Optimization With R - Tips and Tricks

Optimization with R –Tips and Tricks
Hans W Borchers, DHBW Mannheim Introduction
R User Group Meeting, Köln, September 2017
Optimization Mathematical Optimization
“optimization : an act, process, or methodology of making

something (such as a design, system, or decision) as fully perfect, A mathematical optimization problem consists of maximizing (or
functional, or effective as possible; minimizing) a real objective function on a defined domain:
specifically: the mathematical procedures (such as finding the Given a set A ⊆ R n and a function f : A → R from A to the real
maximum of a function) involved in this.” numbers, find an element x0 ∈ A such that f (x0 ) ≤ f (x ) for all x in
– Merriam-Webster Online Dictionary, 2017 (∗ ) an environment of x0 .
Forms of optimization (cf. Netspeak: “? optimization”): Typical problems:
� Code / program / system optimization

� finding an optimum will be computationally expensive
� Search / website / server . . . optimization
� different types of objective functions and domains
� Business / process / chain . . . optimization
� need to compute the optimum with very high accuracy
� Engine / design / production optimization
� need to find a global optimum, restricted resources
� etc.
(∗ ) First Known Use: 1857
Classification of Optimization Tasks 100+ Packages on the Optimization TV
adagio alabama BB boot bvls cccp cec2005benchmark cec2013

� Unconstrained optimization CEoptim clpAPI CLSOCP clue cmaes cmaesr copulaedas cplexAPI
� Nonlinear least-squares fitting (parameter estimation) crs dclone DEoptim DEoptimR desirability dfoptim ECOSolveR GA
� Optimization with constraints genalg GenSA globalOptTests glpkAPI goalprog GrassmannOptim
gsl hydroPSO igraph irace isotone kernlab kofnGA lbfgs lbfgsb3
� Non-smooth optimization (e.g., minimax problems)
limSolve linprog localsolver LowRankQP lpSolve lpSolveAPI
� Global optimization (stochastic programming) matchingMarkets matchingR maxLik mcga mco minpack.lm minqa
� Linear and quadratic programming (LP, QP) neldermead NlcOptim nleqslv nlmrt nloptr nls2 NMOF nnls onls
� Convex optimization (resp. SOCP, SDP) optimx optmatch parma powell pso psoptim qap quadprog quantreg
rcdd RCEIM Rcgmin rCMA Rcplex RcppDE Rcsdp Rdsdp rgenoud
� Mixed-integer programming (MIP, MILP, MINLP)
Rglpk rLindo Rmalschains Rmosek rneos ROI Rsolnp Rsymphony
� Combinatorial optimization (e.g., graph problems) Rvmmin scs smoof sna soma subplex tabuSearch trust trustOptim
TSP ucminf
Optimization in Statistics Goals for this Talk
� Maximum Likelihood
� Parameter estimation � Overview of (large, rapidly changing, still incomplete)
� Quantile and density estimation set of tools for solving optimization problems in R
� Appreciation of the types of problems and types of methods to
� LASSO estimation
solve them
� Robust regression � Advice on setting up problems and solvers
� Nonlinear equations � Suggestions for interpreting results
� Geometric programming problems � Some almost real-world examples
� Deep Learning / Support Vector Machines
Unfortunately, there is no time to talk about the new and exciting
� Engineering and Design, e.g. optimal control developments in convex optimization and optimization modelling
� Operations Research, e.g. network flow problems languages.
� Economics, e.g. portfolio optimization
Univariate (1-dim.) Minimization
optimize(f = , interval = , ..., lower = min(interval),

upper = max(interval), maximum = FALSE,
tol = .Machine$double.eps^0.25)
optim(par = , fn = , gr = NULL, ...,

Unconstrained Optimization method = "Brent",
lower = -Inf, upper = Inf)
optimizeR(f, lower, upper, ..., tol = 1e-20,

method = c("Brent", "GoldenRatio"),
maximum = FALSE,
precFactor = 2.0, precBits = -log2(tol) * precFactor,
maxiter = 1000, trace = FALSE)
1-dimensional Example optim() and Friends

f <- function(x) exp(-0.5*x) * sin(10*pi*x)
curve(f, 0, 1, n = 200, col=4); grid()
optim(par, fn, gr = NULL, ...,
opt <- optimize(f, c(0, 1))
method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B",
points(opt$minimum, opt$objective, pch = 20, col = 2)
"SANN", "Brent"),
lower = -Inf, upper = Inf,
1.0
control = list(), hessian = FALSE)
Methods / Algorithms:
0.5
� Nelder-Mead - downhill simplex method

BFGS - “variable metric” quasi-Newton method
f(x)
�
0.0
� CG - conjugate gradient method

� L-BFGS-B - Broyden-Fletcher-Goldfarb-Shannon
Brent - univariate minimization, same as optimize
−0.5
�
� SANN - Simulated Annealing [don’t use !]
−1.0
0.0 0.2 0.4 0.6 0.8 1.0

Nelder-Mead Nelder-Mead in Action
Nelder-Mead iteratively generates a sequence of simplices to

approximate a minimal point.
At each iteration, the vertices of the simplex are ordered according
to their objective function values and the simplex ‘distorted’
accordingly.
� Sort function values on simplex

� Reflect compute the reflection point
� Expand compute the expansion point
� Contract (outside | inside)
� Shrink the simplex
Stop when the simplex is small enough (‘tolerance’).
Figure 1: Source: de.wikipedia.org
Showcase Rosenbrock optim() w/ Nelder-Mead

As a showcase we use the Rosenbrock function, defined for n ≥ 2. It fn <- adagio::fnRosenbrock; gr <- adagio::grRosenbrock
has has a very flat valley leading to its minimal point. sol <- optim(rep(0, 2), fn, gr, control=list(reltol=1e-12))
sol$par
n
�
f (x1 , . . . , xn ) = [100(xi+1 − xi2 )2 + (1 − xi )2 ]
i=2 ## [1] 0.9999996 0.9999992
The global minimum obviously is (1, . . . , 1) with value 0. fn <- adagio::fnRosenbrock; gr <- adagio::grRosenbrock
sol <- optim(rep(0, 10), fn, gr,
fnRosenbrock <- function (x) { control=list(reltol=1e-12, maxit=10000))
n <- length(x) sol$par; sol$counts
x1 <- x[2:n]; x2 <- x[1:(n - 1)]
sum(100 * (x1 - x2^2)^2 + (1 - x2)^2)
} ## [1] 0.487650105 0.218747555 0.074772474 0.008069353 0.007936313
## [6] 0.037545739 0.013695922 0.027284322 0.023147646 0.043194172
Available in package adagio as fnRosenbrock(),
## function gradient
with exact gradient grRosenbrock().
## 9707 NA
Nelder-Mead Solvers Adaptive Nelder-Mead
anms in pracma implements a new (Gao and Han, 2012) adaptive
� dfoptim Nelder-Mead algorithm, adapting to the size of the problem (i.e.,
dimension of the objective function).
nmk(par, fn, control = list(), ...)
nmkb(par, fn, lower=-Inf, upper=Inf, fn <- adagio::fnRosenbrock
control = list(), ...) pracma::anms(fn, rep(0, 20), tol = 1e-12, maxfeval = 25000)
� adagio
neldermead(fn, x0, ..., adapt = TRUE, ## $xmin
tol = 1e-10, maxfeval = 10000, ## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
step = rep(1.0, length(x0))) ##
## $fmin
� pracma [new] ## [1] 5.073655e-25
anms(fn, x0, ..., ##
tol = 1e-10, maxfeval = NULL) ## $nfeval
## [1] 22628
Gradient-Based Approaches Line Searches
Exploiting the direction of “steepest descent” as computed by the Given a function f : R n → R and a direction d ∈ R n , a line search
negative gradient −∇f (x ) of a multivariate function. method approximately minimizes f along the line
{x + t d | t ∈ R}.
� Steepest descent
dk = −∇f (xk ) Armijo-Goldstein inequality: 0 < c, ν < 1
� Conjugate Gradient (GC) f (x0 + t ∗ d) ≤ f (x0 ) + c ν k f � (x0 ; d), k = 0, 1, 2, . . .
||∇f (xk+1 )||
dk = −∇f (xk ) + βk dk−1 , d0 = −∇f (x0 ), e.g., βk = ||∇f (xk )|| (Weak) Wolf condition: 0 < c1 < c2 < 1
(Fletcher and Reeves).
f (xk + tk dk ) ≤ f (xk ) + c1 tk f � (xk ; dk )
� BFGS and L-BFGS-B
dk = −Hf (xk )−1 ∇f (xk ), Hf (x ) Hessian of f in x . c2 f � (xk ; dk ) ≤ f � (xk + tk dk ; gk )
Rosenbrock with Line Search BFGS and L-BFGS-B
Steepest descent direction vs. BFGS direction
Wolfe line search these two directions
The Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm

Iteration: While �∇fk � > � do
� compute the search direction: dk = −Hk ∇fk

� proceed with line search: xk+1 = xk + αdk
� Update approximate Hessian inverse: Hk+1 ≈ Hf (xk+1 )−1
L-BFGS – low-memory BFGS stores matrix Hk in O(n) storage.

BFGS-B – BFGS with bound constraints (‘active set’ approach).
optim() w/ BFGS Best optim() usage

optim(rep(0, 20), fn, gr, method = "BFGS",
control=list(reltol=1e-12, maxit=1000))$par optim(par, fn, gr = function(x) pracma::grad(fn, x), ...,
method = "L-BFGS-B"",
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 lower = -Inf, upper = Inf,
control = list(factr = 1e-10,
optim(rep(0, 20), fn, method = "L-BFGS-B", maxit = 50*length(par)))
control=list(factr=1e-12, maxit=1000))$par # factr vs.
� use only method = "L-BFGS-B"

## [1] 0.9999987 0.9999984 0.9999982 0.9999981 0.9999980 0.9999980 (faster, more accurate, less memory, bound constraints)
## [8] 0.9999979 0.9999977 0.9999974 0.9999969 0.9999958 0.9999935
� use factr = 1e-10 for tolerance, default 1e07
## [15] 0.9999797 0.9999613 0.9999243 0.9998500 0.9997011 0.9994022
� set maxit = 50*d ... 50*dˆ2 (default is 100)
optim(rep(0, 20), fn, gr, method = "L-BFGS-B", # works for � use dfoptim or pracma for gradients
control=list(factr=1e-12, maxit=1000))$par (if you don’t have an analytical or exact gradient)
� look carefully at the output
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
More BFGS Packages More quasi-Newton type Algorithms
� lbfgsb3 interfaces the Nocedal et al. ‘L-BFGS-B.3.0’ (2011)
(FORTRAN) minimizer with bound constraints. � stats::nlm [don’t ever use!]
� stats::nlminb [PORT routine]
BUT: Options like “maximum number of function calls” are not
accessible. (And the result is returned as ‘invisible’.) nlminb(start, objective, gradient = NULL, hessian = NULL
scale = 1, control = list(), lower = -Inf, upper
sol <- lbfgsb3(par, fn, gr = NULL, lower=-Inf, upper=Inf
� trustOptim::trust.optim [trust-region approach]
sol
no linesearch, suitable for sparse Hessians
trust.optim(x, fn, gr, hs = NULL, control = list(),
� lbfgs interfaces the ‘libBFGS’ C library by Okazaki with method = c("SR1", "BFGS", "Sparse"), ...)
Wolfe line search (based on Nocedal).
� ucminf::ucminf [BFGS + line search + trust region]
BUT: Bound constraints are not accessible through the API.
ucminf(par, fn, gr = NULL, ..., control = list(), hessian=
lbfgs(fn, gr, par, invisible=1)
ucminf with Rosenbrock More John Nash Work

fn <- adagio::fnRosenbrock; gr <- adagio::grRosenbrock
sol <- ucminf::ucminf(rep(0, 100), fn, gr, control=list(maxeval=
list(par=sol$par, value = sol$value, conv = sol$conv, mess Thorough implementation of quasi-Newton solvers in pure R.
� Rcgmin (“conjugate gradient”“)

## $par
� Rvmmin (“variable metric”“)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
� Rtnmin (“truncated Newton”)
## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [71] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Apply, test, and compare different nonlinear optimization solvers for
## smooth, possibly bound constrained multivariate functions:
## $value
## [1] 1.223554e-15 � optimx, optimr, or optimrx?
##
## $conv optimrx::opm(rep(0, 10), fnRosenbrock, grRosenbrock,
## [1] 1 method = "ALL")
##
## $mess
## [1] "Stopped by small gradient (grtol)."
Comparison of Nonlinear Solvers Excurse: Computing Gradients
method value fevals gevals convd xtime
BFGS 3.127628e-21 291 98 0 0.003
CG 1.916095e-12 1107 408 0 0.010 � manually
Nelder-Mead 8.147198e+00 1501 NA 1 0.008 � symbolically: package Deriv
L-BFGS-B 5.124035e-10 78 78 0 0.001
� numerically: packages numDeriv or pracma
nlm 4.342036e-13 NA 55 0 0.002
nlminb 4.243607e-18 121 97 0 0.002 gr <- function(x) numDeriv::grad(fn, x) # simple, or:
lbfgsb3 5.124035e-10 78 78 0 0.029
Rcgmin 3.656125e-19 300 136 0 0.004 gr <- function(x) pracma::grad(fn, x, heps=6e-06) # central
Rtnmin 5.403094e-13 105 105 0 0.013
Rvmmin 2.935561e-27 116 72 0 0.007
� complex step derivation
ucminf 1.470165e-15 77 77 0 0.002 gr <- function(x) pracma::grad_csd(fn, x)
newuoa 3.614733e-11 1814 NA 0 0.022
bobyqa 6.939585e-10 2142 NA 0 0.025 � automated differentiation [not yet available]
nmkb 9.099242e-01 1500 NA 1 0.083
hjkb 8.436900e-07 4920 NA 0 0.033
lbfgs 9.962100e-13 NA NA 0 0.001
Central-difference Formula
∇f (x ) = ( f∂x
(x )
1
, . . . , f∂x
(x )
n
) and df (x )
dx (x ) ≈ f (x +h)−f (x −h)
2·h
pracma::grad
function (f, x0, heps = .Machine$double.eps^(1/3), ...)

{
# [...input checking...]
n <- length(x0) Optimization with Constraints
hh <- rep(0, n)
gr <- numeric(n)
for (i in 1:n) {
hh[i] <- heps
gr[i] <- (f(x0 + hh) - f(x0 - hh))/(2 * heps)
hh[i] <- 0
}
return(gr)
}
Constraints The ‘transfinite’ Trick
If the solver does not support bound constraints li ≤ xi ≤ ui , the

� box/bound constraints: li ≤ xi ≤ ui transfinite approach will do the trick.
[trick: the ‘transfinite’ approach] Generate a smooth (surjective) function h : R n → [li , ui ], e.g.
� linear inequality constraints: A x ≤ 0
h : xi → li + (ui − li )/2 · (1 + tanh(xi ))
� linear equality constraints: A x = b
[trick: the ‘hyperplane’ approach] and optimize the composite function g(x ) = f (h(x )), i.e.
� quadratic constraints
� inequality constraints in general g : R n → [li , ui ] → R
� equality and inequality constraints x ∗ = argminx g(x) = f(h(x))
then xmin = h(x ∗ ) will be a minimum of f in [li , ui ].
Example: ‘Transfinite’ Approach Linear Inequality Constraints

Minimize the Rosenbrock function in 10 dimensions with
0 ≤ xi ≤ 0.5. Optimization with linear constraints only: A x ≥ 0 (or A x ≤ 0)
Tf <- adagio::transfinite(0, 0.5, 10) constrOptim(theta, f, grad, ui, ci, mu = 1e-04, control
h <- Tf$h; hinv <- Tf$hinv method = if(is.null(grad)) "Nelder-Mead" else
p0 <- rep(0.25, 10) outer.iterations = 100, outer.eps = 1e-05,
f <- function(x) fn(hinv(x)) # f: R^n --> R hessian = FALSE)
g <- function(x) pracma::grad(f, x)
sol <- lbfgs::lbfgs(f, g, p0, epsilon=1e-10, invisible=1) � ui %*% theta - ci >= 0 corresponds to Ax ≥ 0
hinv(sol$par); sol$value � Bounds formulated as linear constraints (even xi ≥ 0)
� theta must be in the interior of the feasible region
� Inner iteration still calls optim
## [1] 0.5000000000 0.2630659827 0.0800311137 0.0165742342
## [6] 0.0102120052 0.0102084108 0.0102042121 0.0100040850 Recommendation: Do not use constrOptim. Instead, use an
‘augmented Lagrangian’ solver, e.g. alabama::auglag.
## [1] 7.594813
Trick: Linear Equality Constraints Example: Linear Equality
A <- matrix(1, 1, 10) # x1 + ... + xn = 1
Task: min! f (x1 , ..., xn ) s.t. Ax = b N <- pracma::nullspace(A) # size 10 9
Let b1 , ..., bm be a basis of the nullspace of A, i.e. Abi = 0, and x0 x0 <- qr.solve(A, 1) # A x = 1
a special solution Ax0 = b. Define a new function fun <- function(x) fn(x0 + N %*% x) # length(x) = 9
g(s1 , ..., sm ) = f (x0 + s1 b1 + ... + sm bm ) and solve this as a sol <- ucminf::ucminf(rep(0, 9), fun)
minimization problem without constraints: xmin <- c(x0 + N %*% sol$par)
xmin; sum(xmin)
s = argmin g(s1 , ..., sm )
## [1] 0.559312323 0.314864715 0.102103618 0.013695782
Then xmin = x0 + s1 b1 + ... + sm bm is a (local) minimum.
## [6] 0.003318010 0.003316801 0.003316309 0.003252102
xmin <- lineqOptim(rep(0, 3), fnRosenbrock, grRosenbrock,
Aeq = c(1,1,1), beq = 1) ## [1] 1
xmin
[1] 0.5713651 0.3263519 0.1022830 fn(xmin)
## [1] 7.421543
Augmented Lagrangian Approach Augmented Lagrangian Solvers
Task: min! f (x ) s.t. gi (x ) ≥ 0, hj (x ) = 0 � alabama
Define the augmented Lagrangian function L as auglag(par, fn, gr, hin, hin.jac, heq, heq.jac,
control.outer=list(), control.optim = list(), ...)
� 1 � 2
L(x , λ; µ) = f (x ) − λj hj (x ) + h (x ) NLoptr
j
2µ j j �
auglag(x0, fn, gr = NULL, lower = NULL, upper = NULL,

The inequality constraints gi (x ) ≥ 0 are included by introducing hin = NULL, hinjac = NULL, heq = NULL, heqjac =
slack variables si and replacing the inequality constraints with localsolver = c("COBYLA"), localtol = 1e-6, ineq2local
nl.info = FALSE, control = list(), ...)
gi (x ) − si = 0, si ≥ 0
� Rsolnp
The bound constraints are treated differently (e.g., through the � NlcOptim (Sequential Quadratic programming, SQP)
LANCELOT algorithm).
� Rdonlp2 (removed from CRAN, see R-Forge’s Rmetrics)
Example with alabama::auglag The nloptr Package (NLopt Library)
Minimize the Rosenbrock function with constraints
x1 + . . . + xn = 1 and 0 ≤ xi ≤ 1 for all i = 1, . . . , n. � COBYLA (Constrained Optimization By Linear Approximation)
fheq <- function(x) sum(x) - 1 cobyla(x0, fn, lower = NULL, upper = NULL, hin = NULL,
fhin <- function(x) c(x) nl.info = FALSE, control = list(), ...)
� slsqp (Sequential Quadratic Programming, SQP)
sol <- alabama::auglag(rep(0, 10), fn, gr, heq = fheq, hin
control.outer = list(trace = FALSE, method = "nlminb" slsqp(x0, fn, gr = NULL, lower = NULL, upper = NULL,
print(sol$par, digits=5) hin = NULL, hinjac = NULL, heq = NULL, heqjac = NULL,
## [1] 5.5707e-01 3.1236e-01 1.0052e-01 1.3367e-02 3.4742e-03 � auglag (Augmented Lagrangian)
## [6] 3.3082e-03 3.3071e-03 3.3069e-03 3.2854e-03 -7.6289e-09
auglag(x0, fn, gr = NULL, lower = NULL, upper = NULL,
hin = NULL, hinjac = NULL, heq = NULL, heqjac =
sum(sol$par)
localsolver = c("COBYLA", "LBFGS", "MMA", "SLSQP"
## [1] 1
Quadratic Programming
Quadratic Programming (QP) is the problem of optimizing a

quadratic expression of several variables subject to linear constraints.
1 T
Minimize x Qx + c T x
2
Quadratic Optimization
s.t. Ax ≤ b
where Q is a symmetric, positive (semi-)definite n × n-matrix, c an
n-dim. vector, A an m × n-matrix, and b an m-dim. vector.
For some solvers, linear equality constraints are also allowed.
Example: The enclosing ball problem
Quadratic Solvers
Standard solver for quadratic problems in R is solve.QP in package

quadprog. The matrix Q has to be positive definite.
solve.QP(Dmat, dvec, Amat, bvec, meq=0, factorized=FALSE
Package Function Matrix Timings

Nonsmooth Optimization
quadprog solve.QP pdef 1
kernlab ipop spdef 50
LowRankQP LowRankQP spdef 2
DWD solve_QP_SOCP pdef 9500
coneproj qprog pdef –
Nonsmoothness: Minimax Problems
Functions defined as maximum are not smooth and cannot be

optimized through a straightforward gradient-based approach.
Task: min! f (x ) = max (f1 (x ), . . . , fm (x ))
Instead, define a smooth function g(x1 , . . . , xn , xn+1 ) = xn+1 and
minimze it under constraints Least Squares Solvers
xn+1 ≥ fi (x1 , . . . , xn ) for all i = 1, . . . , m
The solution (x1 , . . . , xn , xn+1 ) returns the minimum point

xmin = (x1 , . . . , xn ) as well as the minimal value fmin = xn+1 .
[Cf. the example in Chapter ?? in the bookdown text.]
Linear Least-squares Nonlinear Least-squares
The standard nonlinear LS estimator for model parameter, given
A linear least-squares (LS) problem means solving min �Ax = b�2 , some data, in Base R is:
possibly with bounds or linear constraints.
The function qr.solve(A, b) from Base R solves over- and nls(formula, data, start, control, algorithm[="plinear|port"],
underdetermined linear systems in the least-squares sense. trace, subset, weights, na.action, model,
lower, upper, ...)
� nnls (Lawson-Hansen algorithm)
linear LS with non-negative/-positive constraints Problems:
� bvls (Stark-Parker algorithm)
linear LS with bound constraints l ≤ x ≤ u � too small or zero residuals
� pracma::lsqlincon(A, b, ...)
� “singular gradient” error message (R-help, Stackoverflow)
linear LS with linear equality and inequality constraints (applies
� too many local minima, proper starting point
a quadratic solver) (cf. nls2 with random or grid-based start points)
� bounds require the ‘port’ algorithm (Port library)
(recommended anyway)
‘Stabilized’ Nonlinear LS Tip: Rosenbrock as LS Problem

Redefine Rosenbrock as vector-value function:
Modern nonlinear LS solvers use the Levenberg-Marquardt method
(not Gauss-Newton) to minimize sums of squares. fn <- function(x) {
n <- length(x)
� minpack.lm x1 <- x[2:n]; x2 <- x[1:(n - 1)]
c(10*(x1 - x2^2), 1 - x2)
nlsLM(formula, data = parent.frame(), start, jac = NULL
}
algorithm = "LM", control = nls.lm.control(),
lower = NULL, upper = NULL, trace = FALSE, ...)
and now apply pracma’s lsqnonlin:
� nlmrt
nlxb(formula, start, trace=FALSE, data, lower=-Inf, upper= lsqnonlin(fn, rep(0, 20))
masked=NULL, control, ...) ## $x
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Cf. also pracma::lsqnonlin(fun, x0, options = list(), . . . ) ## $ssq
## [1] 3.037124e-19
Quantile Regression
�
Median (or: L1 ) Regression: min! |y − A x |
(aka “least absolute deviation”" (LAD) regression)
� quantreg
rq(formula, tau = 0.5, data, subset, weights, na.action,

method = "br", model = TRUE, contrasts, ...)
Global Optimization
� pracma
L1linreg(A, b, p = 1, tol = 1e-07, maxiter = 200)
solves the linear system A x = b in an Lp sense, i.e. minimizes the

�
term |b − A x |p (for 0 < p ≤ 1) by applying an
“iteratively reweighted least square” (IRLS) method.
DE Solvers CMA-ES Solvers
Differential Evolution (DE) is a relatively simple genetic algorithm Covariance Matrix Adaptation – Evolution Strategy (CMA-ES) is an
variant, specialized for real-valued functions (10-20 dims). evolutionary algorithm for continuous optimization problems
(adapting the covariance matrix). It is quite difficult to implement,
� DEoptim but is applicable to dimensions up to 50 or more.
DEoptim(fn, lower, upper, � Packages that contain CMA-ES solvers:

control = DEoptim.control(trace = FALSE), ..., fnMap cmaes
cmaesr
� RcppDE
rCMA
DEoptim(fn, lower, upper, control = DEoptim.control(), parma::cmaes
Rmalschains
� DEoptimR adagio::pureCMAES
JDEoptim(lower, upper, fn, pureCMAES(par, fun, lower = NULL, upper = NULL, sigma =
constr = NULL, meq = 0, eps = 1e-05, NP = 10*d[, ...]) stopfitness = -Inf, stopeval = 1000*length(par)
More Evolutionary Approaches The gloptim Package
Package gloptim incorporates and compares 25 stochastic solvers.
The following is a typical output, here only showing the results of
CMA-ES and DE solvers for the ‘Runge’ problem:
� Simulated Annealing (SA)
GenSA solver package fmin time
1 purecmaes adagio 0.06546780 43.583
� Genetic Algorithms (GA)
2 cmaes parma 0.06546780 23.523
GA, genalg, SOMA, rgenoud
3 cmaoptim rCMA 0.06546780 91.257
� Particle Swarm Optimization (PSO) 4 malschains Rmalschains 0.06546781 76.457
pso, psoptim, hydroPSO 5 deopt NMOF 0.06546876 75.809
6 deoptimr DEoptimR 0.06549435 57.712
NMOF: DEopt, GAopt, PSopt 7 simplede adagio 0.06573988 84.000
NLoptr: crs2lm, direct, mlsl, isres, stogo 8 cma_es cmaes 0.07430865 7.208
9 cmaes cmaesr 0.07503498 8.305
...
22 cppdeoptim RcppDE 6.82525344 17.050
23 deoptim DEoptim 7.28454226 39.287
ROI – R Optimization Infrastructure

Available Plugins:
glpk, symphony, quadprog, ipop, ecos, scs, nloptr, cplex, . . .
library(ROI); library(ROI.plugin.glpk) # ...

v <- c(15, 100, 90, 60, 40, 15, 10, 1)
w <- c( 2, 20, 20, 30, 40, 30, 60, 10)
Future Developments mat <- matrix(w, nrow = 1)

con <- L_constraint(L = mat, dir = "<=", rhs = 105)
pro <- OP(objective = v, constraints = con,

types = rep("B", 8), maximum = TRUE)
ROI_applicable_solvers(pro) # [1] "clp" "glpk" ...

sol <- ROI_solve(pro, solver = "ecos")
## Optimal solution found.
## The objective value is: 2.800000e+02
CVXR Using Julia Solvers
Ipopt (Interior Point OPTimizer) is a software package for
CVXR provides an R modeling language for convex optimization
large-scale nonlinear optimization (with nonlinear equality and
problems (announced UseR!2016, not yet ready).
inequality constraints).
Example: Estimating a discrete distribution, e.g.
�
m
� difficult to install (extra components needed)
max! i log wi
i=1 −w� � ECLIPSE license (not allowed on CRAN?)
s.t. wi ≥ 0, wi = 1, XT w = b
There is an easy-to-install Ipopt.jl package for Julia.
library(CVXR)
w <- Variable(m) With the R packages XR and XRJulia (John Chambers, 2016)
obj <- SumEntries(Entr(w)) # entropy function it will be possible to utilize this with a new R package ipoptjlr.
constr <- list(w >= 0, SumEntries(w) == 1, t(X) %*% w == b)
pro <- Problem(Maximize(obj), constr) library(ipoptjlr)
sol <- solve(pro) julia_setup("path_to_julia")
sol$w IPOPT(x, x_L, x_U, g_L, g_U, eval_f, eval_g,
eval_grad_f, jac_g1, jac_g2, h1, h2)
Using the NEOS Solvers

“The NEOS Server https://neos-server.org/neos/ is a free
internet-based service for solving numerical optimization problems.
[It] provides access to more than 60 state-of-the-art [free and
commercial] solvers.”
rneos: XML-RPC Interface to NEOS
# submit job to the NEOS solver

neosjob <- NsubmitJob(xmlstring, user = "hwb", interface =
Epilogue
id = 8237, nc = CreateNeosComm())
neosjob
# The job number is: 3838832
# The pass word is : wBgHomLT
# getting info about job

NgetJobInfo(neosjob) # "nco" "MINOS" "AMPL" "Done"
NgetFinalResults(neosjob)
“What can go wrong?” References
� Theussl, S., and H. W. Borchers (2017). CRAN Task View:

� Modell, constraints, gradients, . . . Optimization and Mathematical Programming.
� Local: bad starting values URL: https://CRAN.R-project.org/view=Optimization
Global: no guaranteed optimum � Nash, J. C. (2014). Nonlinear Parameter Optimization
� Applying appropriate solvers Using R Tools. John Wiley and Sons, Chichester, UK.
� Setting solver controls � Varadhan, R., Editor (2014). Special Issue: Numerical
� Special problems, e.g. Optimization in R: Beyond optim. Journal of Statistical
Non-smooth objective functions, noise, . . . Software, Vol. 60.
� Understanding solver output (and error messages)
convergence, accuracy, no. of loops and function calls
� Bloomfield, V. A. (2014). Using R for Numerical Analysis
� Checking results in Science and Engineering. CRC Press, USA. (Chapter 7,
40 pp.)
“Most methods work most of the time.” – John Nash � Cortez, P. (2014). Modern Optimization With R. Use R!
Series, Springer Intl. Publishing, Switzerland.

Optimization With R - Tips and Tricks

Uploaded by

Copyright:

Available Formats

Optimization With R - Tips and Tricks

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimization With R - Tips and Tricks

Uploaded by

Copyright:

Available Formats

Optimization with R –Tips and Tricks

Hans W Borchers, DHBW Mannheim Introduction

R User Group Meeting, Köln, September 2017

Optimization Mathematical Optimization

“optimization : an act, process, or methodology of making

Forms of optimization (cf. Netspeak: “? optimization”): Typical problems:

� Code / program / system optimization

adagio alabama BB boot bvls cccp cec2005benchmark cec2013

Optimization in Statistics Goals for this Talk

optimize(f = , interval = , ..., lower = min(interval),

optim(par = , fn = , gr = NULL, ...,

optimizeR(f, lower, upper, ..., tol = 1e-20,

1-dimensional Example optim() and Friends

control = list(), hessian = FALSE)

� Nelder-Mead - downhill simplex method

� CG - conjugate gradient method

0.0 0.2 0.4 0.6 0.8 1.0

Nelder-Mead iteratively generates a sequence of simplices to

� Sort function values on simplex

Stop when the simplex is small enough (‘tolerance’).

Figure 1: Source: de.wikipedia.org

Showcase Rosenbrock optim() w/ Nelder-Mead

Gradient-Based Approaches Line Searches

The Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm

� compute the search direction: dk = −Hk ∇fk

L-BFGS – low-memory BFGS stores matrix Hk in O(n) storage.

optim() w/ BFGS Best optim() usage

� use only method = "L-BFGS-B"

ucminf with Rosenbrock More John Nash Work

� Rcgmin (“conjugate gradient”“)

function (f, x0, heps = .Machine$double.eps^(1/3), ...)

If the solver does not support bound constraints li ≤ xi ≤ ui , the

Example: ‘Transﬁnite’ Approach Linear Inequality Constraints

Augmented Lagrangian Approach Augmented Lagrangian Solvers

Task: min! f (x ) s.t. gi (x ) ≥ 0, hj (x ) = 0 � alabama

auglag(x0, fn, gr = NULL, lower = NULL, upper = NULL,

Quadratic Programming (QP) is the problem of optimizing a

Standard solver for quadratic problems in R is solve.QP in package

solve.QP(Dmat, dvec, Amat, bvec, meq=0, factorized=FALSE

Package Function Matrix Timings

Nonsmoothness: Minimax Problems

Functions deﬁned as maximum are not smooth and cannot be

The solution (x1 , . . . , xn , xn+1 ) returns the minimum point

‘Stabilized’ Nonlinear LS Tip: Rosenbrock as LS Problem

rq(formula, tau = 0.5, data, subset, weights, na.action,

L1linreg(A, b, p = 1, tol = 1e-07, maxiter = 200)

solves the linear system A x = b in an Lp sense, i.e. minimizes the

DE Solvers CMA-ES Solvers

DEoptim(fn, lower, upper, � Packages that contain CMA-ES solvers:

ROI – R Optimization Infrastructure

library(ROI); library(ROI.plugin.glpk) # ...

Future Developments mat <- matrix(w, nrow = 1)

pro <- OP(objective = v, constraints = con,

ROI_applicable_solvers(pro) # [1] "clp" "glpk" ...

Using the NEOS Solvers

# submit job to the NEOS solver

# getting info about job

� Theussl, S., and H. W. Borchers (2017). CRAN Task View: