Optimization With R - Tips and Tricks
Optimization With R - Tips and Tricks
Optimization With R - Tips and Tricks
� Maximum Likelihood
� Parameter estimation � Overview of (large, rapidly changing, still incomplete)
� Quantile and density estimation set of tools for solving optimization problems in R
� Appreciation of the types of problems and types of methods to
� LASSO estimation
solve them
� Robust regression � Advice on setting up problems and solvers
� Nonlinear equations � Suggestions for interpreting results
� Geometric programming problems � Some almost real-world examples
� Deep Learning / Support Vector Machines
Unfortunately, there is no time to talk about the new and exciting
� Engineering and Design, e.g. optimal control developments in convex optimization and optimization modelling
� Operations Research, e.g. network flow problems languages.
� Economics, e.g. portfolio optimization
Univariate (1-dim.) Minimization
Methods / Algorithms:
0.5
�
0.0
�
� SANN - Simulated Annealing [don’t use !]
−1.0
The global minimum obviously is (1, . . . , 1) with value 0. fn <- adagio::fnRosenbrock; gr <- adagio::grRosenbrock
sol <- optim(rep(0, 10), fn, gr,
fnRosenbrock <- function (x) { control=list(reltol=1e-12, maxit=10000))
n <- length(x) sol$par; sol$counts
x1 <- x[2:n]; x2 <- x[1:(n - 1)]
sum(100 * (x1 - x2^2)^2 + (1 - x2)^2)
} ## [1] 0.487650105 0.218747555 0.074772474 0.008069353 0.007936313
## [6] 0.037545739 0.013695922 0.027284322 0.023147646 0.043194172
Available in package adagio as fnRosenbrock(),
## function gradient
with exact gradient grRosenbrock().
## 9707 NA
Nelder-Mead Solvers Adaptive Nelder-Mead
anms in pracma implements a new (Gao and Han, 2012) adaptive
� dfoptim Nelder-Mead algorithm, adapting to the size of the problem (i.e.,
dimension of the objective function).
nmk(par, fn, control = list(), ...)
nmkb(par, fn, lower=-Inf, upper=Inf, fn <- adagio::fnRosenbrock
control = list(), ...) pracma::anms(fn, rep(0, 20), tol = 1e-12, maxfeval = 25000)
� adagio
neldermead(fn, x0, ..., adapt = TRUE, ## $xmin
tol = 1e-10, maxfeval = 10000, ## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
step = rep(1.0, length(x0))) ##
## $fmin
� pracma [new] ## [1] 5.073655e-25
anms(fn, x0, ..., ##
tol = 1e-10, maxfeval = NULL) ## $nfeval
## [1] 22628
Exploiting the direction of “steepest descent” as computed by the Given a function f : R n → R and a direction d ∈ R n , a line search
negative gradient −∇f (x ) of a multivariate function. method approximately minimizes f along the line
{x + t d | t ∈ R}.
� Steepest descent
dk = −∇f (xk ) Armijo-Goldstein inequality: 0 < c, ν < 1
� Conjugate Gradient (GC) f (x0 + t ∗ d) ≤ f (x0 ) + c ν k f � (x0 ; d), k = 0, 1, 2, . . .
||∇f (xk+1 )||
dk = −∇f (xk ) + βk dk−1 , d0 = −∇f (x0 ), e.g., βk = ||∇f (xk )|| (Weak) Wolf condition: 0 < c1 < c2 < 1
(Fletcher and Reeves).
f (xk + tk dk ) ≤ f (xk ) + c1 tk f � (xk ; dk )
� BFGS and L-BFGS-B
dk = −Hf (xk )−1 ∇f (xk ), Hf (x ) Hessian of f in x . c2 f � (xk ; dk ) ≤ f � (xk + tk dk ; gk )
Rosenbrock with Line Search BFGS and L-BFGS-B
Steepest descent direction vs. BFGS direction
Wolfe line search these two directions
Central-difference Formula
∇f (x ) = ( f∂x
(x )
1
, . . . , f∂x
(x )
n
) and df (x )
dx (x ) ≈ f (x +h)−f (x −h)
2·h
pracma::grad
Tf <- adagio::transfinite(0, 0.5, 10) constrOptim(theta, f, grad, ui, ci, mu = 1e-04, control
h <- Tf$h; hinv <- Tf$hinv method = if(is.null(grad)) "Nelder-Mead" else
p0 <- rep(0.25, 10) outer.iterations = 100, outer.eps = 1e-05,
f <- function(x) fn(hinv(x)) # f: R^n --> R hessian = FALSE)
g <- function(x) pracma::grad(f, x)
sol <- lbfgs::lbfgs(f, g, p0, epsilon=1e-10, invisible=1) � ui %*% theta - ci >= 0 corresponds to Ax ≥ 0
hinv(sol$par); sol$value � Bounds formulated as linear constraints (even xi ≥ 0)
� theta must be in the interior of the feasible region
� Inner iteration still calls optim
## [1] 0.5000000000 0.2630659827 0.0800311137 0.0165742342
## [6] 0.0102120052 0.0102084108 0.0102042121 0.0100040850 Recommendation: Do not use constrOptim. Instead, use an
‘augmented Lagrangian’ solver, e.g. alabama::auglag.
## [1] 7.594813
Trick: Linear Equality Constraints Example: Linear Equality
A <- matrix(1, 1, 10) # x1 + ... + xn = 1
Task: min! f (x1 , ..., xn ) s.t. Ax = b N <- pracma::nullspace(A) # size 10 9
Let b1 , ..., bm be a basis of the nullspace of A, i.e. Abi = 0, and x0 x0 <- qr.solve(A, 1) # A x = 1
a special solution Ax0 = b. Define a new function fun <- function(x) fn(x0 + N %*% x) # length(x) = 9
g(s1 , ..., sm ) = f (x0 + s1 b1 + ... + sm bm ) and solve this as a sol <- ucminf::ucminf(rep(0, 9), fun)
minimization problem without constraints: xmin <- c(x0 + N %*% sol$par)
xmin; sum(xmin)
s = argmin g(s1 , ..., sm )
## [1] 0.559312323 0.314864715 0.102103618 0.013695782
Then xmin = x0 + s1 b1 + ... + sm bm is a (local) minimum.
## [6] 0.003318010 0.003316801 0.003316309 0.003252102
xmin <- lineqOptim(rep(0, 3), fnRosenbrock, grRosenbrock,
Aeq = c(1,1,1), beq = 1) ## [1] 1
xmin
[1] 0.5713651 0.3263519 0.1022830 fn(xmin)
## [1] 7.421543
Define the augmented Lagrangian function L as auglag(par, fn, gr, hin, hin.jac, heq, heq.jac,
control.outer=list(), control.optim = list(), ...)
� 1 � 2
L(x , λ; µ) = f (x ) − λj hj (x ) + h (x ) NLoptr
j
2µ j j �
fheq <- function(x) sum(x) - 1 cobyla(x0, fn, lower = NULL, upper = NULL, hin = NULL,
fhin <- function(x) c(x) nl.info = FALSE, control = list(), ...)
� slsqp (Sequential Quadratic Programming, SQP)
sol <- alabama::auglag(rep(0, 10), fn, gr, heq = fheq, hin
control.outer = list(trace = FALSE, method = "nlminb" slsqp(x0, fn, gr = NULL, lower = NULL, upper = NULL,
print(sol$par, digits=5) hin = NULL, hinjac = NULL, heq = NULL, heqjac = NULL,
nl.info = FALSE, control = list(), ...)
## [1] 5.5707e-01 3.1236e-01 1.0052e-01 1.3367e-02 3.4742e-03 � auglag (Augmented Lagrangian)
## [6] 3.3082e-03 3.3071e-03 3.3069e-03 3.2854e-03 -7.6289e-09
auglag(x0, fn, gr = NULL, lower = NULL, upper = NULL,
hin = NULL, hinjac = NULL, heq = NULL, heqjac =
sum(sol$par)
localsolver = c("COBYLA", "LBFGS", "MMA", "SLSQP"
nl.info = FALSE, control = list(), ...)
## [1] 1
Quadratic Programming
1 T
Minimize x Qx + c T x
2
Quadratic Optimization
s.t. Ax ≤ b
where Q is a symmetric, positive (semi-)definite n × n-matrix, c an
n-dim. vector, A an m × n-matrix, and b an m-dim. vector.
For some solvers, linear equality constraints are also allowed.
Example: The enclosing ball problem
Quadratic Solvers
� quantreg
� pracma
Differential Evolution (DE) is a relatively simple genetic algorithm Covariance Matrix Adaptation – Evolution Strategy (CMA-ES) is an
variant, specialized for real-valued functions (10-20 dims). evolutionary algorithm for continuous optimization problems
(adapting the covariance matrix). It is quite difficult to implement,
� DEoptim but is applicable to dimensions up to 50 or more.