Tutors: Prof. Dr. Tanka Nath Dhamala & Mr. Ram Chandra Dhungana

Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

Lecture Notes on Mathematical Programming

Tutors: Prof. Dr. Tanka Nath Dhamala & Mr. Ram Chandra Dhungana

SEMESTER I: May 10, 2020, Kathmandu, Nepal

Remark: This ”Lecture Note” is not intended to substitute the text books. The recommended reference
and text books should be consulted for a completion of the curriculum.

1
Chapter 1

Problem Formulations

1.1 Nonlinear programming (NLP)

Formulation 1. Nonlinear programming problem (NLP)

min f (x)
subject to gi (x) ≤ 0, i = 1, 2, . . . , q
hj (x) = 0, j = 1, 2, . . . , p
x ∈ X

where, f, gi , hj are general nonlinear functions of x = (x1 , x2 , . . . , xn ) ∈ X ⊆ Rn .

Here, f (x) is the objective function or criterion function and gi and hj are the constraint func-
tions. Each of the constraints gi (x) ≤ 0 is an inequality constrain, whereas each of the constraints
hj (x) = 0 is an equality constraint. The vector x ∈ X restricts the variable demanded by the problem
requirements.

Definition 1. The set of all feasible points

F = {x ∈ X | gi (x) ≤ 0, hj (x) = 0, for i = 1, 2, . . . , q; j = 1, 2, . . . , p}

of a formulated problem (P ) is called the feasible set.

Note that the techniques for solving such problems are generally iterative and their convergence is
studied using the real analysis. The optimization problem is to choose a ’best’ configuration or a set
of parameters to achieve its goal. An optimal solution x̄ ∈ F is such that f (x̄) ≤ f (x) for all x ∈ F .
If there are more than one optimal solutions, then they are called alternate optima.
One may formulate an maximization problem with inequality constraint of the form ≥ 0, and all
of the rest as they are.

2
Example 1. Consider the following nonlinear programming problem

min(x1 − 3)2 + (x2 − 2)2


subject to x21 − x2 − 3 ≤ 0
x2 − 1 ≤ 0
−x1 ≤ 0
x ∈ R2

The feasible region is given by the set of points in R2 as follows:

F = {(x1 , x2 ) ∈ R2 | x21 − x2 − 3 ≤ 0, x2 − 1 ≤ 0, −x1 ≤ 0}

We have to find a minimal value of c = (x1 − 3)2 + (x2 − 2)2 over the set F . Different values of

c = (x1 − 3)2 + (x2 − 2)2 give a family of circles of radius c with center (3, 2) (i.e., contours).
An optimal solution is that with minimum c such that the circle touches the feasible region F . The
minimum solution is obtained at (2, 1) with objective value c = 2.

CW 1. Obtain a maximum solution of the following nonlinear programming problem by graphical


method.
max{(x1 − 3)2 + (x2 − 2)2 | x21 − x2 − 3 ≤ 0, x2 − 1 ≤ 0, −x1 ≤ 0}

CW 2. Solve the following nonlinear programming problem by graphical method.

max{x1 + x2 | x21 + x22 ≥ 1, x21 + x22 ≤ 2, x1 ≥ 0, x2 ≥ 0} (1.1)

Here, the objective function is f (x1 , x2 ) = x1 + x2 . The constrained functions are g1 (x1 , x2 ) =
x21 + x22 − 1, g(x1 , x2 ) = x21 + x22 − 2, g3 (x1 , x2 ) = x1 and g4 (x1 , x2 ) = x2 .

3
Remark 1. Note that this is a graphical approach for solving a nonlinear programming problem
which is applicable only for two variable problems and the objective functions and and the constraint
functions are of simple type. In general we have to search for other methods.

Optimization Branches. There are variants of optimization problems (either maximization or min-
imization), for example, nonlinear programming, convex optimization, integer programming (pure or
mixed), linear programming, combinatorial optimization, scheduling, multi-objective optimization,
multi-level optimization, network optimization, location optimization, non-smooth optimization, sys-
tem optimization, design optimization, geometric programming, and numerical optimization, to name
some.

Optimization Domains. The domain of optimization problems are from mathematics, engineering,
management science, science and technology, applied sciences, medicine, economics and also social
sciences.

Optimization Applications. By applying optimization methods, many real-life problems, for exam-
ple, traffic management, communication and production networks, optimal shaping and design, hos-
pital scheduling, planning (capital budgeting, facility location, portfolio analysis), vehicle assignment,
evacuation, circuit design of automated production systems, data analysis and reliability, molecular
biology, high-energy physics, division of region into election systems, optimal control, resource allo-
cation, and so on.

1.2 Linear programming (LP)

When all functions f, gi , hj are nonnegative real valued linear functions of x ∈ Rn , the Formulation
1, reduces to the linear programming as in Formulation 2.

Formulation 2. Linear programming problem (LPP)

min c0 x (1.2)
subject to gi (x) ≤ 0, i = 1, 2, . . . , q (1.3)
hj (x) = 0, j = 1, 2, . . . , p (1.4)

where, gi , hj are linear functions from Rn to R≥0 and c ∈ Rn is a cost vector.

Examples of linear functions are the system of linear equations and linear inequalities: Ax = b or
Cy ≥ d, for example.

4
1.2.1 Variants of LP formulations

Formulation 3. The general LP form


Let f (x1 , x2 , . . . , xn ) be a linear function of a variable (x1 , x2 , . . . xn ) ∈ X ⊆ Rn , and let c =
(c1 , c2 , . . . cn ) be a given cost vector. Then, the general form of linear programming (LP) can be
formulated as follows

min f (x) = < c, x > (1.5)


= c1 x 1 + c2 x 2 + · · · + cn x n (1.6)

subject to

a11 x1 + · · · + a1r xr + a1,r+1 xr+1 + . . . + a1n xn = b1


a21 x1 + · · · + a2r xr + a2,r+1 xr+1 + . . . + a2n xn = b2
..
.
aq1 x1 + · · · + aqr xr + aq,r+1 xr+1 + . . . + aqn xn = bq
aq+1,1 x1 + · · · + aq+1,r xr + aq+1,r+1 xr+1 + . . . + aq+1,n xn ≥ bq+1
..
.
am1 x1 + am2 x2 + · · · + amr xr + am,r+1 xr+1 + . . . + amn xn ≥ bm
x1 , x 2 , · · · + xr ≥ 0
xr+1 , xr+2 , · · · + xn free

where the first q constraints among m are equality, the r variables among m are constrained. The rest
of the constrains m − q are inequality constraints and n − r variables are free.

In matrix form the above system can be written as


 
  x1  
a a12 ··· a1p a1p+1 ··· a1n = b
 11  1
 
  x2  
 a
 21 a22 ··· a2p a2p+1 ··· a2n 
 ..
  =  b
 2

 .
  
 ...   ..  .. 





 xp

 .  .



a
 p+1 1 ap+1 2 · · · ap+1,p ap+1,p+1 · · · ap+1,n 




 ≥ b
 p+1


...  xp+1  .
 ..
   




 .
 ..

 · 


am1 am2 ··· amp am,p+1 ··· amn ≥ bm
 
xn

This system can be written in compact form as

min c0 x (1.7)

5
subject to
n
X
aij xj = bi for i = 1, 2, · · · , q (1.8)
j=1
n
X
aij xj ≥ bi for i = q + 1, q + 2, · · · , m (1.9)
j=1
xj ≥ 0 for j = 1, 2, · · · , r (1.10)
xj free for j = r + 1, r + 2, · · · , n (1.11)

where,
ai = (ai1 , ai2 , . . . , ain ); i = 1, 2, . . . , m

is the ith row of matrix A = (aij )(q+m)×n .

Formulation 4. The canonical LP form


The LP in canonical form can be formulated as follows

min c0 x (1.12)

subject to
n
X
aij xj ≥ bi for i = 1, 2, · · · , m (1.13)
j=1
xj ≥ 0 for j = 1, 2, · · · , n (1.14)
(1.15)

Formulation 5. The standard LP form


The LP in standard form can be formulated as follows

min c0 x

subject to
n
X
aij xj = bi for i = 1, 2, · · · , m
j=1
xj ≥ 0 for j = 1, 2, · · · , n

Theorem 1. Prove that the 3-forms (standard, canonical and general) of LP are equivalent.

Proof. We prove that the canonical, standard and general forms of LP are equivalent.

6
• The canonical and standard form are obviously in general form. Therefore, given canonical and
standard forms, we do not have to extra verify that they can be reduced to the general form.

• For given general forms, we verify that they can be reduced to the canonical form as follows.
Consider the equality constraint
n
X
aij xj = bi
j=1

in general form. This can be written as two inequality constraints in canonical form as follows.
n
X
aij xj ≥ bi
j=1

and n
X
aij xj ≤ bi
j=1

This can be written as follows

n
X
aij xj ≥ bi
j=1

and n
X
(−aij )xj ≥ −bi
j=1

The free variables xj can be expressed as



xj = x+
j − xj


with x+
j ≥ 0 and xj ≥ 0.

• For given a general form, we verify that they can be reduced to the standard form as follows.
Consider the inequality constraint
n
X
aij xj ≥ bi
j=1

in general form.
We use the surplus variables si ≥ 0 such that
n
X
aij xj − si = bi
j=1

in the standard form.


The free variables xj can be expressed as

xj = x+
j − xj


with x+
j ≥ 0 and xj ≥ 0.

7
Remark 2. Note that if there exist the inequality constraints of the form
n
X
aij xj ≤ bi
j=1

in general form, we use the slack variables yi ≥ 0 such that


n
X
aij xj + yi = bi
j=1


in the standard form. Similarly, the free variables xj can be expressed as xj = x+ +
j − xj with xj ≥ 0
and x−
j ≥ 0.

Example 2. Modeling problem (the diet problem)


Let there be n items of foods each containing m nutrients. Suppose that aij denotes the amount
of ith nutrient in a unit of jth food for i = 1, 2, . . . , m and j = 1, 2, . . . , n. Let ri be the yearly
requirement of ith nutrient and cj be the cost per unit of the jth food. Let xj be the yearly consumption
of the jth food. We give an LP model which satisfies the nutrient constraints for food consumption
such that the cost is minimum.

min c0 x
such that Ax ≥ r
x≥0

Example 3. (A modeling problem) A company produces both interior and exterior paints from two
raw materials M1 and M2 . Consider the basic data of the problem as follows.
Raw material M1 requires 6 units per ton for exterior paint and 4 units per ton for interior paints.
Likewise, M1 requires 1 unit per ton for exterior paint and 2 units per ton for interior paints. The
maximum raw materials M1 and M2 available daily are 24 and 6 tons daily, respectively. However,
the profit of exterior paint and interior per ton are 5 and 4 units (in USD 1000).
The daily demand for interior paint cannot exceed that for exterior paint by more than 1 ton. Also
the maximum daily demand for interior paint is 2 tons. Find the optimum (best) production mix of
interior and exterior paints that maximize the total daily profit.
Let x1 and x2 be the production of exterior paint per day and x2 be the production of the interior
paint per day. We have the following LP model.

max 5x1 + 4x2

8
such that 6x1 + 4x2 ≤ 24
x1 + 2x2 ≤ 6
−x1 + x2 ≤ 1
x2 ≤ 2
x1 , x2 ≥ 0

The optimal solution to this problem occurs at the point (3, 1.5) with optimal value 21 units.

1.3 Integer programming (IP)

We give an integer programming formulation in standard form as follows.

min c0 x

subject to

Ax = b
x ≥0
x integer

CW 3. Which problem can be difficult, an IP or LP? Try to give your reasoning.

Remark 3. Are all three forms of IP equivalent?

Example 4. We illustrate an LP whose solution does not yield and IP solution (c.f. Figure 1.1).

max y
subject to
3x + 2y ≤ 12
−x + y ≤ 1
2x + 3y ≤ 12
x, y ≥ 0

The optimum solutions is at x = (x, y) with value 2.

Example 5. We consider the following nonlinear integer programing problem

optimize x1 − x2

9
Figure 1.1: Illustration of IP optima differing to LP optima (single LP optima but multiple IP optima).

x 1 + x2 ≥ 2
x21 + x22 ≥ 4
x21 + x22 ≤ 9
x1 ≥ 0, x2 ≥ 0
2
x ∈ Z≥0

Then draw a feasible region and find an optimal integer point.

1.4 Optimization problems

Definition 2. An instance of an optimization problem (P ) is a pair (F, c), where c : F → R is the cost
function.

The global minimization problem is to find an xo ∈ F such that c(xo ) ≤ c(x) for all x ∈ F .

Definition 3. An optimization problem is a set I of all instances of a given problem.

Consider an instance (F, c) of LP with the feasible set F = {x ∈ Rn | Ax = b, x ≥ 0} and the


cost function c : x → c0 x, where b ∈ Z m , c ∈ Z n , A = (aij )m×n matrix and aij ∈ Z. The set of all
instances is obtained with varying all the given data.
This problem can be considered as continuous optimization problem as all variables are continuous.

10
Figure 1.2: Illustration of IP optima in nonlinear IP, Example 5.

Example 6. Let A = (1, 1, 1) and b = (2) be the given matrix and the vector, respectively. The
feasible set is then, F = {x ∈ R3 | x1 + x2 + x3 = 2, x ≥ 0}. The feasible region is represented by
the intersection of a plane with the first octant in R3 . Consider the objective function c0 x = c1 x1 +
c2 x2 + c3 x3 that has to be minimized. We may assume that the minimum occurs at the corner points
A, B or C, finite number of points (c.f. Figure 1.3). In this regard, we may consider this optimization
problem also a combinatorial one.

Figure 1.3: The feasible set F for an instance of LP

Note that a continuous optimization problem requires continuous variables, like set of real numbers
or functions. However, a discrete optimization problem requires the discrete variables from finite sets
or countable infinite sets such as integers, sets, permutations or graphs. Linear programming problem
is in the boundary of continuous as well as discrete optimization problems.

11
1.4.1 Neighborhoods

Definition 4. Given an optimization problem with an instance (F, c), a neighborhood is a mapping

N : F → 2F

defined for each instance.

Example 7. Let F = Rn , then the set of all points within a fixed Euclidean distance is a neighborhood.

Definition 5. Let n be a positive integer representing the cities in a network, and let [dij ]m×n , where
dij ∈ Z + be a symmetric distance matrix. A tour is a closed path that visits every city exactly once.
Let π be a tour such that π(j) is the city visited after the city j. Let us denote the set of all feasible
elements for this instance by

F = {all cyclic permutations π on n objects}

.
The total length is defined as
n
X
cost = djπ(j)
j=1

Problem 1. The traveling salesman problem (TSP) is to find a tour with minimum total length
n
X
djπ(j)
j=1

Example 8. Consider the traveling salesman problem (TSP) and define

N2 (x) = {g ∈ F }

where, g can be obtained from x by removing 2 edges from the tour and replacing them by 2 other
edges.
This neighborhood is called a 2-neighborhood for the TSP (c.f. Figure 1.4).
It is a discrete neighborhood.

Definition 6. Given an instance (F, c) of an optimization problem and a neighborhood N , a feasible


solution x ∈ N is called a local minimum with respect to the neighborhood N if

c(x) ≤ c(y) for all y ∈ N (x)

It is called a global minimum if it is minimum for any neighborhood N .

A local maximum and a global maximum points are defined similarly.

12
Figure 1.4: (a) An instance of a TSP and a tour. (b) Another tour which is a 2-change of the tour in (a).

Figure 1.5: A 1-dimensional Euclidean optimization problem

Example 9. Let F = [0.2, 1] ⊆ R1 . The points A, B, C, D are local minimum whereas A is the global
minimum at the point (0.8, 0.0) (c.f. Figure 1.5).
Here, N (x) = {y ∈ [0.2, 1] : |x − y| ≤ }, where  is very small.

Definition 7. A neighborhood N (x) is called exact if x ∈ F is locally optimal with respect to the

13
neighborhood N is also globally optimal.

Example 10. Consider the TSP and consider the k-change neighborhood Nk with the following im-
provements on the solutions:
Improve t by replacing s ∈ Nk (t) such that c(s) < c(t) if there exists such s, and unaltered t if
such s does not exist in the neighborhood Nk of t. Where, Nk denotes k-change neighborhood.
The 2-change neighborhood N2 is not exact, however, the n-change neighborhood with n the total
number of cities is exact.

14
Chapter 2

Convexity and Optimization

2.1 Basic terminology

A set S in Rn is said to be convex if for all x1 , x2 ∈ S, it holds that

λx1 + (1 − λ)x2 ∈ S, for all λ ∈ [0, 1]

That is, the line segment joining any two points of the set is contained in it.
The point λx1 + (1 − λ)x2 ∈ S, for allλ ∈ [0, 1] is called as convex combination of x1 and x2 .

Definition 8. For n-points x1 , . . . , xn ∈ S, the combination


k
X k
X
λj xj , where λj = 1, λj ≥ 0 for all, λj
j=1 j=1

is called the convex combination of these points x1 , . . . , xn ∈ S.

Figure 2.1 on the left and on the right illustrates the convex and non-convex sets, respectively.
For any convex S, kj=1 λj xj ∈ S, for kj=1 λj = 1, λj ≥ 0 for all λj , j = 1, 2, . . . , k.
P P

Definition 9. The combination kj=1 λj xj with kj=1 λj = 1 is called an affine combination and the
P P

combination kj=1 λj xj with λj ∈ R is called a linear combination.


P

Example 11. Let S = {x ∈ R3 | a1 x1 + a2 x2 + a3 x3 = d}, an equation of a plane in space. Then the


set S is convex in R3 .

Definition 10. The set S = {x ∈ Rn | pt x = α} is called a hyperplane in Rn , where p ∈ Rn is a


non-zero vector called a gradient or a normal to the hyperplane and α is a scalar.

15
Figure 2.1: Illustration of convexity of sets

If x̄ ∈ S and pt x̄ = α, we can write

S = {x | pt (x − x̄) = 0}

The vector p is orthogonal to all vectors x − x̄ for x ∈ S. Therefore, the vector p is perpendicular to
the surface of the hyperplane S.

Example 12. Consider the following examples.

1. The half-space S = {x ∈ Rn | pt x ≤ α} is convex in Rn .

2. The half-space S = {x ∈ Rn | Ax ≤ b}, the intersection of m half-spaces is also convex in Rn .


This space is called polyhedral set.

Lemma 1. The intersection of any number of convex sets is also convex.

Proof. Let the sets Si be convex and consider the set S = ∩Si of intersections. Let x1 , x2 ∈ S. Then,
x1 , x2 ∈ Si for all i. As each Si is a convex set, the line segment λx1 + (1 − λ)x2 ∈ Si , λ ∈ [0, 1] for
all i. Therefore, λx1 + (1 − λ)x2 ∈ S. This implies that the set S is convex.

Definition 11. Let S be an arbitrary set in Rn . The convex hull of S, denoted by H(S) or conv(S) is
the collection of all convex combinations of S; that is, x ∈ H(S) if and only if
k
X k
X
x= λj xj , where λj = 1, λj ≥ 0 for all λj , j = 1, 2, . . . , k
j=1 j=1

where xj ∈ S and k ∈ Z+ .

Figure 2.2(b) represents different convex hulls H(S) of different convex/nonconvex sets S in Fig-
ure 2.2(a).
Similar definitions of affine hull of S and linear hull of S, for arbitrary set S in Rn can be defined.

16
Figure 2.2: (a) Given sets S. (b) Corresponding convex hulls conv(S) of S in (a).

Definition 12. The hull H(x1 , x2 , . . . , xk+1 ) is called polytope. The hull H(x1 , x2 , . . . , xk+1 ) is called
simplex if x1 , x2 , . . . , xk+1 are affinely independent, that is, x2 −x1 , x3 −x1 , . . . , xk+1 −x1 are linearly
independent.

Theorem 2. Let S be an arbitrary set in Rn . If x ∈ H(S), then x ∈ H(x1 , x2 , . . . , xn+1 ), where


xj ∈ S for j = 1, 2, . . . , n + 1. That is,
n+1
X n+1
X
x= λj xj , where λj = 1 with λj ≥ 0
j=1 j=1

for j = 1, 2, . . . , n + 1, and xj ∈ S for all j.


Pk Pk
Proof. Let x ∈ H(S). Then x = j=1 λj xj with j=1 λj = 1, xj ∈ S and λj > 0 for j = 1, 2, . . . , k.
If k ≤ n + 1, the result has been established. So, we suppose that k > n + 1.
But with a relation between basic feasible solutions and extreme points, we know that at an extreme
point of the set
k
X k
X
{λ | λj xj = x, λj = 1, λj ≥ 0}
j=1 j=1

no more than n + 1 components of λ are positive. Therefore, k = n + 1 is true.

Definition 13. Let x ∈ Rn , consider the neighborhood N (x) = {y : ||y − x|| ≤ }. Let S be an
arbitrary set in Rn .

1. x ∈ Cl(S) if S ∩ N (x) 6= ∅ for all  > 0.

17
2. If S = Cl(S), then S is closed.

3. x ∈ Int(S), if N (x) ⊂ S for some  > 0.

4. Solid set S ⊆ Rn if Int(S) 6= ∅.

5. If S = Int(S), then S is open.

6. x is in the boundary of S, that is, x ∈ bd(S) if N (x) contains at least one element in S and at
least one point not in S for all  > 0.

7. A set S is called bounded if it can be contained in a ball of a sufficiently large .

8. S is called compact if it is closed and bounded.

9. A set is called closed if and only if every convergent sequence {xk } in S with limit x̄, we have
x̄ ∈ S.

Recall the relations between the definitions and the equivalences between the definitions (see page
38 of the text book).

Theorem 3. Let S be a convex set in Rn with nonempty interior. Let x1 ∈ Cl(S) and x2 ∈ Int(S).
Then, λx1 + (1 − λ)x2 ∈ Int(S) for each λ ∈ (0, 1).

Corollary 1. Let S be a convex set in Rn with a nonempty interior, then

1. The set Int(S) is convex.

2. The set Cl(S) is convex.

3. The set Cl(IntS) = Cl(S).

4. The set Int(Cl(S)) = Int(S).

2.2 Minimization problem

Problem 2. (Minimization problem) Let S be an arbitrary set in Rn . Consider

min{f (x) | x ∈ S} (2.1)

If x̄ ∈ S and f (x̄) ≤ f (x) for all x ∈ S. We say that x̄ is a minimum solution for Problem (2).
We say α = inf{f (x) | x ∈ S} if α ≤ f (x) for all x ∈ S and no ᾱ > α exists such that ᾱ ≤ f (x)
for all x ∈ S, we say that α is the greatest lower bound of f on S.

18
Theorem 4. Let S be a nonempty and compact set and let f : S → R1 be a continuous function on
S. Then the problem min{f (x) | x ∈ S} attains its minimum on S. That is, there exists a minimizing
solution to this problem.

Proof. As f : S → R1 is continuous on a closed and bounded set S, f is bounded from below. But S
is a nonempty set, so there exists a greatest lower bound α = inf{f (x) | x ∈ S}. Let 0 <  < 1 and
define Sk = {x ∈ S | α ≤ f (x) ≤ α + k } for k = 1, 2, . . . . Then Sk 6= ∅ for each k by the definition
of an infimum. As S is bounded there is a convergent subsequence {xk }K → x̄, and x̄ ∈ S as S is
also closed. Then α = limk→∞,k∈K f (xk ) = f (x̄) since f is continuous. Thus there exists x̄ ∈ S such
that f (x̄) = α = inf{f (x) | x ∈ S}, showing the existence of a minimum x̄ in S.

The concepts of supporting hyperplanes and separation of disjoint convex sets are quite important
in optimization for dealing with optimality conditions and duality relationships. We explore these
concepts in the following. Figure 2.3 interprets that the sum of the squared norms of the diagonals of
a parallelogram is equal to the sum of squared norms of its sides.

Figure 2.3: Parallelograph law

That is

|| a + b ||2 = || a ||2 + || b ||2 +2at b


|| a − b ||2 = || a ||2 + || b ||2 −2at b

which follows
|| a + b ||2 + || a − b ||2 = 2 || a ||2 +2 || b ||2

Theorem 5. Let S be a nonempty, closed convex set in Rn and let y 6∈ S. Then, there exists a unique
point x̄ ∈ S with minimum distance from y. Furthermore, x̄ is the minimizing point if and only if
(y − x̄)0 (x − x̄) ≤ 0 for all x ∈ S. (c.f. Figure 2.4).

19
Figure 2.4: Minimum distance to a closed convex set

Definition 14. Let S1 and S2 be two nonempty sets in Rn . A hyperplane H = {x | pt x = α} is said to


separate S1 and S2 if pt x ≥ α for each x ∈ S1 and pt x ≤ α for each x ∈ S2 . If in addition, S1 ∪ S2
is not properly contained in H, then H is said to proporly separate S1 and S2 . The hyperplane H is
said to strictly separate S1 and S2 if pt x > α for each x ∈ S1 and pt x < α for each x ∈ S2 . The
hyperplane H is said to strongly separate S1 and S2 if pt x ≥ α +  for each x ∈ S1 and pt x ≤ α for
each x ∈ S2 , where  is a positive scalar.

Figure 2.5 (a-d) represents the improper, proper, strict and strong separations, respectively.

Figure 2.5: Diferent types of separations of the sets S1 and S2 .

Theorem 6. Let S be a nonempty closed convex set in Rn and y 6∈ S. Then there exists a nonzero
vector p and a scalar α such that pt y > α and pt x ≤ α for all x ∈ S.

20
Proof. The set S is nonempty, closed and convex and y 6∈ S. Then, by Theorem 5, there exists a
minimizing point x̄ ∈ S such that

(x − x̄)t (y − x̄) ≤ 0 (2.2)

for all x ∈ S. Let p = y − x̄ 6= 0 and α = x̄t (y − x̄) = pt x̄.


Then we can verify the following.

pt y − α = (y − x̄)t y − pt x̄
= (y − x̄)t y − (y − x̄)t x̄
= (y − x̄)t (y − x̄)
= ||y − x̄||2

Therefore, pt y > α.
Also, pt x ≤ α for each x ∈ S by Theorem 5, since (y − x̄)t x ≤ (y − x̄)t x̄ = α.

Corollary 2. pt x̄ = max{pt x | x ∈ S}.

Proof. Since for each x ∈ S, pt (x̄ − x) = (y − x̄)t (x̄ − x) ≥ 0, it holds, (y − x̄)t x̄ ≥ (y − x̄)t x. That
is, pt x̄ ≥ pt x for all x ∈ S.

Definition 15. A nonempty set C in Rn is called a cone with vertex zero if x ∈ C implies that λx ∈ C
for all λ ≥ 0. Moreover, if in addition, the set C is convex and λx ∈ C for all λ ≥ 0 if x ∈ C, then it
is called convex cone.

Figure 2.6: (a). Convex cone and (b). Non-convex cone.

21
Theorem 7. (Farka’s Theorem) Let A be an m × n matrix and c be an n vector. Then exactly one of
the following systems has a solution.

System I Ax ≤ 0 and c0 x > 0 for some x ∈ Rn .

System II At y = c and y ≥ 0 for some y ∈ Rm .

Proof. Suppose that System II has a solution. That is, there exists a y ≥ 0 and At y = c. Let x be such
that Ax ≤ 0. Then c0 x = (At y)t x = y 0 Ax ≤ 0. Therefore, System I does not have a solution.
Suppose that System II does not have a solution. Consider a set S = {x | x = At y, y ≥ 0}. Then
S is closed and convex. But c 6∈ S. Then there exists a vector p ∈ Rn and scaler α such that p0 c > α
and p0 x ≤ α for all x ∈ S. But 0 ∈ S implies that α ≥ 0, so p0 c > 0. Also α ≥ p0 x = p0 At y = y 0 Ap
for all y ≥ 0. As y ≥ 0 can be made arbitrarily large, it becomes Ap ≤ 0. Thus, Ap ≤ 0 and c0 p > 0.
That means System I has a solution p ∈ Rn .

22
Chapter 3

Linear Programming

Definition 16. A set S in En is called a polyhedral set if it is the intersection of finite number of closed
half-spaces; that is, S = {x : pi 0 x ≤ αi for i = 1, · · · , m}, where pi is a nonzero vector and αi is a
scalar for i = 1, · · · , m.

A polyhedral set is a closed convex set that can be represented by a finite number of inequalities
and/or equations since an equation can be represented by two inequalities. Following are some exam-
ples of polyhedral sets, where A is an m × n matrix and b is an m vector: S = {x : Ax ≤ b, x ≥
0}, S = {x : Ax = b, x ≥ 0}, S = {Ax ≥ b, x ≥ 0}.

Example 13. In Figure 3.1, the shaded region, S = {(x1 , x2 ) : −x1 + x2 ≤ 2, x2 ≤ 4, x1 ≥ 0, x2 ≥


0, x1 ≤ 3} is the polyhedral set.

Definition 17. Let S be a nonempty convex set in En . A vector x ∈ S is called an extreme point of S
if x = λx1 + (1 − λ)x2 with x1 , x2 ∈ S, and λ ∈ (0, 1) implies x = x1 = x2 .

The following are some examples of extreme points of convex sets. We denote the extreme points
by E.

1. S = {(x1 , x2 ) : x21 + x22 ≤ 1}


E = {(x1 , x2 ) : x21 + x22 = 1}

2. S = {(x1 , x2 ) : x1 + x2 ≤ 2, −x1 + 2x2 ≤ 2, x1 , x2 ≥ 0}


E = {(0, 0)t , (0, 1)t , (2/3, 4/3)t , (2, 0)t }

Theorem 8. (Characterization of Extreme Points) Let S = {x : Ax = b, x ≥ 0}, where A is an m × n


matrix of rank m, and b is an m vector. An extreme point x ∈ S if and only if A can be decomposed
into [B, N ] such that " # " #
x1 B −1 b
x= = (3.1)
x2 0

23
Figure 3.1: A polyhedral set.

where B is an m × m invertible matrix satisfying B −1 b ≥ 0. Any such a solution is called a basic


feasible solution(BFS) for S.
" # " #
x1 B −1 b
Proof. Suppose that A can be decomposed into [B, N ] with x = = and B −1 b ≥ 0.
x2 0
It is clear that x ∈ S. Let us suppose x = λx1 + (1 − λ)x2 with x1 , x2 ∈ S for some λ ∈ (0, 1). In
particular, let xt1 = (xt11 , xt12 ) and xt2 = (xt21 , xt22 ). Then,
" # " # " #
B −1 b x11 x21
= x = λx1 + (1 − λ)x2 = λ + (1 − λ) .
0 x12 x22

Since x12 , x22 ≥ 0 and λ ∈ (0, 1), it follows that x12 = x22 = 0. But this implies that x11 = x21 =
−1
B b and, hence, x = x1 = x2 . This shows that x is an extreme point of S.
Conversely, suppose that x is an extreme point of S. Without loss of generality, suppose that
x = (x1 , x2 , · · · , xk , 0, · · · , 0)t , where x1 , · · · , xk are positive. It suffices to show that a1 , a2 , · · · , ak are
linearly independent. On the contrary, suppose that there exist scalar λ1 , λ2 , · · · , λk not all zero such
that kj=1 λj aj = 0. Let λ = (λ1 , λ2 , · · · , λk , 0 · · · , 0)t . Construct the following two vectors where
P

α > 0 is chosen such that x1 , x2 ≥ 0: x1 = x + αλ and x2 = x − αλ. Note that Ax1 = Ax + αAλ =
Ax + α kj=1 λj aj = b and, similarly, Ax2 = b. Therefore, x1 , x2 ∈ S and, since α > 0 and λ 6= 0, x1
P

and x2 are distinct. Moreover, x = 21 x1 + 12 x2 , this contradicts the fact that x is an extreme point.
Thus, a1 , a2 , · · · , ak are linearly independent. Since A has rank m, m − k columns out of the last n − k
columns may be chosen such that they together with previous k columns are linearly independent. To
simplify this notation suppose that these columns are ak+1 , · · · , am . Thus, A can be written as A =

24
−1 t
[B, N ], where B = [a1 · · · am ] is of full rank. Furthermore,
" B# b = " (x1 , x2#, · · · , xk , 0, · · · , 0) , and
x1 B −1 b
since xj ≥ 0 for j = 1, · · · , k then B −1 b ≥ 0. Hence x = = and B −1 b ≥ 0.
x2 0

Corollary 3. The number of extreme points of S is finite.


!
n n!
Proof. The number of extreme points is less than or equal to = m!(n−m)!
which is the mini-
m
mum number of possible ways to choose m columns of A to form B.

Theorem 9. ( Existence of Extreme Points) Let S = {x : Ax = b, x ≥ 0 be nonempty, where A is an


m × n matrix of rank m and b is an m vector. Then, S has at least one extreme point.

Proof. Let x ∈ S and, without loss of generality, suppose that x = (x1 , x2 , · · · , xk , 0, · · · , 0)t , where
xj > 0 for j = 1, 2, · · · , k. If a1 , · · · , ak are linearly independent, then k ≤ m, and x is an ex-
treme point. Otherwise, there exists scalars λ1 , · · · , λk with at least one positive component such that
Pk
j=1 λj aj = 0. Define α > 0 as follows:

xj xi
α = min { ; λj > 0} = .
1≤j≤k λj λi

Consider the point x0 whose j th component x0j is given by


(
xj − αλj for j = 1, · · · , k
x=
0 for j = k + 1, · · · , n.

Here x0j ≥ 0 for j = 1, · · · , k and x0j = 0 for j = k + 1, . . . , n. Moreover, x0i = 0, and

n
X k
X k
X k
X
aj x0j = aj (xj − αλj ) = xj λ j − α aj λj = b − 0 = b.
j=1 j=1 j=1 j=1

Thus, we have constructed a new point x0 with at most k − 1 positive components. This process is
continued until the positive components correspond to linearly independent columns, which results in
an extreme point. Hence, S has at least one extreme point.

Definition 18. Let S be a nonempty, closed set in En . A nonzero vector d in En is called a direction,
or a recession direction, of S if for each x ∈ S, x + λd ∈ S for all λ ≥ 0. Two directions d1 and d2 of
S are called distinct if d1 6= αd2 for any α > 0. A direction d of S is called an extreme direction if it
cannot be written as positive linear combination of two distinct directions, that is, if d = λ1 d1 + λ2 d2
for λ1 , λ2 > 0, then d1 = αd2 for some α > 0.

Let S = {x : Ax = b, x ≥} =
6 φ, where A is an m × n matrix of rank m. Then by definition, a
nonzero vector d is a direction of S if x + λd ∈ S for each x ∈ S and each λ ≥ 0.

25
Theorem 10. (Characterization of Extreme Directions) Let S = {x : Ax = b, x ≥ 0} =
6 φ, where A
is an m × n matrix of rank m, and b is an m vector. A vector d is an extreme direction of S if and only
[B, N ] such that B −1 aj ≤ 0 for some column aj of N , and d is a positive
if A can be decomposed into !
−B −1 aj
multiple of d = , where ej is an n − m vector of zeros except for a 1 in position j.
ej
Corollary 4. The number of extreme directions of S is finite.

3.0.1 Representation Theorem

A polyhedral set is the intersection of a finite number of half - spaces. This representation may be
through of as an outer representation. A polyhedral set can also be described fully by an inner repre-
sentation by means of its extreme points and extreme directions. Suppose S be a nonempty polyhedral
set of the form {x : Ax = b, x ≥ 0}. Then, any point in S can be represented as a convex combination
of its extreme points plus a non - negative linear combination of its extreme directions. If S is bounded,
then it contains no directions, and so any point in S can be described as a convex combination of its
extreme points.

Theorem 11. (Representation Theorem) Let S be a nonempty polyhedral set in En of the form {x :
Ax = b and x ≥ 0}, where A is an m × n matrix with rank m. Let x1 , · · · , xk be the extreme points
of S and d1 , · · · , dl be the extreme directions of S. Then, x ∈ S if and only if x can be written as
k
X l
X
x= λj xj + µj dj (3.2)
j=1 j=1
k
X
λj = 1 (3.3)
j=1
λj ≥ 0 forj = 1, · · · , k (3.4)
µj ≥ 0 forj = 1, · · · , l. (3.5)

Proof. See yourself.

Corollary 5. Let S be a nonempty polyhedral set of the form {x : Ax = b, x ≥ 0}, where A is m × n


matrix with rank m. Then, S has at least one extreme direction if and only if S is unbounded.

3.1 The Simplex Method

A linear programming problem is the minimization or the maximization of a linear function over a
polyhedral set. Many problems can be formulated as, or approximate by linear programs. Also, linear
programming is often used in the process of solving nonlinear and discrete problems. The simplex

26
method is mainly based on exploiting the extreme points and directions of the polyhedral set defining
the problem.

3.1.1 Optimality Condition

Suppose the following linear programming problem: Minimize c0 x subject to x ∈ S, where S is a


polyhedral set in En . The set S is called the constraint set, or feasible region, and the linear function
c0 x is called the objective function. The optimum objective function value of a linear programming
problem may be finite or unbounded. The following theorem gives a necessary and sufficient condition
for a finite optimal solution.

Theorem 12. Consider the following linear programming problem: Minimize c0 x, subject to Ax = b,
x ≥ 0. Here, c is an n vector , A is an m × n matrix of rank m, and b is an m vector. Suppose that
the feasible region is not empty, and let x1 , · · · , xk be the extreme points and d1 , · · · , dl be the extreme
directions of the feasible region. A necessary and sufficient condition for a finite optimal solution is
that c0 dj ≥ 0 for j = 1, · · · , l. If this condition holds true, then there exists an extreme point xi that
solves the problem.

Proof. By Theorem 11, Ax = b and x ≥ 0 if and only if

k
X l
X
x= λj xj + µj dj
j=1 j=1
k
X
λj = 1
j=1
λj ≥ 0 forj = 1, · · · , k
µj ≥ 0 forj = 1, · · · , l.

Therefore, the linear programming problem can be stated as follows:


k
X l
X
0
Minimize c ( λj xj + µj dj )
j=1 j=1
k
X
subject to λj = 1
j=1
λj ≥ 0 forj = 1, · · · , k
µj ≥ 0 forj = 1, · · · , l.

If c0 dj < 0 for some j, then µj can be chosen arbitrarily large, leading to an unbounded optimal
objective value. This shows that a necessary and sufficient condition for a finite optimum is c0 dj ≥ 0

27
for j = 1, · · · , l. If this condition hold true, then in order to minimize the objective function, we may
choose µj = 0 for j = 1, · · · , l, and the problem reduces to c0 ( kj=1 λj xj ) subject to kj=1 λj = 1 and
P P

λj ≥ 0 for j = 1, · · · , k. The optimal solution of latter problem is finite and found by letting λi = 1
and λj = 0 for j 6= i, where the index i is given by c0 xi = min1≤j≤k c0 xj . Thus there exists an optimal
extreme point.

3.1.2 The Simplex Algorithm

The simplex method is a systematic procedure for solving a linear programming problem by mov-
ing from an extreme point to an extreme point with a better (at least not worse) objective function
value. This process continues until an optimal extreme point is reached and recognized,or else, until
an extreme direction d with c0 d < 0 is found. In the later case, we conclude that the objective value
is unbounded, and we declare the problem to be ”unbounded”. Note that the unboundedness of the
feasible region is a necessary, but not sufficient conditions for the problem to be unbounded.
Any polyhedral set can be put in the standard format. For example, as inequality of the form
P
j=1 naij xj ≤ bi can be transformed into an equation by adding the non negative slack variables si ;
P
so that j=1 naij xj + si = bi . Also, an unrestricted variables xj can be replaced by the difference of
two non-negative variables. We shall assume for the time being that the set admits at least one feasible
point and that the rank of A is equal to m.
In the case of finite optimal solution, it suffices to concentrate on extreme points. Suppose that we
have an extreme point x. By the characterization theorem of extreme point, this point is characterized
by a decomposition of A into [B, N ], where B = [aB1 , · · · , aBm ] is an m × n matrix of full rank called
the basis, and N is an m × (n − m) matrix called nonbasic. The variables corresponding to the basis
B are called basic variables and are denoted by aB1 , · · · , aBm , whereas the variables corresponding to
N are called nonbasic variables. Now let us consider a point x satisfying Ax = b x ≥ 0. Decompose
xt into (xtB , xtN ) and note that xtB , xtN ≥ 0. Also, Ax = b can be written as BxB + N xN = b. Hence

xB = B −1 b − B −1 N xN .

. Then

ct x = ctB xB + ctN xN (3.6)


= ctB B −1 b + (ctN − ct B −1 N )xN (3.7)
= ct x + (ctN − ct B −1 N )xN (3.8)

Hence ctN − ct B −1 N ≥ 0 then since xN ≥ 0, we have ct x ≥ ct x and that x is an optimal extreme


point. On the other hand suppose that ctN − ct B −1 N  0. In particular suppose that the j th component

28
cj − ctB B −1 aj is negative. Consider x = x + λdj , where
!
B −1 aj
d=
ej

where ej is an n − m unit vector with a 1 at position j. Then, from Equation 3.6,

ct x = ct x + λ(cj − ctB B −1 aj ) (3.9)

and we get ct x < ct x for λ > 0 since cj − ctB B −1 aj < 0. Then there can be arise two cases, where
yj = B −1 aj .
Case 1: yj ≤ 0
Note that Adj = 0, and since Ax = b, then Ax = b for x = x + λdj and for all values of λ. Hence, x is
feasible if and only if x ≥ 0. This obviously holds true for allλ ≥ 0 if yj ≤ 0. Thus from Equation 3.9
the objective function value is unbounded. Then from Theorem 10 and 11, we have found an extreme
direction dj with ct dj = cj − ctB B −1 aj < 0.
Case 2: yj > 0
Let B −1 b = b, and let λ be defined by;

bi br
λ = min { : yij ≥ 0} = ≥0 (3.10)
1≤i≤m yij yrj
where yij is the ith component of yj . In this case, the components of x = x + λdj are given by

br
xbi = bi − yij for i = 1, · · · , m (3.11)
yrj

br
xj = (3.12)
yrj
And all other xi ’s are equal to zero. The positive components of x can only be xB1 , · · · , xBr−1 ,
xBr+1 , xBm and xj . Hence at most m components of x are positive. Thus their corresponding columns
in A are linearly independent. Therefore by Theorem 8, the point x is itself an extreme point. In this
case we say that the basic variable xB left the basis and the nonbasic variable xj entered the basis in
exchange.
Thus for given an extreme point, we can check its optimality and stop, or find an extreme direction
leading to an unbounded solution, or find an extreme point with a better objective value. The process
is repeated. We can summarize above algorithm as follows:

Algorithm 1. Simplex Procedure


begin

29
opt=”no”,and unbounded =”no” do
while opt = ”no”, and unbounded = ”no” do
if cj ≤ 0 for all j, then opt = ”yes”
else begin
choose any j 0 such that c0j > 0 and c0j ≥ cj ∀j ;
if xij ≤ 0 for all i,then unbounded=”yes”
else
find
xk0
θ0 = mini:xij >0 { xxi0
ij
}= xkj
and pivot on xkj
end
end

3.1.3 Finite Convergence of Simplex Method

At each iteration, one pass through the main step, we have b = B −1 b ≥ 0 then λ defined by Equation
3.10, would be strictly positive, and the objective value at the current extreme point would be strictly
less than that at any of the previous iteration. This implies that the current point is distinct from those
previously generated. Since we have a finite number of extreme points, the simplex algorithm must
stop in a finite number of iterations. If on the other hand, br = 0 then λ = 0, and we would remain
at the same extreme point but with a different basis. In theory ,this could happen in cycling and rarely
occurs in practice.

Example 14.

Minimize x1 − 3x2
subject to − x1 + 2x2 ≤ 6
x1 + x2 ≤ 5
x1 , x2 ≥ 0

The standard form of this LP problem is:

Minimize (f) = x1 − 3x2 + 0.x3 + 0.x4


i. e.f − x1 + 3x2 + 0.x3 + 0.x4 = 0
subject to − x1 + 2x2 + x3 = 6
x1 + x 2 + x4 = 5
x1 , x2 ≥ 0
x3 , x4 − slack variables.

The corresponding tableau of above standard form is;

30
Variables → f x1 x2 x3 x4 RHS(b) Row Operation Ratios Remark
Basic Variables↓ 1 -1 3 0 0 0 - - -
6
x3 0 -1 2 1 0 6 - 2
=3 Minimum
5
x4 0 1 1 0 1 5 - 1
=5 -

From above tableau we can say that x3 = 6 and x4 = 5 is basic solution whereas x1 = x2 = 0 which
are non basic variables. Thus B = [x3 , x4 ], N = [x1 , x2 ] and f = 0. Applying the simplex algorithm
to the above tableau we obtain following tableau;

Variables → f x1 x2 x3 x4 b Row Operations Ratios Remark


1
Basic Variables↓ 1 2
0 − 32 0 −9 R0 → R0 − 3R1 - -
R1
x2 0 − 12 1 1
2
0 3 R1 → 2
- -
3
x4 0 2
0 − 12 1 2 R2 → R2 − R1 2× 2
3
Minimum

From above tableau we can say that x2 = 3 and x4 = 2 is basic solution whereas x1 = x3 = 0
which are non basic variables. Thus B = [x2 , x4 ], N = [x1 , x3 ] and f = −9. Applying the simplex
algorithm to the above tableau we obtain following tableau;

Variables → f x1 x2 x3 x4 b Row Operations Ratios Remark


Basic Variables↓ 1 0 0 − 43 − 31 − 29
3
R0 → R0 + 1
R
2 2
- -
1 1 11 1
x2 0 1 1 3 3 3
R1 → R1 − R
2 2
- -
x1 1 0 1 − 13 2
3
4
3
R2 → 2
3
× R2 - -

4 11
From above tableau we can say that x1 = 3
and x2 = 3
is basic solution whereas x3 = x4 = 0
which are non basic variables. Thus B = [x1 , x2 ], N = [x3 , x4 ] and f = − 29
3
. Since cj ≤ 0 ∀j so
that the simplex algorithm terminates with (x1 , x2 , x3 , x4 ) = ( 34 , 11
3
, 0, 0) which is the required optimal
solution with optimal value f = − 29
3
.

Remark 4. Maximize (f) = - Minimize (f) and Minimize (f) = - Maximize (f).

3.1.4 The Initial Extreme Point

The simplex method starts with an initial extreme point. By Theorem 8 finding an initial extreme point
of the set S = {x : Ax = b, x ≥ 0} involves decomposition A into B and N with B −1 b ≥ 0. In
Example 14 an initial extreme point was immediately available. However, in many cases an initial
extreme point may not be conveniently available. This difficulty can be overcome by introducing
artificial variables. We discuss briefly two procedures for obtaining the initial extreme point. These
are two phase and big - M methods. For both methods the problem should be reduced to standard form
Ax = b and x ≥ 0, with additional requirement bi ≥ 0 otherwise the it h constraint is multiplied by
−1.

31
3.1.5 The Two-Phase Method

In this method, the constrains of the problem are altered by the use of artificial variables so that an
extreme point of the new system is at hand. In particular, the constraint system is modified to;

Ax + xa = b (3.13)
x, xa ≥ 0 (3.14)

where xa is an artificial variable. Clearly, x = 0 and xa = b represent an extreme point of the the given
system. Since a feasible solution of the original system will be obtained only if xa = 0, we can use
the simplex method itself to minimize the sum of artificial variables starting from the above extreme
point. This leads to the following Phase I problem:

Minimize et xa (3.15)
subject to Ax + xa = b (3.16)
x, xa ≥ 0 (3.17)

where e is a vector of 1’s. At the end of Phase I, either xa 6= 0 or xa = 0. In the previous case we
conclude that the original system is inconsistent, that is, feasible region is empty. In the latter case,
the artificial variables would drop from the basis and hence we would obtain an extreme point of the
original problem. Starting with this extreme point, Phase II of the simplex method minimizes the
original objective c0 x.

Example 15. To obtain the optimal solution we apply two- phase method let us consider the following
LP;

Minimize(Z) = x1 + x2 + x3 + x4 + x5
subject to 3x1 + 2x2 + x3 = 1
5x1 + x2 + x3 + x4 = 3
2x1 + 5x2 + x3 + x5 = 4
x1 , x2 , x3 , x4 , x5 ≥ 0

To obtain the optimal solution of the above problem we apply two- phase method. The problem can be
written as;

Minimize(Z) = x1 + x2 + x3 + x4 + x5 (3.18)
i. e. Z − x1 − x2 − x3 − x4 − x5 = 0 (original objective function) (3.19)

32
minimize W = xa1 + xa2 + xa3 (3.20)
i. e. W − xa1 − xa2 − xa3 = 0 (artificial objective function) (3.21)
subject to 3x1 + 2x2 + x3 + xa1 = 1 (3.22)
5x1 + x2 + x3 + x4 + xa2 = 3 (3.23)
2x1 + 5x2 + x3 + x5 + xa3 = 4 (3.24)
x1 , x2 , x3 , x4 , x5 ≥0 (3.25)
xa1 , xa2 , xa3 ≥ 0 (artificial variables) (3.26)

To use two phase method, we would begin with the tableau;

Vbs → xa1 xa2 xa3 x1 x2 x3 x4 x5 b Row Operations Ratios Remark


B Vs↓ 0 0 0 -1 -1 -1 -1 -1 0 - - R00
W -1 -1 -1 0 0 0 0 0 0 - - R0
− 1 0 0 3 2 1 0 0 1 - - R1
− 0 1 0 5 1 1 1 0 3 - - R2
− 0 0 1 2 5 1 0 1 4 - - R3

The pivots and tableau in Phase I is shown below;

Vbs → xa1 xa2 xa3 x1 x2 x3 x4 x5 b Row Operations Ratios


B Vs↓ 0 0 0 -1 -1 -1 -1 -1 0
W 0 0 0 10 8 3 1 1 8 R0 → R1 + R2 + R3
1
xa1 1 0 0 3 2 1 0 0 1 R1 → R1 3 Min.
3
xa2 0 1 0 5 1 1 1 0 3 R2 → R2 5
4
xa3 0 0 1 2 5 1 0 1 4 R3 → R3 2

Vbs → xa1 xa2 xa3 x1 x2 x3 x4 x5 b Row Operations Ratios Remark


−1 −2
B Vs↓ 1
3 0 0 0 3 3 -1 -1 1
3 R00 → R00 + R1 -
W − 10
3 0 0 0 4
3 − 13 1 1 14
3 R0 → R0 − 10R1 -
1 2 1 1 R1 1
x1 3 0 0 1 3 3 0 0 3 R1 → 3 2 Min.
−5
xa2 3 1 0 0 - 73 - 23 1 0 4
3 R2 → R2 − 5R1
−2 11 1 10 10
xa3 3 0 1 0 3 3 0 1 3 R3 → R3 − 2R1 11

Vbs → xa1 xa2 xa3 x1 x2 x3 x4 x5 b Row Operations Ratios Remark


−1 R1
B Vs↓ 1
2 0 0 1
2 0 2 -1 -1 1
2 R00 → R00 + 3 -
4R1
W -4 0 0 -2 0 -1 1 1 4 R0 → R0 − 3 -
1 3 1 1 3R1
x2 2 0 0 2 1 2 0 0 2 R1 → 2
−1 7 1 5 7R1 5
xa2 2 1 0 2 0 2 1 0 2 R2 → R2 + 3 2 Min.
−15 −11 −3 3 11R1
xa3 6 0 1 2 0 2 0 1 2 R3 → R3 − 3

33
Vbs → xa1 xa2 xa3 x1 x2 x3 x4 x5 b Row Operations Ratios Remark
B Vs↓ 0 1 0 4 0 0 0 -1 3 R00 → R00 + R2 -
W − 72 -1 0 − 11
2 0 − 23 0 1 3
2 R0 → R0 − R2 -
1 3 1 1
x2 2 0 0 2 1 2 0 0 2 R1 → R1
−1 7 1 5
xa2 2 1 0 2 0 2 1 0 2 R2 → R2
−15 −11 −3 3 3
xa3 6 0 1 2 0 2 0 1 2 R3 → R3 2 Min.

Vbs → xa1 xa2 xa3 x1 x2 x3 x4 x5 b Row Operations Ratios Remark


−5 −3 −3
B Vs↓ 2 1 1 2 0 2 0 0 9
2 R00 → R00 + R3 -
W −1 -1 -1 0 0 0 0 0 0 R0 → R0 − R3 -
1 3 1 1
x2 2 0 0 2 1 2 0 0 2 R1 → R1
−1 7 1 5
x4 2 1 0 2 0 2 1 0 2 R2 → R2
−15 −11 −3 3
xa3 6 0 1 2 0 2 0 1 2 R3 → R3

In above table W = 0 so that we go to phase II from Phase I by removing artificial variables and
artificial objective function which is shown in following tableau;

Vbs → x1 x2 x3 x4 x5 b
−3 −3 9
B Vs↓ 2
0 2
0 0 2
3 1 1
x2 2
1 2
0 0 2
7 1 5
x4 2
0 2
1 0 2
−11 −3 3
x5 2
0 2
0 1 2

Since cj ≤ 0 for all j so that above tableau is optimal. Hence optimal solution is
(x1 , x2 , x3 , x4 , x5 ) = (0, 12 , 0, 52 , 23 ) with optimal value Z = 92 .

3.1.6 The Big-M Method (Charnes Penalty Method)

As in the two-phase method, the constraints are modified by the use of artificial variables so that
an extreme point of the new system is immediately available. A large positive cost coefficient M is
assigned to each artificial variable so that they will drop to zero level. This leads to the following
problem.

Minimize c0 x + M et xa (3.27)
subject to Ax + xa = b (3.28)
x, xa ≥ 0 (3.29)

We can execute the simplex method without actually specifying a numerical value for M by carry-
ing the objective coefficients of M for the nonbasic variables as a separate vector. These coefficients

34
precisely identified with the reduced objective coefficients for the phase I problem, hence, directly re-
lating the two-phase and the big-M methods. If at termination xa = 0,then we have an optimal solution
to the original problem. Otherwise, if xa 6= 0 at termination of the simplex method, and provided that
the variable entering the basis is the one with the most positive coefficient in the objective row,that is
we give priority attention to the coefficients of M or to the Phase I component of the big-M objective
function, we can conclude that the system Ax = b and x ≥ 0 admits no feasible solutions. The simplex
algorithm for LPPs with artificial variables is carried out as follows:

1. Introduce artificial variables, and set initial extreme point by choosing basic variables.

2. If the optamility criteria is satisfied and the basis B contains no artificial variable then the current
solution is an optimal basic feasible solution.

3. If the optamility criteria is satisfied one or more artificial variable appears in the basis at zero
level, that is, the value of artificial variable zero, then the current solution is degenerate solution
containing artificial variable.

4. If the optamility criteria is satisfied at least one artificial vector appears in the basis at positive
level, then there exists no feasible solution. Such a solution is known as pseudo optimum basic
feasible solution.

Example 16.

Maximize Z = 2x1 + x2
1
subject to x1 − x2 ≥ 1,
2
x1 − x2 ≤ 2,
x1 + x2 ≤ 4,
x1 , x2 ≥ 0.

Solution Let us introduce surplus variable x3 and slack variables x4 , x5 such that

Maximize Z = 2x1 + x2
Maximize (Z) = −Minimize Z = −(2x1 + x2 )
i.e, −Z + 2x1 + x2 = 0
1
subject to x1 − x2 − x3 = 1,
2
x1 − x2 + x4 = 2,
x1 + x2 + x5 = 4,
x1 , x2 , x3 , x4 , x5 ≥ 0.

The corresponding tableau of the above LLP is;

35
Vbs → x1 x2 x3 x4 x5 RHS(b)
B Vs↓ 2 1 0 0 0 0
1 − 12 -1 0 0 1
x4 1 -1 0 1 0 2
x5 1 1 0 0 1 4

Here x4 , x5 are basis element but x3 can not be basis element the coefficient of x3 is −1 in the first
constraint. If we multiply that constraint by −1 then x3 = −1 which is not possible because x3 is
surplus variable. Then we introduce an artificial variable x6 with the penalty M then the above LPP
reduces to;

Maximize (Z) = −Minimize Z = Minimize W = −(2x1 + x2 ) + M x6


i.e, W + 2x1 + x2 − M x6 = 0
1
subject to x1 − x2 − x3 + x6 = 1,
2
x1 − x2 + x4 = 2,
x1 + x2 + x5 = 4,
x1 , x2 , x3 , x4 , x5 ≥ 0.

Then the above tableau becomes;

Vbs → x1 x2 x3 x4 x5 x6 RHS(b)
BVs ↓ 2 1 0 0 0 −M 0
1 − 21 -1 0 0 1 1
x4 1 -1 0 1 0 0 2
x5 1 1 0 0 1 0 4

Let us choose x1 , x2 and x6 as basic variables then the above tableau becomes:

Vbs → x1 x2 x3 x4 x5 x6 b Row Operations Ratios


BVs ↓ 2+M 1 − M2 −M 0 0 0 M R0 → R0 + M R1
x6 1 − 12 -1 0 0 1 1 R1 → R1 1 (min)
x4 1 -1 0 1 0 0 2 R2 → R2 2
x5 1 1 0 0 1 0 4 R3 → R3 4

Applying simplex algorithm in above tableau we get following:

36
Vbs → x1 x2 x3 x4 x5 x6 b Row Operations
BVs ↓ 0 2 2 0 0 0 −2 R0 → R0 − (2 + M )R1
x1 1 − 21 -1 0 0 1 1 R1 → R1
x4 0 − 12 1 1 0 0 1 R2 → R2 − R1
3
x5 0 2
1 0 1 0 3 R3 → R3 − R1

In above tableau x1 , x4 , x5 are basic variables and x2 , x3 .x6 are non basic variables. Since x6 is
the artificial variable that is not in basis so that we remove the column six from tableau corresponding
to the variable x6 . The the tableau reduces as follows:

Vbs → x1 x2 x3 x4 x5 b Ratios
BVs ↓ 0 2 2 0 0 −2
x1 1 − 12 -1 0 0 1
x4 0 − 12 1 1 0 1
3
x5 0 2
1 0 1 3 2(min)

The operations of the algorithm is shown in the following successive tableau:

Vbs → x1 x2 x3 x4 x5 b Row operations Ratios


2
BVs ↓ 0 0 3
0 − 43 −6 R0 → R0 − 2R3
R3
x1 1 0 − 32 0 1
3
2 R1 → R1 + 2
4 1 R3 3
x4 0 0 3
1 3
2 R2 → R2 + 2 2
(min)
2 2 3R3
x2 0 1 3
0 3
2 R3 → 2
3

Vbs → x1 x2 x3 x4 x5 b Row operations


2R2
BVs ↓ 0 0 0 − 12 − 32 -7 R0 → R0 − 3
1 1 2R2
x1 1 0 0 2 2
3 R1 → R1 + 3
3 1 3 3R2
x3 0 0 1 4 4 2
R2 → 4
2R2
x2 0 1 0 - 12 1
2
1 R3 → R3 − 3

Here the optimality criteria is satisfied and the basis B contains no artificial variable then the
current solution is an optimal basic feasible solution. The optimal solution is (x1 , x2 ) = (3, 1) with
optimal value W = −Z = −7, that is, Z = 7.

37
Chapter 4

Duality in LP

The simplex method affords a simple derivation of a duality theorem for linear programming. Consider
the linear program in standard form to minimize c0 x subject to Ax = b and x ≥ 0. Let us consider this
as the primal problem P . The following linear problem is called the dual problem (D) of the foregoing
primal problem (P ).

Maximize b0 y
subject to At y ≤ c
y − unrestricted
Definition 19. Let an LP be given in general form

minimize c0 x
subject to a0i x = bi i ∈ M
a0i x ≥ bi i ∈ M
xj ≥ 0j∈N
xj − f ree j ∈ N
called primal. Then the dual is defined as follows:

maximize y 0 b
subject to y 0 Aj ≤ cj j ∈ N
y 0 Aj = cj j ∈ N
yi − f ree i ∈ M
xj ≥ 0i∈M

38
S. N. Primal Dual
1. Objective function: maximizing Objective function: minimizing
2. Requirement vector b Cost vector bt
3. Coefficient matrix A Coefficient matrix At
4. Constraints ≤ sign Constraints ≥ sign
5. Number of constraints Number of variables
6. Number of variables Number of constraints
7. ith constraint inequality ith variable positive
8. ith constraint equality ith variable free
9. ith variable restricted ith constraint restricted
10. ith slack variable positive ith dual variable zero

Example 17. Consider a linear programming problem (as primal problem):

MaxZ = 2x1 + 3x2 + x3

subject to
4x1 + 3x2 + x3 = 6,

x1 + 2x2 + 5x3 = 4,

x1 , x2 , x3 ≥ 0.

This problem can be written as;

MaxZ = 2x1 + 3x2 + x3

subject to
4x1 + 3x2 + x3 ≤ 6,

−4x1 − 3x2 − x3 ≤ −6,

x1 + 2x2 + 5x3 ≤ 4,

−x1 − 2x2 − 5x3 ≤ −4,

x1 , x2 , x3 ≥ 0.

Take dual variables y10 , y100 , y20 , y200 ≥ 0 and dual objective W , then the dual of the above problem is
written as;

MinW = 6y10 − 6y100 + 4y20 − y200

39
subject to
4y10 − 4y100 + y20 − y200 ≥ 2

3y10 − 3y100 + 2y20 − 2y200 ≥ 3

y10 − y100 + 5y20 − 5y200 ≥ 1

y10 , y100 , y20 , y200 ≥ 0.

Replacing y10 − y100 by y1 and y20 − y200 by y2 then the dual becomes

MinW = 6y1 + 4y2

subject to
4y1 + y2 ≥ 2,

3y1 + 2y2 ≥ 3,

y1 + 5y2 ≥ 1,

y1 , y2 − free.

Theorem 13. The dual of the dual is the primal.

Proof. Let us consider a LP as primal as follows:

minimize c0 x
subject to a0i x = bi i ∈ M
a0i x ≥ bi i ∈ M
xj ≥ 0j∈N
xj − f ree j ∈ N

Then the dual is defined as follows:

maximize y 0 b
subject to y 0 Aj ≤ cj j ∈ N
y 0 Aj = cj j ∈ N
yi − f ree i ∈ M
xj ≥ 0i∈M

which can be written as

40
minimize y 0 (−b)
subject to y 0 (−Aj ) ≥ −cj j ∈ N
y 0 (−Aj ) = −cj j ∈ N
yi − f ree i ∈ M
xj ≥ 0i∈M

maximize (−c0 )x
subject to − a0i x = −bi i ∈ M
−a0i x ≤ −bi i ∈ M
xj ≥ 0j∈N
xj − f ree j ∈ N

This can be written as;

minimize c0 x
subject to a0i x = bi i ∈ M
a0i x ≥ bi i ∈ M
xj ≥ 0j∈N
xj − f ree j ∈ N

This is same as the primal problem. Hence the dual of the dual is primal.

Theorem 14. (Strong Duality) If an LP has an optimal solution, then the dual is optimal with equal
objective value.

Proof. Let x be a feasible solution to the primal LP and let y be a feasible solution to the corresponding
dual LP. Then
c0 x ≥ y 0 Ax ≥ y 0 b.

That is, the primal cost dominates the dual cost. Since the primal LP has a feasible solution, the dual
LP can not have a unbounded solution in cost. Since dual has feasible solution y 0 , by the simplex
algorithm, it has an optimal for the dual max y 0 b with cost

y0b = b b −1 b = cb0 B x
cB B b0

which is the optimal cost in the primal. Therefore, y 0 is optimal in the dual.

41
Corollary 6. If the primal is unbounded, then the dual is infeasible. If the dual is unbounded, then the
primal is infeasible.

Proof. By weak duality relation c0 x ≥ y 0 b. Thus it is clear that if either the primal or the dual has
unbounded cost, the other cannot have feasible solution.

Corollary 7. If the primal is infeasible, then the dual is either unbounded or infeasible. If dual is
infeasible, then the primal is either unbounded or infeasible.

Proof. If one is infeasible then the other could not have an optimal solution or else, we would be able
to obtained an optimal solution to the former, by the previous theorem.

Theorem 15. Complementary Slackness Condition A pair (x, y), respectively feasible in a primal -
dual pair is optimal if and only if

ui = yi (a0i x − bi ) = 0 ∀ i
vj = (cj − y 0 Aj )xj = 0 ∀ j.

Proof. Define
X
u = ui
i
X
and v = vj .
j

Since ui ≥ 0 for all i and vj ≥ 0 for all j so that u ≥ 0 and v ≥ 0. Then u = 0 if and only if
yi (a0i x − bi ) = 0 ∀ i, and v = 0 if and only if (cj − y 0 Aj )xj = 0 ∀ j hold. More precisely,
X X X X
u+v = ui + vj = yi (a0i x − bi ) + (cj − y 0 Aj )xj = c0 x − y 0 b
i j i j

because the terms involving both x and y cancel for all i and j. Therefore, the equations in the
stated theorem hold if and only if u + v = 0,or y 0 b = c0 x, which is the necessary and sufficient
condition for x and y both to be optional in the primal dual pair.

Example 18. Let us consider the following LP;

Minimize(Z) = x1 + x2 + x3 + x4 + x5
subject to 3x1 + 2x2 + x3 = 1
5x1 + x2 + x3 + x4 = 3
2x1 + 5x2 + x3 + x5 = 4
x1 , x2 , x3 , x4 , x5 ≥ 0

42
with optimal solution (x1 , x2 , x3 , x4 , x5 ) = (0, 12 , 0, 25 , 32 ) and optimal value Z = 92 . This problem has
the dual LP formulation

maximize y 0 b = y1 + 3y2 + 4y3


subject to3y1 + 5y2 + 2y3 ≤ 1
2y1 + y2 + 5y3 ≤ 1
y1 + y2 + y3 ≤1
y2 ≤1
y3 ≤1
yi free for i = 1, 2, 3.

From the complementary slackness condition it holds that

c2 − y 0 A 2 = 0
c4 − y 0 A4 = 0
c5 − y 0 A5 = 0

Therefore, we have the binding constraints

2y1 + y2 + 5y3 = 1
y2 =1
y3 =1

These follow y1 = − 25 , y2 = 1 and y3 = 1 at optimality with optimal cost y 0 b = 29 .

43
Chapter 5

Optimization with Convex Functions

5.1 General concepts

Definition 20. Let f : S → R be a real valued function, where S is a nonempty convex set in Rn . The
function f is said to be convex on S if

f (λx1 + (1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 )

for all x1 , x2 ∈ S and for all λ ∈ (0, 1). The function f is called strictly convex on S if

f (λx1 + (1 − λ)x2 ) < λf (x1 ) + (1 − λ)f (x2 )

for all x1 , x2 ∈ S and for all λ ∈ (0, 1).


The function f : S → R is called (strictly) concave on S if −f is (strictly) convex on S.

Lemma 2. Let S be a nonempty convex set in Rn , and let f : S → R be a convex function. Then the
level set Sα = {x ∈ S | f (x) ≤ α}, where α is a real number, is a convex set.

Proof. Let x1 , x2 ∈ Sα . Then x1 , x2 ∈ S, x = λx1 +(1−λ)x2 ∈ S and f (x) ≤ λf (x1 )+(1−λ)f (x2 )
for all λ ∈ (0, 1) as S is convex. Furthermore,

f (x) = f (λx1 + (1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ) ≤ λα + (1 − λ)α = α

. Therefore, x ∈ Sα and hence it is a convex set.

Theorem 16. Let S be a nonempty convex set in Rn , and let f : S → R be convex. Then f is continuous
on the interior of S.

44
Figure 5.1: Convex and nonconvex functions

Definition 21. (Directional Derivative) Let S be a nonempty set in Rn , and let f : S → R. Let x̄ ∈ S
and d be a nonzero vector such that x̄ + λd ∈ S for sufficiently small λ > 0. The directional derivative
of f at x̄ along the vector d, denoted by f 0 (x̄; d), is given by the following limit if it exists:

f (x̄ + λd) − f (x̄)


f 0 (x̄; d) = lim
λ→0+ λ
Note that, the limit in Definition 21 exists for globally defined convex and concave functions.
However, this may not be true if f is restricted on a set S even if it is continuous.

Definition 22. (Epigraph and Hypograph of a Function) Let S be a nonempty set in Rn , and letf :
S → R. The epigraph of f , denoted by epif , is a subset of Rn+1 defined by

{(x, y) | x ∈ S, y ∈ R, y ≥ f (x)}

The hypograph of f , denoted by hypf ; is a subset of Rn+1 defined by

{(x, y) | x ∈ S, y ∈ R, y ≤ f (x)}

Definition 23. (Subgradient and Subdifferential) Let S be a nonempty convex set in Rn , and let f :
S → R be convex. Then σ is called a subgradient of f at x̄ ∈ S if

f (x) ≥ f (x̄) + σ 0 (x − x̄) for all x ∈ S

Similarly, let f : S → R be concave. Then σ is called a subgradient of f at x̄ ∈ S if

f (x) ≤ f (x̄) + σ 0 (x − x̄) for all x ∈ S

45
Figure 5.2: Subgradient of a convex function

Note that the collection of subgradients of f at x̄ (i.e., the subdifferential of f at x̄) is a convex set.
Figure 5.2 shows examples of subgradients of convex and concave functions (on the left and right,
respectively). From the figure, we see that the function f (x̄) + σ 0 (x − x̄) corresponds to a supporting
hyperplane of the epigraph or the hypograph of the function f . The subgradient vector σ corresponds
to the slope of the supporting hyperplane.

Theorem 17. Let S be a nonempty convex set in Rn , and let f : S → R be convex. Then for x̄ ∈ int S,
there exists a vector q such that the hyperplane

H = {(x, y) | y = f (x̄) + q 0 (x − x̄)}

supports epi f at [x̄, f (x̄)]. In particular,

f (x) ≥ f (x̄) + q 0 (x − x̄) for each x ∈ S

that is, q is a subgradient of f at x̄.

Corollary 8. Let S be a nonempty convex set in Rn , and let f : S → R be strictly convex. Then for
x̄ ∈ int S, there exists a vector q such that

f (x) > f (x̄) + q 0 (x − x̄) for each x ∈ S, x 6= x̄

Theorem 18. Let S be a nonempty convex set in Rn , and let f : S → R. Suppose that for each
x̄ ∈ int S, there exists a subgradient vector q such that

f (x) ≥ f (x̄) + q 0 (x − x̄) for each x ∈ S

Then, f is convex on intS.

46
Proof. Let x1 , x2 ∈ intS and λ ∈ (0, 1). Then intS is convex and λx1 + (1 − λ)x2 ∈ intS. Then by
assumption, there exists a subgradient vector q at λx1 + (1 − λ)x2 of f such that

f (x1 ) ≥ f (λx1 + (1 − λ)x2 ) + (1 − λ)q 0 (x1 − x2 )


f (x1 ) ≥ f (λx1 + (1 − λ)x2 ) + λq 0 (x2 − x1 )

By multiplying the above two inequalities by λ and 1 − λ, respectively and adding we get

λf (x1 ) + (1 − λ)f (x2 ) ≥ f (λx1 + (1 − λ)x2 )

This proves that the function f is convex intS.

Definition 24. (Differentiable) Let S be a nonempty set in Rn , and let f : S → R. Then f is said to
be differentiable at x̄ ∈ intS if there exist a vector ∇f (x̄), called the gradient vector, and a function
α : Rn → R such that

f (x) = f (x̄) + ∇f (x̄)t (x − x̄)+ || x − x̄ || α(x; x − x̄)

where limx→x̄ α(x̄; x − x̄) = 0. The function f is said to be differentiable on the open set S 0 ⊆ S
if it is differentiable at each point in S 0 .

The representation of f above is called a first-order (Taylor series) expansion of f at (or about)
the point x; and without the implicitly defined remainder term involving the function α, the resulting
representation is called a first-order (Taylor series) approximation of f at (or about) the point x̄.
Note that if f is differentiable at x̄, there could only be one gradient vector that is given by

∂f (x̄) ∂f (x̄) ∂f (x̄)


∇f (x̄) = ( , ,..., ) = (f1 (x̄), f2 (x̄), . . . , fn (x̄))t
∂x1 ∂x2 ∂xn
∂f (x̄)
where fi (x̄) = ∂xi
is the partial derivative of f at x̄ with respect to the ith component.
Remark that a differentiable convex function has only one subgradient, the gradient vector.

Lemma 3. Let S be a nonempty convex set in Rn , and let f : S → R be convex. Suppose that f is
differentiable at x̄ ∈ int S. Then the collection of subgradients of f at x̄ is the singleton set {∇f (x̄)}.

Theorem 19. Let S be a nonempty open convex set in Rn , and letf : S → R be differentiable on S.
Then f is convex if and only if for any x̄ ∈ S, we have

f (x) ≥ f (x̄) + ∇f (x̄)0 (x − x̄) for each x ∈ S

Similarly, f is strictly convex if and only if for each x̄ ∈ S, we have

f (x) > f (x̄) + ∇f (x̄)0 (x − x̄) for each x 6= x̄ ∈ S

47
Theorem 20. Let S be a nonempty open convex set in Rn and let f : S → R be differentiable on S.
Then f is convex if and only if for each x1 , x2 ∈ S, we have

[∇f (x2 ) − ∇f (x1 )]0 (x2 − x1 ) ≥ 0

Similarly, f is strictly convex if and only if, for each xl 6= x2 ∈ S, the above inequality is strict.

Definition 25. (Twice Differentiable) Let S be a nonempty set in Rn , and let f : S → R. Then f is
said to be twice differentiable at x̄ ∈ intS if there exist a vector ∇f (x̄), an n × n symmetric Hessian
matrix, and a function α : Rn → R such that
1
f (x) = f (x̄) + ∇f (x̄)t (x − x̄) + (x − x̄)t H(x̄)(x − x̄)+ || x − x̄ ||2 α(x; x − x̄)
2
where limx→x̄ α(x̄; x − x̄) = 0. The function f is said to be twice differentiable on the open set S 0 ⊆ S
if it is twice differentiable at each point in S 0 .

The second-order (Taylor series) approximation of f at (or about) the point x̄ is given by Defini-
tion 25 without error term involved.

Definition 26. A Hessian matrix is positive semidefinite (PSD) everywhere in S ⊂ Rn if for any x̄ ∈ S,
we have x0 H(x̄)x ≥ 0 for all x ∈ Rn . It is negative semidefinite (NSD) if x0 H(x̄)x ≤ 0. For positive
(negative) definite, we require strict inequality.

Theorem 21. Let S be a nonempty open convex set in Rn , and let f : S → R be twice differentiable
on S. Then f is convex if and only if the Hessian matrix is positive semidefinite at each point in S.

Theorem 22. Let S be a nonempty open convex set in Rn , and let f : S → R be twice differentiable
on S. If the Hessian matrix is positive definite at each point in S, then f is strictly convex. Conversely,
if f is strictly convex, the Hessian matrix is positive semidefinite at each point in S. However, if f is
strictly convex and quadratic, its Hessian is positive definite.

Theorem 23. Let S be a nonempty open convex set in R, and let f : S → R be infinitely differentiable.
Then f is strictly convex on S if and only if for each x̄ ∈ S, there exists an even n such that f n (x̄) > 0,
while f j (x̄) = 0 for any 1 < j < n, where f j denotes the jth-order derivative of f .

5.2 Minima and maxima of convex functions

Definition 27. Let f : Rn → R and consider the problem

min{f (x) | x ∈ S}

A point x ∈ S is called a feasible point. If x̄ ∈ S and f (x) ≥ f (x̄) for all x ∈ S, then x̄ is called an
optimal solution. The collection of all optimal solutions is called alternate optimal solutions.

48
Definition 28. If x̄ ∈ S and if there exists an -neighbourhood N (x̄) around x̄ such that f (x) ≥ f (x̄)
for all x ∈ S ∩ N (x̄), then x̄ is called a local optimal solution. If x̄ ∈ S and f (x) > f (x̄) for all
x ∈ S ∩ N (x̄), where x 6= x̄ for some  > 0, then x̄ is called a strict local optimal solution. If x̄ is only
the local minimum in S ∩ N (x̄), then x̄ is called a strong or isolated local optimal solution. These are
called relative minima also.

Note that if x̄ is a strong or isolated local minimum, then it is also a strict minimum. However,
on the other hand, a strict local minimum need not be an isolated local minimum. In case of convex
optimization, these two are equivalent as shown below.

Figure 5.3: A, B, C are both strict and strong local minima but the local minima D and E are neither.
The point A is however, a global minima.

Definition 29. Let S be a convex set and f : Rn → R be a convex function. Then

min{f (x) | x ∈ S}

is called a convex programming problem.

Theorem 24. Let S be a nonempty convex set in Rn , and let f : S → R be a convex function on
S. Consider the convex optimization problem min{f (x) | x ∈ S}. Suppose that x̄ is a local optimal
solution to this problem.

a. Then, x̄ is a global optimal solution.

b. If either, x̄ is a strict local minimum, or if f is strictly convex, then x̄ is the unique global optimal
solution, and it is also a strong local minimum.

49
Proof. Suppose that x̄ is a local optimal solution to the convex optimization problem

min{f (x) | x ∈ S}

where f is a convex function on the convex set S.

a. As x̄ is a local optimal solution, by definition, there exists a -neighbourhood N (x̄) around x̄


such that
f (x̄) ≤ f (x) for all x ∈ S ∩ N (x̄)

We have to probe that x̄ is a global optimal solution. On the other way, suppose that x̄ is not a
global optimal solution. Then there exists a vector x̂ ∈ S such that f (x̂) < f (x̄). Now consider
the convex combination z = λx̂ + (1 − λ)x̄ that belongs to S by convexity of S. But since f is
convex

f (z) = f (λx̂ + (1 − λ)x̄) ≤ λf (x̂) + (1 − λ)f (x̄) < λf (x̄) + (1 − λ)f (x̄) = f (x̄)

since f (x̂) < f (x̄).


But for sufficiently small λ > 0, the point z = λx̂ + (1 − λ)x̄ ∈ S ∩ N (x̄) by definition
of the neighborhood. Now since z ∈ S ∩ N (x̄) with f (z) < f (x̄) is a contradiction to the
local optimality of optimality of x̄. Therefore, there does not exist a vector x̂ ∈ S such that
f (x̂) < f (x̄) and hence x̄ is a global minimum solution.

b. Suppose that x̄ is a strict local minimum. Then by part (a), x̄ is a global minimum solution.
Suppose on the contrary that x̄ is not unique global minimum, meaning that there exists another
global minimum point x̂ ∈ S such that f (x̂) = f (x̄). Then, define xλ = λx̂ + (1 − λ)x̄ ∈ S for
0 ≤ λ ≤ 1. It implies that f (xλ ) = f (λx̂ + (1 − λ)x̄) ≤ λf (x̂) + (1 − λ)f (x̄) = f (x̄), with
xλ ∈ S for all 0 ≤ λ ≤ 1.
Taking λ → 0+, we have xλ ∈ S ∩ N (x̄) with f (xλ ) ≤ f (x̄) contradicts the strict optimality
of f at x̄. Hence, there does not exist such x̂ and therefore x̄ is a unique global minimum of f .
This also implies a strong (isolated) local minimality of f at x̄, since any other local minimum
in S ∩ N (x̄) for any  > 0 would also be a global minimum which gives a contradiction.
Finally, suppose that x̄ is a local optimal solution and f is strcitly convex. Then by part (a), x̄
is a global optimal solution. Suppose on contradiction that x̄ is not the unique global optimal
solution. So there exists a x ∈ S, x 6= x̄, such that f (x) = f (x̄).
By strict convexity of f , we have
1 1 1 1
f ( x + x̄) < f (x) + f (x̄) = f (x̄)
2 2 2 2

But then, 12 x + 12 x̄ ∈ S, violates the global optimality of f at x̄.

50
This implies that x̄ is the unique global minima. It is also strong (isolated) local minima as
above.

Hence, the stated theorem is proved.

In the following we look for the necessary and sufficient conditions for the existence of a global
optimal solution. If such an optimal solution does not exist then inf{f (x) | x ∈ S} is finite but is not
achieved at any point in S or it may be equal to −∞.
Let q be a subgradient of f at x̄.

Theorem 25. Let f : Rn → R be a convex function and let S be a nonempty convex set in Rn .
Consider the problem min{f (x) | x ∈ S}. Then the point x̄ ∈ S is an optimal solution to the problem
if and only if f has a subgradient q at x̄ such that q t (x − x̄) ≥ 0 for all x ∈ S.

Remark that if a global optimal solution to the problem min{f (x) | x ∈ S} does not exist, then
either inf{f (x) | x ∈ S} is finite but not achieved at any point in S, or it is equal to -∞.

Corollary 9. Let f : Rn → R be a convex function, and let S be nonempty convex set in Rn . Consider
the problem with min{f (x) | x ∈ S}. If S is open, then x̄ is an optimal solution to the problem if and
only if there exists a zero subgradient of f at x̄.

Proof. We know that x̄ is an optimal solution if and only if q t (x − x̄) ≥ 0for each x ∈ S, where q t is
a subgradient of f at x̄. But as S is open, x = x̄ − λq ∈ S for some positive λ. Therefore,

−λ||q||2 = q t (x − x̄) ≥ 0.

That is, −λ||q||2 ≥ 0. But as λ > 0, it holds that q = 0. Therefore, x̄ is optimal if and only if
q = 0.

Corollary 10. Let f : Rn → R be a convex function and S be nonempty set in Rn . Suppose that f is
differentiable. Then, x̄ is optimal solution if and only if

∇f (x̄)t (x − x̄) ≥ 0 for all x ∈ S.

Furthermore, if S is open, then x̄ is optimal solution if and only if ∇f (x̄) = 0.

Corollary 11. Let f : Rn → R be a convex function, and let S be a nonempty convex set in Rn .
Consider the problem max{f (x) | x ∈ S}. Suppose that f is differentiable. If x̄ ∈ S is a local
optimal solution, then
∇f (x̄)t (x − x̄) ≤ 0 for all x ∈ S.

∇f (x̄)t (x − x̄) ≤ 0 is not a sufficient condition for optimality.

51
Example 19. Consider the function, f (x) = x2 , where S = {x | x ∈ [−1, 2]}. The maximum of the
function is 4 at x = 2 over the set S. But at x̄ = 0, we have ∇f (x̄) = 0. Therefore,

∇f (x̄)t (x − x̄) = 0 for all x ∈ S.

But x̄ = 0 is not even a local maxima.

Theorem 26. Let f : Rn → R be a convex unction and let S be a nonempty compact polyhedral set i
Rn . Consider the problem to maximize f (x) over the set S. An optimal solution to the problem then
exists, where x̄ is an extreme point of S.

Proof. As f : Rn → R is a convex function on a convex set S, f is continuous function. Since S is


compact, f attains a maximum at x0 ∈ S. The result is valid if x0 is an extreme point of S. Otherwise,
let
k
X
0
x = λj xj
j=1
Pk
where, j=1 λj = 1, λj > 0, and xj is an extreme point of S for j = 1, 2, . . . , k.
As f is convex, we have
k
X k
X
0
f (x ) = f ( λj xj ) ≤ λj f (xj ).
j=1 j=1

Pk
Then, f (x0 ) ≤ f (xj ) for all j as j=1 λj = 1. But f (x0 ) ≥ f (xj ), ∀ j = 1, 2, . . . , k.
This implies, f (x0 ) = f (xj ) for all j. Therefore, the extreme points are optimal solutions to the
problem.

52
Chapter 6

Integer Programming

Consider the integer programming problem

min{c0 x | Ax = b, x ≥ 0, x-integer}
Example 20. (ILP formulation of TSP)
Consider the TSP with n + 1 cities, nodes 0, 1, . . . , n and intercity distances [dij ] (not necessarily
a symmetric matrix). Let xij be a decision variable on arc (i, j) with xij = 1 if (i,j) is in a tour and
zero otherwise.
Then the problem can be formulated as follows

n
X
min z = cij xij
i,j=0 i6=j

0 ≤ xij ≤ 1 for all i, j


xij − integer
n
X
xij = 1 for all j = 0, 1, . . . n
i=0
Xn
xij = 1 for all i = 0, 1, . . . n
j=0
X
xij ≥ 1
i∈S,j S̄

where (S, S̄) is a non-trivial partition of {0, 1, . . . , n}.


Definition 30. (unimodularity) A square nonsingular integer matrix B is called unimodular if its
determinant det(B) = ±1. An integer matrix A is called totally unimodular if every square nonsingular
submatrix of A is unimodular.

53
Let R1 (A) = {x | Ax = b, x ≥ 0} be the usual feasible set for the standard LP

min{c0 x | Ax = b, x ≥ 0}.

Theorem 27. Let A be a total unimodular matrix

min{c0 x | Ax = b, x ≥ 0, x-integer}

and the vector b is integer. Then all the vertices of R1 (A) are integer for any integer vector b.

Proof. Consider the integer linear programming problem

min{c0 x | Ax = b, x ≥ 0, x-integer}

Consider the polytope


R1 (A) = {x | Ax = b, x ≥ 0},
the usual feasible set for the corresponding LP

min{c0 x | Ax = b, x ≥ 0}

Let B be a submatrix of m linearly independent columns of A. Then by the LP solution method we


have
B adj b
−1
x=B b=
det(B)
But then x is an integer vector with det(B) = ±1, as b is an integer vector and B adj ∈ Z m×n .
Therefore, by applying the simplex algorithm, solution to the LP is always integer which in turn
yields as a sloution to the corresponding ILP.

Theorem 28. If A is TUM, then all the vertices of

R2 (A) = {x | Ax ≤ b, x ≥ 0},

are integer for any integer vector b. That is, ILP in canonical form can be solved by solving the
corresponding LP.

Proof. Consider the LP in standard form and the polytope

{x | Ax + Iy = b, x ≥ 0, y ≥ 0}

and the matrix (A|I). Let C be a square nonsingular matrix of (A|I). Then

det(C) = det(B)det(Ik ) = ±1.

Where, Ik is an identity matrix of size k and B is a square submatrix of A, possibly with its rows
permutted.

54
However, note that not a solution to the relaxed LP of an ILP yields an integer solution.

Theorem 29. Any integer matrix A with aij = 0, ±1 is TUM if no more than two nonzero entries
appear in any column, and if the rows of A can be partitioned into two sets I1 and I2 such that

1. If a column has two entries of the same sign, their rows are in diferent sets.

2. If a column has two entries of different signs, their rows are in the same set.

Proof. We prove the theorem by mathematical induction on the size of submatrices.


It is true for one element submatrix.
Suppose that C be any submatrix of size k. If C has a column with all zeros, then it is singular, so
nothing is to prove.
If C has a column with one nonzero entry, we can expand its determinant along that column, and
the result follows from the induction hypothesis.
Finally, suppose that C has two nonzero entries in every column. Then by hypothesis of the
theorem,it holds
X X
aij = aij for every
i∈I1 i∈I2

That is, a linear combination of rows is zero. Therefore, det(C) = 0.

55

You might also like