Abstract
We study non-convex optimization problems over simplices. We show that for a large class of objective functions, the convex approximation obtained from the Reformulation-Linearization Technique (RLT) admits optimal solutions that exhibit a sparsity pattern. This characteristic of the optimal solutions allows us to conclude that (i) a linear matrix inequality constraint, which is often added to tighten the relaxation, is vacuously satisfied and can thus be omitted, and (ii) the number of decision variables in the RLT relaxation can be reduced from \({\mathcal {O}} (n^2)\) to \({\mathcal {O}} (n)\). Taken together, both observations allow us to reduce computation times by up to several orders of magnitude. Our results can be specialized to indefinite quadratic optimization problems over simplices and extended to non-convex optimization problems over the Cartesian product of two simplices as well as specific classes of polyhedral and non-convex feasible regions. Our numerical experiments illustrate the promising performance of the proposed framework.
Avoid common mistakes on your manuscript.
1 Introduction
In this paper, we study non-convex optimization problems of the form
where is a generic function, \(g: {\mathbb {R}}^n \mapsto {\mathbb {R}}\) is concave, \(\varvec{A} \in {\mathbb {R}}^{m \times n}\) and \(\varvec{b} \in {\mathbb {R}}^m\). Since f is not necessarily concave, problem (1) is a hard optimization problem even if P = NP [22, Theorem 1]. In the special case where f is convex, problem (1) recovers the class of DC (difference-of-convex-functions) optimization problems over a polyhedron [13]. Significant efforts have been devoted to solving problem (1) exactly (most commonly via branch-and-bound techniques) or approximately (often via convex approximations). For both tasks, the Reformulation-Linearization Technique (RLT) can be used to obtain tight yet readily solvable convex relaxations of (1).
Originally, RLT has been introduced to equivalently reformulate binary quadratic optimization problems as mixed-binary linear optimization problems [1]. To this end, each linear constraint in the original problem is multiplied with each binary decision variable to generate implied quadratic inequalities. These inequalities are subsequently linearized through the introduction of auxiliary decision variables whose values coincide with the generated quadratic terms. This idea is reminiscent of the McCormick envelopes [17], which relax bilinear expressions by introducing implied inequalities that are subsequently linearized. RLT has been extended to (continuous) polynomial optimization problems [26], where implied inequalities are generated from multiplying and subsequently linearizing existing bound constraints.
In this work, we consider a variant of RLT—the Reformulation-Convexification Technique [27]—which applies to linearly constrained optimization problems that maximize a non-concave objective function. This RLT variant (which we henceforth simply call ‘RLT’ for ease of exposition) replaces the non-concave function f in problem (1) with an auxiliary function \(f' : {\mathbb {R}}^{n \times n} \times {\mathbb {R}}^n \mapsto {\mathbb {R}}\) that is concave over the lifted domain \((\varvec{X}, \varvec{x}) \in {\mathbb {S}}^n \times {\mathbb {R}}^n\) and that satisfies \(f' (\varvec{X}, \varvec{x}) = f (\varvec{x})\) whenever \(\varvec{X} = \varvec{x} \varvec{x}^\top \). For the special case where \(f (\varvec{x}) = \varvec{x}^\top \varvec{P} \varvec{x}\) for an indefinite symmetric matrix \(\varvec{P} \in {\mathbb {S}}^n\), for example, we can choose \(f' (\varvec{X}, \varvec{x}) = \langle \varvec{P}, \varvec{X} \rangle \). RLT then augments problem (1) with the decision matrix \(\varvec{X} \in {\mathbb {S}}^n\) and the constraints
where \(\varvec{a}_i^\top \) denotes the i-th row of the matrix \(\varvec{A}\). The constraints (2) are justified by the fact that the pairwise multiplications \((\varvec{a}_i^\top \varvec{x} - b_i) (\varvec{a}_j^\top \varvec{x} - b_j)\) of the constraints in problem (1) have to be non-negative, and those multiplications coincide with the constraints (2) whenever \(\varvec{X} = \varvec{x} \varvec{x}^\top \). To obtain a convex relaxation of problem (1), the non-convex constraint \(\varvec{X} = \varvec{x} \varvec{x}^\top \) is either removed (which we henceforth refer to as ‘classical RLT’, see [24]) or relaxed to the linear matrix inequality (LMI) constraint \(\varvec{X} \succeq \varvec{x} \varvec{x}^\top \) (henceforth referred to as RLT/SDP, see [2, 3, 25]). Even though the matrix \(\varvec{X}\) linearizes quadratic terms, we emphasize that the problems we are considering are not restricted to quadratic programs since f may be a generic nonlinear function.
RLT and its extensions have been exceptionally successful in providing tight approximations to indefinite quadratic [25], polynomial [26] and generic non-convex optimization problems [15, 29], and RLT is routinely implemented in state-of-the-art optimization software, including ANTIGONE [20], CPLEX [14], GLoMIQO [19] and GUROBI [10].
In this paper, we assume that the constraints of problem (1) describe an n-dimensional simplex. Under this assumption, we show that for a large class of functions f that admit a monotone lifting (which includes, among others, various transformations of quadratic functions as well as the negative entropy), the RLT relaxation of problem (1) admits an optimal solution \((\varvec{X}^\star , \varvec{x}^\star )\) that satisfies \(\varvec{X}^\star = \text {diag} (\varvec{x}^\star )\). This has two important consequences. Firstly, we show that when the feasible region of problem (1) is a simplex, \(\varvec{X}^\star = {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\) satisfies \(\varvec{X}^\star \succeq \varvec{x}^\star \varvec{x}^{\star \top }\), that is, the RLT and RLT/SDP relaxations are equivalent, and the computationally expensive LMI constraint \(\varvec{X} \succeq \varvec{x} \varvec{x}^\top \) can be omitted in RLT/SDP. Secondly, we do not need to introduce the decision matrix \(\varvec{X} \in {\mathbb {S}}^n\) in the RLT relaxation, which amounts to a dramatic reduction in the size of the resulting relaxation. We also discuss how our result can be extended to instances of problem (1) over the Cartesian product of two simplices, a generic polyhedron, or a non-convex feasible region as well as an indefinite quadratic objective function.
Indefinite quadratic optimization over simplices (also known as standard quadratic optimization) has a long history, and it has found applications, among others, in mean/variance portfolio selection and the determination of the maximal cliques on a node-weighted graph [6]. More generally, non-convex polynomial optimization problems over simplices have been proposed for the global optimization of neural networks [4], portfolio optimization using the expected shortfall risk measure [5] and the computation of the Lebesgue constant for polynomial interpolation over a simplex [11]; see [8] for a general discussion. Simplicial decompositions of non-convex optimization problems are also studied extensively in the global optimization literature [12].
The remainder of this paper proceeds as follows. We analyze the RLT relaxations of simplex instances of problem (1) in Sect. 2 and report on numerical experiments in Sect. 3, respectively. “Appendix A” extends our findings to well-structured optimization problems over the Cartesian product of two simplices, specific classes of polyhedral and non-convex feasible regions, as well as indefinite quadratic objective functions. “Appendix B”, finally, contains additional numerical experiments.
Notation. We denote by \({\mathbb {R}}^n\) (\({\mathbb {R}}^n_{+}\)) the (non-negative orthant of the) n-dimensional Euclidean space and by \({\mathbb {Q}}\) the set of rational numbers. The cone of (positive semidefinite) symmetric matrices in \({\mathbb {R}}^{n \times n}\) is denoted by \({\mathbb {S}}^n\) (\({\mathbb {S}}^n_+\)). Bold lower and upper case letters denote vectors and matrices, respectively, while standard lower case letters are reserved for scalars. We denote the i-th component of a vector \(\varvec{x}\) by \(x_i\), the (i, j)-th element of a matrix \(\varvec{A}\) by \(A_{ij}\) and the i-th row of a matrix \(\varvec{A}\) by \(\varvec{a}_i^\top \). We write \(\varvec{X} \succeq \varvec{Y}\) to indicate that \(\varvec{X} - \varvec{Y}\) is positive semidefinite. The trace operator is denoted by \({{\,\mathrm{tr}\,}}(\cdot )\), and the trace inner product between two symmetric matrices is given by \(\langle \cdot , \cdot \rangle \). Finally, \({{\,\mathrm{diag}\,}}(\varvec{x})\) is a diagonal matrix whose diagonal elements coincide with the components of the vector \(\varvec{x}\).
2 RLT and RLT/SDP over simplices
This section studies instances of problem (1) where the constraints \(\varvec{A} \varvec{x} \le \varvec{b}\) describe the n-dimensional probability simplex:
Assuming that the feasible region describes a probability simplex, as opposed to any other full-dimensional simplex in \({\mathbb {R}}^n\), does not restrict generality. Indeed, we can always redefine the objective function as \( f(\varvec{x}) \leftarrow f(\varvec{T} \varvec{x})\) and \(g(\varvec{x}) \leftarrow g(\varvec{T} \varvec{x})\) for the invertible matrix \(\varvec{T} \in {\mathbb {R}}^{n \times n}\) that has as columns the extreme points of the simplex to be considered. The pairwise products between the constraints \(x_i \ge 0\), \(i = 1, \ldots , n\), and \(\sum _{i=1}^n x_i = 1\) result in the RLT constraints
here we omit the constraint \(\sum _{i = 1}^n \sum _{j = 1}^n X_{ij} = 1\) as it is implied by the above constraints and the fact that \(\sum _{i = 1}^n x_i = 1\). Thus, the RLT relaxation of problem (3) can be written as
where the auxiliary function \(f'\) has to be suitably chosen, while the RLT/SDP relaxation contains the additional LMI constraint \(\varvec{X} \succeq \varvec{x} \varvec{x}^\top \).
We now define a condition which ensures that the RLT relaxation (4) of problem (3) admits an optimal solution \((\varvec{X}^\star , \varvec{x}^\star )\) with \(\varvec{X}^\star = \text {diag} \, (\varvec{x}^\star )\).
Definition 1
We say that \(f : {\mathbb {R}}^n_+ \mapsto {\mathbb {R}}\) has a monotone lifting if there is a concave function \(f' : {\mathbb {S}}^n \times {\mathbb {R}}^n_+ \mapsto {\mathbb {R}}\) such that \(f' (\varvec{X}, \varvec{x}) = f ( \varvec{x})\) whenever \(\varvec{X} = \varvec{x} \varvec{x}^\top \), as well as \(f' (\varvec{X}', \varvec{x}) \ge f' (\varvec{X}, \varvec{x})\) for all \((\varvec{X}, \varvec{x}) \in {\mathbb {S}}^n \times {\mathbb {R}}^n_+\) and all \(\varvec{X}' \in {\mathbb {S}}^n\) satisfying \(\varvec{X}' \succeq \varvec{X}\).
The requirement in Definition 1 that \(f' (\varvec{X}, \varvec{x}) = f ( \varvec{x})\) whenever \(\varvec{X} = \varvec{x} \varvec{x}^\top \) is needed for the correctness of the RLT relaxation. The concavity of \(f'\) is required for the RLT relaxation to be a convex optimization problem. The assumption that \(f' (\varvec{X}', \varvec{x}) \ge f' (\varvec{X}, \varvec{x})\) for all \((\varvec{X}, \varvec{x}) \in {\mathbb {S}}^n \times {\mathbb {R}}^n_+\) and all \(\varvec{X}' \in {\mathbb {S}}^n\) satisfying \(\varvec{X}' \succeq \varvec{X}\), finally, will allow us to deduce an optimal solution for \(\varvec{X}\) based on the value of \(\varvec{x}\). Indeed, we will see below in Theorem 1 that the RLT relaxation (4) of an instance of problem (3) admits optimal solutions \((\varvec{X}^\star , \varvec{x}^\star )\) satisfying \(\varvec{X}^\star = {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\) whenever the auxiliary function \(f'\) in (4) is a monotone lifting of the function f in (3). Intuitively speaking, Definition 1 enables us to weakly improve any solution \((\varvec{X}, \varvec{x})\) satisfying \(\varvec{X} \ne \mathrm {diag}(\varvec{x})\) by iteratively moving off-diagonal elements of \(\varvec{X}\) to the diagonal. Before presenting the formal result, we provide some examples of functions f that admit monotone liftings.
Proposition 1
The following function classes have monotone liftings:
-
1.
Generalized linearithmic functions: \(f (\varvec{x}) = \sum _{\ell = 1}^L (\varvec{t}_\ell ^\top \varvec{x} + t_\ell ) \cdot h_\ell (\varvec{t}_\ell ^\top \varvec{x} + t_\ell )\) with (i) \(\varvec{t}_\ell \in {\mathbb {R}}^n_+\), \(t_\ell \in {\mathbb {R}}_+\) and \(h_\ell : {\mathbb {R}} \mapsto {\mathbb {R}}\) concave and non-decreasing, or (ii) \(\varvec{t}_\ell \in {\mathbb {R}}^n\), \(t_\ell \in {\mathbb {R}}\) and \(h_\ell : {\mathbb {R}} \mapsto {\mathbb {R}}\) affine and non-decreasing.
-
2.
Linear combinations: \(f(\varvec{x}) = \sum _{\ell = 1}^L t_\ell \cdot f_\ell (\varvec{x})\) with \(t_\ell \in {\mathbb {R}}_+\), where each \(f_\ell : {\mathbb {R}}^n \mapsto {\mathbb {R}}\) has a monotone lifting.
-
3.
Concave compositions: \(h (\varvec{x}) = g (f (\varvec{x}))\) for \(f : {\mathbb {R}}^n_+ \mapsto {\mathbb {R}}\) with a monotone lifting as well as a concave and non-decreasing \(g : {\mathbb {R}} \mapsto {\mathbb {R}}\).
-
4.
Linear pre-compositions: \(h(\varvec{x}) = f(\varvec{T} \varvec{x})\) for \(f:{\mathbb {R}}^{{p}}_+ \mapsto {\mathbb {R}}\) with a monotone lifting as well as \(\varvec{T} \in {\mathbb {R}}^{{p} \times n}\).
-
5.
Pointwise minima: \(h(\varvec{x}) = \min \{f_1(\varvec{x}), \ldots , f_L (\varvec{x})\}\) where each \(f_\ell : {\mathbb {R}}^n_+ \mapsto {\mathbb {R}}\) has a monotone lifting.
Proof
In view of case (i) of the first statement, we choose
which is concave in \((\varvec{X}, \varvec{x})\) since it constitutes the sum of perspectives of concave functions [7, §3.2.2 and §3.2.6]. Whenever \(\varvec{X} = \varvec{x} \varvec{x}^\top \), we have
and thus the standard limit convention for perspective functions implies that \(f'(\varvec{X}, \varvec{x}) = f (\varvec{x})\) for all \(\varvec{x} \in {\mathbb {R}}^n_+\). Moreover, for any \(\varvec{x} \in {\mathbb {R}}^n_+\), we have
where the inequality holds since \(\varvec{X}' - \varvec{X} \succeq 0\). We conclude that
as \(2 t_\ell \varvec{t}_\ell ^\top \varvec{x} + t_\ell ^2 \ge 0\) and \(\varvec{t}_\ell ^\top \varvec{x} + t_\ell \ge 0\) due to the non-negativity of \(\varvec{t}_\ell \), \(t_\ell \) and \(\varvec{x}\), which implies that \(f' (\varvec{X}', \varvec{x}) \ge f' (\varvec{X}, \varvec{x})\) as desired.
One readily verifies that in the special case where each \(h_\ell \) is affine, the concavity of \(f'\), the agreement of \(f'\) with f when \(\varvec{X} = \varvec{x} \varvec{x}^\top \) and the monotonicity of \(f'\) with respect to \(\varvec{X}' \succeq \varvec{X}\) continue to hold even when \(\varvec{t}_\ell \) and/or \(t_\ell \) fail to be non-negative. This establishes case (ii) of the first statement.
As for the second statement, let \(f_\ell ' : {\mathbb {S}}^n \times {\mathbb {R}}^n_+ \mapsto {\mathbb {R}}\) be monotone liftings of \(f_\ell \), \(\ell = 1, \ldots , L\). We claim that \(f' (\varvec{X}, \varvec{x}) = \sum _{\ell = 1}^L t_\ell \cdot f_\ell ' (\varvec{X}, \varvec{x})\) is a monotone lifting of f. Indeed, one readily verifies that \(f'\) inherits concavity in \((\varvec{X}, \varvec{x})\) and agreement with f when \(\varvec{X} = \varvec{x} \varvec{x}^\top \) from its constituent functions \(f'_\ell \). Moreover, since \(f'_\ell (\varvec{X}', \varvec{x}) \ge f'_\ell (\varvec{X}, \varvec{x})\) for all \(\varvec{X}, \varvec{X}' \in {\mathbb {S}}^n\) with \(\varvec{X}' \succeq \varvec{X}\), \(\ell = 1, \ldots , L\), we have \(f' (\varvec{X}', \varvec{x}) \ge f' (\varvec{X}, \varvec{x})\) as well.
In view of the third statement, let \(f'\) be a monotone lifting of f. We claim that in this case, \(h' (\varvec{X}, \varvec{x}) = g (f' (\varvec{X}, \varvec{x}))\) is a monotone lifting of h. Indeed, \(h'\) is a non-decreasing concave transformation of a concave function and is thus concave [7, §3.2.5]. Moreover, since \(f' (\varvec{X}, \varvec{x}) = f (\varvec{x})\) for \(\varvec{X} = \varvec{x} \varvec{x}^\top \), we have
whenever \(\varvec{X} = \varvec{x} \varvec{x}^\top \). Finally, the monotonicity of g implies that
for all \(\varvec{X}, \varvec{X}' \in {\mathbb {S}}^n\) with \(\varvec{X}' \succeq \varvec{X}\).
For the fourth statement, we set \(h'(\varvec{X}, \varvec{x}) = f'(\varvec{T} \varvec{X} \varvec{T}^\top , \varvec{T} \varvec{x})\), where \(f'\) is a monotone lifting of f. The function \(h'\) is concave since it constitutes a composition of a concave function with a linear function [7, §3.2.2]. Moreover, for any \(\varvec{x} \in {\mathbb {R}}^n_+\) and \(\varvec{X} = \varvec{x} \varvec{x}^\top \), we have
where the second identity holds since \(\varvec{T} \varvec{X} \varvec{T}^\top = (\varvec{T} \varvec{x}) (\varvec{T} \varvec{x})^\top \) whenever \(\varvec{X} = \varvec{x} \varvec{x}^\top \) as well as \(f' (\varvec{X}, \varvec{x}) = f (\varvec{x})\) for \(\varvec{X} = \varvec{x} \varvec{x}^\top \). To see that \(h' (\varvec{X}', \varvec{x}) \ge h' (\varvec{X}, \varvec{x})\) for all \(\varvec{x} \in {\mathbb {R}}^n_+\) and all \(\varvec{X}, \varvec{X}' \in {\mathbb {S}}^n\) satisfying \(\varvec{X}' \succeq \varvec{X}\), we note that
where the inequality follows from the fact that
and the assumption that \(f'\) is a monotone lifting.
For the last statement, we set \(h' (\varvec{X}, \varvec{x}) = \min \{f_1'(\varvec{X}, \varvec{x}), \ldots , f_L'(\varvec{X}, \varvec{x})\}\), where \(f'_\ell :{\mathbb {S}}^n \times {\mathbb {R}}^n_+ \mapsto {\mathbb {R}}\) is a monotone lifting of \(f_\ell \) for all \(\ell = 1,\ldots ,L\). The function \(h'\) is concave as it is a minimum of concave functions [7, §3.2.3]. Moreover, for any \(\varvec{x} \in {\mathbb {R}}^n_+\) and \(\varvec{X} = \varvec{x} \varvec{x}^\top \), we have
since each \(f'_\ell \) is a monotone lifting of \(f_\ell \). Similarly, for any \(\varvec{x} \in {\mathbb {R}}^n_+\) and any \(\varvec{X}, \varvec{X}' \in {\mathbb {S}}^n\) satisfying \(\varvec{X}' \succeq \varvec{X}\), we have
where the inequality again follows from the fact that each \(f'_\ell \) is a monotone lifting of \(f_\ell \). This concludes the proof. \(\square \)
Through an iterative application of its rules, Proposition 1 allows us to construct a rich family of functions that admit monotone liftings. We next list several examples that are of particular interest.
Corollary 1
The functions listed below have monotone liftings.
-
1.
Convex quadratic functions: \(f (\varvec{x}) = \varvec{x}^\top \varvec{Q} \varvec{x} + \varvec{q}^\top \varvec{x} + q\) with \(\varvec{Q} \in {\mathbb {S}}^n_+\).
-
2.
Conic quadratic functions: \(f (\varvec{x}) = \Vert \varvec{F} \varvec{x} \Vert _2 + \varvec{f}^\top \varvec{x} + f\), where \(\varvec{F} \in {\mathbb {R}}^{k \times n}\), \(\varvec{f} \in {\mathbb {R}}^n\) and \(f \in {\mathbb {R}}\).
-
3.
Negative entropy: \(f (\varvec{x}) = \sum _{i = 1}^n c_i \cdot x_i \ln x_i\) with \(c_i \in {\mathbb {R}}_+\).
-
4.
Power functions: \(f(x) = x^a\) with \(a \in [1, 2]\) and \(a \in {\mathbb {Q}}\).
Proof
In view of the first statement, let \(\varvec{Q} = \varvec{L}^\top \varvec{L}\) for \(\varvec{L} \in {\mathbb {R}}^{n \times n}\), where \(\varvec{L}\) can be computed from a Cholesky decomposition. Identifying \(\varvec{t}_\ell ^\top \) with the \(\ell \)-th row of \(\varvec{L}\) and setting \(t_\ell = 0\), \(\ell = 1, \ldots , n\), we then obtain
where \(h_\ell : {\mathbb {R}} \mapsto {\mathbb {R}}\) is the identity function, \(\ell = 1, \ldots , n\). The first expression on the right-hand side satisfies the conditions of the first statement of Proposition 1 and thus admits a monotone lifting. The remaining term \(g (\varvec{x}) = \varvec{q}^\top \varvec{x} + q\) admits the trivial lifting \(g' (\varvec{X}, \varvec{x}) = \varvec{q}^\top \varvec{x} + q\), and the second statement of Proposition 1 thus implies that the function f has a monotone lifting as well.
As for the second statement, we note that
Since \(\varvec{F}^\top \varvec{F} \succeq \varvec{0}\) by construction, the term \(\varvec{x}^\top \varvec{F}^\top \varvec{F} \varvec{x}\) has a monotone lifting due to the first statement of this corollary. Moreover, since \(x \mapsto \sqrt{x}\) is non-decreasing and concave, the third statement of Proposition 1 implies that the expression \(\sqrt{\varvec{x}^\top \varvec{F}^\top \varvec{F} \varvec{x}}\) admits a monotone lifting. The remaining term \(g (\varvec{x}) = \varvec{f}^\top \varvec{x} + f\) again admits the trivial lifting \(g' (\varvec{X}, \varvec{x}) = \varvec{f}^\top \varvec{x} + f\), and the second statement of Proposition 1 thus implies that the function f has a monotone lifting as well.
In view of the third statement, we first note that each term \(x_i \ln x_i\) has a monotone lifting if we choose \(\varvec{t}_i = \mathbf {e}_i\), where \(\mathbf {e}_i\) denotes the i-th canonical basis vector in \({\mathbb {R}}^n\), and \(t_i = 0\) in the first statement of Proposition 1. Since f constitutes a weighted sum of these terms, the existence of its monotone lifting then follows from the second statement of Proposition 1.
As for the last statement, we note that \(f(x) = x \cdot h(x)\) with \(h(x) = x^{a-1}\). Since h is concave and non-decreasing, the first statement of Proposition 1 implies that f has a monotone lifting. \(\square \)
Any indefinite quadratic function can be represented as the sum of a convex quadratic and a concave quadratic function [9, 23]. Thus, if problem (3) optimizes an indefinite quadratic function over a simplex (i.e., if it is a standard quadratic optimization problem), then we can redefine its objective function as a sum of a convex quadratic and a concave quadratic function and subsequently apply the first statement in Proposition 1 to the convex part of the objective function.
We are now ready to prove the main result of this section.
Theorem 1
If the function f in problem (3) has a monotone lifting \(f'\), then the corresponding RLT relaxation (4) has an optimal solution \((\varvec{X}^\star , \varvec{x}^\star )\) satisfying \(\varvec{X}^\star = {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\).
Proof
The RLT relaxation (4) maximizes the concave and, a fortiori, continuous function \(f' (\varvec{X}, \varvec{x}) + g (\varvec{x})\) over a compact feasible region. The Weierstrass theorem thus guarantees that the optimal value of problem (4) is attained.
Let \((\varvec{X}^\star , \varvec{x}^\star )\) be an optimal solution to the RLT relaxation (4). If \(\varvec{X}^\star = {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\), then there is nothing to prove. If \(\varvec{X}^\star \ne {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\), on the other hand, then there is \(i, j \in \{ 1, \ldots , n \}\), \(i \ne j\), such that \(X^\star _{ij} = X^\star _{ji} > 0\). Define \(\varvec{X}' \in {\mathbb {S}}^n\) as \(\varvec{X}' = \varvec{X}^\star + \varvec{T}\), where \(T_{ij} = T_{ji} = -X^\star _{ij}\), \(T_{ii} = T_{jj} = X^\star _{ij}\) and \(T_{kl} = 0\) for all other components k, l. Note that \(\varvec{T} \succeq \mathbf {0}\) since \(\varvec{z}^\top \varvec{T} \varvec{z} = {X_{ij}^\star } (z_i - z_j)^2 \ge 0\) for all \(\varvec{z} \in {\mathbb {R}}^n\). We thus have \(\varvec{X}' = \varvec{X}^\star + \varvec{T} \succeq \varvec{X}^\star \), which implies that \({f'} (\varvec{X}', \varvec{x}^\star ) \ge f' (\varvec{X}^\star , \varvec{x}^\star )\) since \(f'\) is a monotone lifting of f. In addition, the row and column sums of \(\varvec{X}^\star \) and \(\varvec{X}'\) coincide by construction, and thus \((\varvec{X}', \varvec{x}^\star )\) is also feasible in the RLT relaxation (4).
By construction, the matrix \(\varvec{X}'\) contains two non-zero off-diagonal elements less than the matrix \(\varvec{X}^\star \). An iterative application of the argument from the previous paragraph eventually results in an optimal diagonal matrix \(\varvec{X}'\), which by the constraints of the RLT relaxation (4) must coincide with \({{\,\mathrm{diag}\,}}({\varvec{x}^\star })\). This proves the statement of the theorem. \(\square \)
Theorem 1 allows us to replace the \(n \times n\) decision matrix \(\varvec{X}\) in the RLT relaxation (4) of problem (3) with \({{\,\mathrm{diag}\,}}(\varvec{x})\) and thus significantly reduce the size of the optimization problem. Our numerical results (cf. Sect. 3) indicate that this can in turn result in dramatic savings in solution time. Another important consequence of Theorem 1 is given next.
Corollary 2
If the function f in problem (3) has a monotone lifting \(f'\), then the optimal value of the corresponding RLT relaxation (4) coincides with the optimal value of the corresponding RLT/SDP relaxation.
Proof
Recall that the RLT/SDP relaxation of problem (3) is equivalent to the RLT relaxation (4), except for the additional constraint that \(\varvec{X} \succeq \varvec{x} \varvec{x}^\top \). According to Theorem 1, it thus suffices to show that \({{\,\mathrm{diag}\,}}(\varvec{x}^\star ) \succeq \varvec{x}^\star \varvec{x}^\star {}^\top \) for the optimal solution \((\varvec{X}^\star , \varvec{x}^\star )\) considered in the theorem’s statement.
Note that the constraints of the RLT relaxation (4) imply that \(\varvec{x}^\star \ge \mathbf {0}\) and \(\sum _{i = 1}^n x_i^\star = 1\). For any vector \(\varvec{y} \in {\mathbb {R}}^n\), we can thus construct a random variable \({\tilde{Y}}\) that attains the value \(y_i\) with probability \(x^\star _i\), \(i = 1, \ldots , n\). We then have
since \({\mathbb {V}}\text {ar} \big [ {\tilde{Y}} \big ] = {\mathbb {E}} \big [ {\tilde{Y}}^2 \big ] - {\mathbb {E}} \big [ {\tilde{Y}} \big ]^2 \ge 0\). We thus conclude that \({{\,\mathrm{diag}\,}}(\varvec{x}^\star ) - \varvec{x}^\star \varvec{x}^\star {}^\top \succeq \mathbf {0}\), that is, the optimal solution \((\varvec{X}^\star , \varvec{x}^\star )\) considered by Theorem 1 vacuously satisfies the LMI constraint of the RLT/SDP relaxation. \(\square \)
Corollary 2 shows that whenever f has a monotone lifting, the RLT/SDP reformulation offers no advantage over the RLT relaxation (4) of problem (3).
3 Numerical experiments
We compare our RLT formulation against standard RLT and RLT/SDP implementations on non-convex optimization problems over simplices. All experiments are run on an 8-th Generation Intel(R) Core(TM) i7-8750H processor using MATLAB 2018b [28], YALMIP R20200930 [16] and MOSEK 9.2.28 [21].
We consider instances of problem (3) whose objective functions satisfy
where \(\varvec{D} \in {\mathbb {S}}^n\) is a diagonal scaling matrix whose diagonal elements are chosen uniformly at random from the interval [0, 10], \(\varvec{Q} \in {\mathbb {R}}^{n \times n}\) is a uniformly sampled rotation matrix [18], and \(\mathbf {1} \in {\mathbb {R}}^n\) is the vector of all ones (cf. Fig. 1).
It follows from our discussion in Sect. 2 that the optimal values of the RLT and RLT/SDP relaxations coincide for the test instances considered in this section, and there are always optimal solutions \((\varvec{X}^\star , \varvec{x}^\star )\) satisfying \(\varvec{X}^\star = {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\). Figure 2 compares the runtimes of our RLT formulation, which replaces the matrix \(\varvec{X}\) with \({{\,\mathrm{diag}\,}}(\varvec{x})\), with those of the standard RLT and RLT/SDP formulations. As expected, our RLT formulation substantially outperforms both alternatives.
References
Adams, W.P., Sherali, H.D.: A tight linearization and an algorithm for zero-one quadratic programming problems. Manag. Sci. 32(10), 1274–1290 (1986)
Anstreicher, K.M.: Semidefinite programming versus the reformulation-linearization technique for nonconvex quadratically constrained quadratic programming. J. Glob. Optim. 43(2–3), 471–484 (2009)
Bao, X., Sahinidis, N.V., Tawarmalani, M.: Semidefinite relaxations for quadratically constrained quadratic programming: a review and comparisons. Math. Program. 129(1), 129–157 (2011)
Beliakov, G., Abraham, A.: Global optimisation of neural networks using a deterministic hybrid approach. In: Hybrid Information Systems, pp. 79–92. Springer (2002)
Bertsimas, D., Lauprete, G.J., Samarov, A.: Shortfall as a risk measure: properties, optimization and applications. J. Econ. Dyn. Control 28(7), 1353–1381 (2004)
Bomze, I.M.: On standard quadratic optimization problems. J. Glob. Optim. 13(4), 369–387 (1998)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
De Klerk, E., Den Hertog, D., Elabwabi, G.: On the complexity of optimization over the standard simplex. Eur. J. Oper. Res. 191(3), 773–785 (2008)
Fampa, M., Lee, J., Melo, W.: On global optimization with indefinite quadratics. EURO J. Comput. Optim. 5(3), 309–337 (2017)
Gurobi Optimization, LLC: Gurobi optimizer reference manual (2020)
Hesthaven, J.S.: From electrostatics to almost optimal nodal sets for polynomial interpolation in a simplex. SIAM J. Numer. Anal. 35(2), 655–676 (1998)
Horst, R., Pardalos, P., Van Thoai, N.: Introduction to Global Optimization. Springer, Berlin (2000)
Horst, R., Thoai, N.V.: DC programming: overview. J. Optim. Theor. Appl. 103(1), 1–43 (1999)
IBM ILOG CPLEX: V12.6: User’s Manual for CPLEX (2014)
Liberti, L., Pantelides, C.C.: An exact reformulation algorithm for large nonconvex NLPs involving bilinear terms. J. Glob. Optim. 36(2), 161–189 (2006)
Löfberg, J.: YALMIP: A toolbox for modeling and optimization in MATLAB. In: Proceedings of the CACSD Conference. Taipei, Taiwan (2004)
McCormick, G.P.: Computability of global solutions to factorable nonconvex programs: Part I-Convex underestimating problems. Math. Program. 10(1), 147–175 (1976)
Mezzadri, F.: How to generate random matrices from the classical compact groups. arXiv preprint arXiv:math-ph/0609050 (2006)
Misener, R., Floudas, C.A.: GloMIQO: global mixed-integer quadratic optimizer. J. Glob. Optim. 57(1), 3–50 (2013)
Misener, R., Floudas, C.A.: ANTIGONE: algorithms for continuous/integer global optimization of nonlinear equations. J. Glob. Optim. 59(2–3), 503–526 (2014)
MOSEK ApS: The MOSEK optimization toolbox for MATLAB manual (version 9.0) (2019)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Springer, Berlin (2004)
Park, J.: Sparsity-preserving difference of positive semidefinite matrix representation of indefinite matrices. arXiv preprint arXiv:1609.06762 (2016)
Sherali, H.D., Adams, W.P.: A Reformulation-Linearization Technique for Solving Discrete and Continuous Nonconvex Problems. Springer, Berlin (2013)
Sherali, H.D., Fraticelli, B.M.: Enhancing RLT relaxations via a new class of semidefinite cuts. J. Glob. Optim. 22(1–4), 233–261 (2002)
Sherali, H.D., Tuncbilek, C.H.: A global optimization algorithm for polynomial programming problems using a reformulation-linearization technique. J. Glob. Optim. 2(1), 101–112 (1992)
Sherali, H.D., Tuncbilek, C.H.: A reformulation-convexification approach for solving nonconvex quadratic programming problems. J. Glob. Optim. 7(1), 1–31 (1995)
The MathWorks Inc.: MATLAB R2018b (2014)
Zhen, J., De Moor, D., Den Hertog, D.: An extension of the reformulation-linearization technique to nonlinear optimization problems. Working Paper (2021)
Acknowledgements
We would like to thank the editors and the anonymous reviewers for their valuable inputs that have led to substantial improvements of the paper. Financial support from the EPSRC grant EP/R045518/1 is gratefully acknowledged.
Open Access
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Online supplement: appendices
A Theoretical extensions
We extend our findings to instances of problem (1) whose feasible regions constitute the Cartesian product of two simplices (“Appendix A.1”) and specific classes of bounded polyhedra (“Appendix A.2”) and non-convex sets (“Appendix A.3”), as well as to quadratic optimization problems whose objective functions do not directly admit monotone liftings (“Appendix A.4”).
1.1 A.1 Cartesian product of two simplices
Consider the following extension of problem (3),
which optimizes the sum of a generic function f and a (jointly) concave function g over the Cartesian product of two simplices. The standard RLT reformulation for this problem introduces the \((n_1 + n_2)^2\) auxiliary decision variables \(\left( \begin{array}{ll} \varvec{X} &{} \varvec{Z} \\ \varvec{Z}^\top &{} \varvec{Y} \end{array} \right) \in {\mathbb {S}}^{n_1 + n_2}\) as well as the following additional constraints:
Using similar arguments as in Sect. 2, we now show that a significant number of decision variables can be removed from the RLT relaxation if function f in problem (5) has a monotone lifting \(f'\).
Theorem 2
If the function f in problem (5) has a monotone lifting \(f'\), then the corresponding RLT relaxation has an optimal solution \((\varvec{X}^\star , \varvec{Y}^\star , \varvec{Z}^\star , \varvec{x}^\star , \varvec{y}^\star )\) satisfying \(\varvec{X}^\star = {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\) and \(\varvec{Y}^\star = {{\,\mathrm{diag}\,}}(\varvec{y}^\star )\).
Proof
Fix any optimal solution \((\varvec{X}^\star , \varvec{Y}^\star , \varvec{Z}^\star , \varvec{x}^\star , \varvec{y}^\star )\) to problem (5). The statement follows if we apply the arguments of the proof of Theorem 1 to the blocks \(\varvec{X}^\star \) and \(\varvec{Y}^\star \) of the matrix \(\left( \begin{array}{ll} \varvec{X}^\star &{} \varvec{Z}^\star \\ \varvec{Z}^\star {}^\top &{} \varvec{Y}^\star \end{array} \right) \). \(\square \)
Note that in Theorem 2 we cannot apply the same arguments to the blocks \(\varvec{Z}^\star \) and \(\varvec{Z}^\star {}^\top \) since they do not lie on the diagonal of the main matrix. We can furthermore show that, as in the case of a single simplex, the RLT and RLT/SDP relaxations are equally tight for problem (5).
Corollary 3
If the function f in problem (5) has a monotone lifting \(f'\), then the optimal value of the corresponding RLT relaxation coincides with the optimal value of the corresponding RLT/SDP relaxation.
Proof
Given an optimal solution \((\varvec{X}^\star , \varvec{Y}^\star , \varvec{Z}^\star , \varvec{x}^\star , \varvec{y}^\star )\) to problem (5) that satisfies \(\varvec{X}^\star = {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\) and \(\varvec{Y}^\star = {{\,\mathrm{diag}\,}}(\varvec{y}^\star )\), as justified by Theorem 2, the statement of the corollary follows if we show that
For any \(\varvec{a} \in {\mathbb {R}}^{n_1}\) and \(\varvec{b} \in {\mathbb {R}}^{n_2}\), we have that
The last inequality follows from similar arguments as in the proof of Corollary 2, which show that
for the random variables \({\tilde{A}}\) and \({\tilde{B}}\) that attain the values \(a_i\) and \(b_j\) with probabilities \(x_i^\star \) and \(y_j^\star \), \(i = 1, \ldots , n_1\) and \(j = 1, \ldots , n_2\), respectively. Likewise, we observe that
where we assume that the random variable \({\tilde{A}} {\tilde{B}}\) attains the values \(a_i b_j\) with probability \(Z_{ij}^\star \), \(i = 1, \ldots , n_1\) and \(j = 1, \ldots , n_2\). Note that this joint probability distribution is consistent with our marginal distributions specified above since the RLT constraints (6) guarantee that \(Z^\star _{ij} \ge 0\), \(\sum _{k = 1}^{n_2} Z^\star _{ik} = x_i^\star \) and \(\sum _{k = 1}^{n_1} Z^\star _{kj} = y_j^\star \). The previous arguments imply that the sum on the left-hand side of the inequality in (7) evaluates to
which concludes the proof. \(\square \)
We emphasize that one can readily construct counterexamples which show that our results in this section do not extend to three or more simplices.
1.2 A.2 Linear constraints
Consider a generic instance of problem (1) whose feasible region is bounded, and let the columns of the matrix \(\varvec{V} \in {\mathbb {R}}^{n \times {p}}\) denote the p extreme points of the feasible region. Problem (1) is equivalent to
which is an instance of problem (3) studied in Sect. 2. The fourth statement of Proposition 1 implies that the objective component f in problem (8) has a monotone lifting whenever the component f in the original problem (1) has one. Note that the number p of decision variables in problem (8) is typically exponential in the number m of constraints in the original problem (1). Notable exceptions exist, however, such as bijective transformations of the 1-norm ball, \(\{ \varvec{T} \varvec{x} \, : \, \varvec{x} \in {\mathbb {R}}^n, \;\; \left\Vert \varvec{x} \right\Vert _1 \le 1 \}\) with \(\varvec{T} \in {\mathbb {R}}^{n \times n}\) invertible, which have \({p} = 2n\) extreme points, as well as the unit simplex, \(\{ \varvec{x} \in {\mathbb {R}}^n_+ \, : \, \sum _i x_i \le 1 \}\), which has \({p} = n + 1\) extreme points.
The RLT relaxation of the original problem (1) is typically strictly weaker than the corresponding RLT/SDP relaxation. Corollary 2 shows, however, that both relaxations are equally tight in the lifted problem (8), and Theorem 1 allows us to replace the decision matrix \(\varvec{X}' \in {\mathbb {S}}^{{p}}\) in the lifted problem with \({{\,\mathrm{diag}\,}}(\varvec{x}')\). We next compare the tightness of the RLT/SDP relaxation of the original problem (1) with the RLT relaxation of the lifted problem (8).
Theorem 3
The RLT relaxation of the lifted problem (8) is at least as tight as the RLT/SDP relaxation of the original problem (1).
Proof
The statement of the theorem follows if for any feasible solution \((\varvec{X}', \varvec{x}')\) of the RLT relaxation of the lifted problem (8) we can construct a feasible solution \((\varvec{X}, \varvec{x})\) of the RLT/SDP relaxation of the original problem (1) that attains a weakly larger objective value. To this end, fix any feasible solution \((\varvec{X}', \varvec{x}')\) of the RLT relaxation of problem (8) and set \((\varvec{X}, \varvec{x}) = (\varvec{V} {{\,\mathrm{diag}\,}}(\varvec{x}') \, \varvec{V}^\top , \varvec{V} \varvec{x}')\). Intuitively speaking, this choice of \((\varvec{X}, \varvec{x})\) interprets \(\varvec{x}'\) as the convex weights of the vertices \(\varvec{V} = [\varvec{V}_1 \, \ldots \varvec{V}_p]\) of the feasible region of problem (1) and therefore sets \(\varvec{x} = \varvec{V} \varvec{x}'\). Moreover, note that \(\mathrm {diag} (\varvec{x}')\) is an ‘optimal’ representation of \(\varvec{x}' \varvec{x}'^\top \) in the RLT relaxation of problem (8). Since \(\varvec{x}' \varvec{x}'^\top \) corresponds to \(\varvec{V} \varvec{x}' \varvec{x}'^\top \varvec{V}^\top \) in problem (1), we thus set \(\varvec{X} = \varvec{V} \mathrm {diag}(\varvec{x}') \varvec{V}^\top \).
We first note that the objective value of \((\varvec{X}, \varvec{x})\) in the relaxation of (1) is at least as large as the objective value of \((\varvec{X}', \varvec{x}')\) in the relaxation of (8):
Here, the left-hand side represents the objective value of \((\varvec{X}, \varvec{x})\) in the RLT/SDP relaxation of problem (1) and the right-hand side represents the objective value of \((\varvec{X}', \varvec{x}')\) in the RLT relaxation of problem (8) if we adopt the monotone lifting proposed in the proof of statement 4 of Proposition 1. The inequality holds since \({{\,\mathrm{diag}\,}}(\varvec{x}') \succeq \varvec{X}'\), which can be shown using similar arguments as in the proof of Theorem 1.
To see that \((\varvec{X}, \varvec{x})\) is feasible for the RLT/SDP relaxation of problem (1), we first note that
since \(\varvec{V} \varvec{x}'\) is a convex combination of the vertices of the polyhedron \(\{ \varvec{x} \in {\mathbb {R}}^n \, : \, \varvec{A} \varvec{x} \le \varvec{b} \}\). Moreover, we have
where the inequality holds since \({{\,\mathrm{diag}\,}}(\varvec{x}') \succeq \varvec{x}' \varvec{x}'{}^\top \) due to similar arguments as in the proof of Corollary 2. In the remainder of the proof, we show that the RLT constraints (2) hold as well. To this end, we note that
where \(\varvec{V}_{\ell }\) denotes the \(\ell \)-th column of \(\varvec{V}\). To see that (9) is indeed non-negative, we distinguish between four cases based on the values of \(b_i\) and \(b_j\).
Case 1: \({b_i, b_j = 0.}\) In this case, the expression (9) simplifies to \(\sum _{\ell =1}^{{p}} x_\ell ' \cdot (\varvec{a}_i^\top \varvec{V}_\ell )\) \((\varvec{a}_j^\top \varvec{V}_\ell )\), which constitutes a sum of non-negative terms since \(\varvec{x}' \ge \mathbf {0}\) as well as \(\varvec{a}_i^\top \varvec{V}_\ell \le b_i = 0\) and \(\varvec{a}_j^\top \varvec{V}_\ell \le b_j = 0\).
Case 2: \({b_i \ne 0, b_j = 0\; \mathrm{or} \; b_i = 0, b_j \ne 0.}\) We assume that \(b_i \ne 0, b_j = 0\); the other case follows by symmetry. Assume further that \(b_i > 0\); the case where \(b_i < 0\) can be shown similarly. Dividing the expression (9) by \(b_i\) and removing the terms that contain \(b_j\) yields
and this expression constitutes a sum of non-negative terms since \(x'_\ell \ge 0\) multiplies the product of two non-positive terms: We have \(\varvec{a}_i^\top \varvec{V}_\ell / b_i \le 1\) since \(\varvec{a}_i^\top \varvec{V}_\ell \le b_i\), and we have \(\varvec{V}_\ell ^\top \varvec{a}_j \le b_j = 0\).
Case 3: \({b_i, b_j > 0 \;\mathrm{or}\; b_i, b_j < 0.}\) We assume that \(b_i, b_j > 0\); the other case follows similarly. Dividing the expression (9) by \(b_i b_j > 0\) yields
where \(\alpha _\ell , \beta _\ell \le 1\) since \(\varvec{a}_i^\top \varvec{V}_\ell \le b_i\) and \(\varvec{a}_j^\top \varvec{V}_\ell \le b_j\). Since
each multiplier of \(x'_\ell \) in (10) is bounded from below by \(-1\), which implies that the sum involving \(x'_\ell \) is bounded from below by \(-1\), and thus the overall expression (10) is non-negative as desired.
Case 4: \({b_i> 0, b_j< 0 \;\mathrm{or }\; b_i < 0, b_j > 0.}\) We assume that \(b_i > 0, b_j < 0\); the other case follows by symmetry. Dividing (9) by \(b_i b_j < 0\) yields (10), which now needs to be non-positive. Note that \(\alpha _\ell \le 1\) while \(\beta _\ell \ge 1\), since \(b_j < 0\). The statement now follows from an argument analogous to the previous case as
which implies that the overall expression (10) is non-positive as desired. \(\square \)
For problems with a moderate number p of vertices, the RLT relaxation of the lifted problem (8), which involves p non-negative decision variables and a single constraint, might be easier to solve than the RLT/SDP relaxation of the original problem (1), which involves \({\mathcal {O}} (n^2)\) decision variables, \({\mathcal {O}} (m^2)\) constraints as well as a restriction to the semidefinite cone. In addition, as proven in Theorem 3, the RLT relaxation of the lifted problem is always at least as tight as the standard RLT/SDP relaxation.
1.3 A.3 Nonlinear constraints
We now study the following generalization of problem (3):
Here, \(f : {\mathbb {R}}^n \mapsto {\mathbb {R}}\) as well as \(f_i : {\mathbb {R}}^n \mapsto {\mathbb {R}}\) are assumed to admit monotone liftings, and \(g : {\mathbb {R}}^n \mapsto {\mathbb {R}}\) as well as \(g_i : {\mathbb {R}}^n \mapsto {\mathbb {R}}\) are concave, \(i = 1, \ldots m\). If we replace both f and \(f_i : {\mathbb {R}}^n \mapsto {\mathbb {R}}\) with their respective monotone liftings \(f'\) and \(f'_i\), \(i = 1, \ldots m\), then one can readily verify that the RLT relaxation
is optimized by a solution \((\varvec{X}^\star , \varvec{x}^\star )\) that satisfies \(\varvec{X}^\star = {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\) as well as \(\varvec{X}^\star \succeq \varvec{x}^\star \varvec{x}^\star {}^\top \). The definition of monotone liftings implies that (12) is a convex optimization problem.
A special case of problem (11) arises when the constraint functions \(f_i\), \(i = 1, \ldots , m\), are absent and when f and \(g_i\), \(i = 1, \ldots , m\), depend on separate parts of the decision vector \(\varvec{x}\), that is, if problem (11) can be written as
where \({\mathcal {Y}} \subseteq {\mathbb {R}}^{n_2}\) denotes the feasible region for the decision vector \(\varvec{y}\). Omitting the RLT constraints that involve cross-products of the constraints involving \(\varvec{x}\) and the constraints involving \(\varvec{y}\), Theorem 1 and Corollary 2 imply that the RLT relaxation
has an optimal solution \((\varvec{X}^\star , \varvec{x}^\star , \varvec{y}^\star )\) satisfying \(\varvec{X}^\star = {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\).
1.4 A.4 Standard quadratic optimization
A standard quadratic optimization problem maximizes a (usually non-convex) quadratic function \(\varphi (\varvec{x}) = \varvec{x}^\top \varvec{Q} \varvec{x} + \varvec{q}^\top \varvec{x} + q\), \(\varvec{Q} \in {\mathbb {S}}^n\), \(\varvec{q} \in {\mathbb {R}}^n\) and \(q \in {\mathbb {R}}\), over the probability simplex. Since \(\varvec{Q} \not \succeq \mathbf {0}\) in general, our results from Sect. 2 are not directly applicable. By decomposing \(\varvec{Q}\) into \(\varvec{Q} = \varvec{Q}^+ - \varvec{Q}^-\) such that \(\varvec{Q}^+, \varvec{Q}^- \succeq \mathbf {0}\), however, we obtain an instance of problem (3) where \(f (\varvec{x}) = \varvec{x}^\top \varvec{Q}^+ \varvec{x} + \varvec{q}^\top \varvec{x} + q\) and \(g (\varvec{x}) = - \varvec{x}^\top \varvec{Q}^- \varvec{x}\). The first statement of Corollary 1 then allows us to apply Theorem 1 and Corollary 2 to the reformulated standard quadratic optimization problem. It is worth noting that different decomposition schemes could lead to different RLT relaxations of varying tightness. For a review of decomposition schemes, we refer to [9, 23].
Instead of decomposing the objective function of the standard quadratic optimization problem and utilizing the results from Sect. 2, one can alternatively apply the RLT or RLT/SDP relaxation directly to the original standard quadratic optimization problem. Our numerical results indicate that for the eigenvalue-based matrix decomposition, the RLT/SDP relaxation outperforms our formulation in terms of tightness, whereas the RLT relaxation and our formulation are in general incomparable, that is, either formulation can be superior for a given instance. In terms of runtime, on the other hand, our formulation outperforms the RLT and RLT/SDP relaxations. This is not surprising as our formulation optimizes over n decision variables, whereas the RLT and RLT/SDP relaxations involve \({\mathcal {O}} (n^2)\) decision variables due to the presence of the decision matrix \(\varvec{X}\).
B Additional numerical experiments
We compare our RLT formulation against standard RLT and RLT/SDP implementations on non-convex optimization problems over polyhedra (“Appendix B.1”) as well as on indefinite quadratic optimization problems over simplices (“Appendix B.2”).
1.1 B.1 Non-convex optimization over polyhedra
We consider instances of problem (8) where
Here, \(\varvec{D} \in {\mathbb {S}}^n\) and \(\varvec{Q} \in {\mathbb {R}}^{n \times n}\) are generated as in Sect. 3, and the feasible region (prior to its lifting) is the hypercube \([-1,1]^n\) in \({\mathbb {R}}^n\). Following our discussion in “Appendix A.2”, our RLT reformulation operates on the lifted space \({\mathbb {R}}^{2^n}\), where the feasible region is described by a probability simplex whose vertices correspond to the vertices of the hypercube, whereas the standard RLT and RLT/SDP reformulations operate directly on the formulation (1) that involves of 2n halfspaces in \({\mathbb {R}}^n\).
Figure 3 reports the optimality gaps and solution times for instances with \(n = 1, \ldots , 10\) decision variables. As expected from Theorem 3, our RLT formulation outperforms both RLT and RLT/SDP in terms of the objective value. Interestingly, the outperformance over RLT is substantial and grows with the dimension n. On the other hand, since the number of decision variables in our RLT reformulation grows exponentially in n, our reformulation is only viable for small problem instances with up to \(n = 10\) decision variables.
With its exponential number of vertices, the hypercubic feasible region of our previous experiment constitutes the least favourable setting for our proposed RLT formulation. We next study instances of problem (8) where \(k < n\) of the decision variables (hereafter \(\varvec{y}\)) reside in a hypercube, whereas the remaining \(n - k\) decision variables (hereafter \(\varvec{x}\)) are restricted to a simplex. In this case, the feasible region of the original problem is described by \(n + k + 2\) halfspaces in \({\mathbb {R}}^n\), whereas the feasible region of the lifted problem constitutes a simplex with \(2^k \cdot (n - k)\) vertices. The objective function is described by
where \(\varvec{D} \in {\mathbb {S}}^n\) and \(\varvec{Q} \in {\mathbb {R}}^{n \times n}\) are generated as before. Figure 4 reports the runtimes of our proposed RLT reformulation as well as the standard RLT and RLT/SDP formulations for problem instances with \(k = 3\) and \(n = 10, 20, \ldots , 150\) decision variables. Our RLT reformulation significantly outperforms the RLT/SDP formulation in terms of runtimes, and it also improves upon the standard RLT formulation for \(n \ge 60\). We note that in terms of the relaxation gaps, our RLT reformulation also outperforms both standard RLT (by about 2%) and RLT/SDP (by about 0.5%), which is in accordance with Theorem 3. Since the differences are small (around \(1\%\)), however, we do not illustrate them in the graph.
1.2 B.2 Standard quadratic optimization
In our final experiment, we maximize an indefinite quadratic function \(\varphi (\varvec{x}) = \varvec{x}^\top \varvec{Q} \varvec{x} \) over the simplex in \({\mathbb {R}}^n\). To this end, we select \(\varvec{Q} = \varvec{V} \varvec{D} \varvec{V} ^\top \), where \(\varvec{D} \in {\mathbb {S}}^n\) is a diagonal scaling matrix whose diagonal elements are sampled uniformly at random from the interval \([-7.5, 2.5]\) (type 1) or \([-5, 5]\) (type 2), and \(\varvec{V} \in {\mathbb {R}}^{n \times n}\) is a uniformly sampled rotation matrix [18].
Following our discussion in “Appendix A.4”, our RLT formulation decomposes the function \(\varphi \) into a convex part \(f (\varvec{x}) = \varvec{x}^\top \varvec{V} \varvec{D}^+ \varvec{V}^\top \varvec{x}\) and a concave part \(g (\varvec{x}) = \varvec{x}^\top \varvec{V} \varvec{D}^- \varvec{V}^\top \varvec{x}\), where \(\varvec{D}^+\) and \(\varvec{D}^-\) contain the positive and negative eigenvalues of \(\varvec{D}\), respectively. In contrast, the standard RLT and RLT/SDP formulations directly operate on the function \(\varphi \). Figures 5 and 6 compare the three approaches in terms of objective values and the required runtimes. The figures show that RLT/SDP tends to provide the tightest relaxations and that our proposed RLT formulation offers tighter relaxations than the standard RLT formulation on type 1 instances, whereas the situation is reversed for type 2 instances. In terms of runtimes, on the other hand, our RLT formulation clearly dominates both alternatives as expected.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Selvi, A., den Hertog, D. & Wiesemann, W. A reformulation-linearization technique for optimization over simplices. Math. Program. 197, 427–447 (2023). https://doi.org/10.1007/s10107-021-01726-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-021-01726-y