Abstract
In this paper, the convergence of the fundamental alternating minimization is established for non-smooth non-strongly convex optimization problems in Banach spaces, and novel rates of convergence are provided. As objective function a composition of a smooth, and a block-separable, non-smooth part is considered, covering a large range of applications. For the former, three different relaxations of strong convexity are considered: (i) quasi-strong convexity; (ii) quadratic functional growth; and (iii) plain convexity. With new and improved rates benefiting from both separate steps of the scheme, linear convergence is proved for (i) and (ii), whereas sublinear convergence is showed for (iii).
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The (cyclic) block coordinate descent (BCD), in the literature also referred to as non-linear block Gauss-Seidel or successive subspace correction method, is a fundamental optimization algorithm [4, 12]. Given a block structured minimization problem, it consists of the successive minimization with respect to the single blocks. Since numerous applications naturally inherit a block structure, the BCD and its variations have been of great interest for decades—especially whenever it is more convenient or feasible to solve the corresponding subproblems instead of the globally coupled problem. For an overview, we refer to the review paper [15].
The convergence of the BCD has been extensively studied in the literature—typically in Euclidean spaces. For instance, if already partial minimization is well-defined, any generated accumulation point is a stationary point [4]. Furthermore, convergence has been established under various convexity assumptions as, e.g., strong convexity [1], and quasi-convexity with respect to each block [7, 8]. Even more strongly, linear convergence has been proved in the context of (multiplicative Schwarz) domain decomposition methods for smooth and strongly convex problems, here, in Banach spaces [14], and the context of feasible descent methods under stricter convexity assumptions (e.g., strong convexity w.r.t. single blocks) [9]; for the latter, lately the overarching class of smooth convex functions with quadratic functional growth has been identified to lead to linear convergence [11]. Commonly, smoothness assumptions of global kind are made, as e.g., global Lipschitz continuity of the Jacobian.
The BCD for two blocks is entitled alternating minimization. It is worth noting that two-block structured problems, appearing in various applications, constitute an important class. In view of this work, we mention an emerging interest for iterative decoupling strategies of two-way coupled partial differential equations, cf., e.g., [5] and references within.
The presence of just two blocks allows for an improved convergence analysis of the BCD in contrast to the general case. For unconstrained smooth convex problems in finite dimensional Euclidean spaces equipped with the \(l_2\) norm, linear convergence has been established under additional strong convexity [3], and moreover sublinear convergence has been showed for problems with non-smooth, block-separable contributions [2]. Both results have in common that the theoretical multiplicative constant merely depends on the minimum of the Lipschitz constants of the partial derivatives, instead of a global one. The proofs essentially utilize knowledge on first-order gradient descent methods as the (proximal) BCD. To our best knowledge, those theoretical convergence results are the finest in the literature.
The motivation for this work has been to generalize and improve the previous convergence results for the alternating minimization. For this purpose, we consider a model problem in (infinite dimensional) Banach spaces incorporating block-separable non-smooth contributions (Sect. 2). The model problem covers a large class of problems, allowing, e.g., for block-separable convex constraints or non-smooth regularization; for more examples, we refer to Beck [2]. Finally, by exploiting tailored norms in the analysis this setting can enable (A) tighter convergence results in (B) a fairly general setup. Furthermore, driven by that fact that strong convexity may be a lot to ask for, for the first time, linear convergence of the alternating minimization is investigated under two relaxations of strong convexity: quasi-strong convexity (Sect. 3), and mere quadratic functional growth without an explicitly required feasible descent property (Sect. 4). For a more complete picture, we additionally study the case of plain convex optimization but in Banach spaces (Sect. 5). An illustrative numerical PDE-based example inspired from multiphysics solved by the alternating minimization is provided in Sect. 6. The results are summarized and discussed in the concluding Sect. 7.
2 Alternating minimization for two-block structured model problem
We consider the two-block structured model problem
where \({\mathcal {B}}_1,{\mathcal {B}}_2,f,g_1,g_2\) satisfy the following properties:
-
(P1)
\(({\mathcal {B}}_i,\Vert \cdot \Vert _i)\) is a Banach space with its dual \(\left( {\mathcal {B}}_i^\star ,\Vert \cdot \Vert _{i,\star }\right) \) and the duality pairing \(\left\langle \cdot ,\cdot \right\rangle _i\), \(i=1,2\). The index will be omitted for duality pairings.
-
(P2)
The function \(g_i: {\mathcal {B}}_i \rightarrow {\mathbb {R}} \cup \{ \infty \}\) is proper convex, (Fréchet) subdifferentiable with subdifferential \(\partial g_i\) on \(\mathrm {dom}\,g_i\), \(i=1,2\). Let \({\mathcal {D}}:=\mathrm {dom}\,g_1 \times \mathrm {dom}\,g_2\).
-
(P3)
The function \(f:{\mathcal {B}}_1 \times {\mathcal {B}}_2 \rightarrow {\mathbb {R}}\) is convex and (Fréchet) differentiable over \({\mathcal {D}}\). Let \(\nabla f\) denote the (Fréchet) derivative of f.
-
(P4)
The optimal set of problem (1), denoted by \(X \subset {\mathcal {B}}_1 \times {\mathcal {B}}_2\), is non-empty, and the corresponding optimal value is denoted by \(H^\star \).
-
(P5)
For any \(({\tilde{x}}_1,{\tilde{x}}_2)\in {\mathcal {D}}\), the following problems have minimizers
$$\begin{aligned} \underset{x_1\in {\mathcal {B}}_1}{\mathrm {min}}\, H(x_1,{\tilde{x}}_2),\qquad \text {and} \qquad \underset{x_2\in {\mathcal {B}}_2}{\mathrm {min}}\, H({\tilde{x}}_1,x_2). \end{aligned}$$
Exploiting the particular two-block structure, we consider the iterative solution of (1) via the classical alternating minimization, cf. Algorithm 1.
As in [2], the partial optimality condition (2) on the initial guess has been chosen for the sake of simpler notation in the subsequent analysis; we will analyze the convergence behavior of Algorithm 1 under the following additional assumptions on the product structure and smoothness:
-
(A1)
\({\mathcal {B}}_1 \times {\mathcal {B}}_2\) is equipped with a separate norm \(\Vert \cdot \Vert \) and \(\beta _1,\beta _2\ge 0\), satisfying
$$\begin{aligned} \Vert (x_1,x_2)\Vert ^2&\ge \beta _i\Vert x_i \Vert _i^2 \quad \text {for all }(x_1,x_2)\in {\mathcal {B}}_1 \times {\mathcal {B}}_2,\ i=1,2. \end{aligned}$$(5)Furthermore, \({\mathcal {B}}_1 \times {\mathcal {B}}_2\) is equipped with a canonical duality pairing \(\left\langle \cdot , \cdot \right\rangle \).
-
(A2)
The partial (Fréchet) derivative of f with respect to the i-th component, denoted by \(\nabla _i f \in {\mathcal {B}}_i^\star \), is Lipschitz continuous with Lipschitz constant \(L_i\in (0,\infty ]\), \(i=1,2\), with \(\mathrm {min}\{L_1,L_2\} < \infty \); exemplarily, for \(i=1\) (analogously for \(i=2\)) it holds that \(\left\| \nabla _1 f(x_1 + h_1,x_2) - \nabla _1 f(x_1,x_2) \right\| _{1,\star } \le L_1 \Vert h_1 \Vert _1\) for all \((x_1,x_2)\in {\mathcal {D}}\), \(h_1 \in {\mathcal {B}}_1\), such that \(x_1+h_1 \in \mathrm {dom}\, g_1\), equivalently by a block version of the so-called descent lemma [2, 4]
$$\begin{aligned} f(x_1+h_1,x_2) \le f(x_1,x_2) + \left\langle \nabla _1 f(x_1,x_2),h_1 \right\rangle + \frac{L_1}{2} \left\| h_1 \right\| _1^2. \end{aligned}$$(6)
Remark 1
(Semi-normed spaces) The following analysis does in fact not require \(\Vert \cdot \Vert \) or \(\Vert \cdot \Vert _i\), \(i=1,2\), to be positive definite. Consequently, it is sufficient to formulate (5) and (6) as well as convexity properties (specified in each section), with respect to semi-norms. Without introducing additional notation, we also subsequently allow \(\Vert \cdot \Vert \) and \(\Vert \cdot \Vert _i\), \(i=1,2\), to be merely semi-norms.
3 Linear convergence in the quasi-strongly convex case
In this section, linear convergence is established for the alternating minimization applied to model problem (1) under additional quasi-strong convexity for f:
-
(A3a)
The function \(f:{\mathcal {B}}_1\times {\mathcal {B}}_2 \rightarrow {\mathbb {R}}\) is quasi-strongly convex w.r.t. X, with modulus \(\sigma >0\), i.e., for all \(x\in {\mathcal {D}}\) and \({\bar{x}}:=\mathrm {arg\,min}\left\{ \Vert x-y\Vert \,\big | \, y\in X \right\} \), the projection of x onto X, it holds
$$\begin{aligned} f({\bar{x}}) \ge f(x) + \left\langle \nabla f(x), {\bar{x}} - x \right\rangle + \frac{\sigma }{2} \Vert x - {\bar{x}}\Vert ^2. \end{aligned}$$
Any strongly convex function is quasi-strongly convex. Moreover, by convexity of \(g_1\) and \(g_2\), H inherits quasi-strong convexity from f [with (A3a) stated for subdifferentiable functions].
Theorem 1
(Q-linear convergence under quasi-strong convexity) Assume that \(\mathrm {(P1)}\)–\(\mathrm {(P5)}\) and \(\mathrm {(A1),(A2),(A3a)}\) hold. Let \(\{x^k\}_{k\ge 0}\) be the sequence generated by the alternating minimization, cf. Algorithm 1. For all \(k\ge 0\) it holds
Proof
We consider the first half-step of the alternating minimization and show
By definition, it holds \(\tfrac{\beta _1}{L_1} \ge 0\), whereas equality holds if \(\beta _1=0\) or \(L_1=\infty \). W.l.o.g. we assume that \(\tfrac{\beta _1}{L_1}>0\) (since \(H^{k+1/2} \le H^k\) by construction, the statement (7) follows immediately for \(\tfrac{\beta _1}{L_1}=0\)). We first utilize: (i) (A3a) and the definition of \(\beta _1\), cf. (5); (ii) a simple rescaling; and (iii) the fact that \(\tfrac{\sigma \beta _1}{L_1}\in (0,1]\) [by (5), (6), (A3a)] and Lipschitz continuity of \(\nabla _1 f\), cf. (6). For this, let \({\bar{x}}^k=({\bar{x}}_1^k,{\bar{x}}_2^k):= \mathrm {arg\,min}\left\{ \Vert x-x^k\Vert \,\big | \, x\in X \right\} \in {\mathcal {D}}\), with \(H^\star = H({\bar{x}}^k)\). Ultimately, it holds
Furthermore, by convexity of \(g_1\), it holds with \(\tfrac{\sigma \beta _1}{L_1}\in (0,1]\) that
or equivalently after reordering terms
Furthermore, the optimality condition corresponding to the second step of Algorithm 1 reads: \(x_2^{k}\in \mathrm {dom}\, g_2\) and \(0 \in \nabla _2 f(x^k) + \partial g_2(x_2^{k})\) for all \(k\ge 0\), which by definition of a subdifferential together with \({\bar{x}}_2^k\in \mathrm {dom}\, g_2\) implies
Combining (i) Eqs. (8)–(10), and (ii) the optimality of \(x_1^{k+1}\), cf. (3), yields
Reordering terms finally yields Eq. (7). By symmetry (incl. discussion of \(\frac{\beta _2}{L_2}\ge 0\)), it holds
Ultimately, combining Eqs. (7) and (11), proves the assertion. \(\square \)
3.1 Numerical test for quasi-strongly convex minimization in a Euclidean space
To assess the sharpness of Theorem 1 under the use of suitable problem-dependent norms, we consider a two-block structured, unconstrained, quadratic, convex optimization problem in a Euclidean space (here \({\mathbb {R}}^{n+m}\), \(n,m\in {\mathbb {N}}\))
with \({\mathbf {A}}_1,{\mathbf {A}}_2,{\mathbf {A}},{\mathbf {b}}\) properly dimensioned. We assume that \({\mathbf {A}}\) is non-zero. Then by Theorem 8 in [11], the problem (12) is quasi-strongly convex w.r.t. the Euclidean \(l_2\) norm, with \(\sigma = \sigma _\mathrm {min}({\mathbf {A}})^2\), where \(\sigma _\mathrm {min}(\cdot )\) denotes the minimal singular value. Furthermore, it satisfies the smoothness and convexity assumptions of Theorem 1 with \(\beta _1=\beta _2=1\), \(L_1=\sigma _\mathrm {max}\left( {\mathbf {A}}_1\right) ^2\), \(L_2=\sigma _\mathrm {max}\left( {\mathbf {A}}_2 \right) ^2\), where \(\sigma _\mathrm {max}(\cdot )\) denotes the maximal singular value. Ultimately, by Theorem 1, q-linear convergence is guaranteed for all \(k\ge 0\)
However, the generality of Theorem 1 also allows for utilizing problem-dependent norms, allowing for improving the straight forward result (13). Having Remark 1 in mind, set \(\Vert \cdot \Vert _i:=\Vert \cdot \Vert _{{\mathbf {A}}_i^\top {\mathbf {A}}_i}\), \(i=1,2\), where \(\Vert {\mathbf {x}} \Vert _{{\mathbf {S}}}^2:= {\mathbf {x}}^\top {\mathbf {S}} {\mathbf {x}}\) for any symmetric, suitably dimensioned matrix \({\mathbf {S}}\). Consequently, it is \(L_1=L_2=1\). In addition, let \(\eta >0\), \({\mathbf {I}}\) be the identity matrix (in any dimension), and define the norm on the product space by \(\Vert \cdot \Vert :=\Vert \cdot \Vert _{{\mathbf {A}}_\eta ^2}\), with \({\mathbf {A}}_\eta := \left( \eta {\mathbf {I}} + {\mathbf {A}}^\top {\mathbf {A}}\right) ^{1/2}\). Similarly, set \({\mathbf {A}}_{i\eta }:=\left( \eta {\mathbf {I}} + {\mathbf {A}}_i^\top {\mathbf {A}}_i\right) ^{1/2}\), and the Schur complement \({\mathbf {S}}_{{\mathbf {A}}_{i\eta }^2} := {\mathbf {A}}_{i\eta }^2 - {\mathbf {A}}_i^\top {\mathbf {A}}_j {\mathbf {A}}_{j\eta }^{-2} {\mathbf {A}}_j^\top {\mathbf {A}}_i\), where \(j\in \{1,2\}\), \(j\ne i\), \(i=1,2\). In order to determine \(\sigma \) and \(\beta _i\), it follows from standard linear algebra that
Finally, \(\sigma \) and \(\beta _i\) are obtained by maximizing the singular values w.r.t. \(\eta \), equivalent with the limit \(\eta \rightarrow 0\). Thus, Theorem 1 predicts that for all \(k\ge 0\) it holds
Using a small example, we demonstrate the sharpness of (14) opposing to (13). Let
For this choice, the two bounds in (13) and (14) are given by \(\lambda \approx 0.717\) and \(\lambda _\mathrm {opt}\approx 0.245\), respectively. In Fig. 1, the theoretical and actual performances of the alternating minimization applied to (12) are visualized for the initial guess \({\mathbf {x}}_1^0:={\varvec{0}}\). We observe a good agreement between the practical convergence rate and the theoretical bound \(\lambda _\mathrm {opt}\), stemming from the analysis using problem-dependent norms.
4 Linear convergence in the quadratic functional growth case
In this section, linear convergence is established for the alternating minimization applied to model problem (1) under additional quadratic growth for H:
-
(A3b)
The objective function \(H:{\mathcal {B}}_1\times {\mathcal {B}}_2 \rightarrow {\mathbb {R}}\) has quadratic functional growth w.r.t. X with modulus \(\kappa >0\); i.e., for all \(x\in {\mathcal {D}}\) and \({\bar{x}}\) (as in \(\mathrm {(A3a)}\)), it holds
$$\begin{aligned} H(x) - H({\bar{x}}) \ge \frac{\kappa }{2} \left\| x - {\bar{x}} \right\| ^2. \end{aligned}$$
Quasi-strong convexity implies quadratic functional growth [11], but not vice versa; functions satisfying (A3b) do not require to be convex [16]. We refer to [11] to examples.
Following a similar strategy as in the proof of Theorem 1, we show q-linear convergence. We stress that opposing to the analysis of general feasible descent methods for problems with quadratic functional growth, cf., e.g., [11], a feasible descent property—ensured e.g. for block coordinatewise strongly convex functions—is not explicitly required for a mere two-block structure.
Theorem 2
(Q-linear convergence under quadratic functional growth) Assume \(\mathrm {(P1)}\)–\(\mathrm {(P5)}\) and \(\mathrm {(A1)},\mathrm {(A2)},\mathrm {(A3b)}\) hold. Let \(\{x^k\}_{k\ge 0}\) be the sequence generated by the alternating minimization, cf. Algorithm 1. For all \(k\ge 0\) it holds
Proof
We consider the first half-step of the alternating minimization and show
W.l.o.g. we assume that \(\tfrac{\beta _1}{L_1}>0\). Let \({\bar{x}}^k:= \mathrm {arg\,min}\left\{ \Vert x-x^k\Vert \,\big | \, x\in X \right\} \in {\mathcal {D}}\), with \(H^\star = H({\bar{x}}^k)\). Utilizing the convexity and smoothness of f, we then obtain
By (i) introducing \(\gamma \in (0,1]\) to be specified later, (ii) using the Lipschitz continuity of \(\nabla _1 f\), cf. (A2), and the convexity of f, and (iii) the definition of \(\beta _1\), cf. Eq. (5), we moreover obtain
Based on same grounds as utilized for deriving (9) and (10), it holds
By definition of H and (16)–(19), we obtain
Thus, by utilizing (A3b), the optimality property of \(x_1^{k+1}\) based on the first step of the alternating minimization, cf. (3), and choosing \(\gamma =\frac{\kappa \beta _1}{4L_1}\), it follows
which yields (15), after reordering. By symmetry, it analogously follows that
Finally, combining Eqs. (15) and (21) proves the assertion. \(\square \)
5 Sublinear convergence in the plain convex case
In this section, sublinear convergence is established for the alternating minimization applied to model problem (1) under the mild assumption of a compact level set of H w.r.t. the initial value, inspired by Beck [2]:
-
(A3c)
The functions \(g_i:{\mathcal {B}}_i \rightarrow {\mathbb {R}} \cup \{ \infty \}\), \(i=1,2\), are closed convex (and thereby H). Furthermore, the level set of H with respect to \(H(x^0)\), \({\mathcal {L}}:= \left\{ x \in {\mathcal {D}} \, \big | \, H(x) \le H(x^0) \right\} \), be compact; let \(R:= \mathrm {diam}({\mathcal {L}},X)\).
The following result predicts a two-stage behavior: first, the error decreases q-linearly until sufficiently small; after that, sublinear convergence is initiated. The shift depends on the smoothness properties of the problem.
Theorem 3
(Sublinear convergence for the non-smooth convex case) Assume that \(\mathrm {(P1)}\)–\(\mathrm {(P5)}\) and \(\mathrm {(A1),(A2),(A3c)}\) are satisfied. Let \(\{x^k\}_{k\ge 0}\) be the sequence generated by the alternating minimization, cf. Algorithm 1. Define
where \(\lceil \cdot \rceil \) and \([\cdot ]_+\) respectively denote the ceiling function and the restriction to the positive part. It holds for all \(k\ge 0\)
In particular, for \(k\ge m^\star \) at the earliest, sublinear convergence kicks in.
The proof utilizes two auxiliary results: general descent properties for each subiteration of the alternating minimization, and a criterion for concluding sublinear convergence. Those are summarized in the following two lemmas.
Lemma 1
Under the assumptions of Theorem 3, it holds for all \(k\ge 0\) that
Proof
We show Eq. (22), assuming w.l.o.g. \(\tfrac{\beta _1}{L_1}>0\). As in the proof of Theorem 2, Eq. (20) can be derived under given assumptions; i.e., for \(\gamma \in (0,1]\) it holds
By definition of R, cf. (A3c), and the monotonicity of \(\{H(x^k)\}_{k=0,\frac{1}{2},1,...}\), it holds \( \Vert x^k - {\bar{x}}^k \Vert \le R\). Thus, with the definition of \(x^{k+1/2}\), cf. Eq. (3), it follows
We distinguish two cases: If \(H^k - H^\star > \frac{2 L_1 R^2}{\beta _1}\), we choose \(\gamma =1\); otherwise, we choose \(\gamma =\frac{\beta _1}{2L_1 R^2}(H^k - H^\star )\). This finally proves the first part of the assertion (22). The second part (23) analogously follows by symmetry. \(\square \)
The following auxiliary convergence criterion, inspired by a similar result in [3], will allow for effectively making use of the energy descent of both steps of the alternating minimization.
Lemma 2
Let \(\{A_k\}_{k=0,\frac{1}{2},1,...} \subset {\mathbb {R}}_{\ge 0}\) and \(\gamma _1,\gamma _2,p\ge 0\) satisfy
Then it holds for all \(k\ge 0\) that \(A_{k} \le \left[ (k+p) (\gamma _1 + \gamma _2)\right] ^{-1}\).
Proof
By (24a) and (24b), \(\{A_k\}_{k=0,\frac{1}{2},1,\frac{3}{2},...}\) is non-increasing, and it holds
for \(k\ge 0\). Thus, by utilizing a telescope sum and applying Eq. (24c), we obtain
This proves the assertion for \(k\ge 1\); for \(k=0\) it follows directly from (24c). \(\square \)
Finally, we are able to prove Theorem 3.
Proof of Theorem 3
As long as \(H^k - H^\star > 2 \,\mathrm {min}\left\{ \tfrac{L_1}{\beta _1}, \tfrac{L_2}{\beta _2} \right\} R^2\) for some \(k\in {\mathbb {N}}_0\), by Lemma 1 and the monotonicity of \(\{H^k\}_{k=0,1,...}\), it holds that
Thereby, there exists a minimal \(m\ge 0\) such that \(H^k - H^\star \le 2 \,\mathrm {min}\left\{ \tfrac{L_1}{\beta _1}, \tfrac{L_2}{\beta _2} \right\} R^2\) for all \(k\ge m\). Assuming \(m\ge 1\), Eq. (25) holds for all \(k\le m-1\), and it holds
Thus, it holds that \(m < \mathrm {log}_2 \left( \frac{H^0 - H^\star }{\,\mathrm {min}\left\{ \tfrac{L_1}{\beta _1}, \tfrac{L_2}{\beta _2} \right\} R^2} \right) \), and consequently (including the case \(m=0\)), \(m\le m^\star \), with \(m^\star \) as defined above.
Since \(\{H^k\}_{k=0,\frac{1}{2},1,...}\) is non-increasing, it also holds for \(k\ge m\) that \(H^{k+1/2} - H^\star \le 2 \,\mathrm {min}\left\{ \tfrac{L_1}{\beta _1}, \tfrac{L_2}{\beta _2} \right\} R^2\). Hence, by Lemma 1 it follows for all \(k\ge m\) that
Using the notation of Lemma 2, we define the sequence \(\{A_n\}_{n=0,\frac{1}{2},1,...}\) with \(A_n := H^{n+m} - H^\star \), satisfying the assumptions of Lemma 2 with \(\gamma _1 = \frac{\beta _1}{4 L_1 R^2}\), \(\gamma _2= \frac{\beta _2}{4 L_2 R^2},\ p= p^\star \). Finally, the application of Lemma 2 yields
Combining Eqs. (25) and (27) proves the assertion. \(\square \)
Remark 2
(Exponential decay during the first iterations) In the case it holds \(\,\mathrm {max}\left\{ \tfrac{L_1}{\beta _1}, \tfrac{L_2}{\beta _2} \right\} <\infty \), and the initial error satisfies \(H^{0}-H^\star > 2 \,\mathrm {max}\left\{ \tfrac{L_1}{\beta _1}, \tfrac{L_2}{\beta _2} \right\} R^2\), the result of Theorem 3 can be in fact improved. By an analogous line of argumentation as in the above proof, one can conclude that \(H^k - H^\star \) first contracts with a rate of \(\frac{1}{4}\) for the first \(k_1\) iterations, until \(H^{k_1}-H^\star \le 2 \,\mathrm {max}\left\{ \tfrac{L_1}{\beta _1}, \tfrac{L_2}{\beta _2} \right\} R^2\) for some \(k_1\in {\mathbb {N}}_0\). Afterwards, the convergence behavior can be qualitatively predicted as in Theorem 3. Ultimately, \(m^\star \) is of the order
6 Numerical example inspired by multiphysics
Sequential solution strategies are widely used in the context of multiphysics applications. Provided a multiphysics problem enjoys a minimization structure, a sequential solution is closely related (or even equivalent) to applying alternating minimization to the underlying minimization problem.
In the following, we numerically demonstrate the efficacy of alternating minimization to a problem, inspired by poroelasticity applications, i.e., flow in deformable porous media. The following model problem corresponds to an elasticity-like vectorial p-Laplace equation coupled with a Darcy-type equation for non-Newtonian fluids, with a Biot–Darcy-type coupling, see [5, 10] for more details. For instance, we consider the representative coupled problem
where \({\varOmega }= (0,1) \times (0,1) \subset {\mathbb {R}}^2\) denotes the domain, \(\alpha ,\beta \in {\mathbb {R}}\), \(\mu ,\kappa \in {\mathbb {R}}_{>0}\), \({\varvec{f}}\in {\mathbb {R}}^2\) are model parameters, \(p,q\in (1,\infty )\), and the solution spaces are defined by
where \(L^p\) (resp. \(L^q\)) denotes the standard Lebesgue space and \({\varvec{n}}_{\partial {\varOmega }}\) is the outer normal vector on the boundary \(\partial {\varOmega }\) of \({\varOmega }\). We note the solution spaces \({\mathcal {U}}\) and \({\mathcal {Q}}\) are closely related to the standard Sobolev spaces \(W^{1,p}_0({\varOmega })\) and \(H_0(\mathrm {div};{\varOmega })\), respectively. We fix \(\alpha = 1\), \(\beta =10\), \(\mu = 1\), \(\kappa =0.1\), \({\varvec{f}}=(1,1)\), \(p=q=1.5\). The corresponding solution is displayed in Fig. 2a.
For the numerical solution, the problem (28) is discretized using the Galerkin method and linear finite elements for \({\varvec{u}}\) and \({\varvec{q}}\) on a Cartesian grid with uniform mesh size \(2^{-N}\) with \(N\in \{4,5,6\}\). The corresponding discrete minimization problem is then solved using Alg. 1 with an initial guess \(({\varvec{u}}^0,{\varvec{q}}^0)=({\varvec{0}},{\varvec{0}})\). For the implementation, the DUNE project [13] and in particular the dune-functions module [6] have been utilized.
Let \(H^\star \) denote the energy corresponding to the (converged discrete) solution of (28), and \(H^k\) the energy of the approximation \(({\varvec{u}}^k,{\varvec{q}}^k)\) of the k-th step of Algorithm 1. The decay \(H^k - H^\star \) is displayed in Fig. 2b for the three mesh sizes. We observe linear, essentially mesh-independent convergence. In addition, we mention a decreasing trend for the energy values \(H^\star \) for consecutively refined grid, as expected due to the consecutively more accurate discretization. In particular, it is \(H^\star \approx -7.077e-3\) for \(N=4\), \(H^\star \approx -7.137e-3\) for \(N=5\), \(H^\star \approx -7.153e-03\) for \(N=6\).
We note the choices for p and q lead to a non-quadratic problem, whose coupling however is governed by a quadratic, merely semi-definite contribution. Hence, the considered problem is closely related with the small algebraic problem in Sect. 3.1, and after all leads to consistent observations. The in principle mesh-independent convergence demonstrates that convergence is most adequately described in problem-dependent, i.e., not standard Euclidean norms, which would in contrast suggest mesh-dependent convergence.
7 Discussion and concluding remarks
In this paper, we have established convergence of the alternating minimization applied to a two-block structured model problem within the class of non-smooth non-strongly convex optimization in general Banach spaces – a fairly broad setting. We have considered three cases of relaxed strong convexity: (i) quasi-strong convexity, (ii) quadratic functional growth, and (iii) plain convexity and a compact initial level set. Convergence rates have been provided, of linear type for the first two cases, and of sublinear type for the third case. To the best of the author’s knowledge, all results are novel.
Our results are direct extensions of previous results in the literature [2, 3, 11], agreeing with or partially refining them if put in the same context, and being valid also in more general scenarios. The key for arriving at our results has been the exploitation of describing smoothness properties (of the two single blocks) and convexity properties (of the full objective function) wrt. different (semi-)norms; these enter the novel rates predicting in particular that both steps of the alternating minimization separately lead to an error decrease. For the subclass of quasi-strongly convex problems, we demonstrate the sharpness of our convergence result, based on a simple numerical example. In addition, an illustrative numerical example inspired by multiphysics demonstrates the efficacy of alternating minimization for PDE-based problems. Finally, we highlight that for the first time, it is proved that quadratic functional growth is sufficient for linear convergence – without any feasible descent property as commonly required in the analysis of the general block coordinate descent [9, 11].
Ultimately, it is noteworthy that the provided results allow for a systematic development and analysis of iterative block-partitioned solvers based on the alternating minimization for problems in applied variational calculus – in particular two-way coupled PDEs arising from a convex minimization problem, see, e.g., [5].
References
Auslender, A.: Optimisation. Masson, Paris (1999)
Beck, A.: On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemes. SIAM J. Optim. 25, 185–209 (2015). https://doi.org/10.1137/13094829X
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23, 2037–2060 (2013). https://doi.org/10.1137/120887679
Bertsekas, D.: Nonlinear Programming, Athena Scientific Optimization and Computation Series. Athena Scientific (1999)
Both, J.W., Kumar, K., Nordbotten, J.M., Radu, F.A.: The gradient flow structures of thermo-poro-visco-elastic processes in porous media (2019). ArXiv e-prints arXiv:1907.03134
Engwer, C., Gräser, C., Müthing, S., Sander, O.: Function space bases in the dune-functions module (2018). arXiv e-prints arXiv:1806.09545
Grippo, L., Sciandrone, M.: On the convergence of the block nonlinear Gauss–Seidel method under convex constraints. Oper. Res. Lett. 26, 127–136 (2000)
Grippof, L., Sciandrone, M.: Globally convergent block-coordinate techniques for unconstrained optimization. Optim. Method Softw. 10, 587–637 (1999). https://doi.org/10.1080/10556789908805730
Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993). https://doi.org/10.1007/BF02096261
Miehe, C., Mauthe, S., Teichtmeister, S.: Minimization principles for the coupled problem of Darcy–Biot-type fluid transport in porous media linked to phase field modeling of fracture. J. Mech. Phys. Solids 82, 186–217 (2015). https://doi.org/10.1016/j.jmps.2015.04.006
Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Progr. 175, 69–107 (2019). https://doi.org/10.1007/s10107-018-1232-1
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables, vol. 30. SIAM (1970)
Sander, O.: DUNE-The Distributed and Unified Numerics Environment, vol. 140. Springer (2020)
Tai, X.-C., Espedal, M.: Rate of convergence of some space decomposition methods For linear and nonlinear problems. SIAM J. Numer. Anal. 35, 1558–1570 (1998)
Wright, S.J.: Coordinate descent algorithms. Math. Progr. 151, 3–34 (2015). https://doi.org/10.1007/s10107-015-0892-3
Zhang, H., Cheng, L.: Restricted strong convexity and its applications to convergence analysis of gradient-type methods in convex optimization. Optim. Lett. 9, 961–979 (2015). https://doi.org/10.1007/s11590-014-0795-x
Acknowledgements
The author thanks the anonymous reviewers for providing constructive comments resulting in significant improvements of the paper.
Funding
Open access funding provided by University of Bergen (incl Haukeland University Hospital).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Both, J.W. On the rate of convergence of alternating minimization for non-smooth non-strongly convex optimization in Banach spaces. Optim Lett 16, 729–743 (2022). https://doi.org/10.1007/s11590-021-01753-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-021-01753-w