1 Introduction

A large number of problems encountered in distinct disciplines like variational inequalities, mini-max, optimization problems, etc. can be modelled as the finding zero points of sum of two nonlinear operators in a Hilbert space \(\mathcal {H}\) as follows (see [1,2,3,4]):

$$\begin{aligned} 0\in (\Gamma _1+\Gamma _2)x, \end{aligned}$$
(1)

in which \(\Gamma _1:\mathcal {H} \rightarrow 2^{\mathcal {H}}\) is a multivalued mapping and \(\Gamma _2:\mathcal {H} \rightarrow \mathcal {H}\) is a mapping. Since most of the problems encountered in applied and computational fields such as signal processing, machine learning, and image recovery can be modeled as the inclusion composed of the sum of two nonlinear operators, splitting methods have attracted the attention of many researchers (see [5,6,7]). These methods are very useful as they process each operator separately instead of processing the sum of the operators. Here, it is possible to divide the handled process into two parts as the case in which the value of the operator is determined by forward calculations and the case in which the operator’s resolvent is determined by backward calculations.

For the identity operator \(I:\mathcal {H}\rightarrow \mathcal {H}\), a mapping \(J^{\Gamma _1}=(I+\Gamma _1)^{-1}\) is named the resolvent of \(\Gamma _1\).

The forward-backward algorithm (FBA) is one of the well-known splitting methods when \(\Gamma _2\) in (1) is single-valued. Each step of this algorithm combines forward calculation of \(\Gamma _2\) and backward calculation of \(\Gamma _1\) to reach the solution of (1). A standard forward-backward splitting algorithm (FBSA) is formulated as follows (see [8]):

$$\begin{aligned} {\mathbf{Algorithm\, FBSA:}}\ \ u_{n+1}=(I+\varrho \Gamma _1)^{-1}(I-\varrho \Gamma _2)u_n,\quad n\ge 0, \end{aligned}$$

with an appropriate \(u_{0}\in \mathcal {H}\) and \(\varrho > 0\). This algorithm involves both the proximal point algorithm and the gradient methods (see [9, 10]). Lions and Mercier [8] proposed two splitting algorithms for the evolutionary and stationary problems composed of the sum of two multivalued monotone mappings:

$$\begin{aligned}{} & {} {\mathbf{Splitting\, algorithm \ 1:}}\ \ v_{n+1}=(2J_{\varrho }^{\Gamma _{1}}-I)(2J_{\varrho }^{\Gamma _{2}}-I)v_{n},\ \ n\ge 0; \ \ \qquad \\{} & {} {\mathbf{Splitting\, algorithm \ 2:}}\ \ v_{n+1}=J_{\varrho }^{\Gamma _{1}}(2J_{\varrho }^{\Gamma _{2}}-I)v_{n}+(I-J_{\varrho }^{\Gamma _{2}})v_{n},\ \ n\ge 0, \end{aligned}$$

in which \(C=\Gamma _1\) or \(C=\Gamma _2\) such that \(J_{\varrho }^{C}=(I+\varrho C)^{-1}\). They proved various convergence results for these algorithms and implemented these results to the minimization problems and to the obstacle problem.

Due to their diverse range of applications, these algorithms have proven to be highly valuable. Consequently, a substantial number of FBSA have been introduced and thoroughly investigated within the context of monotone operators (see [11,12,13,14,15,16,17,18,19]) and references therein.

In the next section, we will recall some recently defined algorithms in this context and present some key facts which will be used to derive the main results of this exposition.

2 Preliminaries and formulations

Throughout this exposition, unless otherwise stated, (\(\mathcal {H}\), \(\Vert \,\cdot \,\Vert \) ) stands for a real Hilbert space in which \(\Vert \,\cdot \,\Vert \) is defined by the inner product \(\langle \,\cdot ,\,\cdot \,\rangle \), \(\emptyset \ne C \subseteq \mathcal {H}\) denotes convex and closed set and \(T:C\rightarrow C\) be a mapping.

Definition 1

A mapping \(\Gamma _2:\mathcal {H}\rightarrow \mathcal {H}\) is called: (i) monotone if

$$\begin{aligned} (\forall \nu _1,\nu _2\in \mathcal {H}) \quad \left\langle \Gamma _2\nu _1-\Gamma _2\nu _2,\nu _1-\nu _2\right\rangle \ge 0;\end{aligned}$$

(ii) \(\phi \)-strongly monotone if there exists a constant \(\phi >0\) such that

$$\begin{aligned} (\forall \nu _1,\nu _2\in \mathcal {H}) \quad \left\langle \Gamma _2\nu _1-\Gamma _2\nu _2,\nu _1-\nu _2\right\rangle \ge \phi \left\| \nu _1-\nu _2\right\| ^{2};\end{aligned}$$

(iii) \(\phi \)-inverse strongly monotone (\(\phi \)-ism) if there exists a constant \(\phi >0\) such that

$$\begin{aligned} (\forall \nu _1,\nu _2\in \mathcal {H}) \quad \left\langle \Gamma _2\nu _1-\Gamma _2\nu _2,\nu _1-\nu _2\right\rangle \ge \phi \left\| \Gamma _2\nu _1-\Gamma _2\nu _2\right\| ^{2};\end{aligned}$$

(iv) firmly nonexpansive if

$$\begin{aligned} (\forall \nu _1,\nu _2\in \mathcal {H}) \quad \left\langle \nu _1-\nu _2,\Gamma _2\nu _1-\Gamma _2\nu _2\right\rangle \ge \left\| \Gamma _2\nu _1-\Gamma _2\nu _2\right\| ^{2};\end{aligned}$$

(v) expanding if there exists a constant \(h >1\) such that

$$\begin{aligned} (\forall \nu _1,\nu _2\in \mathcal {H}) \ \ \ \ \ \left\| \Gamma _2\nu _1-\Gamma _2\nu _2\right\| \ge h\left\| \nu _1-\nu _2\right\| .\end{aligned}$$

Definition 2

A multivalued mapping \(\Gamma _1:\mathcal {H}\rightarrow 2^{\mathcal {H}}\) is called (i) monotone if for all \( \varkappa _1,\varkappa _2\in \mathcal {H}\), \(\nu _1\in \Gamma _1\varkappa _1\), \(\nu _2\in \Gamma _1\varkappa _2\), \( \left\langle \nu _1-\nu _2,\varkappa _1-\varkappa _2\right\rangle \ge 0\); (ii) strongly monotone if there exists a constant \(\phi >0\) such that for all \( \varkappa _1,\varkappa _2\in \mathcal {H}\), \(\nu _1\in \Gamma _1\varkappa _1\), \(\nu _2\in \Gamma _1\varkappa _2\),

$$\begin{aligned} \left\langle \nu _1-\nu _2,\varkappa _1-\varkappa _2\right\rangle \ge \phi \left\| \varkappa _1-\varkappa _2\right\| ^{2};\end{aligned}$$

(iii) maximal monotone if \(\Gamma _1\) monotone and \((I+\varrho \Gamma _1)\mathcal {H}=\mathcal {H}\) holds for all \(\varrho >0\) in which I is the identity mapping in \(\mathcal {H}\).

Let \(\Gamma _1\) be a multivalued monotone mapping with the graph \(\textrm{Gr}(\Gamma _1)\). If a monotone mapping \(\Gamma _2\) with graph \(\textrm{Gr}(\Gamma _2)\) such that \(\textrm{Gr}(\Gamma _1) \subset \textrm{Gr}(\Gamma _2)\) cannot be found, then \(\Gamma _1\) is called a maximal monotone mapping.

Remark 1

(i) The sum \(\Gamma _1+\Gamma _2\) is monotone, if \(\Gamma _1\) and \(\Gamma _2\) are monotone. (ii) A mapping \(\kappa \, \Gamma _1\) for \(\kappa \ge 0\) is monotone, if \(\Gamma _1\) is monotone. (iii) A mapping \(\Gamma _1^{-1}\) is monotone, if \(\Gamma _1\) is monotone.

Remark 2

[6] If \(\Gamma _1:\mathcal {H} \rightarrow 2^{\mathcal {H}}\) is a multivalued maximal monotone operator, then there exists a unique \(p\in \mathcal {H}\) such that \(x\in (I+\varrho \Gamma _1)p\) for each \(x \in \mathcal {H}\) and \(\varrho >0\).

Definition 3

[20, p. 182] Let \(\Gamma _1\) be a maximal monotone operator with resolvent \(J_{\varrho }^{\Gamma _1}=(I+\varrho \Gamma _1)^{-1}\). For every \(\varrho >0\), the Yosida approximation of \(\Gamma _1\) defined by

$$\begin{aligned} \Gamma _{1_{\varrho }}=\frac{1}{\varrho }\left( I-J_{\varrho }^{\Gamma _1}\right) . \end{aligned}$$

Definition 4

[6] \(P_{C}:\mathcal {H}\rightarrow C\) is called metric projection for \(u\in \mathcal {H}\) such that \( d(u,C)=\left\| u-P_{C}(u)\right\| =\inf \left\{ \left\| u-p\right\| :p\in C\right\} \), in which \(P_{C}(u) \in C\) is the singleton set.

Remark 3

[21] \(P_{C}\) is firmly-nonexpansive of \(\mathcal {H}\) and hence a nonexpansive mapping of \(\mathcal {H}\).

We also use the following notation \(J_{\varrho }^{\Gamma _1,\Gamma _2}=J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)\) (see [6]).

To advance research in applied and computational domains, it is crucial to develop new algorithms that exhibit improved convergence rates and to thoroughly investigate their qualitative properties. One such algorithm, known as the Normal-S algorithm, was introduced by Sahu [22]. Research has demonstrated that the Normal-S algorithm achieves a higher convergence rate compared to other well-known iteration algorithms such as Picard [23], Mann [24], Ishikawa [25], Noor [26], and S [27] for the class of contraction mappings (as detailed in [22]). Building on the successes of the Normal-S algorithm and the ongoing research in this area, Gursoy proposed the Picard-S algorithm (as described in [28]). The Picard-S algorithm is particularly intriguing and merits further exploration due to its ability to converge more rapidly. Importantly, it operates independently of the aforementioned algorithms and exhibits favorable behavior when applied to both contractive and nonexpansive mappings. These properties make the Picard-S algorithm a promising solution for operator equations in various scientific and computational research scenarios. As a special case of Picard-S algorithm, Karakaya et al. proposed a PMP algorithm [29].

Sahu et al. [6] proposed below FBSAs based on several well-known algorithms to solve the problem (1):

$$\begin{aligned}&{\mathbf{Algorithm M-FBSA:}} \\&\qquad v_{n+1}=(1-\eta _{n}^{(1)})v_{n}+\eta _{n}^{(1)}(J_{\varrho }^{\Gamma _{1},\Gamma _{2}}v_{n}),\ \ \eta _{n+1}^{(1)}\in [0,1];\\&{\mathbf{Algorithm S-FBSA:}}\quad z_{n}=(1-\eta _{n}^{(3)})w_{n}+\eta _{n}^{(3)}(J_{\varrho }^{\Gamma _{1},\Gamma _{2}}w_{n}), \\&\qquad w_{n+1}=(1-\eta _{n}^{(2)})J_{\varrho }^{\Gamma _{1},\Gamma _{2}}w_{n}+\eta _{n}^{(2)}(J_{\varrho }^{\Gamma _{1},\Gamma _{2}}z_{n}), \ \ \eta _{n+1}^{(2)},\eta _{n+1}^{(3)}\in [0,1]; \end{aligned}$$

with appropriate \(v_{0},w_{0}\in \mathcal {H}\), \(\varrho >0\) and the corresponding \(\eta _{0}^{(i)}\in [0,1]\) \((i=1,2,3)\). Notice that in the case \(\eta _{n}^{(1)}=1\) for all \(n\,(n\ge 0)\), M-FSBA transforms into P-FSBA [6]. Likewise, when \(\eta _{n}^{(2)}=1\) for all \(n\,(n\ge 0)\), S-FSBA transforms into NS-FSBA [6].

Sahu et al. [6] obtained some convergence results; they also demonstrated theoretically and experimentally that these algorithms are faster than the classical FBA (forward-backward algorithm).

Drawing inspiration from the mentioned algorithms and their successes, we introduce two new FBSAs: PS-FBSA and PMP-FBSA, respectively:

$$\begin{aligned}&{\mathbf{Algorithm PS-FBSA:}}\quad e_{n}=(1-\eta _{n}^{(6)})c_{n}+\eta _{n}^{(6)}(J_{\varrho }^{\Gamma _{1}, \Gamma _{2}}c_{n}),\\&d_{n}=(1-\eta _{n}^{(5)})J_{\varrho }^{\Gamma _{1}, \Gamma _{2}}c_{n}+\eta _{n}^{(5)}(J_{\varrho }^{\Gamma _{1},\Gamma _{2}}e_{n}), \ \ c_{n+1}=J_{\varrho }^{\Gamma _{1},\Gamma _{2}}d_{n},\ \eta _{n}^{(5)},\eta _{n}^{(6)} \in [0,1];\\&{\mathbf{Algorithm PMP-FBSA:}}\quad e_{n}=(J_{\varrho }^{\Gamma _{1},\Gamma _{2}}c_{n}), \\&d_{n}=(1-\eta _{n}^{(7)})e_{n}+\eta _{n}^{(7)}(J_{\varrho }^{\Gamma _{1},\Gamma _{2}}e_{n}),\ \ c_{n+1}=J_{\varrho }^{\Gamma _{1},\Gamma _{2}}d_{n}, \ \ \eta _{n+1}^{(7)}\in [0,1], \end{aligned}$$

with appropriate \(c_{0}\in \mathcal {H}\), \(\varrho >0\) and the corresponding \(\eta _{0}^{(i)}\in [0,1]\) \((i=5,6,7)\). Observe that if \(\eta _{n}^{(6)}=1\) for all \(n \in \mathbb {N}\), then PS-FBSA reduces to PMP-FBSA.

In the present work, we establish strong convergence results for three algorithms: PS-FBSA, NS-FBSA, and S-FBSA. Specifically, we demonstrate the equivalence of convergence between PS-FBSA and NS-FBSA, and we provide a comparison of their convergence rates. We also applied them to solve the convex minimization problem. We show the validity of our findings through a nontrivial numerical example. As a practical application, in classification/regression of the machine learning problem and in image deblurring tasks, the performances of the newly proposed algorithms with that of their predecessors are compared. By conducting some numerical experiments, we confirm that these new algorithms converge to the optimum value with fewer steps, and we show that the newly proposed algorithms exhibit much better approximations to the optimum value even when fewer steps are used. We need the following facts to derive the main results of this exposition:

Lemma 1

[30, Corollary 2.14] For \(a_1,a_2 \in H\) and \(\Lambda \in [0,1]\), it holds that

$$\begin{aligned}\Vert \Lambda a_1-(\Lambda -1)a_2\Vert ^2=\Lambda \Vert a_1\Vert ^{2}-(\Lambda -1)\Vert a_2\Vert ^{2}+\Lambda (\Lambda -1)\Vert a_1-a_2\Vert ^2.\end{aligned}$$

Lemma 2

[31] Let \(\{ \rho _{n}^{(\nu )}\} _{n=0}^{\infty }\), \(\nu =1,2,3\), be three nonnegative sequences. Assume that \(\rho _{n}^{(2)}=o(\rho _{n}^{(3)})\), \(\sum _{n=1}^{\infty }\rho _{n}^{(3)}=\infty \), and \(\rho _{n}^{(3)}\in (0,1)\) for all \(\ n\ge n_{0}\). If \(\rho _{n+1}^{(1)}\le (1-\rho _{n}^{(3)})\rho _{n}^{(1)}+\rho _{n}^{(2)}\), then \(\lim _{n\rightarrow \infty }\rho _{n}^{(1)}=0\).

Lemma 3

[32] Let \(\{ \rho _{n}^{(\nu )}\} _{n=0}^{\infty }\), \(\nu =1,2\), be two nonnegative sequences. Assume that \(\lim _{n\rightarrow \infty }\rho _{n}^{(2)}=0\) and \(\mu \in (0,1)\). If \(\rho _{n+1}^{(1)}\le \mu \rho _{n}^{(1)}+\rho _{n}^{(2)}\), then \(\lim _{n\rightarrow \infty }\rho _{n}^{(1)}=0\).

Definition 5

(see [33]) If \(\lim _{n\rightarrow \infty }{\Vert \Theta _n^{(1)}-\Theta _1\Vert }/{\Vert \Theta _n^{(2)}-\Theta _{2}\Vert }=0\), where \(\{\Theta _{n}^{(\nu )}\} _{n=0}^{\infty }\), \(\nu =1,2\), are two sequences with \(\lim _{n\rightarrow \infty }\Theta _{n}^{(\nu )}=\Theta _{\nu }\), \(\nu =1,2\), then it is said that \(\{\Theta _n^{(1)}\}_{n=0}^{\infty }\) converges faster than \(\{\Theta _n^{(2)}\}_{n=0}^{\infty }\).

Definition 6

(see [34]) Let \(\{\Theta _n^{(\nu )}\}_{n=0}^{\infty }\) and \(\{\Pi _n^{(\nu )}\} _{n=0}^{\infty }\) \((\nu =1,2)\) be four sequences, such that \(\Pi _n^{(\nu )}\ge 0\) for each \( n \in \mathbb {N}\), \(\lim _{n\rightarrow \infty }\Theta _n^{(\nu )}=\Theta ^{*}\), and \(n\lim _{n\rightarrow \infty }\Pi _n^{(\nu )}=0\), \(\nu =1,2\). Suppose that for each \(n \in \mathbb {N}\) the following error estimates are available (and the best possible [35]) \(\Vert \Theta _{n}^{(\nu ) }-\Theta ^{*}\Vert \le \Pi _{n}^{(\nu )}\) \((\nu =1,2)\). If \(\{\Pi _{n}^{(1)}\}_{n=0}^{\infty }\) converges faster than \(\{\Pi _{n}^{(2)}\}_{n=0}^{\infty }\) (in the sense of Definition 5), then we say that \(\{\Theta _n^{(1)}\}_{n=0}^{\infty }\) converges to \(\Theta ^{*}\) faster than \(\{\Theta _{n}^{(2)}\}_{n=0}^{\infty }\).

3 Convergence Analysis of forward–backward splitting algorithms

Let \(\emptyset \ne C\) be a subset of \(\mathcal {H}\), \(\Gamma _2:C\rightarrow \mathcal {H}\) a \(\phi \)-ism and expanding operator, and \(\Gamma _1:C\rightarrow 2^{\mathcal {H}}\) a multivalued maximal monotone operator. Our purpose is to reach the solution \(p\in \mathcal {H}\) such that \(0\in \Gamma _1p+\Gamma _2p \). For this purpose, we use the following fixed-point characterization:

$$\begin{aligned} 0\in (\Gamma _1+\Gamma _2)p\Longleftrightarrow p=J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)p. \end{aligned}$$
(2)

The following two propositions play a key role in the proof of the subsequent theorems.

Proposition 4

(see Proposition 3.1 in [6]) For all \(u_1,u_2 \in \mathcal {H}\) and \(\varrho >0\), we have

$$\begin{aligned} \bigl \Vert J_{\varrho }^{\Gamma _1}u_1-J_{\varrho }^{\Gamma _1}u_2\bigr \Vert ^{2}\le \Vert u_1-u_2\Vert ^{2},\end{aligned}$$

in which \(\Gamma _1:\mathcal {H}\rightarrow 2^{\mathcal {H}}\) is a multivalued maximal monotone operator.

Proposition 5

For \(\varrho >0\) and for all \( u_1,u_2 \in C\), we have

$$\begin{aligned} \Vert (I-\varrho \Gamma _2)u_1-(I-\varrho \Gamma _2)u_2 \Vert \le \theta \Vert u_1-u_2 \Vert , \end{aligned}$$

in which \(\theta =\sqrt{1+{\varrho ^{2}}/{\phi ^{2}}-2\varrho \phi }\). If \(\varrho< \phi \sqrt{2 \varrho \phi }< \sqrt{\varrho ^2+ \phi ^2}\), then \(\theta <1\) which implies that \( (I-\varrho \Gamma _2): C \rightarrow H\) is a contraction mapping.

Proof

Let \(u_1,u_2 \in C\). For \(\varrho >0\), we have

$$\begin{aligned} \bigl \Vert (I-\varrho \Gamma _2)u_1-(I-\varrho \Gamma _2)u_2\bigr \Vert ^2 = \left\| u_1-u_2\right\| ^2+\varrho ^2\left\| \Gamma _2u_1-\Gamma _2u_2\right\| ^2-2\varrho \left\langle \Gamma _2u_1-\Gamma _2u_2,u_1-u_2\right\rangle . \end{aligned}$$

If (i) \(\Gamma _2\) is \(\phi \)-ism then, \(\Gamma _2\) is \(\frac{1}{\phi }\)-Lipschitzian. Hence, we have

$$\begin{aligned} \varrho ^2 \Vert \Gamma _2u_1-\Gamma _2u_2 \Vert ^2 \le \frac{\varrho ^2}{\phi ^2} \Vert u_1-u_2 \Vert ^2; \end{aligned}$$

If (ii) \(\Gamma _2\) is \(\phi \)-ism, then \( -2\varrho \left\langle \Gamma _2u_1-\Gamma _2u_2,u_1-u_2\right\rangle \le -2\varrho \phi \Vert \Gamma _2u_1-\Gamma _2u_2\Vert ^2\);

If (iii) \(\Gamma _2\) is expanding, then

$$\begin{aligned} -2\varrho \phi \Vert \Gamma _2u_1-\Gamma _2u_2\Vert ^2 \le -2\varrho \phi h^2 \Vert u_1-u_2\Vert ^2 < -2\varrho \phi \Vert u_1-u_2\Vert ^2.\end{aligned}$$

Exploiting (i)–(iii), we get \(\bigl \Vert (I-\varrho \Gamma _2)u_1-(I-\varrho \Gamma _2)u_2\bigr \Vert ^{2} \le \theta ^2 \Vert u_1-u_2 \Vert ^{2}\). \(\square \)

Now, we proceed to demonstrate the convergence theorem for the algorithm S-FBSA.

Theorem 6

Let \(\{w_{n}\}_{n=0}^{\infty }\) be a sequence generated by the algorithm S-FBSA, with \(\varrho >0\) such that \(\varrho<\phi \sqrt{2\varrho \phi }<\sqrt{\varrho ^{2}+\phi ^{2}}\) and \(P_{(\Gamma _1+\Gamma _2)^{-1}(0)}w_{0}=p_{*}\). Then, for any initial point \(\ w_{0}\in C\), the sequence \(\{w_{n}\}_{n=0}^{\infty }\) converges strongly to \(p_{*}\) so that

$$\begin{aligned} (\forall n \in \mathbb {N}) \ \ \ \left\| w_{n+1}-p_{*}\right\| \le \theta ^{n+1}\left\| w_{0}-p_{*}\right\| , \end{aligned}$$

where \(\theta \) is defined in Proposition 5.

Proof

From the algorithm S-FBSA, Lemma 1 and Proposition 4, we have

$$\begin{aligned} \left\| z_{n}-p_{*}\right\| ^{2}\le & {} (1-\eta _{n}^{(3)})\left\| w_{n}-p_{*}\right\| ^{2}+\eta _{n}^{(3)}\bigl \Vert J_{\varrho }^{\Gamma _1,\Gamma _2}w_{n}-J_{\varrho }^{\Gamma _1,\Gamma _2}p_{*}\bigr \Vert ^{2} \\= & {} (1-\eta _{n}^{(3)})\left\| w_{n}-p_{*}\right\| ^{2}+\eta _{n}^{(3)}\bigl \Vert J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)w_{n}-J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)p_{*}\bigr \Vert ^{2} \\\le & {} (1-\eta _{n}^{(3)})\left\| w_{n}-p_{*}\right\| ^{2}+\eta _{n}^{(3)}\left\| (I-\varrho \Gamma _2)w_{n}-(I-\varrho \Gamma _2)p_{*}\right\| ^{2}. \end{aligned}$$

By using Proposition 5, we attain

$$\begin{aligned} \left\| z_{n}-p_{*}\right\| ^{2}\le & {} (1-\eta _{n}^{(3)})\left\| w_{n}-p_{*}\right\| ^{2}+\eta _{n}^{(3)}\left[ 1+\frac{\varrho ^{2}}{\phi ^{2}} -2\varrho \phi \right] \left\| w_{n}-p_{*}\right\| ^{2} \\= & {} \bigl [1-\eta _{n}^{(3)}(1-\theta ^2)\bigr ] \Vert w_{n}-p_{*}\Vert ^{2}, \end{aligned}$$

which implies

$$\begin{aligned} \left\| z_{n}-p_{*}\right\| ^{2}\le \left\| w_{n}-p_{*}\right\| ^{2}, \end{aligned}$$
(3)

as \([1-\eta _{n}^{(3)}(1-\theta ^2)]\le 1\). Also, we have

$$\begin{aligned} \left\| w_{n+1}-p_{*}\right\| ^{2}\le & {} (1-\eta _{n}^{(2)})\bigl \Vert (J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)w_{n}-J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)p_{*})\bigr \Vert ^{2} \\{} & {} \quad \ +\eta _{n}^{(2)}\bigl \Vert J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)z_{n}-J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)p_{*}\bigr \Vert ^{2} \\\le & {} (1-\eta _{n}^{(2)})\left\| (I-\varrho \Gamma _2)w_{n}-(I-\varrho \Gamma _2)p_{*})\right\| ^{2} \\{} & {} \quad \ +\eta _{n}^{(2)} \left\| (I-\varrho \Gamma _2)z_{n}-(I-\varrho \Gamma _2)p_{*}\right\| ^{2}. \end{aligned}$$

By using Proposition 5 and the inequality (3), we attain \(\left\| w_{n+1}-p_{*}\right\| \le \theta \Vert w_{n}-p_{*}\Vert \le \theta ^{n+1}\left\| w_{0}-p_{*}\right\| \), which implies that \(\lim _{n\rightarrow \infty } \Vert w_{n}-p_{*} \Vert =0\) as \(\theta <1\). \(\square \)

Remark 4

By imposing an additional condition \(\sum _{k=1}^{n}\eta _{k}^{(1)}=\infty \) on \(\eta _{n}^{(1)}\), one can obtain the convergence of M-FSBA to \(p_{*}\) similar to the proof of Theorem 6.

The following theorem is a direct consequence of Theorem 6 by setting \(\eta _{n}^{(2)}=1\) for all \(n\in \mathbb {N}\).

Theorem 7

Let \(\{a_{n}\}_{n=0}^{\infty }\) be a sequence generated by the algorithm NS-FBSA [6] with \(\varrho >0\), such that \(\varrho< \phi \sqrt{2 \varrho \phi }< \sqrt{\varrho ^2+ \phi ^2}\) and \(P_{(\Gamma _1+\Gamma _2)^{-1}(0)}a_{0}=p_{*}\). Then, for any initial point \(\ a_{0}\in C\), \(\{a_{n}\}_{n=0}^{\infty }\) converges strongly to \(p_{*}\), so that

$$\begin{aligned} (\forall n \in \mathbb {N}) \ \ \ \left\| a_{n+1}-p_{*}\right\| \le \theta ^{n+1}\left\| a_{0}-p_{*}\right\| , \end{aligned}$$
(4)

where \(\theta \) is defined in Proposition 5.

The following theorem provides the convergence result for the algorithm PS-FBSA.

Theorem 8

Let \(\{c_{n}\}_{n=0}^{\infty }\) be a sequence generated by the algorithm PS-FBSA, with \(\varrho >0\) such that \(\varrho<\phi \sqrt{2\varrho \phi }<\sqrt{\varrho ^{2}+\phi ^{2}}\) and \(P_{(\Gamma _1+\Gamma _2)^{-1}(0)}c_{0}=p_{*}\). Then, for any initial point \(c_{0}\in C\), \(\{x_{n}\}_{n=0}^{\infty }\) converges strongly to \(p_{*}\), so that

$$\begin{aligned} (\forall n \in \mathbb {N}) \ \ \ \left\| c_{n+1}-p_{*}\right\| \le \theta ^{n+1}\left\| c_{0}-p_{*}\right\| , \end{aligned}$$

where \(\theta \) is defined in Proposition 5.

Proof

From the algorithm PS-FBSA, Lemma 1, and Proposition 4, we have

$$\begin{aligned} \left\| e_{n}-p_{*}\right\| ^{2}\le & {} (1-\eta _{n}^{(6)})\left\| c_{n}-p_{*}\right\| ^{2}+\eta _{n}^{(6)}\bigl \Vert J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)c_{n}-J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)p_{*}\bigr \Vert ^{2} \\\le & {} (1-\eta _{n}^{(6)})\left\| c_{n}-p_{*}\right\| ^{2}+\eta _{n}^{(6)}\bigl \Vert (I-\varrho \Gamma _2)c_{n}-(I-\varrho \Gamma _2)p_{*}\bigr \Vert ^{2}. \end{aligned}$$

By using Proposition 5, we obtain

$$\begin{aligned} \left\| e_{n}-p_{*}\right\| ^{2}\le & {} (1-\eta _{n}^{(6)})\left\| c_{n}-p_{*}\right\| ^{2}+\eta _{n}^{(6)}\left[ 1+\frac{\varrho ^{2}}{\phi ^{2}} -2\varrho \phi \right] \left\| c_{n}-p_{*}\right\| ^{2} \\= & {} [1-\eta _{n}^{(6)}(1-\theta ^2)]\left\| c_{n}-p_{*}\right\| ^{2}, \end{aligned}$$

which yields \(\left\| e_{n}-p_{*}\right\| ^{2}\le \left\| c_{n}-p_{*}\right\| ^{2}\) as \([1-\eta _{n}^{(6)}(1-\theta ^2)]\le 1\). Also,

$$\begin{aligned} \left\| d_{n}-p_{*}\right\| ^{2}\le & {} (1-\eta _{n}^{(5)})\bigl \Vert J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)c_{n}-J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)p_{*}\bigr \Vert ^{2} \\{} & {} \quad +\eta _{n}^{(5)}\bigl \Vert J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)e_{n}-J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)p_{*}\bigr \Vert ^{2} \\\le & {} (1-\eta _{n}^{(5)})\left\| (I-\varrho \Gamma _2)c_{n}-(I-\varrho \Gamma _2)p_{*}\right\| ^{2} \\{} & {} \quad +\eta _{n}^{(5)}\left\| (I-\varrho \Gamma _2)e_{n}-(I-\varrho \Gamma _2)p_{*}\right\| ^{2}. \end{aligned}$$

By using again Proposition 5, we get

$$\begin{aligned} \left\| d_{n}-p_{*}\right\| ^{2}\le & {} (1-\eta _{n}^{(5)})\left[ 1+\frac{ \varrho ^{2}}{\phi ^{2}}-2\varrho \phi \right] \left\| c_{n}-p_{*}\right\| ^{2}+\eta _{n}^{(5)}\left[ 1+\frac{\varrho ^{2}}{\phi ^{2}}-2\varrho \phi \right] \left\| e_{n}-p_{*}\right\| ^{2} \\\le & {} \bigl [1-\eta _{n}^{(5)}(1-\theta ^2)\bigr ]\left\| c_{n}-p_{*}\right\| ^{2}, \end{aligned}$$

which yields \(\left\| d_{n}-p_{*}\right\| ^{2}<\left\| c_{n}-p_{*}\right\| ^{2}\) as \([1-\eta _{n}^{(5)}(1-\theta ^2)]\le 1\). Finally,

$$\begin{aligned} \left\| c_{n+1}-p_{*}\right\| ^{2}\le & {} \left\| (I-\varrho \Gamma _2)d_{n}-(I-\varrho \Gamma _2)p_{*})\right\| ^{2}. \end{aligned}$$

As in the proof of Theorem 6, we get \(\left\| c_{n+1}-p_{*}\right\| \le \theta ^{n+1}\left\| c_{0}-p_{*}\right\| \), which gives \(\lim _{n\rightarrow \infty }\left\| c_{n}-p_{*}\right\| =0\) as \(\theta <1\). \(\square \)

The convergence result for PMP-FBSA can be derived by applying Theorem 8 with the specific choice of \(\eta _{n}^{(6)}=1\) for all \( n\in \mathbb {N}\).

Theorem 9

Let \(\{c_{n}\}_{n=0}^{\infty }\) be a sequence generated by the algorithm PMP-FBSA, with \(\varrho >0\) such that \(\varrho< \phi \sqrt{2 \varrho \phi }< \sqrt{\varrho ^2+ \phi ^2}\) and \(P_{(\Gamma _1+\Gamma _2)^{-1}(0)}c_{0}=p_{*}\). Then, for any initial point \(\ c_{0}\in C\), the sequence \(\{c_{n}\}_{n=0}^{\infty }\) converges strongly to \(p_{*}\), so that

$$\begin{aligned} (\forall n\in \mathbb {N}) \ \ \ \left\| c_{n+1}-p_{*}\right\| \le \theta ^{2(n+1)}\left\| c_{0}-p_{*}\right\| , \end{aligned}$$
(5)

where \(\theta \) is defined in Proposition 5.

The following theorem asserts the equivalence of the convergences of PS-FBSA and NS-FBSA, meaning that if PS-FBSA is convergent, then NS-FBSA is convergent, and vice versa.

Theorem 10

Let \(\{a_{n}\}_{n=0}^{\infty }\) and \(\{c_{n}\}_{n=0}^{\infty }\) be sequences generated by the algorithms NS-FBSA [6] and PS-FBSA, respectively, in which \(\{\eta _{n}^{(\nu )}\}_{n=0}^{\infty }\subset [0,1]\) for \(\nu =4,5,6\). Let \(\varrho<\phi \sqrt{2\varrho \phi }<\sqrt{\varrho ^{2}+\phi ^{2}}\) for \(\varrho >0\), \(\theta \) be as in Proposition 5, and let

$$\begin{aligned} {\widehat{B}}_n=(2\theta ^2+\theta ) \max _{\nu \in \{4,5,6\}}\bigl \{1-\eta _n^{(\nu )}(1-\theta )\bigr \},\quad g_n=\frac{{\widehat{B}}_n}{\eta _n^{(5)}(1-\theta )}\quad (n\ge 0). \end{aligned}$$
(6)

Then the following claims hold: (i) If \(\sum _{n=0}^{\infty }\eta _n^{(5)}=\infty \) and the sequence \(\{g_n\}_{n=0}^\infty \) is bounded, then \(\lim _{n\rightarrow \infty } (a_{n}-c_{n}) = 0\), with

$$\begin{aligned}\Vert a_{n+1}-c_{n+1}\Vert \le [1-\eta _n^{(5)}(1-\theta )] \Vert a_{n}-c_{n}\Vert +{\widehat{B}}_n\Vert a_{n}-p_{*}\Vert ,\end{aligned}$$

for each \(n\in \mathbb {N}\), and \(\lim _{n\rightarrow \infty } c_{n}=p_{*}= P_{(\Gamma _1+\Gamma _2)^{-1}(0)}c_{0}\).

(ii) If \(\lim _{n\rightarrow \infty } c_{n}=p_{*}= P_{(\Gamma _1+\Gamma _2)^{-1}(0)}c_{0}\), then \(\lim _{n\rightarrow \infty } ( c_{n}-a_{n}) = 0 \) with

$$\begin{aligned}\Vert c_{n+1}-a_{n+1}\Vert \le \theta \Vert c_{n}-a_{n}\Vert +{\widehat{B}}_n\Vert c_{n}-p_{*}\Vert ,\end{aligned}$$

for each \(n\in \mathbb {N}\), and \(\lim _{n\rightarrow \infty } a_{n}=p_{*}= P_{(\Gamma _1+\Gamma _2)^{-1}(0)}a_{0}\).

Proof

(i) It is known from Theorem 7 that \(\lim _{n\rightarrow \infty }\left\| a_{n}-p_{*}\right\| =0\). We prove now that \(\lim _{n\rightarrow \infty }\left\| a_{n}-c_{n}\right\| =0\) and \(\lim _{n\rightarrow \infty }\left\| c_{n}-p_{*}\right\| =0\). By Propositions 4, 5, and the algorithms NS-FBSA [6] and PS-FBSA, we conclude that

$$\begin{aligned} \left\| a_{n+1}-c_{n+1}\right\|\le & {} \left\| (I-\varrho \Gamma _2)b_{n}-(I-\varrho \Gamma _2)p_{*})\right\| +\left\| (I-\varrho \Gamma _2)d_{n}-(I-\varrho \Gamma _2)p_{*})\right\| \nonumber \\\le & {} \theta \left\| b_{n}-p_{*}\right\| +\theta \left\| d_{n}-p_{*}\right\| . \end{aligned}$$
(7)

Similarly, we have \(\left\| b_{n}-p_{*}\right\| \le \bigl [ 1-\eta _{n}^{(4)}\left( 1-\theta \right) \bigr ]\left\| a_{n}-p_{*}\right\| \),

$$\begin{aligned}\left\| d_{n}-p_{*}\right\| \le (1-\eta _{n}^{(5)})\theta \left\| c_{n}-p_{*}\right\| +\eta _{n}^{(5)}\theta \left\| e_{n}-p_{*}\right\| ,\end{aligned}$$

and \(\Vert e_{n}-p_{*}\Vert \le \bigl [1-\eta _{n}^{(6)}\left( 1-\theta \right) \bigr ]\left\| c_{n}-p_{*}\right\| \), so that

$$\begin{aligned}\left\| d_{n}-p_{*}\right\| \le (1-\eta _{n}^{(5)})\theta \left\| c_{n}-p_{*}\right\| +\eta _{n}^{(5)}\theta [ 1-\eta _{n}^{(6)}\left( 1-\theta \right) ]\left\| c_{n}-p_{*}\right\| . \end{aligned}$$

Using these inequalities, we get

$$\begin{aligned} \Vert a_{n+1}-c_{n+1}\Vert\le & {} \theta [ 1-\eta _{n}^{(4)}\left( 1-\theta \right) ]\Vert a_{n}-p_{*}\Vert +\theta ^{2}(1-\eta _{n}^{(5)})\Vert c_{n}-p_{*}\Vert \\{} & {} \quad +\theta ^{2}\eta _{n}^{(5)}[ 1-\eta _{n}^{(6)}\left( 1-\theta \right) ]\Vert c_{n}-p_{*}\Vert , \end{aligned}$$

or equivalently

$$\begin{aligned} \Vert a_{n+1}-c_{n+1}\Vert \le A_n \Vert a_{n}-c_{n}\Vert +B_n\Vert a_{n}-p_{*}\Vert , \end{aligned}$$
(8)

where

$$\begin{aligned} A_n&=\theta ^2(1-\eta _n^{(5)})+\theta ^2\eta _n^{(5)}[1-\eta _n^{(6)}(1-\theta )],\\ B_n&=\theta [1-\eta _n^{(4)}(1-\theta )] +\theta ^2(1-\eta _n^{(5)})+\theta ^2\eta _n^{(5)}[1-\eta _n^{(6)}(1-\theta )]. \end{aligned}$$

Since \(\theta <1\) and \(\bigl \{\eta _{n}^{(\nu )}\bigr \} \subset [0,1]\) \((\nu =4,5,6)\), we have \(A_n\le 1-\eta _n^{(5)}(1-\theta )\),

$$\begin{aligned}\theta ^2(1-\eta _n^{(5)})\le \theta ^2[1-\eta _{n}^{(5)}(1-\theta )],\quad \theta ^2\eta _{n}^{(5)}[1-\eta _n^{(6)}(1-\theta )]\le \theta ^2[1-\eta _n^{(6)}(1-\theta )],\end{aligned}$$

as well as

$$\begin{aligned} B_n\le {\widehat{B}}_n=(2\theta ^2+\theta ) \max _{\nu \in \{4,5,6\}}\bigl \{1-\eta _n^{(\nu )}(1-\theta )\bigr \}. \end{aligned}$$

Using these inequalities, (8) becomes

$$\begin{aligned} \Vert a_{n+1}-c_{n+1}\Vert \le [1-\eta _n^{(5)}(1-\theta )]\Vert a_{n}-c_{n}\Vert +{\widehat{B}}_n\Vert a_{n}-p_{*}\Vert . \end{aligned}$$
(9)

For each \(n\in \mathbb {N}\), we set \(\rho _{n}^{(1)}=\Vert a_{n}-c_{n}\Vert \ge 0\), \(\rho _{n}^{(3)}=\eta _{n}^{(5)}(1-\theta ) \in (0,1)\), and \(\rho _n^{(2)}= {\widehat{B}}_n \Vert a_{n}-p_{*}\Vert \).

Note that \(\sum _{n=0}^{\infty }\eta _{n}^{(5)}=\infty \). Since the sequence \(\left\{ g_n\right\} _{n=0}^{\infty }\) is bounded, there exists a constant \(M>0\) such that for each \(n\in \mathbb {N}\), \(\vert g_n\vert <M\).

Let \(\varepsilon >0\). Since \(\lim _{n\rightarrow \infty }\xi _{n} =0\) in which \(\xi _{n}=\left\| a_{n}-p_{*}\right\| \) and \({\varepsilon }/{M}>0\), there exists \(n_{0}\in \mathbb {N}\) such that for each \(n\ge n_{0}\), \(\left| \xi _{n}\right| < \varepsilon /M \). Hence for each \(n\ge n_{0}\), \(\left| g_n\xi _n\right| <\varepsilon \) and, therefore, \(\rho _{n}^{(2)}=o( \rho _{n}^{(3)}) \). As (9) satisfies all the conditions of Lemma 2, \(\lim _{n\rightarrow \infty }\left\| a_{n}-c_{n}\right\| =0\). Since \(\lim _{n\rightarrow \infty }\left\| a_{n}-p_{*}\right\| =0\) and \(\Vert c_{n}-p_{*}\Vert \le \left\| a_{n}-c_{n}\right\| +\left\| a_{n}-p_{*}\right\| \), we conclude that \(\lim _{n\rightarrow \infty }\left\| c_{n}-p_{*}\right\| =0\).

(ii) Now, we prove that \(\lim _{n\rightarrow \infty }\left\| c_{n}-a_{n}\right\| = \lim _{n\rightarrow \infty }\left\| a_{n}-p_{*}\right\| =0\). By Propositions 4, 5, NS-FBSA [6] and PS-FBSA, we have

$$\begin{aligned} \left\| c_{n+1}-a_{n+1}\right\|\le & {} \left\| (I-\varrho \Gamma _2)d_{n}-(I-\varrho \Gamma _2)p_{*})\right\| +\left\| (I-\varrho \Gamma _2)b_{n}-(I-\varrho \Gamma _2)p_{*})\right\| \nonumber \\\le & {} \theta \left\| d_{n}-p_{*}\right\| +\theta \left\| b_{n}-p_{*}\right\| . \end{aligned}$$

As in (i), we find

$$\begin{aligned} \Vert c_{n+1}-a_{n+1}\Vert \le \theta \bigl [1-\eta _{n}^{(4)}(1-\theta )\bigr ]\Vert c_{n}-a_{n}\Vert + B_n \Vert c_{n}-p_{*}\Vert , \end{aligned}$$
(10)

where \(B_n\) is defined as in (i). Using inequalities from (i), as well as the inequality \(\theta [1-\eta _{n}^{(4)}(1-\theta )]\le \theta \), we conclude that the inequality (10) reduces to

$$\begin{aligned} \Vert c_{n+1}-a_{n+1} \Vert \le \theta \Vert c_{n}-a_{n}\Vert + {\widehat{B}}_n\Vert c_{n}-p_{*}\Vert , \end{aligned}$$
(11)

where \({\widehat{B}}_n\) is defined in (6).

Similarly as in (i), for each \(n\in \mathbb {N}\), we set \(\rho _{n}^{(1)}= \Vert c_{n}-a_{n}\Vert \ge 0\), \(\theta \in (0,1)\), and \(\rho _n^{(2)}={\widehat{B}}_n\Vert c_{n}-p_{*}\Vert \). Since \(\bigl \{{\widehat{B}}_n\bigr \} _{n=0}^{\infty }\) is a bounded sequence, there exists \(R>0\) such that \(\vert {\widehat{B}}_n\vert <R\) for each \(n\in \mathbb {N}\). Let \(\varepsilon >0\). Since \(\lim _{n\rightarrow \infty }\eta _{n} =0\) in which \(\eta _{n}=\Vert c_{n}-p_{*}\Vert \) and \(\varepsilon /R>0\), there exists \(n_{0}\in \mathbb {N}\) such that \(\vert \eta _{n}\vert <\varepsilon /R\), for all \(n\ge n_0\). Therefore, for each \(n\in \mathbb {N}\), we have \(\vert {\widehat{B}}_n\vert <\varepsilon \). Thus, \(\lim _{n\rightarrow \infty } \rho _{n}^{(2)}=0\).

As (11) satisfies all the conditions of Lemma 3, we have \(\lim _{n\rightarrow \infty } \Vert c_{n}-a_{n}\Vert =0\). According to \(\lim _{n\rightarrow \infty }\Vert c_{n}-p_{*}\Vert =0\) and \(\Vert a_{n}-p_{*}\Vert \le \Vert c_{n}-a_{n}\Vert +\Vert c_{n}-p_{*}\Vert \), we conclude that \(\lim _{n\rightarrow \infty }\Vert a_{n}-p_{*}\Vert =0\). \(\square \)

In the next result, we compare the convergence rates of PS-FBSA and NS-FBSA as they approach the solution of problem (1). More precisely, we demonstrate that PS-FBSA converges faster to the solution of this problem compared to NS-FBSA.

Theorem 11

Let \(\{a_n\}_{n=0}^{\infty }\) and \(\{c_n\}_{n=0}^{\infty }\) be sequences generated by the algorithms NS-FBSA [6] and PS-FBSA, respectively, with \(\{\eta _n^{(\nu )}\}_{n=0}^{\infty }\subset [0,1]\) for \(\nu =4,5,6\), in which \(\lim _{n\rightarrow \infty }\eta _n^{(5)}\eta _n^{(6)}=1\) and \(\lim _{n\rightarrow \infty }\eta _n^{(4)}=1\). Let \(\varrho<\phi \sqrt{2\varrho \phi }<\sqrt{\varrho ^{2}+\phi ^{2}}\) for \(\varrho >0\) and \(\theta \) be as in Proposition 5. Then, \(\{c_n\}_{n=0}^{\infty }\) converges faster than \(\{a_n\}_{n=0}^{\infty }\) to the solution \(p_{*}= P_{(\Gamma _1+\Gamma _2)^{-1}(0)}c_0\) provided that \(a_0=c_0 \ne p_{*}\).

Proof

According to Theorems 7 and 8, we have the following estimates (which are the best possible) for each \(n\in \mathbb {N}\),

$$\begin{aligned} \left\| a_{n+1}-p_{*}\right\| \le \theta ^{n+1}\left\| a_{0}-p_{*}\right\| \prod \limits _{i=0}^{n}\sqrt{[ 1-\eta _{i}^{(4)}\left( 1-\theta ^2 \right) ]} = \vartheta _{n}^{\left( 2\right) } \end{aligned}$$

and

$$\begin{aligned} \left\| c_{n+1}-p_{*}\right\| \le \theta ^{2(n+1)}\left\| c_{0}-p_{*}\right\| \prod \limits _{i=0}^{n}\sqrt{[ 1-\eta _{i}^{(5)}\eta _{i}^{(6)}\left( 1-\theta ^2 \right) ]}=\vartheta _{n}^{\left( 1\right) }. \end{aligned}$$

Observe that \(\lim _{n\rightarrow \infty }\vartheta _{n}^{\left( 1\right) }=\lim _{n\rightarrow \infty }\vartheta _{n}^{\left( 2\right) }=0\). Thus, all the conditions of Definition 6 are met. For each \(n\in \mathbb {N}\) we define

$$\begin{aligned}\Phi _{n}=\frac{\Vert \vartheta _{n}^{(1)}-0\Vert }{ \Vert \vartheta _{n}^{(2)}-0\Vert }=\theta ^{n+1}\Delta _{n}, \end{aligned}$$

in which

$$\begin{aligned}\Delta _{n}=\prod _{k=0}^{n}\sqrt{{\frac{1-\eta _{k}^{(5)}\eta _{k}^{(6)} ( 1-\theta ^2)}{1-\eta _{k}^{(4)} ( 1-\theta ^2 )}}}\cdot \frac{\Vert c_{0}-p_{*} \Vert }{\Vert a _{0}-p_{*} \Vert }. \end{aligned}$$

Then

$$\begin{aligned} \frac{\Phi _{n+1}}{\Phi _{n}}=\theta \frac{\Delta _{n+1}}{\Delta _{n}}=\theta \sqrt{{\frac{1-\eta _{n+1}^{(5)}\eta _{n+1}^{(6)}\left( 1-\theta ^2 \right) }{1-\eta _{n+1}^{(4)}\left( 1-\theta ^2 \right) }}}. \end{aligned}$$

By applying the assumptions \(\lim _{n\rightarrow \infty }\eta _{n}^{(5)}\eta _{n}^{(6)}=1\) and \(\lim _{n\rightarrow \infty }\eta _{n}^{(4)}=1\), we get \(\lim _{n\rightarrow \infty }{\Phi _{n+1}}/{\Phi _{n}}=\lim _{n\rightarrow \infty }\theta <1\), and by ratio test \(\sum _{n=0}^{\infty }\Phi _{n}\) converges. Hence,

$$\begin{aligned}\lim _{n\rightarrow \infty }\Phi _n=\lim _{n\rightarrow \infty }\frac{\Vert \vartheta _n^{(1)}-0\Vert }{\Vert \vartheta _n^{(2)}-0\Vert }=0. \end{aligned}$$

By Definition 5, \(\{\vartheta _n^{(1)}\} _{n=0}^{\infty }\) converges faster than \(\{\vartheta _n^{(2)}\} _{n=0}^{\infty }\) and hence \(\{c_n\}_{n=0}^{\infty }\) converges faster than \(\{a_n\}_{n=0}^{\infty }\) to \(p_{*}= P_{(\Gamma _1+\Gamma _2)^{-1}(0)}c_0\). \(\square \)

3.1 An application to convex minimization problem

In this section, the following convex minimization problem will be tackled:

$$\begin{aligned} \min \{f(u):u \in C\}. \end{aligned}$$
(12)

in which \(f:C\rightarrow \mathbb {R}\) is a convex mapping. Assume that \(f:C\rightarrow \mathbb {R}\) is a Fréchet differentiable convex function. It is well-known that \(p_{*}\) is a solution of (12) if and only if \(p_{*}=P_{C}(p_{*}-\varrho \nabla f\left( p_{*}\right) )\) in which \(\nabla f:C\rightarrow \mathcal {H}\) is the gradient of f and \(\varrho >0\).

Let us consider \(J_{\varrho }^{\Gamma _1,\Gamma _2}=J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)\) again. Let \(\Gamma _1=\) \(\varrho ^{-1}(P_{C}^{-1}-I)\) for any \(\varrho >0\). It follows from Remarks 1 and 3 that \(\Gamma _1\) is a monotone operator. Also, \((I+\varrho \Gamma _1)C=P_{C}^{-1}C=\mathcal {H}\) holds for any \(\varrho >0\). That is, \(\Gamma _1\) is maximal monotone and its resolvent is \(J_{\varrho }^{\Gamma _1}=(I+\varrho \Gamma _1)^{-1}=\) \(P_{C}\). However, if we take \(\nabla f\) instead of \(\Gamma _2:C\rightarrow \mathcal {H}\) in \(J_{\varrho }^{\Gamma _1,\Gamma _2}\), we get \(J_{\varrho }^{\Gamma _1}(I-\varrho \Gamma _2)=P_{C}(I-\varrho \nabla f)\) for any \(\varrho >0.\) As a result, if p holds (2), it is also a solution of minimization problem (12) when \(\Gamma _1=\) \(\varrho ^{-1}(P_{C}^{-1}-I)\) and \( \Gamma _2=\nabla f\). The next result is a straight consequence of Theorems 6 and 8 for these choices of \(\Gamma _1\) and \(\Gamma _2\).

Corollary 12

Assume that \(p_{*}\) is a solution of (12). Let \(f:C\rightarrow \mathbb {R}\) be a Fréchet differentiable convex function such that its gradient \(\nabla f\) is \(\phi \)-ism and expanding operator for \(\varrho > 0\) such that \(\varrho<\phi \sqrt{2\varrho \phi }<\sqrt{\varrho ^{2}+\phi ^{2}}\). Let \(\Gamma _1=\) \(\varrho ^{-1}(P_{C}^{-1}-I)\) and \(\Gamma _2=\nabla f\) be in the all algorithms M-FBSA, S-FBSA, PS-FBSA, PMP-FBSA. Then, the algorithm M-FBSA, with \(\sum _{n=0}^{\infty }\eta _{n}^{(1)}=\infty \), and the algorithms S-FBSA, PS-FBSA, PMP-FBSA converge strongly to \(p_{*}\).

Example 1

Let \(\mathcal {H}=\ell _2\) be a standard Hilbert space of the real sequences with the inner product \(\langle \textbf{u},\textbf{v} \rangle =\sum _{k=0}^{\infty }u_{k}v_{k}\) and the norm \(\Vert \textbf{u} \Vert =\sqrt{\sum _{k=0}^{\infty }\vert u_{k}\vert ^{2}}\). Then, the set \(C=\left\{ \textbf{u}=\left\{ u_{k}\right\} _{k=0}^{\infty }:\left\| \textbf{u}\right\| \le 1\text { }\right\} \subset \mathcal {H}\) is convex and closed. Let \(f:C\rightarrow \mathbb {R}\) be defined by \(f( \textbf{u}) =\Vert \textbf{u}\Vert ^{2}\). For all \(\textbf{u}\), \(\textbf{v}\in C\) and \(\eta \in (0,1)\), we have

$$\begin{aligned} f\left( \eta \textbf{u}+\left( 1-\eta \right) \textbf{v}\right)= & {} \left\| \eta \textbf{u}+( 1-\eta ) \textbf{v}\right\| ^{2}=\sum _{k=0}^{\infty }\left| \eta u_k+\left( 1-\eta \right) v_k\right| ^{2} \\\le & {} \eta \sum _{k=0}^{\infty }\left| u_{k}\right| ^{2}+\left( 1-\eta \right) \sum _{k=0}^{\infty }\left| v_{k}\right| ^{2} =\eta f(\textbf{u}) +(1-\eta ) f(\textbf{v}), \end{aligned}$$

which implies f is a convex function. Observe that the set of solution of (12) for f is \(S=\{\textbf{0}\}=\{(0,0,\ldots )\}\). By definition (cf. [36, p. 169]) we conclude that f is Fréchet differentiable at \(\textbf{u}\), with the Fréchet derivative \(\nabla f\textbf{u}=\bigl (2u_0,2u_1,\ldots \bigr )=2\textbf{u}\). Since for each \(\textbf{u},\textbf{v}\in \mathcal {H}\),

$$\begin{aligned} \left\langle \nabla f\textbf{u}-\nabla f\textbf{v},\textbf{u}-\textbf{v}\right\rangle =\left\langle 2\textbf{u}-2\textbf{v},\textbf{u}-\textbf{v}\right\rangle =2\sum \limits _{k=0}^{\infty }\left| u_{k}-v_{k}\right| ^{2}=\frac{1}{2}\left\| \nabla f\textbf{u}-\nabla f\textbf{v}\right\| ^{2},\end{aligned}$$

we conclude that \(\nabla f\) is \(\frac{1}{2}\)-ism. Also, for each \(\textbf{u},\textbf{v}\in \mathcal {H}\) we have

$$\begin{aligned} \left\| \nabla f\textbf{u}-\nabla f\textbf{v}\right\| ^{2}=\left\| 2\textbf{u}-2\textbf{v}\right\| ^{2}=4\sum \limits _{k=0}^{\infty }\left| u_{k}-v_{k}\right| ^{2}=4\left\| \textbf{u}-\textbf{v}\right\| ^{2}, \end{aligned}$$

which implies that \(\nabla f\) is 2-expanding mapping. Moreover, \(P_{C}: \mathcal {H}\rightarrow C\) is defined as

$$\begin{aligned}P_{C}(\textbf{u}) =\left\{ \begin{array}{ll} \textbf{u}, &{} \textbf{u}\in C, \\ \textbf{u}/\Vert \textbf{u}\Vert ,\quad &{} \textbf{u}\notin C. \end{array} \right. \end{aligned}$$

Also, it is easy to see that \(\varrho<\phi \sqrt{2\varrho \phi }<\sqrt{\varrho ^2+\phi ^2}\) holds for \(\varrho =1/9\) and \(\phi =1/2\).

By \(\textbf{u}_n=\bigl \{u_k^{(n)}\bigr \}_{k=0}^{\infty }=\bigl (u_0^{(n)},u_1^{(n)},\ldots \bigr )\) we denote the \(n^{\textrm{th}}\) iteration obtained by any of the previous algorithms. We take \(\eta _n^{(\nu )}=1/(n+2)\) for \(\nu =1,3,6,7\) and \(\eta _n^{(\nu )}=1/(n+2)^2\) for \(\nu =2,5\).

The convergence behaviors of the algorithms P-FBSA, M-FBSA, S-FBSA, NS-FBSA, PS-FBSA, PMP-FBSA are listed in Tables 1 and 2 and illustrated in Fig. 1. In these tables we give the obtained values after the first iteration \((n=1)\), as well as ones after \(n=250,500,750\) and 1000 iterations. Numbers in parentheses indicate decimal exponents, e.g., \(1.2937(-28)\) means \(1.2937\times 10^{-28}\).

Table 1 Convergence behavior of the algorithms P-FBSA, M-FBSA and S-FBSA
Table 2 Convergence behavior of the algorithms NS-FBSA, PS-FBSA and PMP-FBSA
Fig. 1
figure 1

Convergence behaviour of \(\Vert \textbf{u}_n\Vert \)

Tables 1 and 2 show that the sequences \(\{\textbf{u}_n\}_{n=0}^\infty \), generated by the algorithms P-FBSA, M-FBSA, S-FBSA, NS-FBSA, PS-FBSA, and PMP-FBSA, with the initial guess \(\textbf{u}_0=\bigl \{1/2^{k+1}\bigr \}_{k=0}^\infty \), converge to \(\textbf{0}=\{0\}_{k=0}^{\infty }\). Figure 1 shows that the sequence \(\Vert \textbf{u}_n-\textbf{0}\Vert =\Vert \textbf{u}_n\Vert \) converges to 0.

4 Applications

In this section, we inquire about the application of the newly defined FBAs on convex minimization. We define new iterative shrinkage algorithms corresponding to PS-FBSA and PMP-FBSA and apply them to the image deblurring problem.

Let \(g:\mathcal {H}\rightarrow (-\infty ,\infty ]\) and \(f:\mathcal {H}\rightarrow \mathbb {R}\) be proper lower semi-continuous convex functions. Assume that g is a non-smooth function and f is differentiable on \(\mathcal {H}\) and has an L-Lipschitz continuous gradient for some \(L > 0\). Here we consider the following problem:

$$\begin{aligned} \min \left\{ F(x)=f(x)+g(x): x \in \mathcal {H}\right\} , \end{aligned}$$

which is equivalent to \( 0\in \nabla f(x^{*})+\partial g(x^{*})\). The set of all solutions of this problem is denoted by \(X^{*}\).

4.1 Application to supervised learning

In this section, the Least Absolute Shrinkage and Selection Operator-LASSO problem is taken as the basis to perform this process, along with the algorithms mentioned in this paper. We apply these adapted algorithms to four real data sets and, we present a detailed and comparative analysis among them. The outcome of this experiment sets forth that the algorithms PS-FBSA and PMP-FBSA have better computation time, lower cost function value, and higher estimation accuracy than algorithms M-FBSA and S-FBSA in general.

Let us consider a dataset \(X\in \mathbb {R}^{m\times d}\) in which every row is a sample and every coloumns are attributes of the samples. Y \(\in \mathbb {R}^m\) denotes the set of outcomes, that is, labels of the samples. Now, we can employ the following minimization problem:

$$\begin{aligned} \min \Bigl \{F\left( w\right) = \frac{1}{2}\left\| Xw-Y\right\| _{2}^{2}+\delta \left\| w\right\| _{1}: w\in \mathbb {R}^d \Bigr \}. \end{aligned}$$
(13)

Let \(\Gamma _1(w)=\text {prox}_{\delta \kappa _{n}}(w)=\left( \left| {w}^{i}\right| -\delta \kappa _{n}\right) _{+} \textrm{sgn}\left( {w}^{i}\right) \) in which \(\textrm{sgn}\) is the signum function and \(\Gamma _2(w)=\left( w-\delta \kappa _{n}X^{t}\left( Xw-Y\right) \right) \). Then, we have

$$\begin{aligned} J_{\delta \kappa _{n} }^{\Gamma _1,\Gamma _2}(w)=\text {prox}_{\delta \kappa _{n}\left\| \cdot \right\| _{1}}( w-\delta \kappa _{n} X^{t}( Xw-Y)). \end{aligned}$$
(14)

The P-FSBA algorithm associated to \(J_{\delta \kappa _{n} }^{\Gamma _1,\Gamma _2}\) in (14) is

$$\begin{aligned}{{\textbf {Algorithm ISTA}}:}\quad w_{n+1}=T(w_n),\ \ n\ge 0,\quad w_{0}\in \mathbb {R}^d\ \ (\kappa _{0}, \delta > 0),\ \ \end{aligned}$$

and it can be used to approximate the solution of (13).

In a similar manner, M-ISTA [6], S-ISTA [6], NS-ISTA [6], PS-ISTA, and PMP-ISTA can be defined by utilizing M-FSBA, S-FSBA, NS-FSBA, PS-FSBA, and PMP-FSBA associated with \(J_{\delta \kappa _{n} }^{\Gamma _1,\Gamma _2}\) in (14), respectively (Table 3).

Table 3 Comparing the effectiveness of algorithms ISTA-PMP ISTA

The datasets employed in numerical experiments are listed as follows:

\(+\):

ARCENE: It is derived from features showing the abundance of proteins in human serum of a given mass value and includes a set of distracting features called ’probes’ which are not predictive. A total of 900 samples with 10000 attributes are contained in this dataset.Footnote 1

\(+\):

Heart Disease (Cleveland): With this dataset, researchers aim to distinguish individuals who have heart disease from healthy individuals. A total of 303 samples with 14 attributes are contained in this dataset.1

\(+\):

CNAE: This is a dataset consisting of 9 categories that belong to free-text job description documents of Brazilian companies and is cataloged in a table called the National Classification of Economic Activities (CNAE). A total of 1080 samples with 586 attributes are contained in this dataset.1

\(+\):

ISOLET: In this data set, whose purpose is to capture letters from sounds and 150 people pronounce each letter in the alphabet twice, a total of 1559 samples with 617 attributes are contained.1

The preprocessing of the datasets, application of the algorithms, and preparation of the result analyzes have been carried out in the Matlab\(^{\circledR }\). The optimal values of \(\kappa =(\kappa _{n})\) have been found by using a backtracking algorithm.

Fig. 2
figure 2

Comparing the effectiveness of algorithms ISTA-PMP ISTA based on reduction in function values \(F(w_{n})\) in each step

Fig. 3
figure 3

Comparing the effectiveness of algorithms ISTA-PMP ISTA based on \(\Vert F(w_{n})-F(w^{*})\Vert \) in each step

Fig. 4
figure 4

Comparing the effectiveness of algorithms ISTA-PMP ISTA based on rMSE in each step

As a preprocess, all datasets were split into \(60\%\) training and \(40\%\) test data with a bias added. The stopping criteria for every algorithm is either \(|F\left( w_{n}\right) -F\left( w_{n-1}\right) |< 10^{-5} \) or the maximum number of iteration steps \(n= 10^{5}\).

We compare the performance of algorithms on each dataset in terms of the calculation times, \(F\left( w_{n}\right) \) (Fig. 2) and \(\left\| F\left( w_{n}\right) -F\left( w^{*}\right) \right\| \) (Fig. 3), the accuracy of prediction (rMSE) for training samples (Fig. 4), as well as the accuracy of prediction (rMSE) for testing samples. The graphs presented in Figs. 2-4 are plotted on the log scale.

4.2 Application to image deblurring problem

The wavelet based deblurring problem defined by

$$\begin{aligned} \min _{x}F_{\textrm{deb}}={\Vert Ax-b\Vert }_{2}^{2}+\lambda {\Vert x\Vert }_{1}, \end{aligned}$$
(15)

in which \(A:\mathbb {R}^{n\times d}\rightarrow \mathbb {R}^{n\times d}\) is Haar wavelet transform of a blurring matrix, \(x\in \mathbb {R}^{n\times d}\) is the original image, and \(b\in \mathbb {R}^{n\times d}\) is the blurred image and \(\lambda >0\) is a control parameter. In [37], problem (15) has been argued detailed and the interested reader can consult [38,39,40]. Gradient projection algorithms are widely used numerical tools to solve this problem.

Let \(\Gamma _1(u)=\textrm{prox}_{\lambda t_n}(u)=\bigl ( \bigl \vert {u}^{i}\bigr \vert -\lambda t_n\bigr )_{+} \textrm{sgn}\bigl ( {u}^{i}\bigr )\) in which \(t_{n}\) is an appropriate stepsize and \(\Gamma _2(u)=\left( u-2 t_n A^{t}\left( Au-b\right) \right) \). Then

$$\begin{aligned} T(u):=J_{\lambda t_n }^{\Gamma _1,\Gamma _2}(u)=\textrm{prox}_{\lambda t_n\left\| \cdot \right\| _{1}}\bigl ( u-2 t_nA^{t}\left( Au-b\right) \bigr ). \end{aligned}$$

In this case, the iterative shrinkage-thresholding (ISTA) defined as \(u_{n+1}=T(u_{n})\). M-ISTA, S-ISTA, NS-ISTA, PS-ISTA and PMP-ISTA algorithms can be defined in a similar manner as above. Taking advantage of the supporting Matlab\(^{\circledR }\) library files provided by Beck and Teboulle, we use Matlab\(^{\circledR }\) functions to produce blurred images from classical test images of Cameraman, Lena, Peppers, and Goldhill. Original and blurred images are shown in Figs. 5.

Fig. 5
figure 5

(Top) Orginal images; (bottom) Blurred images

We present the comparison results regarding the performances of PS-ISTA, PMP-ISTA, ISTA, M-ISTA, S-ISTA, and NS-ISTA in image deblurring tasks. The deblurred Cameraman, Lena, Peppers, Goldhill images are shown in Fig. 6, including PSNR ratio.

Fig. 6
figure 6

Comparison results for deblurred Cameraman, Lena, Peppers, Goldhill images (from top to bottom) by ISTA, M-ISTA, S-ISTA, NS-ISTA, PS-ISTA, PMP-ISTA

We present the deblurring function value at \(n^{\textrm{th}}\) iteration with \(n=100\) and \(n=20\) for the Cameraman, Lena, Peppers, and Goldhill in Figs. 7, 8, 9, 10, respectively. As it is understood from these figures, the function values for PS-ISTA and PMP-ISTA are decreasing rapidly during the first 20 steps and are lower than the others even at \(n=100\).

We also present the Frobenius norm of the difference of two successive iterations, that is, \(\Vert u_{n}-u_{n-1}\Vert _{\textrm{fro}}\) value at \(n^{\textrm{th}}\) iteration with \(n=100\) and \(n=20\) for Cameraman, Lena, Peppers, and Goldhill in Figs. 11, 12, 13, 14, respectively.

Fig. 7
figure 7

Function value \(F^{n}\) in each step for Cameraman: (a) \(n=100\) and (b) \(n=20\)

Fig. 8
figure 8

Function value \(F^{n}\) in each step for Lena: (a) \(n=100\) and (b) \(n=20\)

Fig. 9
figure 9

Function value \(F^{n}\) in each step for Peppers: (a) \(n=100\) and (b) \(n=20\)

Fig. 10
figure 10

Function value \(F^{n}\) in each step for Goldhill: (a) \(n=100\) and (b) \(n=20\)

Fig. 11
figure 11

\(\Vert u_{n}-u_{n-1}\Vert _{\textrm{fro}}\) for Cameraman: (a) \(n=100\) (b) \(n=20\)

Fig. 12
figure 12

\(\Vert u_{n}-u_{n-1}\Vert _{\textrm{fro}}\) for Lena: (a) \(n=100\) (b) \(n=20\)

Fig. 13
figure 13

\(\Vert u_{n}-u_{n-1}\Vert _{\textrm{fro}}\) for Peppers: (a) \(n=100\) (b) \(n=20\)

Fig. 14
figure 14

\(\Vert u_{n}-u_{n-1}\Vert _{\textrm{fro}}\) for Goldhill: (a) \(n=100\) (b) \(n=20\)

5 Conclusion

In this study, we proved some strong convergence theorems through Picard-S and PMP forward-backward algorithms originated from Picard-S [28] and PMP [29] fixed point algorithms. In addition to showing there is an equivalency between convergence of NS-FBSA [6] and PS-FBSA, we compared the rate of convergence of these algorithms. We modified all the algorithms handled in this paper and applied them to the convex minimization problem. We furnished an academic example in support of Corollary 12 and to illustrate the convergence behaviors of the algorithms M-FBSA, S-FBSA, PS-FBSA, PMP-FBSA. We applied these algorithms to the image deblurring problem and machine learning (classification/regression) with datasets derived from real-world problems. The numerical experiments presented in Sect. 4 reveal that the newly defined algorithms PS-FBSA and PMP-FBSA outperform the algorithms M-FBSA and S-FBSA in solving the problems tackled herein. Our results refine and correct the corresponding results in [41, 42].