Abstract
A convex non-convex variational model is proposed for multiphase image segmentation. We consider a specially designed non-convex regularization term which adapts spatially to the image structures for a better control of the segmentation boundary and an easy handling of the intensity inhomogeneities. The nonlinear optimization problem is efficiently solved by an alternating directions methods of multipliers procedure. We provide a convergence analysis and perform numerical experiments on several images, showing the effectiveness of this procedure.






Similar content being viewed by others
Notes
A convex function is proper if it nowhere takes the value \(-\infty \) and is not identically equal to \(+\infty \).
References
Bioucas-Dias, J., Figueredo, M.: Fast image recovery using variable splitting and constrained optimization. IEEE Trans. Image Process. 19(9), 2345–2356 (2010)
Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge (1987)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–22 (2011)
Brown, E.S., Chan, T.F., Bresson, X.: Completely convex formulation of Chan–Vese image segmentation model. Int. J. Comput. Vis. 98(1), 103–121 (2012)
Cai, X.H., Chan, R.H., Zeng, T.Y.: A two-stage image segmentation method using a convex variant of the Mumford–Shah model and thresholding. SIAM J. Imaging Sci. 6(1), 368–390 (2013)
Chan, T., Esedoglu, S., Nikolova, M.: Algorithms for finding global minimizers of image segmentation and denoising models. SIAM J. Appl. Math. 66(5), 1632–1648 (2006)
Chan, T., Vese, L.A.: Active contours without edges. IEEE Trans. Image Process. 10, 266–277 (2001)
Chan, T., Vese, L.A.: Active contours without edges for vector-valued image. J. Vis. Commun. Image Represent. 11, 130–141 (2000)
Chen, P.Y., Selesnick, I.W.: Group-sparse signal denoising: non-convex regularization, convex optimization. IEEE Trans. Signal Proc. 62, 3464–3478 (2014)
Christiansen, M., Hanke, M.: Deblurring methods using antireflective boundary conditions. SIAM J Sci Comput 30, 855–872 (2008)
Clarke, F.H.: Optimizatiom and Nonsmooth Analysis. Wiley, New York (1983)
Donatelli, M., Reichel, L.: Square smoothing regularization matrices with accurate boundary conditions. J Comput Appl Math 272, 334–349 (2014)
Dong, B., Chien, A., Shen, Z.: Frame based segmentation for medical images. Commun. Math. Sci. 32, 1724–1739 (2010)
Ekeland, I., Temam, R.: Convex Analysis and Variational Problems (Classics in Applied Mathematics). SIAM, Philadelphia (1999)
Esedoglu, S., Tsai, Y.: Threshold dynamics for the piecewise constant Mumford–Shah functional. J. Comput. Phys. 211, 367–384 (2006)
Huang, G., Lanza, A., Morigi, S., Reichel, L., Sgallari, F.: Majorization–minimization generalized Krylov subspace methods for \(\ell _p - \ell _q\) optimization applied to image restoration. BIT Numer Math 57(2), 351–378 (2017). doi:10.1007/s10543-016-0643-8
Lanza, A., Morigi, S., Sgallari, F.: Convex image denoising via non-convex regularization. In: Aujol, JF., Nikolova, M., Papadakis, N. (eds.) Scale Space and Variational Methods in Computer Vision. SSVM 2015. Lecture Notes in Computer Science, vol. 9087, pp. 666–677. Springer, Cham (2015)
Lanza, A., Morigi, S., Reichel, L., Sgallari, F.: A generalized Krylov subspace method for lp–lq minimization. SIAM J. Sci. Comput. 37(5), S30–S50 (2015)
Lanza, A., Morigi, S., Sgallari, F.: Constrained TVp-l2 model for image restoration. J. Sci. Comput. 68(1), 64–91 (2016)
Lanza, A., Morigi, S., Sgallari, F.: Convex image denoising via non-convex regularization with parameter selection. J. Math. Imaging Vis. 56(2), 195–220 (2016)
Lanza, A., Morigi, S., Selesnick, I., Sgallari, F.: Nonconvex nonsmooth optimization via convex–nonconvex majorization–minimization. Numer. Math. 136(2), 343–381 (2017)
Li, F., Ng, M., Zeng, T.Y., Shen, C.: A multiphase image segmentation method based on fuzzy region competition. SIAM J. Imaging Sci. 3, 277–299 (2010)
Li, F., Shen, C., Li, C.: Multiphase soft segmentation with total variation and H1 regularization. J. Math. Imaging Vis. 37, 98–111 (2010)
Lie, J., Lysaker, M., Tai, X.: A binary level set model and some applications to Mumford–Shah image segmentation. IEEE Trans. Image Process. 15, 1171–1181 (2006)
Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42(5), 577–685 (1989)
Ng, M.K., Chan, R.H., Tang, W.C.: A fast algorithm for deblurring models with Neumann boundary conditions. SIAM J Sci Comput 21, 851–866 (1999)
Nikolova, M.: Estimation of binary images by minimizing convex criteria. Proc. IEEE Int. Conf. Image Process. 2, 108–112 (1998)
Parekh, A., Selesnick, I.W.: Convex Denoising Using Non-Convex Tight Frame Regularization. arXiv:1504.00976 (2015)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer, Berlin (1998)
Sandberg, B., Kang, S., Chan, T.: Unsupervised multiphase segmentation: a phase balancing model. IEEE Trans. Image Process. 19, 119–130 (2010)
Selesnick, I.W., Bayram, I.: Sparse signal estimation by maximally sparse convex optimization. IEEE Trans. Signal Process. 62(5), 1078–1092 (2014)
Selesnick, I.W., Parekh, A., Bayram, I.: Convex 1-D total variation denoising with non-convex regularization. IEEE Signal Process. Lett. 22(2), 141–144 (2015)
Strong, D.M., Chan, T.F.: Edge-preserving and scale-dependent properties of total variation regularization. Inverse Probl. 19(6), 165–187 (2003)
Wu, C., Tai, X.C.: Augmented Lagrangian method, dual methods, and split Bregman iteration for ROF, vectorial TV, and high order models. SIAM J. Imaging Sci. 3(3), 300–339 (2010)
Wu, C., Zhang, J., Tai, X.C.: Augmented lagrangian method for total variation restoration with non-quadratic fidelity. Inverse Probl. Imaging 5(1), 237–261 (2011)
Yuan, J., Bae, E., Tai, X., Boykov, Y.: A study on continuous max-flow and min-cut approaches. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2217–2224 (2010)
Yuan, J., Bae, E., Tai, X., Boykov, Y.: A continuous max-flow approach to Potts model. In: ECCV 2010: Proceedings of the 11th European Conference on Computer Vision, Springer, Berlin, pp. 332–345 (2010)
Varga, R.S.: Matrix Iterative Analysis, Springer Series in Computational Mathematics. Springer, Berlin, Heidelberg (2000). doi:10.1007/978-3-642-05156-2
Acknowledgements
We would like to thank the referees for comments that lead to improvements of the presentation.This work is partially supported by HKRGC GRF Grant No. CUHK300614, CUHK14306316, CRF Grant No. CUHK2/CRF/11G, AoE Grant AoE/M-05/12, CUHK DAG No. 4053007, and FIS Grant No. 1907303. Research by SM, AL and FS was supported by the “National Group for Scientific Computation (GNCS-INDAM)” and by ex60% project by the University of Bologna “Funds for selected research topics”.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Lemma 3.2
Let \(x :{=} (x_1,x_2,x_3)^T \in {\mathbb R}^3\). Then, the function \(f(\,\cdot \,;\lambda ,T,a)\) in (3.3) can be rewritten in a more compact form as follows:
with the matrix \(Q \in {\mathbb R}^{3 \times 3}\) defined as
We introduce the eigenvalue decomposition of the matrix Q in (7.2):
where orthogonality of the modal matrix V in (7.3) follows from symmetry of matrix Q. Then, we decompose the diagonal eigenvalues matrix \(\Lambda \) in (7.3) as follows:
Substituting (7.4) into (7.3), then (7.3) into (7.1), we obtain the following equivalent expression for the function f:
Recalling that the property of convexity for a function is invariant under non-singular linear transformations of its domain, we introduce the following one for the domain \({\mathbb R}^3\) of function f above:
which is non-singular due to V and Z being non-singular matrices. By defining as \(f_T :{=} f \circ T\) the function f in the transformed domain, we have:
Recalling the definitions of Z and \(\widetilde{\Lambda }\) in (7.4), we can write (7.7) in the explicit form:
where the function g in (7.8) is defined in (3.6). Since the first term in (7.8) is (quadratic) convex, a sufficient condition for the function \(f_T\) in (7.8) to be strictly convex is that the function g in (3.6) is strictly convex. This concludes the proof after recalling that the function f is strictly convex if and only if the function \(f_T\) is strictly convex. \(\square \)
Proof of Lemma 3.3
It follows immediately from the definition of strict convexity that a function from \({\mathbb R}^2\) into \({\mathbb R}\) is strictly convex if and only if the restriction of the function to any possible straight line of \({\mathbb R}^2\) is strictly convex. Due to the radial symmetry property of function \(\psi \) in (3.7), the restriction of \(\psi \) to a generic straight line l is identical to the restriction of \(\psi \) to any other straight line obtained by rotating l around the origin. Hence, \(\psi \) is strictly convex if and only if all its restrictions to horizontal straight lines (any other direction, e.g. vertical, could be chosen as well) with non-negative intercept are strictly convex.
We denote by \(h_0\) and \(h_k\) the functions from \({\mathbb R}\) into \({\mathbb R}\) corresponding to the restriction of \(\psi \) to the horizontal straight line with null intercept, namely the horizontal coordinate axis, and to any horizontal straight line with positive intercept \(k > 0\), respectively. From the definition of the function \(\psi \) in (3.7), we have:
Since the function \(\psi \) in (3.7) is strictly convex if and only if both \(h_0\) in (7.9) and \(h_k\) in (7.10) are strictly convex, it is clear that a necessary condition for \(\psi \) to be strictly convex is that \(h_0\) in (7.9) is strictly convex. It thus remains to demonstrate that \(h_0\) being strictly convex is also a sufficient condition for \(\psi \) to be strictly convex or, equivalently, that strict convexity of \(h_0\) in (7.9) implies strict convexity of \(h_k\) in (7.10) for any positive k.
The functions \(h_0\) and \(h_k\) in (7.9)–(7.10) are clearly even and, since we are assuming \(z \in \mathcal {C}^1({\mathbb R}_+)\), we have that \(h_k \in \mathcal {C}^1({\mathbb R})\) and \(h_0 \in \mathcal {C}^0({\mathbb R}) \cap \mathcal {C}^1({\mathbb R}\setminus \{0\})\). In particular, the first-order derivatives of \(h_0\) and \(h_k\) are as follows:
We note that \(h_0\) is continuously differentiable also at the point \(t=0\) if and only if the right-sided derivative of the function z at 0 is equal to 0.
We now assume that the function \(h_0\) in (7.9) is strictly convex. This implies that the first-order derivative function \(h_0'\) is monotonically increasing on its entire domain \({\mathbb R}\setminus \{0\}\). It thus follows from the definition of \(h_0'\) in (7.11) that the first-order derivative function \(z'\) is nonnegative and monotonically increasing on \({\mathbb R}_+\). We then notice that, for any given \(k > 0\), the first-order derivative function \(h_k'\) in (7.12) is continuous (since \(z'\) is continuous on \({\mathbb R}_+\) by assumption) and odd (hence \(h_k'(0) = 0\)). Finally, by recalling that the composition and the product of positive, monotonically increasing functions is monotonically increasing, it follows that \(h_k'\) in (7.12) is monotonically increasing on the entire real line, hence \(h_k\) in (7.10) is strictly convex. This completes the proof. \(\square \)
Proof of Proposition 3.7
The functional \(\mathcal {J}(\,\cdot \,;\lambda ,\eta ,a)\) in (1.1) is clearly proper. Moreover, since the functions \(\phi (\,\cdot \,;T,a)\) and \(\Vert \cdot \Vert _2\) are both continuous and bounded from below by zero, \(\mathcal {J}\) is also continuous and bounded from below by zero. In particular, we notice that \(\mathcal {J}\) achieves the zero value only for \(u = b\) with b a constant image. The penalty function \(\phi (\,\cdot \,;T,a)\) is not coercive, hence the regularization term in \(\mathcal {J}\) is not coercive. However, since the fidelity term is quadratic and strictly convex, hence coercive, and the regularization term is bounded from below by zero, \(\mathcal {J}\) is coercive.
As far as strong convexity is concerned, it follows from Definition 3.6 that the functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1) is \(\mu \)-strongly convex if and only if the functional \(\widetilde{\mathcal {J}}(u;\lambda ,T,a,\mu )\) defined as
is convex, where \(\mathcal {A}(u)\) is an affine function of u. We notice that the functional \(\widetilde{\mathcal {J}}\) in (7.13) almost coincides with the original functional \(\mathcal {J}\) in (1.1), the only difference being the coefficient is \(\lambda -\mu \) instead of \(\lambda \). Hence, we can apply the results in Theorem 3.5 and state that \(\widetilde{\mathcal {J}}\) in (7.13) is convex if condition (3.10) is satisfied with \(\lambda - \mu \) in place of \(\lambda \). By substituting \(\lambda - \mu \) for \(\lambda \) in condition (3.10), deriving the solution interval for \(\mu \) and then taking the maximum, one obtains equality (3.22). \(\square \)
Proof of Proposition 4.1
The demonstration of condition (4.17) for strict convexity of the function \(\theta \) in (4.16) is straightforward. In fact, the function \(\theta \) can be equivalently rewritten as
with \(\mathcal {A}(x)\) an affine function, so that a necessary and sufficient condition for \(\theta \) to be strictly convex is that the function \(\bar{\theta }\) in (7.14) is strictly convex. We then notice that \(\bar{\theta }\) is almost identical to the function g in (3.6), the only difference being the coefficient \(\beta /2\) that for g now reads \(\lambda /18\). By setting \(\lambda /18 = \beta /2 \Longleftrightarrow \lambda = 9 \beta \), the two functions coincide. Condition for strict convexity of g in (3.10) reads as \(\lambda > 9\,a\), hence by substituting \(\lambda = 9 \beta \) in it we obtain condition (4.17) for strict convexity of \(\theta \).
We remark that condition \(\beta > a\) reduces to \(\beta \ge a\) when only convexity is required.
For the proof of statement (4.19), according to which the unique solution \(x^*\) of the strictly convex problem (4.18) is obtained by a shrinkage of vector r, we refer the reader to [20, Proposition 4.5].
We now prove statement (4.20). First, we notice that if \(\Vert r\Vert _2 = 0\), i.e. r is the null vector, the minimization problem in (4.18) with the objective function \(\theta (x)\) defined in (4.16) reduces to
Since the former and the latter terms of the cost function in (7.15) are a monotonically non-decreasing and a monotonically increasing functions of \(\Vert x\Vert _2\), respectively, the solution of (7.15) is clearly \(x^* = 0\). Hence, the case \(\Vert r\Vert _2 = 0\) can be easily dealt with by taking any value \(\xi ^*\) in formula (4.19). We included the case \(\Vert r\Vert _2 = 0\) in formula a) of (4.20). In the following, we consider the case \(\Vert r\Vert _2 > 0\).
Based on the previously demonstrated statement (4.19), by setting \(x = \xi \, r\), \(\xi \ge 0\), we turn the original unconstrained 2-dimensional problem in (4.18) into the following equivalent constrained 1-dimensional problem:
where in (7.16) we omitted the constants and introduced the cost function \(f: {\mathbb R}_+ \rightarrow {\mathbb R}\) for future reference. Since the function \(\phi \) in (7.16), which is defined in (2.1), is continuously differentiable on \({\mathbb R}_+\), the cost function f in (7.16) is also continuously differentiable on \({\mathbb R}_+\). Moreover, f is strictly convex since it represents the restriction of the strictly convex function \(\theta \) in (4.16) to the half-line \(\xi \, r, \, \xi \ge 0\). Hence, the first-order derivative \(f'(\xi )\) is a continuous, monotonically increasing function and a necessary and sufficient condition for an inner point \(0< \xi < 1\) to be the global minimizer of f is that \(f'(\xi ) = 0\). From the definition of f in (7.16) we have:
and, in particular:
It follows from (7.18) that the solution of (7.16) can not be \(\xi ^* = 0\), hence it is either \(\xi ^* = 1\) or an inner stationary point.
Recalling the definition of \(\phi (\,\cdot \,;T,a)\) in (2.1), after some simple manipulations the function \(f'(\xi )\) in (7.17) can be rewritten in the following explicit form:
that is:
Denoting by \(\xi _1^*\), \(\xi _2^*\), \(\xi _3^*\) the points where \(f_1'\), \(f_2'\), \(f_3'\) in (7.20) equal zero, respectively, we have:
However, for \(\xi _1^*\), \(\xi _2^*\) and \(\xi _3^*\) in (7.21) to be acceptable candidate solutions of problem (7.16), they must belong to the domains \(\mathcal {D}_1\), \(\mathcal {D}_2\), \(\mathcal {D}_3\) of \(f_1'\), \(f_2'\), \(f_3'\), respectively, and obviously also to the optimization domain \(\mathcal {O} :{=} [0,1]\) of problem (7.16). We have:
The proof of statement (4.20) is thus completed. \(\square \)
Proof of Theorem 5.7
Based on the definition of the augmented Lagrangian functional in (5.2), we rewrite in explicit form the first inequality of the saddle-point condition in (4.7):

and, similarly, the second inequality:
In the first part of the proof, we prove that if \(\,(u^*,t^*;\rho ^*)\,\) is a solution of the saddle-point problem (4.6)–(4.7), that is it satisfies the two inequalities (7.25) and (7.26), then \(u^*\) is a global minimizer of the functional \(\mathcal {J}\) in (1.1).
Since (7.25) must be satisfied for any \(\rho \;{\in }\; {\mathbb R}^{2n}\), we have:
The second inequality (7.26) must be satisfied for any \((u,t) \;{\in }\; {\mathbb R}^n {\times }\, {\mathbb R}^{2n}\). Hence, by taking \(t = Du\) in (7.26) and, at the same time, substituting in (7.26) the previously derived condition (7.27), we obtain:
Inequality (7.28) indicates that \(u^*\) is a global minimizer of the functional \(\mathcal {J}\) in (1.1). Hence, we have proved that all the saddle-point solutions of problem (4.6)–(4.7), if there exists one, are of the form \(\,(u^*,Du^*;\rho ^*)\,\), with \(u^*\) denoting a global minimizer of \(\mathcal {J}\).
In the second part of the proof, we prove that at least one solution of the saddle-point problem exists. In particular, we prove that if \(u^*\) is a global minimizer of \(\mathcal {J}\) in (1.1), then there exists at least one pair \(\,(t^*,\,\rho ^*) \,{\in }\; {\mathbb R}^{2n} {\times }\, {\mathbb R}^{2n}\) such that \((u^*,t^*;\rho ^*)\) is a solution of the saddle-point problem (4.6)–(4.7), that is it satisfies the two inequalities (7.25) and (7.26). The proof relies on a suitable choice of the vectors \(t^*\) and \(\rho ^*\). We take:
where the term \(\bar{\partial }_{t} \left[ \, R \,\right] (Du^*)\) indicates the Clarke generalized gradient (with respect to t, calculated at \(Du^*\)) of the nonconvex regularization function R defined in (5.1). We notice that a vector \(\rho ^*\) satisfying (7.30) is guaranteed to exist thanks to Proposition 5.2. In fact, since here we are assuming that \(u^*\) is a global minimizer of functional \(\mathcal {J}\), the first-order optimality condition in (5.5) holds true.
Due to (7.29), the first saddle-point condition in (7.25) is clearly satisfied. Proving the second condition (7.26) is less straightforward: we need to investigate the optimality conditions of the functional \(\mathcal {L}\,(u,t;\rho ^*)\) with respect to the pair of primal variables (u, t). We follow the same procedure used, e.g., in [35], which requires \(\mathcal {L}\,(u,t;\rho ^*)\) to be jointly convex in (u, t). According to Proposition 5.4, in our case this requirement is fulfilled if the penalty parameter \(\beta \) satisfies condition (5.17), which has thus been taken as an hypothesis of this theorem. Hence, we can apply Lemma 5.6 and state that (7.26) is satisfied if and only if both the following two optimality conditions are met:
where in (7.31)–(7.32) we introduced the two functions \(\mathcal {L}^{(u)}\) and \(\mathcal {L}^{(t)}\) representing the restrictions of functions \(\mathcal {L}\,(u,t^*;\rho ^*)\) and \(\mathcal {L}\,(u^*,t;\rho ^*)\) to only the terms depending on the optimization variables u and t, respectively. In particular, after recalling the definition of the augmented Lagrangian functional in (5.2), we have
where, like in [35], \(\mathcal {L}^{(u)}\) and \(\mathcal {L}^{(t)}\) have been split into the sum of two functions with the aim of then deriving optimality conditions for \(\mathcal {L}^{(u)}\) and \(\mathcal {L}^{(t)}\) by means of Lemma 5.5. Unlike in [35], the ADMM quadratic penalty term \(\frac{\beta }{2} \, \Vert t - D u \Vert _2^2\) has been split into two parts (differently in \(\mathcal {L}^{(u)}\) and \(\mathcal {L}^{(t)}\)) in order to deal with the nonconvex regularization term. In particular, the coefficients \(\beta _1\), \(\beta _2\) introduced in (7.33)–(7.34) satisfy
such that the terms \(S^{(u)}\), \(S^{(t)}\) in (7.33)–(7.34) are clearly convex and the terms \(Q^{(u)}\), \(Q^{(t)}\) are convex due to results in Lemma 5.3 and Proposition 4.1, respectively. We also notice that all the functions \(Q^{(u)}\), \(Q^{(t)}, S^{(u)}\), \(S^{(t)}\) are proper and continuous and that \(S^{(u)}\), \(S^{(t)}\) are G\(\hat{a}\)teaux-differentiable. Hence, we can apply Lemma 5.5 separately to (7.33) and (7.34), to check if the pair \((u^*,t^*)\) satisfies the optimality conditions in (7.31) and (7.32), so that the second saddle-point condition (7.26) holds true. We obtain:
where the term \(t^*-Du^*\) in (7.36)–(7.37) is zero due to the setting (7.29). We rewrite conditions (7.36)–(7.37) as follows:
where in (7.38) we added and subtracted the term \(\lambda \, (u^*-b)\) and added the null term \(\beta _1 D^T (t^* - D u^*)\), and in (7.39) we added the null term \(\beta _2 \, (t^* - Du^*)\). The term \(-\lambda \, (u^*-b) - D^T \rho ^*\) in (7.38) is null due to the setting (7.30). By introducing the two functions
which are convex under conditions (7.35) for the same reason for which the functions \(Q^{(u)}\), \(Q^{(t)}\) in (7.33)–(7.34) are convex, conditions (7.38)–(7.39) can be rewritten as
where we highlighted that the left side of the scalar product in (7.41) represents the subdifferential (actually, the standard gradient) of function U calculated at \(u^*\) and that the left side of the scalar product in (7.42) is a particular vector belonging to the subdifferential of function T calculated at \(t^*\). This second statement comes from the definition of function T in (7.40) and from settings (7.29)–(7.30).
Optimality conditions in (7.41)–(7.42) are easily proved by noticing that the left-hand sides of (7.41)–(7.42) represent the Bregman distances associated with functions U and T, respectively, which are known to be non-negative for convex functions. Hence, the second saddle-point condition in (7.26) is satisfied and, finally, the second and last part of the proof is completed. \(\square \)
Proof of Theorem 5.8
Let us define the following errors:
Since \((u^*,t^*;\rho ^*)\) is a saddle-point of the augmented Lagrangian functional in (4.6), it follows from Theorem 5.7 that \(t^* = D^*u\). This relationship, together with the ADMM updating formula for the vector of Lagrange multipliers in (4.10), yields:
It then follows easily from (7.44) that
Computation of a lower bound for the right-hand side of (7.45)
Since \((u^*,t^*;\rho ^*)\) is a saddle-point of the augmented Lagrangian functional in (4.6), it satisfies the following optimality conditions [see (7.36)–(7.37) in the proof of Theorem 5.7]:
Similarly, by the construction of \(\big (u^{(k)},t^{(k)}\big )\) in Algorithm 1, we have:
Taking \(u = u^{(k)}\) in (7.46), \(u = u^*\) in (7.48) and recalling that \(\langle D^T w , z \rangle = \langle w , D z \rangle \) , by addition we obtain:
Similarly, taking \(t = t^{(k)}\) in (7.47) and \(t = t^*\) in (7.49), after addition we have:
where, we recall, the parameters \(\beta _1\) and \(\beta _2\) in (7.50)–(7.51) satisfy the constraints in (7.35).
By summing up (7.50) and (7.51), we obtain:
that is
where we introduced the positive coefficient \(\beta _3 > 0\) (the reason will be clear later on). We want that the last term in (7.52) takes the form \(-\,\big \Vert \, c_1 \bar{t}^{(k)} - c_2 D \bar{u}^{(k)} \big \Vert _2^2 \) with \(c_1,c_2 > 0\). Hence, first we impose that the coefficients of \(\big \Vert \bar{t}^{(k)} \big \Vert _2^2\) and \(\big \Vert D \bar{u}^{(k)} \big \Vert _2^2\) in (7.52) are strictly positive, which yields:
Combining (7.53) with conditions (7.35), we obtain:
From condition on \(\beta _3\) in (7.54), the following constraint for \(\beta \) is derived:
We notice that condition (7.55) can be more stringent than (5.17), depending on \(\tau _c\), hence it has been taken as an hypothesis of this theorem and will be considered, together with (5.17), in the rest of the proof. From condition on \(\beta _3\) in (7.54) it also follows that the coefficient \(\beta - \beta _3\) of the scalar product in (7.52) is positive.
Then, we have to impose that the coefficient of the term \(-\big \langle \, \bar{t}^{(k)} , D \bar{u}^{(k)} \, \big \rangle \) in (7.52) is twice the product of the square roots of the (positive) coefficients of \(\big \Vert \bar{t}^{(k)} \big \Vert _2^2\) and \(\big \Vert D \bar{u}^{(k)} \big \Vert _2^2\), that is:
By imposing condition on \(\beta _3\) in (7.54), namely \(\beta -\beta _3 > 2a\), it is easy to verify that (7.56) admits acceptable solutions only in case that \(\beta _1 > \beta _2\). By setting in (7.56) \(\beta _1 = \tau _c \frac{9}{8} \, a\) and \(\beta _2 = a\), which are acceptable values according to this last result (since \(\tau _c > 1\), clearly \(\beta _1 > \beta _2\)) and also to conditions (7.54), we obtain:
We now check if there exist acceptable values for the two remaining free parameters, namely \(\beta \) and \(\beta _3\), such that (7.57) holds. We impose that \(\beta \) in (7.57) satisfies its constraint in (5.17), which guarantees convexity of the augmented Lagrangian functional, and the derived condition in (7.55):
Since \(\tau _c > 1\) (and \(a > 0\)), both conditions in (7.58) are satisfied for any \(\beta _3 > 0\). Hence, for \(\beta _1 = \tau _c \frac{9}{8} \, a\), \(\,\beta _2 = a\) and any \(0< \beta _3 < \beta - 2a\), with \(\beta > 2a\), the last term in (7.52) can be written in the form
where \(c_1,c_2 > 0\), \(c_1 \ne c_2\). Replacing the expression in (7.59) for the last term in (7.52), we have:
where in (7.60) we multiplied both sides by the positive coefficient \(2\beta \). We notice that the left-hand side of (7.60) coincides with the right-hand side of (7.45), hence it follows that:
Computation of a lower bound for the term \(\varvec{T}\) in (7.61).
We can write:
First, we notice that:
Then, from the construction of \(t^{(k-1)}\) (from \(u^{(k-1)}\)), we have:
Taking \(t = t^{(k-1)}\) in (7.49) and \(t = t^{(k)}\) in (7.64), we obtain:
By addition of (7.65) and (7.66), we have that
Recalling that
replacing (7.68) into (7.67) and then dividing by \(\beta \), we obtain:
From (7.62), (7.63) and (7.69), we have:
Convergence results for sequences \(\varvec{t^{(k)},Du^{(k)},\rho ^{(k)}}\).
From (7.61) and (7.70), we obtain:
that is:
where we have introduced the scalar sequence \(\{s^{(k)}\}\), which is clearly bounded from below by zero. We notice that the coefficient \(\beta -2\beta _2\) in (7.72) is positive due to the constraint \(\beta > 2a\). Since the right-hand side of the first inequality in (7.72) is nonnegative, \(\{s^{(k)}\}\) is monotonically non-increasing, hence convergent. This implies that the right-hand side of (7.72) tend to zero as \(k \rightarrow \infty \). From these considerations and (7.72) it follows that:
Since the two coefficients \(c_1\), \(c_2\) in (7.76) satisfy \(c_1, c_2 \ne 0\), \(c_1 \ne c_2\), then it follows from (7.75)–(7.76) that both the sequences \(\{\bar{t}^{(k)}\}\) and \(\{D \bar{u}^{(k)}\}\) tend to zero as \(k \rightarrow \infty \). Results in (7.73)–(7.76) can thus be rewritten in the following more concise and informative form:
where the last equality in (7.78) comes from the saddle-point properties stated in Theorem 5.7. Since it will be useful later on, we note that it follows from (7.78) that
Convergence results for sequence \(\varvec{u^{(k)}}\).
We now prove that \(\lim _{{k \rightarrow \infty } } u^{(k)} =u^*\). Since \((u^*,t^*;\rho ^*)\) is a saddle point of the augmented Lagrangian functional \(\mathcal {L}(u,t;\rho )\), we have
By taking \(u = u^{(k)}\), \(t = t^{(k)}\) in (7.81) and recalling the definition of \(\mathcal {L}(u,t;\rho )\) in (5.2), we have:
Taking \(u = u^*\) in (7.48) and \(t = t^*\) in (7.49), we obtain:
By summing up (7.83) and (7.84), we have:
Taking \(\lim \inf \) of (7.82) and \(\lim \sup \) of (7.85), and using the results in (7.77)–(7.80), we have
It follows from (7.86) that
We now manipulate \(F(u^{(k)})\) as follows:
On the other hand, we have that
where in (7.89) we have used the (optimality) condition (7.30). From (7.88) and (7.89) it follows that
that is
Taking the limit for \(k \rightarrow \infty \) of both sides of (7.91) and recalling (7.79) and (7.87), we obtain:
thus completing the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Chan, R., Lanza, A., Morigi, S. et al. Convex non-convex image segmentation. Numer. Math. 138, 635–680 (2018). https://doi.org/10.1007/s00211-017-0916-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00211-017-0916-4