Elsevier

Applied Mathematics and Computation

Volume 343, 15 February 2019, Pages 67-89
Applied Mathematics and Computation

A two-phase-like proximal point algorithm in domains of positivity

https://doi.org/10.1016/j.amc.2018.09.054Get rights and content

Abstract

This paper improves a decomposition-like proximal point algorithm, developed for computing minima of nonsmooth convex functions within a framework of symmetric positive semidefinite matrices, and extends it to domains of positivity of reducible type, in a nonlinear sense and in a Riemannian setting. Several computational experiments with weighted Lp (p=1,2) centers of mass are performed to demonstrate the practical feasibility of the method.

Introduction

In several situations, researchers need to estimate solutions for optimization problems on manifolds of non-Euclidean type. This practice brings together other relevant aspects, such as the development of new optimization techniques and the extension of classical methods. This work introduces an optimization method that is both a theoretical improvement and extension to domains of positivity of the proximal point algorithm presented by Gregório and Oliveira [12].

The methodology in [12] is based on the Schur theorem for symmetric matrices and the spectral theorem for positive definite matrices. See [13] for further details on the Schur and spectral theorems for Hermitian (symmetric) positive definite matrices.

Moreover, the method belongs to the class of decomposition algorithms. Points from the proximal trajectory generated by the method are computed by two iterative steps, videlicet, solving recurrently in a iterative way an auxiliary Riemannian subproblem defined on the diagonal positive definite matrices set to update the spectral step, after applying the Schur decomposition to update the orthogonal step.

In the current work, the orthogonal step is also assumed to be any (local) solution of an auxiliary Riemannian subproblem now defined on the orthogonal group, taking into account its Riemannian structure as discussed in [8]. The stopping criteria for the inner iterations, which were not stated in [12], were made explicit for this way of defining the orthogonal step; this also enabled the extension of the technique to homogeneous domains of positivity, that is, it is nonlinearly reducible in a sense that is also introduced here at the end of Section 2.

Synthetically, the aims of the current work can be summarized as improving theoretically the proximal point method in [12] through the analysis and synthesis of stopping criteria for the inner iterations, owing to the introduction of the orthogonal step described above and discussed in Section 3; extending the technique to compute minimizers of convex functions now defined in homogeneous domains of positivity of nonlinearly reducible type; and exemplifying the practical feasibility of its applications to real problems, casting a glance over (weighted) median and mean computation of symmetric positive definite matrices.

Notice that computation of (weighted) Lp centers of mass, especially in the (weighted) median and mean case (p=1 and p=2, respectively), in manifolds of symmetric positive definite matrices has been the aim of several researchers, including [9], [10], [18], and [28]. In almost all work on this subject, researchers have applied classical smooth optimization techniques to compute the (weighted) median and mean, generally faster than our method, as seen in the computational behavior reference to precise CPU time in Section 5.2.5.

However, there has been little research on techniques that fit themselves to the nonsmooth case in Riemannian manifolds. See [6], [7], [12], and [20] for further details on nonsmooth techniques in the particular case of the Hadamard manifolds. Despite the absence of such computational experiments, the Riemannian median (optimization) problem is naturally nonsmooth in points of the data distribution for which one wishes to estimate it. Specifically, in this case, the (weighted) median cannot be computed in a smooth way, mainly when the solution is one of these points. Therefore, the advantage of our technique, as well as the method proposed in [12], resides in avoiding the assumption of smoothness. It should be noted here that the term nonsmooth encompasses the smooth case. Therefore, our method can also be applied to smooth cases without restrictions, as in the Riemannian mean problem (for instance), although it is naturally characterized as a slow method, because an optimization problem with the same complexity as the original problem is solved in each iteration.

In the remainder of Section 1, we present a narrative on different algorithms for solving optimization problems and applications related to this field. Section 2 introduces a set of preliminary concepts, highlighting the two main examples of domains of positivity. Section 3 establishes the problem and the algorithm in its exact version. The global convergence criteria are fulfilled in this case, taking into account the general result of convergence in Ferreira and Oliveira [7]; this is also discussed at the end of the section. Section 4 presents the inexact version. Although the convergence results in [12] remain valid in this case, we return to them here. Another result preserved from [12] is the weak convergence of the method when solutions belong to the boundary of reducible domains of positivity. Section 5 presents theoretical aspects that enable the application of the method to computation of the weighted Lp centers of mass in the manifolds of symmetric positive definite matrices, and Section 5.2.5 shows the results of computational experiments with our method. Section 6 ends this work with conclusions and suggestions for future investigations.

Synthesizing extensions of optimization methods to Riemannian manifolds can be as complicated as the establishment of new algorithms. Generally, an extension is not easily obtained, because its analysis of convergence does not derive from straightforward arguments. The same happens with classical optimization methods in Euclidean spaces, but this only demands concepts of Euclidean geometry. In the Riemannian case, aspects of the topology of the differentiable manifolds in the study and concepts of Riemannian geometry merge themselves in a relevant field of knowledge, as seen in Ferreira and Oliveira [6] and Smith [24], for instance.

Despite being introduced relatively recently in the history of optimization, some techniques have already been extended to general cases. Indeed, the replacement of the Riemannian distance by Bregman distance in the definition of the proximal point iteration in [7] enabled Quiroz and Oliveira [20] to develop a proximal point method for generalized convex optimization problems, on noncompact Hadamard manifolds.

On the other hand, they also enable the development of specific methods of optimization in particular manifolds. In fact, the employment of the same algorithm in [7] to estimate minima for convex functions in the manifolds of n-by-n symmetric positive definite matrices, denoted here by SP(n), enabled [12] to synthesize a proximal point method based on the spectral theorem and the Schur decomposition for symmetric positive definite matrices.

A key concept underlying the development of algorithms in optimization is convexity in its Euclidean or Riemannian form. In this paper, we restrict the applicability of our work to Riemannian convexity only. Let Rn be the real Euclidean space with dimension n. A subset C of a complete Riemannian manifold MRn is called convex if it contains all minimal geodesic segments connecting any pair of points belonging to C. Besides, a function f:CR is said to be convex if(fγxy)((1t)·t1+t·t2)(1t)·f(x)+t·f(y),for every x, y ∈ C and t ∈ [0, 1], where γxy is any minimal geodesic segment in C satisfying γxy(t1)=x and γxy(t2)=y. Moreover, f is said to be strictly convex if the inequality above is strict, for every t ∈ (0, 1). See Udriste [26] for further details on convexity in Riemannian manifolds.

Subdifferentials of functions also have an important role in nonsmooth and convex optimization in Riemannian manifolds. According to [26], the subdifferential of a function f:CR, at x ∈ C, denoted by ∂f(x), is the set defined byf(x):={sTxM:f(y)f(x)+s,expx1yx,yC},where expx1y is the tangent vector at x to any minimal geodesic segment that connects x to y in C. Namely, ⟨, ⟩x represents the Riemannian metric on the tangent space of M, at x, denoted by TxM.

The subdifferential of a convex function is nonempty at any point in its domain, as also seen in [26]. Moreover, any minimizer x* of a convex function f:CR complies with0f(x*).Any element of ∂f(x) is called a subgradient to f at x. If f is differentiable, then f(x)={gradxf}, where gradxf represents the Riemannian gradient of f, at x. Namely, gradxf is the vector field metrically equal to ∇f(x), i.e., gradxf,sx=f(x),s, for every sTxM. In addition, if G is the symmetric positive definite bilinear form for which s1,s2x=G(x)s1,s2, for every xM and s1,s2TxM, thengradxf=[G(x)]1f(x).

The technique discussed in this paper belongs to the class of proximal point methods. That class was introduced by Martinet [17] and extended to monotone operators by Rockafellar [21]. The technique presented in [7] is also an extension of that class of methods to Hadamard manifolds. The algorithm presented in [12] is a particular case of the method in [7], where MSP(n). Based on the Schur factorization for symmetric positive definite matrices, it computes points from the proximal trajectory generated by the method in [7] by making descent steps in the spectrum of the matrix iterated for the function to be minimized, without losing the positive definiteness, and afterwards by applying the Schur factorization to update the matrix with respect to the orthogonal step.

However, the orthogonal step only updates the iterated matrix. It is also not a descent step. In other words, the value of the function to be minimized remains unchanged under this step. Moreover, any information on that function is used with respect to the natural geometry of the orthogonal group. This was extensively explored by Fiori [8], for instance. Besides, Schur factorization has a high computational cost when n is large, making the method slow.

The increment in the number of methods related to optimization in Riemannian manifolds is justified by the increase of mathematical models for real problems based on natural geometries. In particular, studies show that domains of positivity are manifolds of the Hadamard type for a special choice of Riemannian metric, as presented in [22]. For instance, the Riemannian structure for SP(n) is derived from it, despite other approaches leading to the same structure, as seen in [3] and [19].

Signal processing and computer vision are examples of applications of aggregate models based on the concepts of symmetry and positive definiteness. Diffusion tensor magnetic resonance imaging (DT-MRI) has been acknowledged as one of the most interesting applications of these concepts. Tridimensional images in DT-MRI are numerically represented by a matrix with the property that each 3-by-3 submatrix, named voxel, is symmetric and positive definite. The geometric representation of a voxel is an ellipsoid whose semi-axes are given by the eigenvectors of the corresponding 3-by-3 submatrix and whose amplitudes are given by the corresponding eigenvalues as seen in [27].

According to [28], image filtering is often required in DT-MRI, because noise is known to occur in the synthesis of imaging data by magnetic resonance machines. Noise is generated by external interference signs that generate deficient voxels, namely asymmetric or indefinite 3-by-3 matrices. Filtering improves the imaging visualization by minimizing the variance of the imaging data from the noisy region. Image filtering is done by applying the weighted mean filter, when noisy voxels belong to the interior of the image, or the weighted median filter, when noisy voxels belong to the edge of the image [28].

Filtering can be classified as either directional or tensorial. The former essentially uses imaging data based on numerical amounts related to the image, while the latter uses the Riemannian structure to SP(n). Nevertheless, according to recent studies, Riemannian filtering has proven to be more suitable than Euclidean filtering, because it smooths the handling of data and naturally preserves symmetry and positive definiteness [5], [9], [10], [28].

A problem that generalizes the computation of the Riemannian weighted mean and median is the computation of the Riemannian weighted Lp center of mass, for 1 ≤ p < ∞, because it is defined as a convex optimization problem. According to [1], the problem of computation of the Riemannian weighted Lp center of mass is well-posed in its discrete form on SP(n). The Riemannian weighted median and Riemannian weighted mean occur as particular cases of the Riemannian weighted Lp center of mass when p=1 and p=2, respectively.

There are several other applications involving symmetry and positive definiteness, as seen in [9]. Among these, we note human detection via classification, as discussed in [25], which also involves covariance of symmetric positive definite matrices in SP(5).

Another matrix manifold with properties similar to SP(n) is the set of n-by-n Hermitian positive definite matrices, denoted here by HP(n). The Lp centers of mass are also easily introduced in HP(n), taking into account the same general Riemannian metric to homogeneous domains of positivity as in [22].

Section snippets

Basic concepts and examples

Here, several concepts on domains of positivity are presented, all of which are based on [22]. Let D be a nonempty open set in Rn and σ:Rn×RnR be a nonsingular symmetric bilinear form.

Definition 1

D is called the domain of positivity, with respect to σ, if the following axioms are guaranteed:

  • (A1)

    σ(a1, a2) > 0, for every a1, a2 ∈ D;

  • (A2)

    if aRn is such that σ(a,a¯)>0, for every a¯D¯ (a¯0), where D¯ is the closure of D, then a ∈ D.

Namely, σ is called the characteristic of D.

Example 1

Let Tr:Rn×nR denote the trace function

General aspects and algorithm

Let D and f:D¯R{,+} be a nonlinearly reducible domain of positivity and a proper and convex function on D, respectively. We start this section by assuming, as a hypothesis, the extension of the continuity of f in D to the boundary of D (∂D), because D is a totally convex open set and f is convex in D. See Theorem 3.6 in [26] for further details. In fact, we admit that

  • (H)

    limk+f(ak)=f(a¯), for every a¯D¯ and {ak} ⊂ D such that aka¯.

Our main goal is to establish an interior proximal point

General aspects and algorithm

In this section, we present the inexact version of the TPP algorithm. Analogous to assumption (16) in [12], we admit that ak+1 is not determined exactly. In practice, this means a slight relaxation of (7). Indeed, we assume thatβkexpak+11akϵkf(ak+1),for any ϵk ≥ 0, k=0,1,, where ϵf(a˜) denotes the ϵ-subdifferential of f, at a˜D, for any ϵ ≥ 0, i.e.,ϵf(a˜):={sRn:f(a)f(a˜)+s,expa˜1aa˜ϵ,aD}.

Relation (15) enables us to use finite stopping criteria for each iteration of the proximal

General aspects

In this section, all the statements refer to SP(n) as a nonlinearly reducible domain of positivity. We denote by S(n) the vector space of symmetric matrices. Recall that the Schur decomposition establishes for all WS(n) the existence of an orthogonal matrix Q and a real diagonal matrix Λ, with λ11 ≥ ⋅⋅⋅ ≥ λnn, such that W=QΛQT, where λιι is the ιth eigenvalue of W denoted by λι(W). Note that if W1,W2S(n) are such that W1=QΛ1QT and W2=QΛ2QT then they commute, with respect to the product of

Conclusions

Studies on domains of positivity and their properties have enabled us to synthesize our initial goals, which were the extension and improvement of a factorized-like proximal point method that was proposed to compute the minima of Riemannian convex functions in the manifold of symmetric positive definite matrices. Moreover, these studies enabled us to take a critical view of the theory on domains of positivity, besides increasing our understanding of several applications. Indeed, despite the

Acknowledgments

This work was derived from the research project Proximal point algorithm with Schur factorizations in domains of positivity. Part of this work appeared as a technical report at http://www.optimization-online.org/DB_HTML/2013/05/3885.html, under the title A nonsmooth optimization technique in domains of positivity, and it encompasses results from another technical report entitled A proximal technique for computing the Karcher mean of symmetric positive definite matrices, available at //www.optimization-online.org/DB_HTML/2013/05/3885.html

References (28)

  • S. Fiori

    Quasi-geodesic neural learning algorithms over the orthogonal group: a tutorial

    J. Mach. Learn. Res.

    (2005)
  • S. Fiori

    Learning the fréchet mean over the manifold of symmetric positive-definite matrices

    J. Cogn. Comput.

    (2009)
  • G.H. Golub et al.

    Matrix Computations

    (1996)
  • R.A. Horn et al.

    Matrix Analysis

    (1985)
  • Cited by (3)

    • L<sup>α</sup> Riemannian weighted centers of mass applied to compose an image filter to diffusion tensor imaging

      2021, Applied Mathematics and Computation
      Citation Excerpt :

      For circumventing this problem, we propose to apply the proximal point algorithm on nonlinear reductible domains of positivity proposed in [9] for computing minimum of Riemannian convex functions on the corresponding Riemannian metric to that class of manifolds. The methodology proposed in [9] is a constraint of the algorithm in [28] to nonlinearly reducible homogeneous domains of positivity that represent a particular class of Hadamard manifolds. It consists in replacing the iteration (20) by two new iterations, working as a decomposition method.

    View full text