A two-phase-like proximal point algorithm in domains of positivity
Introduction
In several situations, researchers need to estimate solutions for optimization problems on manifolds of non-Euclidean type. This practice brings together other relevant aspects, such as the development of new optimization techniques and the extension of classical methods. This work introduces an optimization method that is both a theoretical improvement and extension to domains of positivity of the proximal point algorithm presented by Gregório and Oliveira [12].
The methodology in [12] is based on the Schur theorem for symmetric matrices and the spectral theorem for positive definite matrices. See [13] for further details on the Schur and spectral theorems for Hermitian (symmetric) positive definite matrices.
Moreover, the method belongs to the class of decomposition algorithms. Points from the proximal trajectory generated by the method are computed by two iterative steps, videlicet, solving recurrently in a iterative way an auxiliary Riemannian subproblem defined on the diagonal positive definite matrices set to update the spectral step, after applying the Schur decomposition to update the orthogonal step.
In the current work, the orthogonal step is also assumed to be any (local) solution of an auxiliary Riemannian subproblem now defined on the orthogonal group, taking into account its Riemannian structure as discussed in [8]. The stopping criteria for the inner iterations, which were not stated in [12], were made explicit for this way of defining the orthogonal step; this also enabled the extension of the technique to homogeneous domains of positivity, that is, it is nonlinearly reducible in a sense that is also introduced here at the end of Section 2.
Synthetically, the aims of the current work can be summarized as improving theoretically the proximal point method in [12] through the analysis and synthesis of stopping criteria for the inner iterations, owing to the introduction of the orthogonal step described above and discussed in Section 3; extending the technique to compute minimizers of convex functions now defined in homogeneous domains of positivity of nonlinearly reducible type; and exemplifying the practical feasibility of its applications to real problems, casting a glance over (weighted) median and mean computation of symmetric positive definite matrices.
Notice that computation of (weighted) Lp centers of mass, especially in the (weighted) median and mean case ( and respectively), in manifolds of symmetric positive definite matrices has been the aim of several researchers, including [9], [10], [18], and [28]. In almost all work on this subject, researchers have applied classical smooth optimization techniques to compute the (weighted) median and mean, generally faster than our method, as seen in the computational behavior reference to precise CPU time in Section 5.2.5.
However, there has been little research on techniques that fit themselves to the nonsmooth case in Riemannian manifolds. See [6], [7], [12], and [20] for further details on nonsmooth techniques in the particular case of the Hadamard manifolds. Despite the absence of such computational experiments, the Riemannian median (optimization) problem is naturally nonsmooth in points of the data distribution for which one wishes to estimate it. Specifically, in this case, the (weighted) median cannot be computed in a smooth way, mainly when the solution is one of these points. Therefore, the advantage of our technique, as well as the method proposed in [12], resides in avoiding the assumption of smoothness. It should be noted here that the term nonsmooth encompasses the smooth case. Therefore, our method can also be applied to smooth cases without restrictions, as in the Riemannian mean problem (for instance), although it is naturally characterized as a slow method, because an optimization problem with the same complexity as the original problem is solved in each iteration.
In the remainder of Section 1, we present a narrative on different algorithms for solving optimization problems and applications related to this field. Section 2 introduces a set of preliminary concepts, highlighting the two main examples of domains of positivity. Section 3 establishes the problem and the algorithm in its exact version. The global convergence criteria are fulfilled in this case, taking into account the general result of convergence in Ferreira and Oliveira [7]; this is also discussed at the end of the section. Section 4 presents the inexact version. Although the convergence results in [12] remain valid in this case, we return to them here. Another result preserved from [12] is the weak convergence of the method when solutions belong to the boundary of reducible domains of positivity. Section 5 presents theoretical aspects that enable the application of the method to computation of the weighted Lp centers of mass in the manifolds of symmetric positive definite matrices, and Section 5.2.5 shows the results of computational experiments with our method. Section 6 ends this work with conclusions and suggestions for future investigations.
Synthesizing extensions of optimization methods to Riemannian manifolds can be as complicated as the establishment of new algorithms. Generally, an extension is not easily obtained, because its analysis of convergence does not derive from straightforward arguments. The same happens with classical optimization methods in Euclidean spaces, but this only demands concepts of Euclidean geometry. In the Riemannian case, aspects of the topology of the differentiable manifolds in the study and concepts of Riemannian geometry merge themselves in a relevant field of knowledge, as seen in Ferreira and Oliveira [6] and Smith [24], for instance.
Despite being introduced relatively recently in the history of optimization, some techniques have already been extended to general cases. Indeed, the replacement of the Riemannian distance by Bregman distance in the definition of the proximal point iteration in [7] enabled Quiroz and Oliveira [20] to develop a proximal point method for generalized convex optimization problems, on noncompact Hadamard manifolds.
On the other hand, they also enable the development of specific methods of optimization in particular manifolds. In fact, the employment of the same algorithm in [7] to estimate minima for convex functions in the manifolds of n-by-n symmetric positive definite matrices, denoted here by enabled [12] to synthesize a proximal point method based on the spectral theorem and the Schur decomposition for symmetric positive definite matrices.
A key concept underlying the development of algorithms in optimization is convexity in its Euclidean or Riemannian form. In this paper, we restrict the applicability of our work to Riemannian convexity only. Let be the real Euclidean space with dimension n. A subset C of a complete Riemannian manifold is called convex if it contains all minimal geodesic segments connecting any pair of points belonging to C. Besides, a function is said to be convex iffor every x, y ∈ C and t ∈ [0, 1], where γxy is any minimal geodesic segment in C satisfying and . Moreover, f is said to be strictly convex if the inequality above is strict, for every t ∈ (0, 1). See Udriste [26] for further details on convexity in Riemannian manifolds.
Subdifferentials of functions also have an important role in nonsmooth and convex optimization in Riemannian manifolds. According to [26], the subdifferential of a function at x ∈ C, denoted by ∂f(x), is the set defined bywhere is the tangent vector at x to any minimal geodesic segment that connects x to y in C. Namely, ⟨, ⟩x represents the Riemannian metric on the tangent space of at x, denoted by .
The subdifferential of a convex function is nonempty at any point in its domain, as also seen in [26]. Moreover, any minimizer x* of a convex function complies withAny element of ∂f(x) is called a subgradient to f at x. If f is differentiable, then where gradxf represents the Riemannian gradient of f, at x. Namely, gradxf is the vector field metrically equal to ∇f(x), i.e., for every . In addition, if G is the symmetric positive definite bilinear form for which for every and then
The technique discussed in this paper belongs to the class of proximal point methods. That class was introduced by Martinet [17] and extended to monotone operators by Rockafellar [21]. The technique presented in [7] is also an extension of that class of methods to Hadamard manifolds. The algorithm presented in [12] is a particular case of the method in [7], where . Based on the Schur factorization for symmetric positive definite matrices, it computes points from the proximal trajectory generated by the method in [7] by making descent steps in the spectrum of the matrix iterated for the function to be minimized, without losing the positive definiteness, and afterwards by applying the Schur factorization to update the matrix with respect to the orthogonal step.
However, the orthogonal step only updates the iterated matrix. It is also not a descent step. In other words, the value of the function to be minimized remains unchanged under this step. Moreover, any information on that function is used with respect to the natural geometry of the orthogonal group. This was extensively explored by Fiori [8], for instance. Besides, Schur factorization has a high computational cost when n is large, making the method slow.
The increment in the number of methods related to optimization in Riemannian manifolds is justified by the increase of mathematical models for real problems based on natural geometries. In particular, studies show that domains of positivity are manifolds of the Hadamard type for a special choice of Riemannian metric, as presented in [22]. For instance, the Riemannian structure for is derived from it, despite other approaches leading to the same structure, as seen in [3] and [19].
Signal processing and computer vision are examples of applications of aggregate models based on the concepts of symmetry and positive definiteness. Diffusion tensor magnetic resonance imaging (DT-MRI) has been acknowledged as one of the most interesting applications of these concepts. Tridimensional images in DT-MRI are numerically represented by a matrix with the property that each 3-by-3 submatrix, named voxel, is symmetric and positive definite. The geometric representation of a voxel is an ellipsoid whose semi-axes are given by the eigenvectors of the corresponding 3-by-3 submatrix and whose amplitudes are given by the corresponding eigenvalues as seen in [27].
According to [28], image filtering is often required in DT-MRI, because noise is known to occur in the synthesis of imaging data by magnetic resonance machines. Noise is generated by external interference signs that generate deficient voxels, namely asymmetric or indefinite 3-by-3 matrices. Filtering improves the imaging visualization by minimizing the variance of the imaging data from the noisy region. Image filtering is done by applying the weighted mean filter, when noisy voxels belong to the interior of the image, or the weighted median filter, when noisy voxels belong to the edge of the image [28].
Filtering can be classified as either directional or tensorial. The former essentially uses imaging data based on numerical amounts related to the image, while the latter uses the Riemannian structure to . Nevertheless, according to recent studies, Riemannian filtering has proven to be more suitable than Euclidean filtering, because it smooths the handling of data and naturally preserves symmetry and positive definiteness [5], [9], [10], [28].
A problem that generalizes the computation of the Riemannian weighted mean and median is the computation of the Riemannian weighted Lp center of mass, for 1 ≤ p < ∞, because it is defined as a convex optimization problem. According to [1], the problem of computation of the Riemannian weighted Lp center of mass is well-posed in its discrete form on . The Riemannian weighted median and Riemannian weighted mean occur as particular cases of the Riemannian weighted Lp center of mass when and respectively.
There are several other applications involving symmetry and positive definiteness, as seen in [9]. Among these, we note human detection via classification, as discussed in [25], which also involves covariance of symmetric positive definite matrices in .
Another matrix manifold with properties similar to is the set of n-by-n Hermitian positive definite matrices, denoted here by . The Lp centers of mass are also easily introduced in taking into account the same general Riemannian metric to homogeneous domains of positivity as in [22].
Section snippets
Basic concepts and examples
Here, several concepts on domains of positivity are presented, all of which are based on [22]. Let D be a nonempty open set in and be a nonsingular symmetric bilinear form. Definition 1 D is called the domain of positivity, with respect to σ, if the following axioms are guaranteed: σ(a1, a2) > 0, for every a1, a2 ∈ D; if is such that for every where is the closure of D, then a ∈ D.
Namely, σ is called the characteristic of D.
Example 1
Let denote the trace function
General aspects and algorithm
Let D and be a nonlinearly reducible domain of positivity and a proper and convex function on D, respectively. We start this section by assuming, as a hypothesis, the extension of the continuity of f in D to the boundary of D (∂D), because D is a totally convex open set and f is convex in D. See Theorem 3.6 in [26] for further details. In fact, we admit that
- ()
for every and {ak} ⊂ D such that .
Our main goal is to establish an interior proximal point
General aspects and algorithm
In this section, we present the inexact version of the TPP algorithm. Analogous to assumption (16) in [12], we admit that is not determined exactly. In practice, this means a slight relaxation of (7). Indeed, we assume thatfor any ϵk ≥ 0, where denotes the ϵ-subdifferential of f, at for any ϵ ≥ 0, i.e.,
Relation (15) enables us to use finite stopping criteria for each iteration of the proximal
General aspects
In this section, all the statements refer to as a nonlinearly reducible domain of positivity. We denote by the vector space of symmetric matrices. Recall that the Schur decomposition establishes for all the existence of an orthogonal matrix Q and a real diagonal matrix Λ, with λ11 ≥ ⋅⋅⋅ ≥ λnn, such that where λιι is the ιth eigenvalue of W denoted by λι(W). Note that if are such that and then they commute, with respect to the product of
Conclusions
Studies on domains of positivity and their properties have enabled us to synthesize our initial goals, which were the extension and improvement of a factorized-like proximal point method that was proposed to compute the minima of Riemannian convex functions in the manifold of symmetric positive definite matrices. Moreover, these studies enabled us to take a critical view of the theory on domains of positivity, besides increasing our understanding of several applications. Indeed, despite the
Acknowledgments
This work was derived from the research project Proximal point algorithm with Schur factorizations in domains of positivity. Part of this work appeared as a technical report at http://www.optimization-online.org/DB_HTML/2013/05/3885.html, under the title A nonsmooth optimization technique in domains of positivity, and it encompasses results from another technical report entitled A proximal technique for computing the Karcher mean of symmetric positive definite matrices, available at //www.optimization-online.org/DB_HTML/2013/05/3885.html
References (28)
- et al.
Conjugate gradient algorithm for optimization under unitary matrix constraint
Signal Processi.
(2009) - et al.
Riemannian geometry for the statistical analysis on diffusion tensor data
Signal Process.
(2007) - et al.
Proximal point algorithm with Schur decomposition on the cone of symmetric semidefinite positive matrices
J. Math. Anal. Appl.
(2009) - et al.
New Riemannian techniques for directional and tensorial image data
Pattern Recognit.
(2010) Riemannian lp center of mass: existence, uniqueness and convexity
Proc. Am. Math. Soc.
(2010)Positive Definite Matrices
(2007)- et al.
Manifolds of negative curvature
Trans. Amer. Math. Soc.
(1969) - et al.
A riemannian approach to anisotropic filtering of tensor fields
Signal Process.
(2007) - et al.
Subgradient algorithm on Riemannian manifolds
Optim. Theory Appl.
(1998) - et al.
Proximal point algorithm in Riemannian manifolds
Optimization
(2002)
Quasi-geodesic neural learning algorithms over the orthogonal group: a tutorial
J. Mach. Learn. Res.
Learning the fréchet mean over the manifold of symmetric positive-definite matrices
J. Cogn. Comput.
Matrix Computations
Matrix Analysis
Cited by (3)
L<sup>α</sup> Riemannian weighted centers of mass applied to compose an image filter to diffusion tensor imaging
2021, Applied Mathematics and ComputationCitation Excerpt :For circumventing this problem, we propose to apply the proximal point algorithm on nonlinear reductible domains of positivity proposed in [9] for computing minimum of Riemannian convex functions on the corresponding Riemannian metric to that class of manifolds. The methodology proposed in [9] is a constraint of the algorithm in [28] to nonlinearly reducible homogeneous domains of positivity that represent a particular class of Hadamard manifolds. It consists in replacing the iteration (20) by two new iterations, working as a decomposition method.