First-Order Methods for Nonnegative Trigonometric Matrix Polynomials

Cederberg, Daniel

doi:10.1007/s10957-024-02581-5

First-Order Methods for Nonnegative Trigonometric Matrix Polynomials

Published: 22 January 2025

Volume 204, article number 32, (2025)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Daniel Cederberg ORCID: orcid.org/0009-0008-8349-6481¹

115 Accesses
Explore all metrics

Abstract

Optimization problems over the cone of nonnegative trigonometric matrix polynomials are common in signal processing. Traditionally, those problems are formulated and solved as semidefinite programs (SDPs). However, the SDP formulation increases the dimension of the problem, resulting in large problems that are challenging to solve. In this paper we propose first-order methods that circumvent the SDP formulation and instead optimize directly within the space of trigonometric matrix polynomials. Our methods are based on a particular Bregman proximal operator. We apply our approach to two fundamental signal processing applications: to rectify a power spectrum that fails to be nonnegative and for graphical modeling of Gaussian time series. Numerical experiments demonstrate that our methods are orders of magnitude faster than an interior-point solver applied to the corresponding SDPs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lp quasi-norm minimization: algorithm and applications

Article Open access 07 February 2024

Nonmonotone Quasi–Newton-based conjugate gradient methods with application to signal processing

Article 15 December 2022

Nonlinear Spectral Analysis via One-Homogeneous Functionals: Overview and Future Prospects

Article 07 June 2016

Data availability

The data used to reproduce our results is available at https://github.com/dance858/BregmanProx.

References

Alkire, B., Vandenberghe, L.: Convex optimization problems involving finite autocorrelation sequences. Math. Program. 93(3), 331–359 (2002)
Article MathSciNet MATH Google Scholar
Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16(3), 697–725 (2006)
Article MathSciNet MATH Google Scholar
Basseville, M.: Distance measures for signal processing and pattern recognition. Signal Process. 18(4), 349–369 (1989)
Article MathSciNet MATH Google Scholar
Beck, A.: First-Order Methods in Optimization. Soc. Ind. Appl. Math, Philadelphia, PA (2017)
Book MATH Google Scholar
Benner, P., Mehrmann, V., Sima, V., Van Huffel, S., Varga, A.: SLICOT—A Subroutine Library in Systems and Control Theory. In: Biswa Nath, D. (ed.) Appl. and Comput. Control, Signals, and Circuits: Volume 1, pages 499–539. Birkhäuser Boston (1999)
Bertsekas, D.: Nonlinear Programming. Athena Scientific (2016)
Blomqvist, A., Lindquist, A., Nagamune, R.: Matrix-valued Nevanlinna-Pick interpolation with complexity constraint: an optimization approach. IEEE Trans. Autom. Control 48(12), 2172–2190 (2003)
Article MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press (2004)
Brillinger, D.: Remarks concerning graphical models for time series and point processes. Revista de Econometria 16, 1–23 (1996)
MATH Google Scholar
Brillinger, David R.: Time Series. Soc. Ind. Appl. Math. (2001)
Burg, J. P.: Maximum entropy spectral analysis. PhD thesis, Stanford University (1975)
Chambolle, A., Contreras, J.: Accelerated Bregman primal-dual methods applied to optimal transport and Wasserstein barycenter problems. SIAM J. Math. Data Sci. 4(4), 1369–1395 (2022)
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imag. Vis. 40(1), 120–145 (2011)
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program Ser. A 159(1–2), 253–287 (2016)
Article MathSciNet MATH Google Scholar
Chao, H., Vandenberghe, L.: Entropic proximal operators for nonnegative trigonometric polynomials. IEEE Trans. Signal Process. 66(18), 4826–4838 (2018)
Article MathSciNet MATH Google Scholar
Delsarte, P., Genin, Y., Kamp, Y.: Orthogonal polynomial matrices on the unit circle. IEEE Trans. Circuits Syst. 25, 149–160 (1978)
Article MathSciNet MATH Google Scholar
Dumitrescu, B., Tabus, I., Stoica, P.: On the parameterization of positive real sequences and MA parameter estimation. IEEE Trans. Signal Process. 49(11), 2630–2639 (2001)
Article MathSciNet MATH Google Scholar
Ferrante, A., Masiero, C., Pavon, M.: Time and spectral domain relative entropy: A new approach to multivariate spectral estimation. IEEE Trans. Autom. Control 57(10), 2561–2575 (2012)
Article MathSciNet MATH Google Scholar
Gallivan, K., Thirumalai, S., Van Dooren, P., Vermaut, V.: High performance algorithms for Toeplitz and block Toeplitz matrices. Linear Algebra Appl. 241–243, 343–388 (1996)
Article MathSciNet MATH Google Scholar
Genin, Y., Hachez, Y., Nesterov, Y., Van Dooren, P.: Optimization problems over positive pseudopolynomial matrices. SIAM J. Matrix Anal. Appl. 25(1), 57–79 (2003)
Article MathSciNet MATH Google Scholar
Grant, M., Boyd, S.: CVX: Matlab Software for Disciplined Convex Programming, Version 2.1 (2014). http://cvxr.com/cvx
Horn, R., Johnson, C.: Matrix Analysis. Cambridge University Press (2013)
Jiang, X., Vandenberghe, L.: Bregman primal-dual first-order method and application to sparse semidefinite programming. Comput. Optim. Appl. 81(1), 127–159 (2022)
Article MathSciNet MATH Google Scholar
Jiang, X., Vandenberghe, L.: Bregman three-operator splitting methods. J. Optim. Theory Appl. 196, 936–972 (2023)
Article MathSciNet MATH Google Scholar
Kailath, T., Sayed, A., Hassibi, B.: Linear Estimation. Prentice Hall (2000)
Kressner, D., Van Dooren, P.: Factorizations and linear system solvers for matrices with Toeplitz structure (2001)
Kuceyeski, A., Jamison, K., Owen, J., Raj, A., Mukherjee, P.: Functional rerouting via the structural connectome is associated with better recovery after mild TBI. bioRxiv (2018)
Larsson, C., Annergren, M., Hjalmarsson, H.: On optimal input design in system identification for model predictive control. In: 50nd IEEE Conference Decision Control, pp. 805–810 (2011)
Larsson, C., Rojas, C., Hjalmarsson, H.: MPC oriented experiment design. IFAC Proc. Volumes 44(1), 9966–9971 (2011)
Article MATH Google Scholar
Lemon, A., So, A., Ye, Y.: Low-rank semidefinite programming: theory and applications. Found. Trends Optim. 2(1–2), 1–156 (2016)
MATH Google Scholar
Lindquist, A., Masiero, C., Picci, G.: On the multivariate circulant rational covariance extension problem. In: 52nd IEEE Conference on Decision and Control (2013)
Lindquist, A., Picci, G.: Linear Stochastic Systems: A Geometric Approach to Modeling. Springer, Estimation and Identification (2015)
Book MATH Google Scholar
McLean, J., Woerdeman, H.: Spectral factorizations and sums of squares representations via semidefinite programming. SIAM J. Matrix Anal. Appl. 23(3), 646–655 (2002)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Squared functional systems and optimization problems. In: Hans, F., Kees, R., Tamás, T., Shuzhong, Z. (eds.) High Performance Optimization, pp. 405–440. Springer US, Boston, MA (2000)
O’Donoghue, B., Chu, E., Parikh, N., Boyd, S.: Conic optimization via operator splitting and homogeneous self-dual embedding. J. Optim. Theory Appl. 169(3), 1042–1068 (2016)
Article MathSciNet MATH Google Scholar
Rissanen, J.: Algorithms for triangular decomposition of block Hankel and Toeplitz matrices with application to factoring positive matrix polynomials. Math. Comput. 27(121), 147–154 (1973)
Article MathSciNet MATH Google Scholar
Songsiri, J., Dahl, J., Vandenberghe, L.: Graphical models of autoregressive processes. In: D. P. Palomar and Y. C. Eldar, editors, Convex Optimization in Signal Processing and Communications, pp. 89-116. Cambridge University Press (2009)
Songsiri, J., Vandenberghe, L.: Topology selection in graphical models of autoregressive processes. J. Mach. Learn. Res. 11(91), 2671–2705 (2010)
MathSciNet MATH Google Scholar
Steffens, C., Pesavento, M.: Block- and rank-sparse recovery for direction finding in partly calibrated arrays. IEEE Trans. Signal Process. 66(2), 384–399 (2018)
Article MathSciNet MATH Google Scholar
Stoica, A., Moses, R., Stoica, P.: Enforcing positiveness on estimated spectral densities. Electron. Lett. 29, 2009–2011 (1993)
Article MATH Google Scholar
Stoica, P., McKelvey, T., Mari, J.: MA estimation in polynomial time. IEEE Trans. Signal Process. 48(7), 1999–2012 (2000)
Article MATH Google Scholar
Sun, Y., Li, Y., Kuceyeski, A., Basu, S.: Large spectral density matrix estimation by thresholding. arXiv preprint (2018). arXiv:1812.00532
Tseng, P.: On Accelerated Proximal Gradient Methods for Convex-Concave Optimization (2008)
Tütüncü, R., Toh, K., Todd, M.: Solving semidefinite-quadratic-linear programs using SDPT3. Math. Program. 95, 189–217 (2003)
Article MathSciNet MATH Google Scholar
Vandenberghe, L., Balakrishnan, V.R., Wallin, R., Hansson, A., Roh, T.: Interior-point algorithms for semidefinite programming problems derived from the KYP lemma. In: Henrion, D., Garulli, A. (eds.) Positive Polynomials in Control, pp. 195–238. Springer, Berlin Heidelberg, Berlin, Heidelberg (2005)
Chapter MATH Google Scholar
von Sachs, R.: Nonparametric spectral analysis of multivariate time series. Annu. Rev. Statist. and Appl. 7, 361–386 (2020)
Article MathSciNet MATH Google Scholar
He, Y., Xia, M., Wang, J.: BrainNet Viewer: a network visualization tool for human brain connectomics. PLoS ONE 8, e68910 (2013)
Article Google Scholar
Zorzi, M., Sepulchre, R.: AR identification of latent-variable graphical models. IEEE Trans. Autom. Control 61(9), 2327–2340 (2016)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We thank Lieven Vandenberghe for pointing us to the application in graphical models, and Alan Yang and Tomas Lundberg for feedback on a draft of the paper. We also thank Amy Kuceyeski for sharing the MRI data, and Yige Li for exceptional help when setting up the MRI data. Finally, we thank two reviewers for excellent suggestions that improved the manuscript.

Author information

Authors and Affiliations

Stanford University, Stanford, CA, 94305, USA
Daniel Cederberg

Authors

Daniel Cederberg
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Daniel Cederberg.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Alper Yildirim.

Appendix

In this appendix we present proofs that have been omitted from the main body of the paper.

1.1 Proof of Theorem 1

To prove Theorem 3.1 we need the following result [17, Theorem 6].

Lemma A.1

Let $\mathcal {W} \in \mathbf{int \ }K^{*}$. Consider the matrix polynomial $A:{\text{ C }}\rightarrow {\text{ C }}^{m \times m}$ given by $A(z) = \Pi _p(z)^T \mathcal {T}(\mathcal {W})^{-1} E$ where $E = \Pi _p(0)$. Then $\det A(z) \ne 0$ for $|z| \le 1.$

Proof (Theorem 3.1) Problem (7) is equivalent to

$$\begin{aligned} \begin{array}{ll} \text{ maximize } & \text{ log } \text{ det } V \\ \text{ subject } \text{ to } & E^T Y E = V \\ & D(Y) = \mathcal {X}^T, \hspace{0.5cm} Y \succeq 0, \end{array} \end{aligned}$$

(28)

with variables $Y \in {\text{ S }}^{(p+1)m}$ and $V \in {\text{ S }}^m.$ Introduce Lagrange multipliers $U \in {\text{ S }}^m, \, \mathcal {W} \in \textbf{V}$ and $Z \in {\text{ S }}^{(p+1)m}.$ A dual is

$$\begin{aligned} \begin{array}{ll} \text{ minimize } & \langle \mathcal {X}^T, \mathcal {W} \rangle - \text{ log } \text{ det } U - m \\ \text{ subject } \text{ to } & Z + EUE^T - \mathcal {T}(\mathcal {W}) = 0 \\ & Z \succeq 0, \end{array} \end{aligned}$$

(29)

with variables $U \in {\text{ S }}^m, \, \mathcal {W} \in \textbf{V}$ and $Z \in {\text{ S }}^{(p+1)m}.$ Eliminating Z shows that a suitable dual is

$$\begin{aligned} \begin{array}{ll} \text{ minimize } & \langle \mathcal {X}^T, \mathcal {W} \rangle - \text{ log } \text{ det } U - m \\ \text{ subject } \text{ to } & \mathcal {T}(\mathcal {W}) \succeq E U E^T. \end{array} \end{aligned}$$

(30)

Since $\mathcal {X} \in \mathbf{int \ }\mathcal {K}$ there exists $Y \succ 0$ such that $D(Y) = \mathcal {X}^T.$ Hence, the primal (28) is strictly feasible. From Slater’s constraint qualification for generalized inequalities [9, page 265] it follows that strong duality holds and the optimum of problem (30) is attained.

Problem (29) is strictly feasible (choose e.g. $ U = I, \, \mathcal {W} = (2I, 0, \dots , 0))$ and $Z = \textbf{blkdiag}(I, 2I, \dots , 2I)).$ Therefore, by Slater’s constraint qualification the dual of (29) admits a solution. The dual of (29) is equivalent to (28) (this can be verified by explicitly deriving a dual of (29), and it also follows from the observation that the objective function in (28) is proper, closed and convex, so the dual of the dual is equivalent to the original problem). Hence, we conclude that problem (28) admits a solution.

Since strong duality holds, if $\hat{Y}$ is a solution of (28) and $(\hat{U}, \hat{\mathcal {W}})$ is a solution of (30), then there is no duality gap and complementary slackness is satisfied:

$$\begin{aligned} \hat{Y}( \mathcal {T}(\hat{\mathcal {W}}) - E \hat{U} E^T) = 0. \end{aligned}$$

(31)

Since $\hat{U} \succ 0$ it follows from the block Toeplitz structure that $T(\hat{\mathcal {W}}) \succ 0$ (see, for example, [38, eq. (1.34)]). Hence, (31) is equivalent to

$$\begin{aligned} \hat{Y} = \hat{Y}E \hat{U} E^T \mathcal {T}(\hat{\mathcal {W}})^{-1}. \end{aligned}$$

(32)

Complementary slackness can also be formulated as $(\mathcal {T}(\hat{\mathcal {W}}) - E \hat{U}E^T) \hat{Y} = 0,$ which is equivalent to

$$\begin{aligned} \hat{Y} = \mathcal {T}(\hat{\mathcal {W}})^{-1} E \hat{U}E^T \hat{Y}. \end{aligned}$$

(33)

Putting together (32) and (33) shows that any primal solution $\hat{Y}$ must satisfy

$$\begin{aligned} \hat{Y}&= \mathcal {T}(\hat{\mathcal {W}})^{-1} E \hat{U} E^T \hat{Y}E \hat{U} E^T \mathcal {T}(\hat{\mathcal {W}})^{-1} \\ &= \mathcal {T}(\hat{\mathcal {W}})^{-1} \begin{bmatrix} \hat{U} \hat{Y}_{00} \hat{U} & 0 & \dots & 0\\ 0 & 0 & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \dots & 0 & 0 \end{bmatrix} \mathcal {T}(\hat{\mathcal {W}})^{-1} \end{aligned}$$

where $(\hat{U}, \hat{\mathcal {W}})$ is any dual solution. Hence, any primal solution $\hat{Y}$ satisfies $\textbf{rank}\hat{Y} = \textbf{rank}(\hat{U} \hat{Y}_{00} \hat{U}) = \textbf{rank}\hat{Y}_{00}$. We now argue that $\textbf{rank}\hat{Y}_{00} = m$. Assume $\textbf{rank}\hat{Y}_{00} < m$. Let $\varvec{B} = \mathcal {T}(\hat{\mathcal {W}})^{-1} E \hat{U} (\hat{Y}_{00})^{1/2}.$ Then $\hat{Y} = \varvec{B} \varvec{B}^T$ so $B(z) = \varvec{B}^T \Pi _p(z)$ is a spectral factor of $F_\mathcal {X}(z).$ From the assumption $\textbf{rank}\hat{Y}_{00} < m$ it follows that $\det B(z) = 0$. This implies $\det F_\mathcal {X}(z) = |\det B(z)|^2 = 0$ which contradicts the assumption that $\mathcal {X} \in \mathbf{int \ }\mathcal {K}$.

We have now shown that any solution $\hat{Y}$ has rank m. Also recall that we have shown that at least one solution exists. By applying results from the literature on low-rank semidefinite programming (see, for example, [31, Corollary 2.5]) we conclude that the solution is unique. (Corollary 2.5 in [31] applies to SDPs with a linear objective function. However, we can still apply the result by noting that the objective function in (7) can be replaced by the linear objective function $\textbf{Tr}(Y_{00})$ without changing the set of optimal solutions.)

Denote the unique solution by $Y^\star $. Note that $Y^\star $ satisfies

$$\begin{aligned} Y^{\star } = \mathcal {T}(\hat{\mathcal {W}})^{-1} E \hat{U} Y^\star _{00} \hat{U} E^T \mathcal {T}(\hat{\mathcal {W}})^{-1}. \end{aligned}$$

(34)

Let $\varvec{B} \in {\text{ R }}^{(p+1) \times m}$ be any matrix satisfying $\det (B_0) \ge 0$ and $Y^\star = \varvec{B} \varvec{B}^T$. Then $B(z) = \varvec{B}^T \Pi _p(z)$ is a spectral factor of $F_\mathcal {X}$, since $D(\varvec{B} \varvec{B}^T) = \mathcal {X}^T$. It must hold that $\det (B_0) > 0$, since $Y_0^\star = B_0^T B_0$ and $Y_0^\star \succ 0$. Multiplying both sides of (34) with $\Pi _p(z)^H$ from the left and $\Pi _p(z)$ from the right shows that

$$\begin{aligned} \begin{aligned}&\Pi _p(z)^H \varvec{B} \varvec{B}^T \Pi _p(z) = \Pi _p(z)^H \mathcal {T}(\mathcal {W})^{-1} E \hat{U} B_0^T B_0 \hat{U} E^T \mathcal {T}(\mathcal {W})^{-1} \Pi _p(z), \end{aligned} \end{aligned}$$

implying the identity $| \det B(z) | = | \det (B_0 \hat{U} E^T \mathcal {T}(\mathcal {W})^{-1} \Pi _p(z)) |.$ From Lemma (A.1) it follows that $\det (B_0 \hat{U} E^T \mathcal {T}(\mathcal {W})^{-1} \Pi _p(z)) \ne 0$ for $|z| \le 1$, implying that B(z) is a stable spectral factor. $\square $

1.2 Proof of Theorem 3

The minimization problem in (9) is equivalent to

$$\begin{aligned} \begin{array}{ll} \text{ minimize } & - \text{ log } \text{ det } V \\ \text{ subject } \text{ to } & \Pi _p(0)^T Y \Pi _p(0) = V, \\ & D(Y) = \mathcal {X}^T, \quad Y \succeq 0, \end{array} \end{aligned}$$

with variables $Y \in {\text{ S }}^{(p+1)m}$ and $V \in {\text{ S }}^m.$ A dual is

$$\begin{aligned} \begin{array}{ll} \text{ maximize } & \text{ log } \text{ det } U -\langle \mathcal {X}^T, \mathcal {W} \rangle + m \\ \text{ subject } \text{ to } & \mathcal {T}(\mathcal {W}) \succeq \Pi _p(0)U\Pi _p(0)^T, \end{array} \end{aligned}$$

with variables $U \in {\text{ S }}^m$ and $\mathcal {W} \in \textbf{V}$. As argued in the proof of Theorem 1, strong duality holds, a dual solution $(U^\star , \mathcal {W}^\star )$ exists and $(\mathcal {T}(\mathcal {W}^\star ) - \Pi _p(0) U^\star \Pi _p(0)^T) Y^\star = 0.$ Furthermore, primal feasibility states that $Y_{00}^\star = V^\star ,$ and from stationarity of the Lagrangian we also have $U^\star = (V^\star )^{-1} = (Y_{00}^\star )^{-1}.$ Now let $Y^\star = \varvec{B}\varvec{B}^T.$ Then $U^\star = (B_0^T B_0)^{-1}$ and

$$\begin{aligned} (\mathcal {T}(\mathcal {W}^\star ) - \Pi _p(0) (B_0^T B_0)^{-1} \Pi _p(0)^T)\varvec{B}\varvec{B}^T = 0. \end{aligned}$$

Since $\varvec{B}$ has full column rank it must hold that $ \mathcal {T}(\mathcal {W}^\star ) \varvec{B} - \Pi _p(0)(B_0^T B_0)^{-1} B_0^T = 0.$ Equivalently,

$$\begin{aligned} \mathcal {T}(\mathcal {W}^\star ) \varvec{B}B_0(B_0^T B_0)^{-1} = \Pi _p(0)(B_0^T B_0)^{-1}. \end{aligned}$$

These are the Yule-Walker equations for the autoregressive process $x(t) = - \sum _{k=1}^p A_k x(t-k) + w(t)$, where $x(t) \in {\text{ R }}^m$ and $w(t) \sim N(0, (B_0^T B_0)^{-1})$ is Gaussian and $A_k = B_k^T B_0 (B_0^T B_0)^{-1}$ (see, for example, [38, eq. (1.9)]). In this interpretation, $\mathcal {W}^\star $ contains the first $p + 1$ elements of the covariance sequence of the autoregressive process. Since the covariance sequence is unique it follows that $\mathcal {W}^\star $ is unique. With the uniqueness of $\mathcal {W}^\star $ established, it follows from duality theory (see e.g. [6, Proposition 4.3.3]) that $\nabla \phi (\mathcal {X}^T) = - \mathcal {W}^\star $. Expression (12) is a consequence of symmetry properties of the gradient.

1.3 Proof of Lemma 4

Since $\mathcal {T}(\mathcal {W})$ is a positive definite block Toeplitz matrix of size $m(p + 1) \times m(p+1)$ with blocks of size $m \times m$, it can be decomposed as $V \mathcal {T}(\mathcal {W}) V^T = D$, where V is an upper triangular matrix with identity matrices on the diagonal and $D = \textbf{blkdiag}(D_p, \dots , D_0)$ with $0 \prec D_p \preceq \dots \preceq D_0$ [37]. Since V is nonsingular, $\mathcal {T}(\mathcal {W}) \succeq \Pi _p(0)U\Pi _p(0)^T$ if and only if $V \mathcal {T}(\mathcal {W}) V^T \succeq V\Pi _p(0)U\Pi _p(0)^T V^T$ [23, Theorem 7.7.2], i.e. if and only if

$$\begin{aligned} \textbf{blkdiag}(D_p, D_{p-1} \dots , D_0) \succeq \textbf{blkdiag}(U, 0, \dots , 0). \end{aligned}$$

Since log-determinant is a monotone function in the sense that $A \succeq B$ implies $ \text{ log } \text{ det } A \ge \text{ log } \text{ det } B,$ one optimal choice of U given $\mathcal {W}$ is $U = D_p.$ The statement that $D_p = (\Pi _p(0)^T \mathcal {T}(\mathcal {W})^{-1} \Pi _p(0))^{-1}$ follows from studying the first block-column of the equation $U \mathcal {T}(\mathcal {W}) U^T = \textbf{blkdiag}(D_p, \dots , D_0)$.

1.4 Derivation Multivariate Itakura-Saito Distance

Here the Bregman divergence associated with the negative entropy defined in (8) is derived. We start by evaluating $\nabla \phi (\mathcal {Y}) \in \textbf{V}$ for $\mathcal {Y} \in \mathbf{int \ }K$. We have

$$\begin{aligned} \nabla \phi (\mathcal {Y}) = \bigg ( \frac{\partial \phi (\mathcal {Y})}{\partial Y_0}, \dots , \frac{\partial \phi (\mathcal {Y})}{\partial Y_p} \bigg ), \end{aligned}$$

where $\frac{\partial \phi (\mathcal {Y})}{\partial Y_0}$ is the gradient of the function

$$\begin{aligned} Y_0 \mapsto - \frac{1}{2\pi } \int _0^{2 \pi } \text{ log } \text{ det } \big (Y_0 + \sum _{k=1}^p (Y_k e^{j \omega k} + Y_k^T e^{-j \omega k}) \big ), \end{aligned}$$

where we view $Y_1, \dots , Y_k$ as fixed. Here the gradient is taken in ${\text{ S }}^m$ with respect to the inner product $\langle A, B \rangle = \textbf{Tr}(AB), \, A, B \in {\text{ S }}^m$. Furthermore, for $k = 1, \dots , p$, $\frac{\partial \phi (\mathcal {Y})}{\partial Y_k}$ is the gradient of the function

$$\begin{aligned} Y_k \mapsto - \frac{1}{2\pi } \int _0^{2 \pi } \text{ log } \text{ det } \big (Y_0 + \sum _{k=1}^p (Y_k e^{j \omega k} + Y_k^T e^{-j \omega k}) \big ), \end{aligned}$$

where we view $Y_i$ as fixed for $i \ne k$. Here the gradient is taken in ${\text{ R }}^{m \times m}$ with respect to the inner product $\langle A, B \rangle = 2 \textbf{Tr}(A^T B), \, A, B \in {\text{ R }}^{m \times m}.$ Using standard chain rules one can show that

$$\begin{aligned} \frac{\partial \phi (\mathcal {Y})}{\partial Y_0} = -\frac{1}{2\pi } \int _0^{2\pi } F_\mathcal {Y}(e^{j\omega })^{-1} d \omega , \end{aligned}$$

and for $k = 1, \dots , p$:

$$\begin{aligned} \frac{\partial \phi (\mathcal {Y})}{\partial Y_k} = - \frac{1}{4\pi } \int _0^{2\pi } \big \{ F_\mathcal {Y}(e^{j\omega })^{-T} e^{jk\omega } + F_\mathcal {Y}(e^{j\omega })^{-1} e^{-jk\omega } \big \} d \omega . \end{aligned}$$

By using these expressions, a few lines of algebra show that $\langle \nabla \phi (\mathcal {Y}), \mathcal {X} - \mathcal {Y} \rangle $ is equal to

$$\begin{aligned} -\frac{1}{2\pi } \int _0^{2\pi } \textbf{Tr}\bigg \{ F_\mathcal {Y}(e^{j\omega })^{-1} [F_\mathcal {X}(e^{j \omega }) - F_\mathcal {Y}(e^{j\omega })] \bigg \} d \omega . \end{aligned}$$

Plugging this expression into the definition of the Bregman divergence yields, after some algebraic manipulation, (15).

1.5 Suboptimality Certificate

Here we justify the statement that a lower bound on the optimal value of (20) is given by (22). In this derivation, let $E = \Pi _p(0)$. It can be shown that a Lagrange dual of (20) is

$$\begin{aligned} \begin{array}{ll} \text{ maximize } & h(\lambda , \mathcal {Z}) \\ \text{ subject } \text{ to } & \mathcal {Z} \in K^*, \end{array} \end{aligned}$$

(35)

with variables $\lambda \in {\text{ R }}$ and $\mathcal {Z} \in \textbf{V},$ where

$$\begin{aligned} h(\lambda , \mathcal {Z}) = - \textbf{Tr}(\hat{R}_0) \lambda - \frac{1}{2} \Vert \lambda E - \mathcal {Z} \Vert ^2 + \langle \hat{\mathcal {R}}^T, \lambda E - \mathcal {Z} \rangle . \end{aligned}$$

Furthermore, strong duality and the existence of a dual solution $(\lambda ^\star , \mathcal {Z}^\star )$ follow from Slater’s constraint qualification. The optimality conditions for the primal-dual pair (20), (35) are

$$\begin{aligned} \begin{aligned} \lambda ^\star E - \mathcal {Z}^\star&= \hat{\mathcal {R}}^T - \mathcal {X}^\star , \qquad \textbf{Tr}(X^\star _0) = \textbf{Tr}(\hat{R}_0), \\ \mathcal {X}^\star&\in K, \qquad \qquad \qquad \mathcal {Z}^\star \in K^*. \end{aligned} \end{aligned}$$

To any primal feasible iterate $\mathcal {X}_k$, associate a dual iterate $(\lambda _k, \mathcal {Z}_k)$ through the relation $\lambda _k E - \mathcal {Z}_k = \hat{\mathcal {R}}^T - \mathcal {X}_k.$ The dual objective value is

$$\begin{aligned} h(\lambda _k, \mathcal {Z}_k) = - \textbf{Tr}(\hat{R}_0) \lambda _k - \frac{1}{2} \Vert \mathcal {X}_k - \hat{\mathcal {R}}^T \Vert ^2 + \langle \hat{\mathcal {R}}^T, \hat{\mathcal {R}}^T - \mathcal {X}_k \rangle . \end{aligned}$$

If $(\lambda _k, \mathcal {Z}_k)$ is dual feasible it follows from weak duality that $h(\lambda _k, \mathcal {Z}_k)$ is a lower bound on the optimal value of (20). We will ensure dual feasibility by choosing $\lambda _k$ large enough to guarantee that $\mathcal {Z}_k = \lambda _k E - \hat{\mathcal {R}}^T + \mathcal {X}_k \in K^*$. For dual feasibility it is required that $\lambda _k \ge - \lambda _{\min }(\mathcal {T}(\mathcal {X}_k - \hat{\mathcal {R}}^T)).$ Since $\textbf{Tr}(\hat{R}_0) > 0$, the choice of $\lambda _k$ resulting in the largest lower bound is $\lambda _k = - \lambda _{\min }(\mathcal {T}(\mathcal {X}_k - \hat{\mathcal {R}}^T))$. Inserting this choice of $\lambda _k$ in the dual objective function proves that (22) is a lower bound on the optimal value of (20).

Since $\textbf{Tr}(\hat{R}_0) > 0 $, the optimal dual solution must be unique and given by $\lambda ^\star = - \lambda _{\min }(\mathcal {T}(\mathcal {X}^\star - \hat{\mathcal {R}}^T))$, $\mathcal {Z}^\star = \lambda ^\star E - \hat{\mathcal {R}}^T + \mathcal {X}^\star $. Hence, as $\mathcal {X}_k \rightarrow \mathcal {X}^\star $ it holds that $(\lambda _k, \mathcal {Z}_k) \rightarrow (\lambda ^\star , \mathcal {Z}^\star )$ and $h(\lambda _k, \mathcal {Z}_k) \rightarrow h(\lambda ^\star , \mathcal {Z}^\star ).$ Since the optimal primal and dual values are equal, it follows that the lower bound (22) converges to the optimal value of (20) as $\mathcal {X}_k \rightarrow \mathcal {X}^\star $.

1.6 Inverse Spectral Density Matrix Estimation

In his dissertation [12, page 69–71], John Parker Burg motivates why it is reasonable to measure the information content of a spectral density matrix f using the quantity $(1/(2\pi )) \int _0^{2\pi } \text{ log } \text{ det } f(\omega ) d \omega $. The principle of maximum entropy states that to estimate f, one should choose the spectrum which maximizes the information content among all spectra that are consistent with prior information. Assume that the estimate is of the form (3) and that the covariance function $R_k$ is known for $|k| \le p$, where p is a positive integer. Then the estimate $\hat{f}$ is consistent with the known values of the covariance function if

$$\begin{aligned} \hspace{2cm} \int _0^{2\pi } \hat{f}(\omega ) e^{j k \omega } d \omega = R_k, \hspace{1cm} 0 \le k \le p. \end{aligned}$$

Hence, the estimate maximizing the entropy among all spectra that are consistent with prior information can be found by solving the problem

$$\begin{aligned} \begin{array}{ll} \text{ maximize } & (1/(2\pi )) \int _0^{2\pi } \text{ log } \text{ det } \hat{f}(\omega ) d \omega \\ \text{ subject } \text{ to } & \int _0^{2\pi } \hat{f}(\omega ) e^{j k \omega } d \omega = R_k, \hspace{0.5cm} 0 \le k \le p, \end{array} \end{aligned}$$

(36)

with problem data $R_k, \, 0 \le k \le p$ and variable $\hat{f}(\omega ) = \frac{1}{2\pi } \sum _{k=-\infty }^\infty R_k e^{-jk\omega }$ where $R_{-k} = R_k^T.$ Problem (36) is sometimes called a maximum entropy covariance extension problem.

In [38, Section 1.2.3] it was pointed out that a dual problem of (36) is

$$\begin{aligned} \text {minimize } - \frac{1}{2\pi } \int _{0}^{2\pi } \text{ log } \text{ det } Y(\omega ) + \langle \mathcal {Y}, \mathcal {R} \rangle \end{aligned}$$

(37)

with variable $\mathcal {Y} = (Y_0, Y_1, \dots , Y_p)$ and problem data $\mathcal {R} = (R_0, R_1, \dots , R_p)$, where $Y(\omega ) = Y_0 + \sum _{k=1}^p (Y_k e^{-jk \omega } + Y_k^T e^{jk \omega })$. Furthermore, there is a natural interpretation of the dual variables in terms of the inverse spectral density matrix: if $\hat{f}$ is optimal in (36) and $(Y_0, Y_1, \dots , Y_p)$ is optimal in (37), then $\hat{f}(\omega )^{-1} = 2 \pi Y(\omega )$. In other words, from the solution of (37) an estimate of the inverse spectral density matrix is available. Problem (37) has an analytical solution [38, Section 1.2.3].

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cederberg, D. First-Order Methods for Nonnegative Trigonometric Matrix Polynomials. J Optim Theory Appl 204, 32 (2025). https://doi.org/10.1007/s10957-024-02581-5

Download citation

Received: 23 April 2024
Accepted: 28 October 2024
Published: 22 January 2025
DOI: https://doi.org/10.1007/s10957-024-02581-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

First-Order Methods for Nonnegative Trigonometric Matrix Polynomials

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Lp quasi-norm minimization: algorithm and applications

Nonmonotone Quasi–Newton-based conjugate gradient methods with application to signal processing

Nonlinear Spectral Analysis via One-Homogeneous Functionals: Overview and Future Prospects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Proof of Theorem 1

Lemma A.1

1.2 Proof of Theorem 3

1.3 Proof of Lemma 4

1.4 Derivation Multivariate Itakura-Saito Distance

1.5 Suboptimality Certificate

1.6 Inverse Spectral Density Matrix Estimation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now