Skip to main content
Log in

First-Order Methods for Nonnegative Trigonometric Matrix Polynomials

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

Optimization problems over the cone of nonnegative trigonometric matrix polynomials are common in signal processing. Traditionally, those problems are formulated and solved as semidefinite programs (SDPs). However, the SDP formulation increases the dimension of the problem, resulting in large problems that are challenging to solve. In this paper we propose first-order methods that circumvent the SDP formulation and instead optimize directly within the space of trigonometric matrix polynomials. Our methods are based on a particular Bregman proximal operator. We apply our approach to two fundamental signal processing applications: to rectify a power spectrum that fails to be nonnegative and for graphical modeling of Gaussian time series. Numerical experiments demonstrate that our methods are orders of magnitude faster than an interior-point solver applied to the corresponding SDPs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The data used to reproduce our results is available at https://github.com/dance858/BregmanProx.

References

  1. Alkire, B., Vandenberghe, L.: Convex optimization problems involving finite autocorrelation sequences. Math. Program. 93(3), 331–359 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  2. Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16(3), 697–725 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  3. Basseville, M.: Distance measures for signal processing and pattern recognition. Signal Process. 18(4), 349–369 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  4. Beck, A.: First-Order Methods in Optimization. Soc. Ind. Appl. Math, Philadelphia, PA (2017)

    Book  MATH  Google Scholar 

  5. Benner, P., Mehrmann, V., Sima, V., Van Huffel, S., Varga, A.: SLICOT—A Subroutine Library in Systems and Control Theory. In: Biswa Nath, D. (ed.) Appl. and Comput. Control, Signals, and Circuits: Volume 1, pages 499–539. Birkhäuser Boston (1999)

  6. Bertsekas, D.: Nonlinear Programming. Athena Scientific (2016)

  7. Blomqvist, A., Lindquist, A., Nagamune, R.: Matrix-valued Nevanlinna-Pick interpolation with complexity constraint: an optimization approach. IEEE Trans. Autom. Control 48(12), 2172–2190 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  8. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  9. Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press (2004)

  10. Brillinger, D.: Remarks concerning graphical models for time series and point processes. Revista de Econometria 16, 1–23 (1996)

    MATH  Google Scholar 

  11. Brillinger, David R.: Time Series. Soc. Ind. Appl. Math. (2001)

  12. Burg, J. P.: Maximum entropy spectral analysis. PhD thesis, Stanford University (1975)

  13. Chambolle, A., Contreras, J.: Accelerated Bregman primal-dual methods applied to optimal transport and Wasserstein barycenter problems. SIAM J. Math. Data Sci. 4(4), 1369–1395 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  14. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imag. Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  15. Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program Ser. A 159(1–2), 253–287 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  16. Chao, H., Vandenberghe, L.: Entropic proximal operators for nonnegative trigonometric polynomials. IEEE Trans. Signal Process. 66(18), 4826–4838 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  17. Delsarte, P., Genin, Y., Kamp, Y.: Orthogonal polynomial matrices on the unit circle. IEEE Trans. Circuits Syst. 25, 149–160 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  18. Dumitrescu, B., Tabus, I., Stoica, P.: On the parameterization of positive real sequences and MA parameter estimation. IEEE Trans. Signal Process. 49(11), 2630–2639 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  19. Ferrante, A., Masiero, C., Pavon, M.: Time and spectral domain relative entropy: A new approach to multivariate spectral estimation. IEEE Trans. Autom. Control 57(10), 2561–2575 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  20. Gallivan, K., Thirumalai, S., Van Dooren, P., Vermaut, V.: High performance algorithms for Toeplitz and block Toeplitz matrices. Linear Algebra Appl. 241–243, 343–388 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  21. Genin, Y., Hachez, Y., Nesterov, Y., Van Dooren, P.: Optimization problems over positive pseudopolynomial matrices. SIAM J. Matrix Anal. Appl. 25(1), 57–79 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  22. Grant, M., Boyd, S.: CVX: Matlab Software for Disciplined Convex Programming, Version 2.1 (2014). http://cvxr.com/cvx

  23. Horn, R., Johnson, C.: Matrix Analysis. Cambridge University Press (2013)

  24. Jiang, X., Vandenberghe, L.: Bregman primal-dual first-order method and application to sparse semidefinite programming. Comput. Optim. Appl. 81(1), 127–159 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  25. Jiang, X., Vandenberghe, L.: Bregman three-operator splitting methods. J. Optim. Theory Appl. 196, 936–972 (2023)

    Article  MathSciNet  MATH  Google Scholar 

  26. Kailath, T., Sayed, A., Hassibi, B.: Linear Estimation. Prentice Hall (2000)

  27. Kressner, D., Van Dooren, P.: Factorizations and linear system solvers for matrices with Toeplitz structure (2001)

  28. Kuceyeski, A., Jamison, K., Owen, J., Raj, A., Mukherjee, P.: Functional rerouting via the structural connectome is associated with better recovery after mild TBI. bioRxiv (2018)

  29. Larsson, C., Annergren, M., Hjalmarsson, H.: On optimal input design in system identification for model predictive control. In: 50nd IEEE Conference Decision Control, pp. 805–810 (2011)

  30. Larsson, C., Rojas, C., Hjalmarsson, H.: MPC oriented experiment design. IFAC Proc. Volumes 44(1), 9966–9971 (2011)

    Article  MATH  Google Scholar 

  31. Lemon, A., So, A., Ye, Y.: Low-rank semidefinite programming: theory and applications. Found. Trends Optim. 2(1–2), 1–156 (2016)

    MATH  Google Scholar 

  32. Lindquist, A., Masiero, C., Picci, G.: On the multivariate circulant rational covariance extension problem. In: 52nd IEEE Conference on Decision and Control (2013)

  33. Lindquist, A., Picci, G.: Linear Stochastic Systems: A Geometric Approach to Modeling. Springer, Estimation and Identification (2015)

    Book  MATH  Google Scholar 

  34. McLean, J., Woerdeman, H.: Spectral factorizations and sums of squares representations via semidefinite programming. SIAM J. Matrix Anal. Appl. 23(3), 646–655 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  35. Nesterov, Y.: Squared functional systems and optimization problems. In: Hans, F., Kees, R., Tamás, T., Shuzhong, Z. (eds.) High Performance Optimization, pp. 405–440. Springer US, Boston, MA (2000)

  36. O’Donoghue, B., Chu, E., Parikh, N., Boyd, S.: Conic optimization via operator splitting and homogeneous self-dual embedding. J. Optim. Theory Appl. 169(3), 1042–1068 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  37. Rissanen, J.: Algorithms for triangular decomposition of block Hankel and Toeplitz matrices with application to factoring positive matrix polynomials. Math. Comput. 27(121), 147–154 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  38. Songsiri, J., Dahl, J., Vandenberghe, L.: Graphical models of autoregressive processes. In: D. P. Palomar and Y. C. Eldar, editors, Convex Optimization in Signal Processing and Communications, pp. 89-116. Cambridge University Press (2009)

  39. Songsiri, J., Vandenberghe, L.: Topology selection in graphical models of autoregressive processes. J. Mach. Learn. Res. 11(91), 2671–2705 (2010)

    MathSciNet  MATH  Google Scholar 

  40. Steffens, C., Pesavento, M.: Block- and rank-sparse recovery for direction finding in partly calibrated arrays. IEEE Trans. Signal Process. 66(2), 384–399 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  41. Stoica, A., Moses, R., Stoica, P.: Enforcing positiveness on estimated spectral densities. Electron. Lett. 29, 2009–2011 (1993)

    Article  MATH  Google Scholar 

  42. Stoica, P., McKelvey, T., Mari, J.: MA estimation in polynomial time. IEEE Trans. Signal Process. 48(7), 1999–2012 (2000)

    Article  MATH  Google Scholar 

  43. Sun, Y., Li, Y., Kuceyeski, A., Basu, S.: Large spectral density matrix estimation by thresholding. arXiv preprint (2018). arXiv:1812.00532

  44. Tseng, P.: On Accelerated Proximal Gradient Methods for Convex-Concave Optimization (2008)

  45. Tütüncü, R., Toh, K., Todd, M.: Solving semidefinite-quadratic-linear programs using SDPT3. Math. Program. 95, 189–217 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  46. Vandenberghe, L., Balakrishnan, V.R., Wallin, R., Hansson, A., Roh, T.: Interior-point algorithms for semidefinite programming problems derived from the KYP lemma. In: Henrion, D., Garulli, A. (eds.) Positive Polynomials in Control, pp. 195–238. Springer, Berlin Heidelberg, Berlin, Heidelberg (2005)

    Chapter  MATH  Google Scholar 

  47. von Sachs, R.: Nonparametric spectral analysis of multivariate time series. Annu. Rev. Statist. and Appl. 7, 361–386 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  48. He, Y., Xia, M., Wang, J.: BrainNet Viewer: a network visualization tool for human brain connectomics. PLoS ONE 8, e68910 (2013)

    Article  Google Scholar 

  49. Zorzi, M., Sepulchre, R.: AR identification of latent-variable graphical models. IEEE Trans. Autom. Control 61(9), 2327–2340 (2016)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We thank Lieven Vandenberghe for pointing us to the application in graphical models, and Alan Yang and Tomas Lundberg for feedback on a draft of the paper. We also thank Amy Kuceyeski for sharing the MRI data, and Yige Li for exceptional help when setting up the MRI data. Finally, we thank two reviewers for excellent suggestions that improved the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Cederberg.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Alper Yildirim.

Appendix

Appendix

In this appendix we present proofs that have been omitted from the main body of the paper.

1.1 Proof of Theorem 1

To prove Theorem 3.1 we need the following result [17, Theorem 6].

Lemma A.1

Let \(\mathcal {W} \in \mathbf{int \ }K^{*}\). Consider the matrix polynomial \(A:{\text{ C }}\rightarrow {\text{ C }}^{m \times m}\) given by \(A(z) = \Pi _p(z)^T \mathcal {T}(\mathcal {W})^{-1} E\) where \(E = \Pi _p(0)\). Then \(\det A(z) \ne 0\) for \(|z| \le 1.\)

Proof (Theorem 3.1) Problem (7) is equivalent to

$$\begin{aligned} \begin{array}{ll} \text{ maximize } & \text{ log } \text{ det } V \\ \text{ subject } \text{ to } & E^T Y E = V \\ & D(Y) = \mathcal {X}^T, \hspace{0.5cm} Y \succeq 0, \end{array} \end{aligned}$$
(28)

with variables \(Y \in {\text{ S }}^{(p+1)m}\) and \(V \in {\text{ S }}^m.\) Introduce Lagrange multipliers \(U \in {\text{ S }}^m, \, \mathcal {W} \in \textbf{V}\) and \(Z \in {\text{ S }}^{(p+1)m}.\) A dual is

$$\begin{aligned} \begin{array}{ll} \text{ minimize } & \langle \mathcal {X}^T, \mathcal {W} \rangle - \text{ log } \text{ det } U - m \\ \text{ subject } \text{ to } & Z + EUE^T - \mathcal {T}(\mathcal {W}) = 0 \\ & Z \succeq 0, \end{array} \end{aligned}$$
(29)

with variables \(U \in {\text{ S }}^m, \, \mathcal {W} \in \textbf{V}\) and \(Z \in {\text{ S }}^{(p+1)m}.\) Eliminating Z shows that a suitable dual is

$$\begin{aligned} \begin{array}{ll} \text{ minimize } & \langle \mathcal {X}^T, \mathcal {W} \rangle - \text{ log } \text{ det } U - m \\ \text{ subject } \text{ to } & \mathcal {T}(\mathcal {W}) \succeq E U E^T. \end{array} \end{aligned}$$
(30)

Since \(\mathcal {X} \in \mathbf{int \ }\mathcal {K}\) there exists \(Y \succ 0\) such that \(D(Y) = \mathcal {X}^T.\) Hence, the primal (28) is strictly feasible. From Slater’s constraint qualification for generalized inequalities [9, page 265] it follows that strong duality holds and the optimum of problem (30) is attained.

Problem (29) is strictly feasible (choose e.g. \( U = I, \, \mathcal {W} = (2I, 0, \dots , 0))\) and \(Z = \textbf{blkdiag}(I, 2I, \dots , 2I)).\) Therefore, by Slater’s constraint qualification the dual of (29) admits a solution. The dual of (29) is equivalent to (28) (this can be verified by explicitly deriving a dual of (29), and it also follows from the observation that the objective function in (28) is proper, closed and convex, so the dual of the dual is equivalent to the original problem). Hence, we conclude that problem (28) admits a solution.

Since strong duality holds, if \(\hat{Y}\) is a solution of (28) and \((\hat{U}, \hat{\mathcal {W}})\) is a solution of (30), then there is no duality gap and complementary slackness is satisfied:

$$\begin{aligned} \hat{Y}( \mathcal {T}(\hat{\mathcal {W}}) - E \hat{U} E^T) = 0. \end{aligned}$$
(31)

Since \(\hat{U} \succ 0\) it follows from the block Toeplitz structure that \(T(\hat{\mathcal {W}}) \succ 0\) (see, for example, [38, eq. (1.34)]). Hence, (31) is equivalent to

$$\begin{aligned} \hat{Y} = \hat{Y}E \hat{U} E^T \mathcal {T}(\hat{\mathcal {W}})^{-1}. \end{aligned}$$
(32)

Complementary slackness can also be formulated as \((\mathcal {T}(\hat{\mathcal {W}}) - E \hat{U}E^T) \hat{Y} = 0,\) which is equivalent to

$$\begin{aligned} \hat{Y} = \mathcal {T}(\hat{\mathcal {W}})^{-1} E \hat{U}E^T \hat{Y}. \end{aligned}$$
(33)

Putting together (32) and (33) shows that any primal solution \(\hat{Y}\) must satisfy

$$\begin{aligned} \hat{Y}&= \mathcal {T}(\hat{\mathcal {W}})^{-1} E \hat{U} E^T \hat{Y}E \hat{U} E^T \mathcal {T}(\hat{\mathcal {W}})^{-1} \\ &= \mathcal {T}(\hat{\mathcal {W}})^{-1} \begin{bmatrix} \hat{U} \hat{Y}_{00} \hat{U} & 0 & \dots & 0\\ 0 & 0 & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \dots & 0 & 0 \end{bmatrix} \mathcal {T}(\hat{\mathcal {W}})^{-1} \end{aligned}$$

where \((\hat{U}, \hat{\mathcal {W}})\) is any dual solution. Hence, any primal solution \(\hat{Y}\) satisfies \(\textbf{rank}\hat{Y} = \textbf{rank}(\hat{U} \hat{Y}_{00} \hat{U}) = \textbf{rank}\hat{Y}_{00}\). We now argue that \(\textbf{rank}\hat{Y}_{00} = m\). Assume \(\textbf{rank}\hat{Y}_{00} < m\). Let \(\varvec{B} = \mathcal {T}(\hat{\mathcal {W}})^{-1} E \hat{U} (\hat{Y}_{00})^{1/2}.\) Then \(\hat{Y} = \varvec{B} \varvec{B}^T\) so \(B(z) = \varvec{B}^T \Pi _p(z)\) is a spectral factor of \(F_\mathcal {X}(z).\) From the assumption \(\textbf{rank}\hat{Y}_{00} < m\) it follows that \(\det B(z) = 0\). This implies \(\det F_\mathcal {X}(z) = |\det B(z)|^2 = 0\) which contradicts the assumption that \(\mathcal {X} \in \mathbf{int \ }\mathcal {K}\).

We have now shown that any solution \(\hat{Y}\) has rank m. Also recall that we have shown that at least one solution exists. By applying results from the literature on low-rank semidefinite programming (see, for example, [31, Corollary 2.5]) we conclude that the solution is unique. (Corollary 2.5 in [31] applies to SDPs with a linear objective function. However, we can still apply the result by noting that the objective function in (7) can be replaced by the linear objective function \(\textbf{Tr}(Y_{00})\) without changing the set of optimal solutions.)

Denote the unique solution by \(Y^\star \). Note that \(Y^\star \) satisfies

$$\begin{aligned} Y^{\star } = \mathcal {T}(\hat{\mathcal {W}})^{-1} E \hat{U} Y^\star _{00} \hat{U} E^T \mathcal {T}(\hat{\mathcal {W}})^{-1}. \end{aligned}$$
(34)

Let \(\varvec{B} \in {\text{ R }}^{(p+1) \times m}\) be any matrix satisfying \(\det (B_0) \ge 0\) and \(Y^\star = \varvec{B} \varvec{B}^T\). Then \(B(z) = \varvec{B}^T \Pi _p(z)\) is a spectral factor of \(F_\mathcal {X}\), since \(D(\varvec{B} \varvec{B}^T) = \mathcal {X}^T\). It must hold that \(\det (B_0) > 0\), since \(Y_0^\star = B_0^T B_0\) and \(Y_0^\star \succ 0\). Multiplying both sides of (34) with \(\Pi _p(z)^H\) from the left and \(\Pi _p(z)\) from the right shows that

$$\begin{aligned} \begin{aligned}&\Pi _p(z)^H \varvec{B} \varvec{B}^T \Pi _p(z) = \Pi _p(z)^H \mathcal {T}(\mathcal {W})^{-1} E \hat{U} B_0^T B_0 \hat{U} E^T \mathcal {T}(\mathcal {W})^{-1} \Pi _p(z), \end{aligned} \end{aligned}$$

implying the identity \(| \det B(z) | = | \det (B_0 \hat{U} E^T \mathcal {T}(\mathcal {W})^{-1} \Pi _p(z)) |.\) From Lemma (A.1) it follows that \(\det (B_0 \hat{U} E^T \mathcal {T}(\mathcal {W})^{-1} \Pi _p(z)) \ne 0\) for \(|z| \le 1\), implying that B(z) is a stable spectral factor. \(\square \)

1.2 Proof of Theorem 3

The minimization problem in (9) is equivalent to

$$\begin{aligned} \begin{array}{ll} \text{ minimize } & - \text{ log } \text{ det } V \\ \text{ subject } \text{ to } & \Pi _p(0)^T Y \Pi _p(0) = V, \\ & D(Y) = \mathcal {X}^T, \quad Y \succeq 0, \end{array} \end{aligned}$$

with variables \(Y \in {\text{ S }}^{(p+1)m}\) and \(V \in {\text{ S }}^m.\) A dual is

$$\begin{aligned} \begin{array}{ll} \text{ maximize } & \text{ log } \text{ det } U -\langle \mathcal {X}^T, \mathcal {W} \rangle + m \\ \text{ subject } \text{ to } & \mathcal {T}(\mathcal {W}) \succeq \Pi _p(0)U\Pi _p(0)^T, \end{array} \end{aligned}$$

with variables \(U \in {\text{ S }}^m\) and \(\mathcal {W} \in \textbf{V}\). As argued in the proof of Theorem 1, strong duality holds, a dual solution \((U^\star , \mathcal {W}^\star )\) exists and \((\mathcal {T}(\mathcal {W}^\star ) - \Pi _p(0) U^\star \Pi _p(0)^T) Y^\star = 0.\) Furthermore, primal feasibility states that \(Y_{00}^\star = V^\star ,\) and from stationarity of the Lagrangian we also have \(U^\star = (V^\star )^{-1} = (Y_{00}^\star )^{-1}.\) Now let \(Y^\star = \varvec{B}\varvec{B}^T.\) Then \(U^\star = (B_0^T B_0)^{-1}\) and

$$\begin{aligned} (\mathcal {T}(\mathcal {W}^\star ) - \Pi _p(0) (B_0^T B_0)^{-1} \Pi _p(0)^T)\varvec{B}\varvec{B}^T = 0. \end{aligned}$$

Since \(\varvec{B}\) has full column rank it must hold that \( \mathcal {T}(\mathcal {W}^\star ) \varvec{B} - \Pi _p(0)(B_0^T B_0)^{-1} B_0^T = 0.\) Equivalently,

$$\begin{aligned} \mathcal {T}(\mathcal {W}^\star ) \varvec{B}B_0(B_0^T B_0)^{-1} = \Pi _p(0)(B_0^T B_0)^{-1}. \end{aligned}$$

These are the Yule-Walker equations for the autoregressive process \(x(t) = - \sum _{k=1}^p A_k x(t-k) + w(t)\), where \(x(t) \in {\text{ R }}^m\) and \(w(t) \sim N(0, (B_0^T B_0)^{-1})\) is Gaussian and \(A_k = B_k^T B_0 (B_0^T B_0)^{-1}\) (see, for example, [38, eq. (1.9)]). In this interpretation, \(\mathcal {W}^\star \) contains the first \(p + 1\) elements of the covariance sequence of the autoregressive process. Since the covariance sequence is unique it follows that \(\mathcal {W}^\star \) is unique. With the uniqueness of \(\mathcal {W}^\star \) established, it follows from duality theory (see e.g. [6, Proposition 4.3.3]) that \(\nabla \phi (\mathcal {X}^T) = - \mathcal {W}^\star \). Expression (12) is a consequence of symmetry properties of the gradient.

1.3 Proof of Lemma 4

Since \(\mathcal {T}(\mathcal {W})\) is a positive definite block Toeplitz matrix of size \(m(p + 1) \times m(p+1)\) with blocks of size \(m \times m\), it can be decomposed as \(V \mathcal {T}(\mathcal {W}) V^T = D\), where V is an upper triangular matrix with identity matrices on the diagonal and \(D = \textbf{blkdiag}(D_p, \dots , D_0)\) with \(0 \prec D_p \preceq \dots \preceq D_0\) [37]. Since V is nonsingular, \(\mathcal {T}(\mathcal {W}) \succeq \Pi _p(0)U\Pi _p(0)^T\) if and only if \(V \mathcal {T}(\mathcal {W}) V^T \succeq V\Pi _p(0)U\Pi _p(0)^T V^T\) [23, Theorem 7.7.2], i.e. if and only if

$$\begin{aligned} \textbf{blkdiag}(D_p, D_{p-1} \dots , D_0) \succeq \textbf{blkdiag}(U, 0, \dots , 0). \end{aligned}$$

Since log-determinant is a monotone function in the sense that \(A \succeq B\) implies \( \text{ log } \text{ det } A \ge \text{ log } \text{ det } B,\) one optimal choice of U given \(\mathcal {W}\) is \(U = D_p.\) The statement that \(D_p = (\Pi _p(0)^T \mathcal {T}(\mathcal {W})^{-1} \Pi _p(0))^{-1}\) follows from studying the first block-column of the equation \(U \mathcal {T}(\mathcal {W}) U^T = \textbf{blkdiag}(D_p, \dots , D_0)\).

1.4 Derivation Multivariate Itakura-Saito Distance

Here the Bregman divergence associated with the negative entropy defined in (8) is derived. We start by evaluating \(\nabla \phi (\mathcal {Y}) \in \textbf{V}\) for \(\mathcal {Y} \in \mathbf{int \ }K\). We have

$$\begin{aligned} \nabla \phi (\mathcal {Y}) = \bigg ( \frac{\partial \phi (\mathcal {Y})}{\partial Y_0}, \dots , \frac{\partial \phi (\mathcal {Y})}{\partial Y_p} \bigg ), \end{aligned}$$

where \(\frac{\partial \phi (\mathcal {Y})}{\partial Y_0}\) is the gradient of the function

$$\begin{aligned} Y_0 \mapsto - \frac{1}{2\pi } \int _0^{2 \pi } \text{ log } \text{ det } \big (Y_0 + \sum _{k=1}^p (Y_k e^{j \omega k} + Y_k^T e^{-j \omega k}) \big ), \end{aligned}$$

where we view \(Y_1, \dots , Y_k\) as fixed. Here the gradient is taken in \({\text{ S }}^m\) with respect to the inner product \(\langle A, B \rangle = \textbf{Tr}(AB), \, A, B \in {\text{ S }}^m\). Furthermore, for \(k = 1, \dots , p\), \(\frac{\partial \phi (\mathcal {Y})}{\partial Y_k}\) is the gradient of the function

$$\begin{aligned} Y_k \mapsto - \frac{1}{2\pi } \int _0^{2 \pi } \text{ log } \text{ det } \big (Y_0 + \sum _{k=1}^p (Y_k e^{j \omega k} + Y_k^T e^{-j \omega k}) \big ), \end{aligned}$$

where we view \(Y_i\) as fixed for \(i \ne k\). Here the gradient is taken in \({\text{ R }}^{m \times m}\) with respect to the inner product \(\langle A, B \rangle = 2 \textbf{Tr}(A^T B), \, A, B \in {\text{ R }}^{m \times m}.\) Using standard chain rules one can show that

$$\begin{aligned} \frac{\partial \phi (\mathcal {Y})}{\partial Y_0} = -\frac{1}{2\pi } \int _0^{2\pi } F_\mathcal {Y}(e^{j\omega })^{-1} d \omega , \end{aligned}$$

and for \(k = 1, \dots , p\):

$$\begin{aligned} \frac{\partial \phi (\mathcal {Y})}{\partial Y_k} = - \frac{1}{4\pi } \int _0^{2\pi } \big \{ F_\mathcal {Y}(e^{j\omega })^{-T} e^{jk\omega } + F_\mathcal {Y}(e^{j\omega })^{-1} e^{-jk\omega } \big \} d \omega . \end{aligned}$$

By using these expressions, a few lines of algebra show that \(\langle \nabla \phi (\mathcal {Y}), \mathcal {X} - \mathcal {Y} \rangle \) is equal to

$$\begin{aligned} -\frac{1}{2\pi } \int _0^{2\pi } \textbf{Tr}\bigg \{ F_\mathcal {Y}(e^{j\omega })^{-1} [F_\mathcal {X}(e^{j \omega }) - F_\mathcal {Y}(e^{j\omega })] \bigg \} d \omega . \end{aligned}$$

Plugging this expression into the definition of the Bregman divergence yields, after some algebraic manipulation, (15).

1.5 Suboptimality Certificate

Here we justify the statement that a lower bound on the optimal value of (20) is given by (22). In this derivation, let \(E = \Pi _p(0)\). It can be shown that a Lagrange dual of (20) is

$$\begin{aligned} \begin{array}{ll} \text{ maximize } & h(\lambda , \mathcal {Z}) \\ \text{ subject } \text{ to } & \mathcal {Z} \in K^*, \end{array} \end{aligned}$$
(35)

with variables \(\lambda \in {\text{ R }}\) and \(\mathcal {Z} \in \textbf{V},\) where

$$\begin{aligned} h(\lambda , \mathcal {Z}) = - \textbf{Tr}(\hat{R}_0) \lambda - \frac{1}{2} \Vert \lambda E - \mathcal {Z} \Vert ^2 + \langle \hat{\mathcal {R}}^T, \lambda E - \mathcal {Z} \rangle . \end{aligned}$$

Furthermore, strong duality and the existence of a dual solution \((\lambda ^\star , \mathcal {Z}^\star )\) follow from Slater’s constraint qualification. The optimality conditions for the primal-dual pair (20), (35) are

$$\begin{aligned} \begin{aligned} \lambda ^\star E - \mathcal {Z}^\star&= \hat{\mathcal {R}}^T - \mathcal {X}^\star , \qquad \textbf{Tr}(X^\star _0) = \textbf{Tr}(\hat{R}_0), \\ \mathcal {X}^\star&\in K, \qquad \qquad \qquad \mathcal {Z}^\star \in K^*. \end{aligned} \end{aligned}$$

To any primal feasible iterate \(\mathcal {X}_k\), associate a dual iterate \((\lambda _k, \mathcal {Z}_k)\) through the relation \(\lambda _k E - \mathcal {Z}_k = \hat{\mathcal {R}}^T - \mathcal {X}_k.\) The dual objective value is

$$\begin{aligned} h(\lambda _k, \mathcal {Z}_k) = - \textbf{Tr}(\hat{R}_0) \lambda _k - \frac{1}{2} \Vert \mathcal {X}_k - \hat{\mathcal {R}}^T \Vert ^2 + \langle \hat{\mathcal {R}}^T, \hat{\mathcal {R}}^T - \mathcal {X}_k \rangle . \end{aligned}$$

If \((\lambda _k, \mathcal {Z}_k)\) is dual feasible it follows from weak duality that \(h(\lambda _k, \mathcal {Z}_k)\) is a lower bound on the optimal value of (20). We will ensure dual feasibility by choosing \(\lambda _k\) large enough to guarantee that \(\mathcal {Z}_k = \lambda _k E - \hat{\mathcal {R}}^T + \mathcal {X}_k \in K^*\). For dual feasibility it is required that \(\lambda _k \ge - \lambda _{\min }(\mathcal {T}(\mathcal {X}_k - \hat{\mathcal {R}}^T)).\) Since \(\textbf{Tr}(\hat{R}_0) > 0\), the choice of \(\lambda _k\) resulting in the largest lower bound is \(\lambda _k = - \lambda _{\min }(\mathcal {T}(\mathcal {X}_k - \hat{\mathcal {R}}^T))\). Inserting this choice of \(\lambda _k\) in the dual objective function proves that (22) is a lower bound on the optimal value of (20).

Since \(\textbf{Tr}(\hat{R}_0) > 0 \), the optimal dual solution must be unique and given by \(\lambda ^\star = - \lambda _{\min }(\mathcal {T}(\mathcal {X}^\star - \hat{\mathcal {R}}^T))\), \(\mathcal {Z}^\star = \lambda ^\star E - \hat{\mathcal {R}}^T + \mathcal {X}^\star \). Hence, as \(\mathcal {X}_k \rightarrow \mathcal {X}^\star \) it holds that \((\lambda _k, \mathcal {Z}_k) \rightarrow (\lambda ^\star , \mathcal {Z}^\star )\) and \(h(\lambda _k, \mathcal {Z}_k) \rightarrow h(\lambda ^\star , \mathcal {Z}^\star ).\) Since the optimal primal and dual values are equal, it follows that the lower bound (22) converges to the optimal value of (20) as \(\mathcal {X}_k \rightarrow \mathcal {X}^\star \).

1.6 Inverse Spectral Density Matrix Estimation

In his dissertation [12, page 69–71], John Parker Burg motivates why it is reasonable to measure the information content of a spectral density matrix f using the quantity \((1/(2\pi )) \int _0^{2\pi } \text{ log } \text{ det } f(\omega ) d \omega \). The principle of maximum entropy states that to estimate f, one should choose the spectrum which maximizes the information content among all spectra that are consistent with prior information. Assume that the estimate is of the form (3) and that the covariance function \(R_k\) is known for \(|k| \le p\), where p is a positive integer. Then the estimate \(\hat{f}\) is consistent with the known values of the covariance function if

$$\begin{aligned} \hspace{2cm} \int _0^{2\pi } \hat{f}(\omega ) e^{j k \omega } d \omega = R_k, \hspace{1cm} 0 \le k \le p. \end{aligned}$$

Hence, the estimate maximizing the entropy among all spectra that are consistent with prior information can be found by solving the problem

$$\begin{aligned} \begin{array}{ll} \text{ maximize } & (1/(2\pi )) \int _0^{2\pi } \text{ log } \text{ det } \hat{f}(\omega ) d \omega \\ \text{ subject } \text{ to } & \int _0^{2\pi } \hat{f}(\omega ) e^{j k \omega } d \omega = R_k, \hspace{0.5cm} 0 \le k \le p, \end{array} \end{aligned}$$
(36)

with problem data \(R_k, \, 0 \le k \le p\) and variable \(\hat{f}(\omega ) = \frac{1}{2\pi } \sum _{k=-\infty }^\infty R_k e^{-jk\omega }\) where \(R_{-k} = R_k^T.\) Problem (36) is sometimes called a maximum entropy covariance extension problem.

In [38, Section 1.2.3] it was pointed out that a dual problem of (36) is

$$\begin{aligned} \text {minimize } - \frac{1}{2\pi } \int _{0}^{2\pi } \text{ log } \text{ det } Y(\omega ) + \langle \mathcal {Y}, \mathcal {R} \rangle \end{aligned}$$
(37)

with variable \(\mathcal {Y} = (Y_0, Y_1, \dots , Y_p)\) and problem data \(\mathcal {R} = (R_0, R_1, \dots , R_p)\), where \(Y(\omega ) = Y_0 + \sum _{k=1}^p (Y_k e^{-jk \omega } + Y_k^T e^{jk \omega })\). Furthermore, there is a natural interpretation of the dual variables in terms of the inverse spectral density matrix: if \(\hat{f}\) is optimal in (36) and \((Y_0, Y_1, \dots , Y_p)\) is optimal in (37), then \(\hat{f}(\omega )^{-1} = 2 \pi Y(\omega )\). In other words, from the solution of (37) an estimate of the inverse spectral density matrix is available. Problem (37) has an analytical solution [38, Section 1.2.3].

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cederberg, D. First-Order Methods for Nonnegative Trigonometric Matrix Polynomials. J Optim Theory Appl 204, 32 (2025). https://doi.org/10.1007/s10957-024-02581-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10957-024-02581-5

Keywords

Mathematics Subject Classification