Skip to main content
Log in

Misspecified nonconvex statistical optimization for sparse phase retrieval

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying “true” statistical models. To address this issue, we take a first step towards taming model misspecification by studying the high-dimensional sparse phase retrieval problem with misspecified link functions. In particular, we propose a simple variant of the thresholded Wirtinger flow algorithm that, given a proper initialization, linearly converges to an estimator with optimal statistical accuracy for a broad family of unknown link functions. We further provide extensive numerical experiments to support our theoretical findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Here we use the shorthand \([n] = \{1,2,\ldots , n\}\).

  2. See, for example, Foucart and Rauhut [13] for the definition of the distributional derivative.

References

  1. Ahmadi, A.A., Parrilo, P.A.: Some recent directions in algebraic methods for optimization and Lyapunov analysis. In: Laumond, J.P., Mansard, N., Lasserre, J.B. (eds.) Geometric and Numerical Foundations of Movements, pp. 89–112. Springer, Cham (2017)

    Chapter  Google Scholar 

  2. Alquier, P., Biau, G.: Sparse single-index model. J. Mach. Learn. Res. 14, 243–280 (2013)

    MathSciNet  MATH  Google Scholar 

  3. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bühlmann, P., van de Geer, S.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Berlin (2011)

    Book  MATH  Google Scholar 

  5. Bunk, O., Diaz, A., Pfeiffer, F., David, C., Schmitt, B., Satapathy, D.K., van der Veen, J.F.: Diffractive imaging for periodic samples: retrieving one-dimensional concentration profiles across microfluidic channels. Acta Crystallogr. Sect. A Found. Crystallogr. 63, 306–314 (2007)

    Article  Google Scholar 

  6. Cai, T.T., Li, X., Ma, Z.: Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow. Ann. Stat. 44, 2221–2251 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  7. Candès, E.J., Eldar, Y.C., Strohmer, T., Voroninski, V.: Phase retrieval via matrix completion. SIAM Rev. 57, 225–251 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  8. Candès, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via Wirtinger flow: theory and algorithms. IEEE Trans. Inf. Theory 61, 1985–2007 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  9. Candès, E.J., Strohmer, T., Voroninski, V.: Phaselift: exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 66, 1241–1274 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  10. Chen, Y., Candès, E.: Solving random quadratic systems of equations is nearly as easy as solving linear systems. In: Advances in Neural Information Processing Systems (2015)

  11. Coene, W., Janssen, G., de Beeck, M.O., Van Dyck, D.: Phase retrieval through focus variation for ultra-resolution in field-emission transmission electron microscopy. Phys. Rev. Lett. 69, 3743 (1992)

    Article  Google Scholar 

  12. Cook, R.D., Ni, L.: Sufficient dimension reduction via inverse regression: a minimum discrepancy approach. J. Am. Stat. Assoc. 100, 410–428 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  13. Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Springer, Berlin (2013)

    Book  MATH  Google Scholar 

  14. Genzel, M.: High-dimensional estimation of structured signals from non-linear observations with general convex loss functions. IEEE Trans. Inf. Theory 63, 1601–1619 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  15. Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23, 2341–2368 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  16. Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156, 59–99 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  17. Goldstein, L., Minsker, S., Wei, X.: Structured signal recovery from non-linear and heavy-tailed measurements (2016). arXiv preprint arXiv:1609.01025

  18. Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Johns Hopkins University Press (2012)

    MATH  Google Scholar 

  19. Gonçalves, M.L., Melo, J.G., Monteiro, R.D.: Convergence rate bounds for a proximal admm with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems (2017). arXiv preprint arXiv:1702.01850

  20. Gu, Q., Wang, Z., Liu, H.: Sparse PCA with oracle property. In: Advances in Neural Information Processing Systems (2014)

  21. Han, A.K.: Non-parametric analysis of a generalized regression model: the maximum rank correlation estimator. J. Econ. 35, 303–316 (1987)

    Article  MATH  Google Scholar 

  22. Harrison, R.: Phase problem in crystallography. J. Opt. Soc. Am. Part A Opt. Image Sci. 10, 1046–1055 (1993)

    Article  Google Scholar 

  23. Hong, M., Luo, Z.-Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26, 337–364 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  24. Horowitz, J.L.: Semiparametric and Nonparametric Methods in Econometrics. Springer, Berlin (2009)

    Book  MATH  Google Scholar 

  25. Jaganathan, K., Eldar, Y.C. and Hassibi, B.: Phase retrieval: an overview of recent developments (2015). arXiv preprint arXiv:1510.07713

  26. Jiang, B., Liu, J.S.: Variable selection for general index models via sliced inverse regression. Ann. Stat. 42, 1751–1786 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  27. Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104, 682–693 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  28. Kakade, S.M., Kanade, V., Shamir, O., Kalai, A.: Efficient learning of generalized linear and single index models with isotonic regression. In: Advances in Neural Information Processing Systems (2011)

  29. Kalai, A.T., Sastry, R.: The isotron algorithm: high-dimensional isotonic regression. In: Conference on Learning Theory (2009)

  30. Kim, S., Kojima, M., Toh, K.-C.: A Lagrangian-DNN relaxation: a fast method for computing tight lower bounds for a class of quadratic optimization problems. Math. Program. 156, 161–187 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  31. Li, K.-C.: Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 86, 316–327 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  32. Li, K.-C.: On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. J. Am. Stat. Assoc. 87, 1025–1039 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  33. Li, K.-C., Duan, N.: Regression analysis under link violation. Ann. Stat. 17, 1009–1052 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  34. Li, X., Yang, L.F., Ge, J., Haupt, J., Zhang, T., Zhao, T.: On quadratic convergence of DC proximal Newton algorithm for nonconvex sparse learning in high dimensions (2017). arXiv preprint arXiv:1706.06066

  35. Lin, Q., Zhao, Z., Liu, J.S.: On consistency and sparsity for sliced inverse regression in high dimensions (2015). arXiv preprint arXiv:1507.03895

  36. Loh, P.-L., Wainwright, M.J.: Regularized \({M}\)-estimators with nonconvexity: statistical and algorithmic theory for local optima. J. Mach. Learn. Res. 16, 559–616 (2015)

    MathSciNet  MATH  Google Scholar 

  37. Lu, Z., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 152, 615–642 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  38. Marchesini, S., He, H., Chapman, H.N., Hau-Riege, S.P., Noy, A., Howells, M.R., Weierstall, U., Spence, J.C.: X-ray image reconstruction from a diffraction pattern alone. Phys. Rev. B 68, 140101 (2003)

    Article  Google Scholar 

  39. McCullagh, P., Nelder, J.A.: Generalized Linear Models. Chapman and Hall, Boca Raton (1989)

    Book  MATH  Google Scholar 

  40. Miao, J., Ishikawa, T., Shen, Q., Earnest, T.: Extending X-ray crystallography to allow the imaging of noncrystalline materials, cells, and single protein complexes. Ann. Rev. Phys. Chem. 59, 387–410 (2008)

    Article  Google Scholar 

  41. Millane, R.: Phase retrieval in crystallography and optics. J. Opt. Soc. Am. A Opt. Image Sci. 7, 394–411 (1990)

    Article  Google Scholar 

  42. Netrapalli, P., Jain, P., Sanghavi, S.: Phase retrieval using alternating minimization. In: Advances in Neural Information Processing Systems (2013)

  43. Neykov, M., Wang, Z., Liu, H.: Agnostic estimation for misspecified phase retrieval models. In: Advances in Neural Information Processing Systems (2016)

  44. Parrilo, P.A.: Semidefinite programming relaxations for semialgebraic problems. Math. Program. 96, 293–320 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  45. Plan, Y., Vershynin, R.: The generalized lasso with non-linear observations. IEEE Trans. Inf. Theory 62, 1528–1537 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  46. Plan, Y., Vershynin, R., Yudovina, E.: High-dimensional estimation with geometric constraints (2014). arXiv preprint arXiv:1404.3749

  47. Radchenko, P.: High dimensional single index models. J. Multivar. Anal. 139, 266–282 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  48. Sahinoglou, H., Cabrera, S.D.: On phase retrieval of finite-length sequences using the initial time sample. IEEE Trans. Circuits Syst. 38, 954–958 (1991)

    Article  Google Scholar 

  49. Shechtman, Y., Eldar, Y.C., Cohen, O., Chapman, H.N., Miao, J., Segev, M.: Phase retrieval with application to optical imaging: a contemporary overview. IEEE Signal Process. Mag. 32, 87–109 (2015)

    Article  Google Scholar 

  50. Stein, C.M.: Estimation of the mean of a multivariate normal distribution. Ann. Stat. 9, 1135–1151 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  51. Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. In: IEEE International Symposium on Information Theory (2016)

  52. Sun, W., Wang, Z., Liu, H., Cheng, G.: Non-convex statistical optimization for sparse tensor graphical model. In: Advances in Neural Information Processing Systems (2015)

  53. Tan, K.M., Wang, Z., Liu, H., Zhang, T.: Sparse generalized eigenvalue problem: optimal statistical rates via truncated rayleigh flow (2016). arXiv preprint arXiv:1604.08697

  54. Thrampoulidis, C., Abbasi, E., Hassibi, B.: Lasso with non-linear measurements is equivalent to one with linear measurements. In: Advances in Neural Information Processing Systems (2015)

  55. Waldspurger, I.: Phase retrieval with random gaussian sensing vectors by alternating projections (2016). arXiv preprint arXiv:1609.03088

  56. Waldspurger, I., dÁspremont, A., Mallat, S.: Phase recovery, maxcut and complex semidefinite programming. Math. Program. 149, 47–81 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  57. Wang, Z., Liu, H., Zhang, T.: Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann. Stat. 42, 2164–2201 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  58. Wang, Z., Lu, H., Liu, H.: Tighten after relax: Minimax-optimal sparse PCA in polynomial time. In: Advances in Neural Information Processing Systems (2014)

  59. Weisser, T., Lasserre, J.-B., Toh, K.-C.: Sparse-BSOS: a bounded degree SOS hierarchy for large scale polynomial optimization with sparsity. Math. Program. Comput. 10, 1–32 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  60. Xu, Y., Yin, W.: A globally convergent algorithm for nonconvex optimization based on block coordinate update. J. Sci. Comput. 72(2), 700–734 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  61. Yang, Z., Balasubramanian, K., Liu, H.: High-dimensional non-gaussian single index models via thresholded score function estimation (2017)

  62. Yang, Z., Wang, Z., Liu, H.: Estimating high-dimensional non-gaussian multiple index models via Stein’s lemma. In: Advances in Neural Information Processing Systems (2017)

  63. Yang, Z., Wang, Z., Liu, H., Eldar, Y.C., Zhang, T.: Sparse nonlinear regression: Parameter estimation and asymptotic inference. In: International Conference on Machine Learning (2015)

  64. Yi, X., Wang, Z., Caramanis, C., Liu, H.: Optimal linear estimation under unknown nonlinear transform. In: Advances in Neural Information Processing Systems (2015)

  65. Zhang, H., Chi, Y., Liang, Y.: Provable nonconvex phase retrieval with outliers: median truncated Wirtinger flow. In: International Conference on Machine Learning (2016)

  66. Zhao, T., Liu, H., Zhang, T.: Pathwise coordinate optimization for sparse learning: algorithm and theory. Ann. Stat. 46(1), 180–218 (2018). https://doi.org/10.1214/17-AOS1547

    Article  MathSciNet  MATH  Google Scholar 

  67. Zhu, L., Miao, B., Peng, H.: On sliced inverse regression with high-dimensional covariates. J. Am. Stat. Assoc. 101, 630–643 (2006)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ethan X. Fang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 494 KB)

Appendix

Appendix

1.1 A Further simulations

We further investigate how the sparsity level of the estimator changes over iterations empirically. For the three link function \(h_1\), \(h_2\), and \(h_3\) defined in Eq. (4.1), we plot the sparsity of the TWF algorithm in Fig. 5 up to \(T = 1000\) iterations. Similar to our experimental study of the optimization error, here we set \(p = 1000\), \(s= 5\), and \(n = 864 \approx 5 s^2 \log p\). Moreover, the tuning parameters are set as \(\gamma = 2\), \(\kappa = 15\), and \(\eta = 0.005\). In this case, as we show in Sect. 4.1, our TWF algorithm converges geometrically to an estimator with high statistical accuracy. As shown in Fig. 5, our algorithm tends to over-estimate the sparsity. However, in all of these figures, the estimated sparsity gradually decreases as the algorithm proceeds. Furthermore, combining with the experiments in Sect. 4.1, although the estimator has more nonzero entries than the true parameter, the statistical error is satisfactory.

Fig. 5
figure 5

Plots of the sparsity of the TWF updates \(\beta ^{(t)}\). Here the link function is one of \(h_1\), \(h_2\), and \(h_3\). In addition, we set \(p = 1000 \), \(s = 5\), \(n = 864 \approx 5 s^2\cdot \log p\), and run the TWF algorithm for \(T = 1000\) iterations. These figures are generated based on 50 independent trials for each link function

Fig. 6
figure 6

Plots of the statistical errors for the cubic model \(Y = (X^\top \beta ^*)^3 + \epsilon \) and the phase retrieval model \(Y = (X^\top \beta ^*)^2 + \epsilon \). We set \(p = 1000\), \(s= 5\), and let n vary. The plots are generated based on 100 independent trials for each (nps). In a, we plot the two error curves together, which shows that the TWF algorithm incurs much larger error on the cubic model, whereas the error for the phase retrieval model becomes rather negligible in comparison. In b and c, we plot the two curves in a separately for presentation

In addition, we show a failed example, which violates Assumption 3.1, for the readers to better understand to what extent the proposed algorithm is robust to model misspecification. In particular, consider the cubic link function \(f(u, v) = u^3 + v\), which violates Assumption 3.1 since in this case we have \(\text {Cov} [ Y, (X ^\top \beta ^*)^2 ] = 0\). We compare this cubic model \(Y = (X^\top \beta ^*)^3 + \epsilon \) with the phase retrieval model \(Y = (X^\top \beta ^*)^2 + \epsilon \), where the link function is quadratic. We set \(p = 1000\), \(s = 5\), and let n vary. For each setting, we report the estimation error based on 100 independent trials. The statistical error of the TWF algorithm for these two models are plotted in Fig. 6a, which shows that our algorithm has nondecreasing estimation error for the cubic model even when n is very large. This is in sharp contrast with the phase retrieval model, where the estimation error is negligible. Moreover, we plot the two error curves separately in Fig. 6b and c to better see the details.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Z., Yang, L.F., Fang, E.X. et al. Misspecified nonconvex statistical optimization for sparse phase retrieval. Math. Program. 176, 545–571 (2019). https://doi.org/10.1007/s10107-019-01364-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-019-01364-5

Mathematics Subject Classification

Navigation