skip to main content
research-article

Advancing non-convex and constrained learning: challenges and opportunities

Published:06 December 2019Publication History
Skip Abstract Section

Abstract

As data gets more complex and applications of machine learning (ML) algorithms for decision-making broaden and diversify, traditional ML methods by minimizing an unconstrained or simply constrained convex objective are becoming increasingly unsatisfactory. To address this new challenge, recent ML research has sparked a paradigm shift in learning predictive models into non-convex learning and heavily constrained learning. Non-Convex Learning (NCL) refers to a family of learning methods that involve optimizing non-convex objectives. Heavily Constrained Learning (HCL) refers to a family of learning methods that involve constraints that are much more complicated than a simple norm constraint (e.g., data-dependent functional constraints, non-convex constraints), as in conventional learning. This paradigm shift has already created many promising outcomes: (i) non-convex deep learning has brought breakthroughs for learning representations from large-scale structured data (e.g., images, speech) (LeCun, Bengio, & Hinton, 2015; Krizhevsky, Sutskever, & Hinton, 2012; Amodei et al., 2016; Deng & Liu, 2018); (ii) non-convex regularizers (e.g., for enforcing sparsity or low-rank) could be more effective than their convex counterparts for learning high-dimensional structured models (C.-H. Zhang & Zhang, 2012; J. Fan & Li, 2001; C.-H. Zhang, 2010; T. Zhang, 2010); (iii) constrained learning is being used to learn predictive models that satisfy various constraints to respect social norms (e.g., fairness) (B. E. Woodworth, Gunasekar, Ohannessian, & Srebro, 2017; Hardt, Price, Srebro, et al., 2016; Zafar, Valera, Gomez Rodriguez, & Gummadi, 2017; A. Agarwal, Beygelzimer, Dudík, Langford, & Wallach, 2018), to improve the interpretability (Gupta et al., 2016; Canini, Cotter, Gupta, Fard, & Pfeifer, 2016; You, Ding, Canini, Pfeifer, & Gupta, 2017), to enhance the robustness (Globerson & Roweis, 2006a; Sra, Nowozin, & Wright, 2011; T. Yang, Mahdavi, Jin, Zhang, & Zhou, 2012), etc. In spite of great promises brought by these new learning paradigms, they also bring emerging challenges to the design of computationally efficient algorithms for big data and the analysis of their statistical properties.

References

  1. Agarwal, A., Beygelzimer, A., Dudík, M., Langford, J., & Wallach, H. (2018). A reductions approach to fair classification. In Proceedings of the 35th international conference on machine learning (icml) (pp.-).Google ScholarGoogle Scholar
  2. Agarwal, N., Allen Zhu, Z., Bullins, B., Hazan, E., & Ma, T. (2017). Finding approximate local minima faster than gradient descent. In Acm symposium on theory of computing (stoc) (pp. 1195--1199).Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Allen-Zhu, Z., Li, Y., & Song, Z. (2018). A convergence theory for deep learning via over-parameterization. CoRR, abs/1811.03962.Google ScholarGoogle Scholar
  4. Allen-Zhu, Z. (2017). Natasha 2: Faster non-convex optimization than sgd. CoRR, /abs/1708.08694/v4.Google ScholarGoogle Scholar
  5. Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., ... Zhu, Z. (2016). Deep speech 2: End-to-end speech recognition in english and mandarin. In Proceedings of the 33rd international conference on international conference on machine learning (icml) (pp. 173--182).Google ScholarGoogle Scholar
  6. An, N. T., & Nam, N. M. (2017). Convergence analysis of a proximal point algorithm for minimizing differences of functions. Optimization, 66(1), 129--147.Google ScholarGoogle ScholarCross RefCross Ref
  7. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In International conference on machine learning (pp. 214--223).Google ScholarGoogle Scholar
  8. Arora, S., Cohen, N., & Hazan, E. (2018). On the optimization of deep networks: Implicit acceleration by overparameterization. arXiv preprint arXiv:1802.06509.Google ScholarGoogle Scholar
  9. Attouch, H., Bolte, J., & Svaiter, B. F. (2013). Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized gauss-seidel methods. Mathematical Programming, 137(1), 91--129.Google ScholarGoogle ScholarCross RefCross Ref
  10. Belagiannis, V., Rupprecht, C., Carneiro, G., & Navab, N. (2015). Robust optimization for deep regression. In Proceedings of the ieee international conference on computer vision (pp. 2830--2838).Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bolte, J., Sabach, S., & Teboulle, M. (2014). Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146, 459--494.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bot, R. I., Csetnek, E. R., & Lászlá, S. C. (2016, Feb 01). An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions. EURO Journal on Computational Optimization, 4(1), 3--25.Google ScholarGoogle ScholarCross RefCross Ref
  13. Candès, E. J., Wakin, M. B., & Boyd, S. P. (2008, Dec 01). Enhancing sparsity by reweighted l1 minimization. Journal of Fourier Analysis and Applications, 14(5), 877--905.Google ScholarGoogle ScholarCross RefCross Ref
  14. Canini, K., Cotter, A., Gupta, M. R., Fard, M. M., & Pfeifer, J. (2016). Fast and flexible monotonic functions with ensembles of lattices. In Proceedings of the 30th international conference on neural information processing systems (nips) (pp. 2927--2935).Google ScholarGoogle Scholar
  15. Carlini, N., & Wagner, D. (2017). Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp) (pp. 39--57).Google ScholarGoogle Scholar
  16. Carmon, Y., Duchi, J. C., Hinder, O., & Sidford, A. (2016). Accelerated methods for non-convex optimization. CoRR, abs/1611.00756.Google ScholarGoogle Scholar
  17. Cartis, C., Gould, N. I. M., & Toint, P. L. (2011a, Dec 01). Adaptive cubic regularisation methods for unconstrained optimization. part ii: worst-case function- and derivative-evaluation complexity. Mathematical Programming, 130(2), 295--319.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Cartis, C., Gould, N. I. M., & Toint, P. L. (2011b). Adaptive cubic regularisation methods for unconstrained optimization. part i: motivation, convergence and numerical results. Mathematical Programming, 127(2), 245--295.Google ScholarGoogle ScholarCross RefCross Ref
  19. Chartrand, R. (2012). Nonconvex splitting for regularized low-rank+ sparse decomposition. IEEE Transactions on Signal Processing, 60(11), 5810--5819.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chartrand, R., & Yin, W. (2016). Nonconvex sparse regularization and splitting algorithms. In Splitting methods in communication, imaging, science, and engineering (pp. 237--249). Springer.Google ScholarGoogle Scholar
  21. Chen, J., & Gu, Q. (2018). Closing the generalization gap of adaptive gradient methods in training deep neural networks. arXiv preprint arXiv:1806.06763.Google ScholarGoogle Scholar
  22. Chen, Z., Yuan, Z., Yi, J., Zhou, B., Chen, E., & Yang, T. (2019). Universal stage-wise learning for non-convex problems with convergence on averaged solutions. In 7th international conference on learning representations, ICLR 2019, new orleans, la, usa, may 6--9, 2019.Google ScholarGoogle Scholar
  23. Cherukuri, A., Gharesifard, B., & Cortes, J. (2017). Saddle-point dynamics: conditions for asymptotic stability of saddle points. SIAM Journal on Control and Optimization, 55(1), 486--511.Google ScholarGoogle ScholarCross RefCross Ref
  24. Cisse, M., Bojanowski, P., Grave, E., Dauphin, Y., & Usunier, N. (2017). Parseval networks: Improving robustness to adversarial examples. In Proceedings of the 34th international conference on machine learning-volume 70 (pp. 854--863).Google ScholarGoogle Scholar
  25. Daskalakis, C., Ilyas, A., Syrgkanis, V., & Zeng, H. (2017). Training gans with optimism. CoRR, abs/1711.00141.Google ScholarGoogle Scholar
  26. Davis, D., & Drusvyatskiy, D. (2018). Stochastic subgradient method converges at the rate o(k-1/4) on weakly convex functions. arXiv preprint arXiv:1802.02988.Google ScholarGoogle Scholar
  27. Deng, L., & Liu, Y. (2018). Deep learning in natural language processing. Springer.Google ScholarGoogle Scholar
  28. Du, S. S., Zhai, X., Poczos, B., & Singh, A. (2018). Gradient descent provably optimizes over-parameterized neural networks. arXiv preprint arXiv:1810.02054.Google ScholarGoogle Scholar
  29. Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul), 2121--2159.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348--1360.Google ScholarGoogle ScholarCross RefCross Ref
  31. Fan, Y., Lyu, S., Ying, Y., & Hu, B. (2017). Learning with average top-k loss. In Advances in neural information processing systems 30: Annual conference on neural information processing systems 2017, 4--9 december 2017, long beach, ca, USA (pp. 497--505).Google ScholarGoogle Scholar
  32. Ghadimi, S., & Lan, G. (2013). Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4), 2341--2368.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Globerson, A., & Roweis, S. (2006a). Nightmare at test time: Robust learning by feature deletion. In Proceedings of the 23rd international conference on machine learning (pp. 353--360).Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Globerson, A., & Roweis, S. (2006b). Nightmare at test time: robust learning by feature deletion. In Proceedings of the 23rd international conference on machine learning (pp. 353--360).Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672--2680).Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Gouk, H., Frank, E., Pfahringer, B., & Cree, M. (2018). Regularisation of neural networks by enforcing lipschitz continuity. arXiv preprint arXiv:1804.04368.Google ScholarGoogle Scholar
  37. Grnarova, P., Levy, K. Y., Lucchi, A., Hofmann, T., & Krause, A. (2017). An online learning approach to generative adversarial networks. CoRR, abs/1706.03269.Google ScholarGoogle Scholar
  38. Gupta, M. R., Cotter, A., Pfeifer, J., Voevodski, K., Canini, K. R., Mangylov, A., ... Esbroeck, A. V. (2016). Monotonic calibrated interpolated look-up tables. Journal of Machine Learning Research (JMLR), 17, 109:1--109:47.Google ScholarGoogle Scholar
  39. Hardt, M., Price, E., Srebro, N., et al. (2016). Equality of opportunity in supervised learning. In Advances in neural information processing systems (pp. 3315--3323).Google ScholarGoogle Scholar
  40. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the ieee conference on computer vision and pattern recognition (pp. 770--778).Google ScholarGoogle ScholarCross RefCross Ref
  41. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems 30 nips) (pp. 6629--6640).Google ScholarGoogle Scholar
  42. Hillar, C. J., & Lim, L.-H. (2013, November). Most tensor problems are np-hard. Journal of ACM, 60(6), 45:1--45:39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Khalaf, W., Astorino, A., d'Alessandro, P., & Gaudioso, M. (2017). A dc optimization-based clustering technique for edge detection. Optimization Letters, 11(3), 627--640.Google ScholarGoogle ScholarCross RefCross Ref
  44. Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In 3rd international conference on learning representations, ICLR 2015, san diego, ca, usa, may 7--9, 2015, conference track proceedings. Retrieved from http://arxiv.org/abs/1412.6980Google ScholarGoogle Scholar
  45. Kiryo, R., Niu, G., du Plessis, M. C., & Sugiyama, M. (2017). Positive-unlabeled learning with non-negative risk estimator. In Advances in neural information processing systems 30 (pp. 1675--1685).Google ScholarGoogle Scholar
  46. Kohler, J. M., & Lucchi, A. (2017). Sub-sampled cubic regularization for non-convex optimization. In Proceedings of the international conference on machine learning (icml) (pp. 1895--1904).Google ScholarGoogle Scholar
  47. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (nips) (pp. 1106--1114).Google ScholarGoogle Scholar
  48. LeCun, Y., Bengio, Y., & Hinton, G. E. (2015). Deep learning. Nature, 521(7553), 436--444.Google ScholarGoogle ScholarCross RefCross Ref
  49. Le Thi, H. A., & Dinh, T. P. (2014). Dc programming in communication systems: challenging problems and methods. Vietnam Journal of Computer Science, 1(1), 15--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Le Thi, H. A., Dinh, T. P., & Belghiti, M. (2014). Dca based algorithms for multiple sequence alignment (msa). Central European Journal of Operations Research, 22(3), 501--524.Google ScholarGoogle ScholarCross RefCross Ref
  51. Li, H., & Lin, Z. (2015). Accelerated proximal gradient methods for nonconvex programming. In Proceedings of the 28th international conference on neural information processing systems - volume 1 (pp. 379--387).Google ScholarGoogle Scholar
  52. Li, X., & Orabona, F. (2018). On the convergence of stochastic gradient descent with adaptive stepsizes. arXiv preprint arXiv:1805.08114.Google ScholarGoogle Scholar
  53. Li, Y., & Liang, Y. (2018). Learning overparameterized neural networks via stochastic gradient descent on structured data. In Advances in neural information processing systems (neurips) (pp. 8157--8166).Google ScholarGoogle Scholar
  54. Lin, Q., Liu, M., Rafique, H., & Yang, T. (2018). Solving weakly-convex-weakly-concave saddle-point problems as weakly-monotone variational inequality. arXiv preprint arXiv:1810.10207.Google ScholarGoogle Scholar
  55. Lin, Q., Nadarajah, S., Soheili, N., & Yang, T. (2019). A data efficient and feasible level set method for stochastic convex optimization with expectation constraints. CoRR, abs/1908.03077.Google ScholarGoogle Scholar
  56. Liu, M., & Yang, T. (2017a). On noisy negative curvature descent: Competing with gradient descent for faster non-convex optimization. CoRR, abs/1709.08571.Google ScholarGoogle Scholar
  57. Liu, M., & Yang, T. (2017b). Stochastic non-convex optimization with strong high probability second-order convergence. CoRR, abs/1710.09447.Google ScholarGoogle Scholar
  58. Liu, T., Pong, T. K., & Takeda, A. (2018, Sep 08). A successive difference-of-convex approximation method for a class of nonconvex nonsmooth optimization problems. Mathematical Programming.Google ScholarGoogle Scholar
  59. Luo, L., Xiong, Y., Liu, Y., & Sun, X. (2019). Adaptive gradient methods with dynamic bound of learning rate. arXiv preprint arXiv:1902.09843.Google ScholarGoogle Scholar
  60. Ma, R., Lin, Q., & Yang, T. (2019). Proximally constrained methods for weakly convex optimization with weakly convex constraints. arXiv preprint arXiv:1908.01871.Google ScholarGoogle Scholar
  61. Mahdavi, M., Yang, T., Jin, R., & Zhu, S. (2012). Stochastic gradient descent with only one projection. In Advances in neural information processing systems (nips) (p. 503--511).Google ScholarGoogle Scholar
  62. Nagarajan, V., & Kolter, J. Z. (2017). Gradient descent GAN optimization is locally stable. In Advances in neural information processing systems 30 (nips) (pp. 5591--5600).Google ScholarGoogle Scholar
  63. Namkoong, H., & Duchi, J. C. (2016). Stochastic gradient methods for distributionally robust optimization with f-divergences. In Advances in neural information processing systems (pp. 2208--2216).Google ScholarGoogle Scholar
  64. Namkoong, H., & Duchi, J. C. (2017). Variance-based regularization with convex objectives. In Advances in neural information processing systems (pp. 2971--2980).Google ScholarGoogle Scholar
  65. Nesterov, Y., & Polyak, B. T. (2006). Cubic regularization of newton method and its global performance. Math. Program., 108(1), 177--205.Google ScholarGoogle ScholarCross RefCross Ref
  66. Nitanda, A., & Suzuki, T. (2017). Stochastic difference of convex algorithm and its application to training deep boltzmann machines. In Artificial intelligence and statistics (pp. 470--478).Google ScholarGoogle Scholar
  67. Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.Google ScholarGoogle Scholar
  68. Rafique, H., Liu, M., Lin, Q., & Yang, T. (2018). Non-convex min-max optimization: Provable algorithms and applications in machine learning. CoRR, abs/1810.02060.Google ScholarGoogle Scholar
  69. Ravi, S. N., Dinh, T., Lokhande, V. S. R., & Singh, V. (2018). Constrained deep learning using conditional gradient and applications in computer vision. arXiv preprint arXiv:1803.06453.Google ScholarGoogle Scholar
  70. Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2019). Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence (Vol. 33, pp. 4780--4789).Google ScholarGoogle ScholarCross RefCross Ref
  71. Reddi, S. J., Zaheer, M., Sra, S., Poczos, B., Bach, F., Salakhutdinov, R., & Smola, A. J. (2017). A generic approach for escaping saddle points. arXiv preprint arXiv:1709.01434.Google ScholarGoogle Scholar
  72. Rigollet, P., & Tong, X. (2011, November). Neyman-pearson classification, convexity and stochastic constraints. J. Mach. Learn. Res., 12, 2831--2855.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Royer, C. W., & Wright, S. J. (2017). Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. CoRR, abs/1706.03131.Google ScholarGoogle Scholar
  74. Sra, S., Nowozin, S., & Wright, S. J. (2011). Optimization for machine learning. The MIT Press.Google ScholarGoogle Scholar
  75. Tan, M., & Le, Q. V. (2019). Efficient-net: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946.Google ScholarGoogle Scholar
  76. Thi, H. A. L., Le, H. M., Phan, D. N., & Tran, B. (2017). Stochastic dca for the large-sum of non-convex functions problem and its application to group variable selection in classification. In Proceedings of the 34th international conference on machine learning-volume 70 (pp. 3394--3403).Google ScholarGoogle Scholar
  77. Tian, Y., Pei, K., Jana, S., & Ray, B. (2018). Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th international conference on software engineering (pp. 303--314).Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Wen, F., Chu, L., Liu, P., & Qiu, R. C. (2018). A survey on nonconvex regularization-based sparse and low-rank recovery in signal processing, statistics, and machine learning. IEEE Access, 6, 69883--69906.Google ScholarGoogle ScholarCross RefCross Ref
  79. Woodworth, B., Gunasekar, S., Ohannessian, M. I., & Srebro, N. (2017). Learning non-discriminatory predictors. arXiv preprint arXiv:1702.06081.Google ScholarGoogle Scholar
  80. Woodworth, B. E., Gunasekar, S., Ohannessian, M. I., & Srebro, N. (2017). Learning non-discriminatory predictors. In Proceedings of the 30th conference on learning theory, COLT 2017, amsterdam, the netherlands, 7--10 july 2017 (pp. 1920--1953).Google ScholarGoogle Scholar
  81. Wu, Y., & Liu, Y. (2007). Robust truncated hinge loss support vector machines. Journal of the American Statistical Association, 102(479), 974--983.Google ScholarGoogle ScholarCross RefCross Ref
  82. Xu, P., Roosta-Khorasani, F., & Mahoney, M. W. (2017). Newton-type methods for non-convex optimization under inexact hessian information. CoRR, abs/1708.07164.Google ScholarGoogle Scholar
  83. Xu, Y., Jin, R., & Yang, T. (2019). Stochastic proximal gradient methods for non-smooth non-convex regularized problems. arXiv preprint arXiv:1902.07672.Google ScholarGoogle Scholar
  84. Xu, Y., Lin, Q., & Yang, T. (2017). Stochastic convex optimization: Faster local growth implies faster global convergence. In Proceedings of the 34th international conference on machine learning-volume 70 (pp. 3821--3830).Google ScholarGoogle Scholar
  85. Xu, Y., Qi, Q., Lin, Q., Jin, R., & Yang, T. (2019). Stochastic optimization for DC functions and non-smooth non-convex regularizers with non-asymptotic convergence. In Proceedings of the 36th international conference on machine learning, ICML 2019, 9--15 june 2019, long beach, california, USA (pp. 6942--6951).Google ScholarGoogle Scholar
  86. Xu, Y., Rong, J., & Yang, T. (2018). First-order stochastic algorithms for escaping from saddle points in almost linear time. In Advances in neural information processing systems (neurips) (pp. 5530--5540).Google ScholarGoogle Scholar
  87. Xu, Y., Zhu, S., Yang, S., Zhang, C., Jin, R., & Yang, T. (2019). Learning with non-convex truncated losses by SGD. In Proceedings of the thirty-fifth conference on uncertainty in artificial intelligence, UAI 2019, tel aviv, israel, july 22--25, 2019 (p. 244).Google ScholarGoogle Scholar
  88. Yan, Y., Yang, T., Li, Z., Lin, Q., & Yang, Y. (2018). A unified analysis of stochastic momentum methods for deep learning. In Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, july 13--19, 2018, stockholm, sweden. (pp. 2955--2961).Google ScholarGoogle ScholarCross RefCross Ref
  89. Yang, L. (2018). Proximal gradient method with extrapolation and line search for a class of nonconvex and nonsmooth problems. CoRR, abs/1711.06831.Google ScholarGoogle Scholar
  90. Yang, T., Lin, Q., & Zhang, L. (2017). A richer theory of convex constrained optimization with reduced projections and improved rates. In Proceedings of the 34th international conference on machine learning (icml) (p. -).Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Yang, T., Mahdavi, M., Jin, R., Zhang, L., & Zhou, Y. (2012). Multiple kernel learning from noisy labels by stochastic programming. In Proceedings of the international conference on machine learning (icml) (pp. 233--240).Google ScholarGoogle Scholar
  92. You, S., Ding, D., Canini, K. R., Pfeifer, J., & Gupta, M. R. (2017). Deep lattice networks and partial monotonic functions. In Advances in neural information processing systems 30 (nips) (pp. 2985--2993).Google ScholarGoogle Scholar
  93. Yu, Y., Zheng, X., Marchetti-Bowick, M., & Xing, E. P. (2015). Minimizing nonconvex non-separable functions. In The 17th international conference on artificial intelligence and statistics (AISTATS).Google ScholarGoogle Scholar
  94. Zafar, M. B., Valera, I., Gomez Rodriguez, M., & Gummadi, K. P. (2017). Fairness beyond disparate treatment and disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th international conference on world wide web (pp. 1171--1180).Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Zaheer, M., Reddi, S., Sachan, D., Kale, S., & Kumar, S. (2018). Adaptive methods for nonconvex optimization. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems 31 (pp. 9793--9803). Curran Associates, Inc. Retrieved from http://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization.pdfGoogle ScholarGoogle Scholar
  96. Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894 -- 942.Google ScholarGoogle ScholarCross RefCross Ref
  97. Zhang, C.-H., & Zhang, T. (2012, 11). A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science, 27(4), 576--593. Google ScholarGoogle ScholarCross RefCross Ref
  98. Zhang, S., & Xin, J. (2014). Minimization of transformed I_1 penalty: Theory, difference of convex function algorithm, and robust application in compressed sensing. CoRR, abs/1411.5735.Google ScholarGoogle Scholar
  99. Zhang, T. (2010, March). Analysis of multistage convex relaxation for sparse regularization. J. Mach. Learn. Res., 11, 1081--1107.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Zhong, W., & Kwok, J. T. (2014). Gradient descent with proximal average for nonconvex and composite regularization. In Proceedings of the twenty-eighth AAAI conference on artificial intelligence, july 27--31, 2014, québec city, québec, canada. (pp. 2206--2212).Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Zhou, D., Tang, Y., Yang, Z., Cao, Y., & Gu, Q. (2018). On the convergence of adaptive gradient methods for nonconvex optimization. arXiv preprint arXiv:1808.05671.Google ScholarGoogle Scholar
  102. Zhu, D., Li, Z., Wang, X., Gong, B., & Yang, T. (2019). A robust zero-sum game framework for pool-based active learning. In The 22nd international conference on artificial intelligence and statistics (pp. 517--526).Google ScholarGoogle Scholar
  103. Zou, D., Cao, Y., Zhou, D., & Gu, Q. (2018). Stochastic gradient descent optimizes over-parameterized deep relu networks. CoRR, abs/1811.08888.Google ScholarGoogle Scholar

Index Terms

  1. Advancing non-convex and constrained learning: challenges and opportunities
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image AI Matters
          AI Matters  Volume 5, Issue 3
          September 2019
          82 pages
          EISSN:2372-3483
          DOI:10.1145/3362077
          Issue’s Table of Contents

          Copyright © 2019 Copyright is held by the owner/author(s)

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 December 2019

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader