Abstract
The era of big data in healthcare is here, and this era will significantly improve medicine and especially oncology. However, traditional machine learning algorithms need to be promoted to solve such large-scale realworld problems due to a large amount of data that needs to be analyzed and the difficulty in solving problems with nonconvex nonlinear settings. We aim to minimize the composite of a smooth nonlinear function and a block-separable nonconvex function on a large number of block variables with inequality constraints. We propose a novel parallel first-order optimization method, called asynchronous block coordinate descent with time perturbation (ATP), which adopts a time perturbation technique that escapes from saddle points and sub-optimal local points. The details of the proposed method are presented with analyses of convergence and iteration complexity properties. Experiments conducted on real-world machine learning problems validate the efficacy of our proposed method. The experimental results demonstrate that time perturbation enables ATP to escape from saddle points and sub-optimal points, providing a promising way to handle nonconvex optimization problems with inequality constraints employing asynchronous block coordinate descent. The asynchronous parallel implementation on shared memory multi-core platforms indicates that the proposed algorithm, ATP, has strong scalability.
Similar content being viewed by others
References
Anandkumar A, Ge R, 2016. Efficient approaches for es-caping higher order saddle points in non-convex optimization. Proc 29th Annual Conf on Learning Theory, p.81–102.
Bertsekas DP, Tsitsiklis JN, 1989. Parallel and Distributed Computation: Numerical Methods. Prentice Hall, Englewood Cliffs, NJ, USA.
Bertsimas D, Bjarnadóttir MV, Kane MA, et al., 2008. Al-gorithmic prediction of health-care costs. Oper Res, 56(6):1382–1392. https://doi.org/10.1287/opre.1080.0619
Cannelli L, Facchinei F, Kungurtsev V, et al., 2018. Asyn-chronous parallel algorithms for nonconvex optimization. https://arxiv.org/abs/1607.04818
Dagum L, Menon R, 1998. OpenMP: an industry standard API for shared-memory programming. IEEE Comput Sci Eng, 5(1):46–55. https://doi.org/10.1109/99.660313
Fan JQ, Li RZ, 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc, 96(456):1348–1360. https://doi.org/10.1198/016214501753382273
Friedman J, Hastie T, Tibshirani R, 2001. The Elements of Statistical Learning. Springer, Berlin, Germany.
Ge R, Huang FR, Jin C, et al., 2015. Escaping from saddle points—online stochastic gradient for tensor decomposition. Proc 28th Conf on Learning Theory, p.797–842.
Guennebaud G, Jacob B, 2010. Eigen v3. http://eigen.tuxfamily.org
Hazan E, Levy KY, Shalev-Shwartz S, 2016. On graduated optimization for stochastic non-convex problems. Proc 33rd Int Conf on Machine Learning, p.1833–1841.
Hong MY, Luo ZQ, Razaviyayn M, 2016. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J Optim, 26(1):337–364. https://doi.org/10.1137/140990309
James GM, Paulson C, Rusmevichientong P, 2012. The Constrained Lasso. Working Paper, University of Southern California, Los Angeles, California, USA.
Jiang B, Lin TY, Ma SQ, et al., 2019. Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput Optim Appl, 72(1):115–157. https://doi.org/10.1007/s10589-018-0034-y
Jin C, Ge R, Netrapalli P, et al., 2017. How to escape saddle points efficiently. Proc 34th Int Conf on Machine Learning, p.1724–1732.
Khozin S, Blumenthal GM, Pazdur R, 2017. Real-world data for clinical evidence generation in oncology. J Nat Cancer Inst, 109(11), Article djx187. https://doi.org/10.1093/jnci/djx187
Li D, Lai Z, Ge K, et al., 2019. HPDL: towards a general framework for high-performance distributed deep learning. Proc 39th IEEE Int Conf on Distributed Computing Systems, p.1742–1753.
Li JQ, Vachani A, Epstein A, et al., 2018. A doubly robust approach for cost-effectiveness estimation from observa-tional data. Stat Methods Med Res, 27(10):3126–3138. https://doi.org/10.1177/0962280217693262
Liu R, 2019. Asynchronous block coordinate descent method for large-scale nonconvex problem in real world study. Proc IEEE 21st Int Conf on High Performance Computing and Communications, jointly with IEEE 17th Int Conf on Smart City and IEEE 5th Int Conf on Data Science and Systems, p.2033–2038. https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00281
Nesterov Y, 2012. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J Optim, 22(2):341–362. https://doi.org/10.1137/100802001
Nesterov Y, Nemirovskii A, 1994. Interior-Point Polynomial Algorithms in Convex Programming. SIAM, Philadelphia, USA. https://doi.org/10.1137/1.9781611970791
Palomar DP, Chiang M, 2006. A tutorial on decomposition methods for network utility maximization. IEEE J Sel Areas Commun, 24(8):1439–1451. https://doi.org/10.1109/JSAC.2006.879350
Peng ZM, Xu YY, Yan M, et al., 2019. On the convergence of asynchronous parallel iteration with unbounded delays. J Oper Res Soc China, 7(1):5–42. https://doi.org/10.1007/s40305-017-0183-1
Qiao LB, Zhang BF, Su JS, et al., 2016a. Linearized alter-nating direction method of multipliers for constrained nonconvex regularized optimization. Proc 8th Asian Conf on Machine Learning, p.97–109.
Qiao LB, Lin TY, Jiang YG, et al., 2016b. On stochastic primal-dual hybrid gradient approach for compositely regularized minimization. Proc 22nd European Conf on Artificial Intelligence, p.167–174. https://doi.org/10.3233/978-1-61499-672-9-167
Qiao LB, Zhang BF, Lu XC, et al., 2017. Adaptive lin-earized alternating direction method of multipliers for non-convex compositely regularized optimization problems. Tsinghua Sci Technol, 22(3):328–341. https://doi.org/10.23919/TST.2017.7914204
Qiao LB, Lin TY, Qin Q, et al., 2018. On the iteration com-plexity analysis of stochastic primaldual hybrid gradient approach with high probability. Neurocomputing, 307:78–90. https://doi.org/10.1016/j.neucom.2018.03.066
Razaviyayn M, Hong MY, Luo ZQ, et al., 2014. Parallel suc-cessive convex approximation for nonsmooth nonconvex optimization. Proc 27th Int Conf on Neural Information Processing Systems, p.1440–1448.
Rockafellar RT, Wets RJB, 2009. Variational Analysis. Springer, New York, USA.
Shen L, Liu W, Yuan G, et al., 2017. GSOS: Gauss-Seidel operator splitting algorithm for multi-term nonsmooth convex composite optimization. Proc 34th Int Conf on Machine Learning, p.3125–3134.
Shen L, Sun P, Wang YT, et al., 2018. An algorithmic framework of variable metric over-relaxed hybrid proximal extra-gradient method. https://arxiv.org/abs/1805.06137
Sun MZ, Jiang Y, Sun C, et al., 2019. The associations between smoking and obesity in northeast China: a quantile regression analysis. Sci Rep, 9, Article 3732. https://doi.org/10.1038/s41598-019-39425-6
Sun T, Hannah R, Yin W, et al., 2017. Asynchronous coor-dinate descent under more realistic assumptions. Proc Advances in Neural Information Processing Systems, p.6182–6190.
Wang X, Ma SQ, Goldfarb D, et al., 2017. Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J Optim, 27(2):927–956. https://doi.org/10.1137/15M1053141
Wang Y, Yin W, Zeng J, 2019. Global convergence of ADMM in nonconvex nonsmooth optimization. J Sci Comput, 78(1):29–63. https://doi.org/10.1007/s10915-018-0757-z
Xiao L, Johansson M, Boyd SP, 2004. Simultaneous routing and resource allocation via dual decomposition. IEEE Trans Commun, 52(7):1136–1144. https://doi.org/10.1109/TCOMM.2004.831346
Xu YY, 2019. Asynchronous parallel primal-dual block coordinate update methods for affinely constrained convex programs. https://arxiv.org/abs/1705.06391
Xu YY, Yin W, 2017. A globally convergent algorithm for nonconvex optimization based on block coordinate update. J Sci Comput, 72(2):700–734. https://doi.org/10.1007/s10915-017-0376-0
Zhang CH, 2010. Nearly unbiased variable selection under minimax concave penalty. Ann Stat, 38(2):894–942. https://doi.org/10.1214/09-AOS729
Zhang T, 2010. Analysis of multi-stage convex relaxation for sparse regularization. J Mach Learn Res, 11:1081–1107.
Zhang X, Liu J, Zhu ZY, 2018. Taming convergence for asynchronous stochastic gradient descent with unbounded delay in non-convex learning. https://arxiv.org/abs/1805.09470
Zou FY, Shen L, Jie ZQ, et al., 2018. A sufficient condition for convergences of Adam and RMSProp. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.11127–11135. https://arxiv.org/abs/1811.09358
Zou FY, Shen L, Jie ZQ, et al., 2019. Weighted AdaGrad with unified momentum. https://arxiv.org/abs/1808.03408
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Rui LIU, Wei-chu SUN, Tao HOU, Chun-hong HU, and Lin-bo QIAO declare that they have no conflict of interest.
Additional information
Project supported by the National Key R&D Program of China (No. 2018YFB2101100) and the National Natural Science Foundation of China (Nos. 61806216 and 61702533)
Rights and permissions
About this article
Cite this article
Liu, R., Sun, WC., Hou, T. et al. Block coordinate descentwith time perturbation for nonconvex nonsmooth problems in real-world studies. Frontiers Inf Technol Electronic Eng 20, 1390–1403 (2019). https://doi.org/10.1631/FITEE.1900341
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1900341
Key words
- Convergence analysis
- Asynchronous block coordinate descent method
- Time perturbation
- Nonconvex nonsmooth optimization
- Real-world study