Abstract
Proximal Algorithms are known to be very popular in the area of signal processing, image reconstruction, variational inequality and convex optimization due to their small iteration costs and applicability to the non-smooth optimization problems. Various real-world machine learning problems have been solved utilizing the non-smooth convex loss minimization framework, and a recent trend is to design new accelerated algorithms to solve such frameworks efficiently. In this paper, we propose a novel viscosity-based accelerated gradient algorithm (VAGA), that utilizes the concept of viscosity approximation method of fixed point theory for solving the learning problems. We discuss the boundedness of the sequence generated by this iterative algorithm and prove the strong convergence of the algorithm under the few specific conditions. To test the practical performance of the algorithm on real-world problems, we applied it to solve the regularized multitask regression problem with sparsity-inducing regularizers. We present the detailed comparative analysis of our algorithm with few traditional proximal algorithms on three real benchmark multitask regression datasets. We also apply the proposed algorithm to the task of joint splice-site recognition problem of bio-informatics. The improved results demonstrate the efficacy of our algorithm over state-of-the-art proximal gradient descent algorithms. To the best of our knowledge, it is the first time that a viscosity-based iterative algorithm is applied to solve the real world problem of regression and recognition.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ansari QH, Sahu DR (2016) Chapter 5 - extragradient methods for some nonlinear problems. In: Alfuraidan MR, Ansari QH (eds) Fixed point theory and graph theory. Academic Press, Oxford, pp 187–230. https://doi.org/10.1016/B978-0-12-804295-3.50005-X. ISBN: 9780128042953
Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems 19, MIT Press
Attouch H, Czarnecki MO, Peypouquet J (2011) Prox-penalization and splitting methods for constrained variational problems. SIAM J Optim 21(1):149–173. https://doi.org/10.1137/100789464
Attouch H, Peypouquet J, Redont P (2015) Fast convergence of an inertial gradient-like system with vanishing viscosity. arXiv:1507.04782
Bach F, Jenatton R, Mairal J, Obozinski G (2012) Optimization with sparsity-inducing penalties. Found Trends Mach Learn 4(1):1–106. https://doi.org/10.1561/2200000015
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202. https://doi.org/10.1137/080716542
BoŢ RI, Csetnek ER (2014) Forward-backward and tseng’s type penalty schemes for monotone inclusion problems. Set-Valued and Variational Analysis 22(2):313–331. https://doi.org/10.1007/s11228-014-0274-7
Chambolle A, Dossal C (2015) On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J Optim Theory Appl 166(3):968–982. https://doi.org/10.1007/s10957-015-0746-4
Chen X, Pan W, Kwok JT, Carbonell JG (2009) Accelerated gradient method for multi-task sparse learning problem. In: Proceedings of the 2009 Ninth IEEE international conference on data mining. IEEE Computer Society, Washington, ICDM ’09, pp 746–751 . https://doi.org/10.1109/ICDM.2009.128
Cho SY, Qin X, Wang L (2014) Strong convergence of a splitting algorithm for treating monotone operators. J fixed point theory appl 2014(1):1–15. https://doi.org/10.1186/1687-1812-2014-94
Combettes PL, Pesquet JC (2011) Proximal splitting methods in signal processing. Springer, New York, pp 185–212. https://doi.org/10.1007/978-1-4419-9569-8_10
Combettes PL, Wajs VR (2005) Signal recovery by proximal forward-backward splitting. Multiscale Model Simul 4(4):1168–1200. https://doi.org/10.1137/050626090
Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, Durbin R, Eyras E, Gilbert J, Hammond M, Huminiecki L, Kasprzyk A, Lehvaslaiho H, Lijnzaad P, Melsopp C, Mongin E, Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S, Slater G, Smith J, Spooner W, Stabenau A, Stalker J, Stupka E, Ureta-Vidal A, Vastrik I, Clamp M (2002) The ensembl genome database project. Nucleic Acids Res 30(1):38–41. https://doi.org/10.1093/nar/30.1.38. http://nar.oxfordjournals.org/content/30/1/38.abstract, http://nar.oxfordjournals.org/content/30/1/38.full.pdf+html
Zhou JJC, Ye J (2012) MALSAR: Multi-tAsk Learning via StructurAl Regularization. Arizona State University, http://www.MALSAR.org
Johnstone PR, Moulin P (2015) Convergence of an inertial proximal method for l1-regularized least-squares. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), , pp 3566–3570. https://doi.org/10.1109/ICASSP.2015.7178635
Kim B, Yu D, Won JH (2016) Comparative study of computational algorithms for the lasso with high-dimensional, highly correlated data. Appl Intell :1–20. https://doi.org/10.1007/s10489-016-0850-7
Lehdili N, Moudafi A (1996) Combining the proximal algorithm and tikhonov regularization. Optimization 37(3):239–252. https://doi.org/10.1080/02331939608844217
Liu J, Ye J (2010) Efficient l1/lq norm regularization. CoRR arXiv:1009.4766
Lorenz DA, Pock T (2015) An inertial forward-backward algorithm for monotone inclusions. J Math Imaging Vision 51(2):311–325. https://doi.org/10.1007/s10851-014-0523-2
Marino G, Xu HK (2006) A general iterative method for nonexpansive mappings in hilbert spaces. J Math Anal Appl 318(1):43–52. https://doi.org/10.1016/j.jmaa.2005.05.028. http://www.sciencedirect.com/science/article/pii/S0022247X05004713
Moudafi A (2000) Viscosity approximation methods for fixed-points problems. J Math Anal Appl 241(1):46–55. https://doi.org/10.1006/jmaa.1999.6615. http://www.sciencedirect.com/science/article/pii/S0022247X99966155
Nesterov Y (2013) Gradient methods for minimizing composite functions. Math Program 140(1):125–161. https://doi.org/10.1007/s10107-012-0629-5
Parikh N, Boyd S (2014) Proximal algorithms. Found Trends Optim 1(3):127–239. https://doi.org/10.1561/2400000003
Rosasco L, Villa S, Vũ BC (2016) Stochastic forward–backward splitting for monotone inclusions. J Optim Theory Appl 169(2):388–406. https://doi.org/10.1007/s10957-016-0893-2
Sahu DR, Ansari QH, Yao JC (2015) The prox-tikhonov-like forward-backward method and applications. Taiwan J Math 19(2):481–503. https://doi.org/10.11650/tjm.19.2015.4972
Shu X, Lu H (2014) Linear discriminant analysis with spectral regularization. Appl Intell 40(4):724–731. https://doi.org/10.1007/s10489-013-0485-x
Sonnenburg S, Rätsch G, Henschel S, Widmer C, Behr J, Zien A, Bona Fd, Binder A, Gehl C, Franc V (2010) The shogun machine learning toolbox. J Mach Learn Res 11:1799–1802. http://dl.acm.org/citation.cfm?id=1756006.1859911
Takahashi S, Takahashi W (2007) Viscosity approximation methods for equilibrium problems and fixed point problems in hilbert spaces. J Math Anal Appl 331(1):506–515. https://doi.org/10.1016/j.jmaa.2006.08.036. http://www.sciencedirect.com/science/article/pii/S0022247X06008894
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58 (1):267–288. http://www.jstor.org/stable/2346178
Tseng P (2000) A modified forward-backward splitting method for maximal monotone mappings. SIAM J Control Optim 38(2):431–446. https://doi.org/10.1137/S0363012998338806
Verma M, Shukla K (2017) A new accelerated proximal gradient technique for regularized multitask learning framework. Pattern Recogn Lett 95:98–103. https://doi.org/10.1016/j.patrec.2017.06.013. http://www.sciencedirect.com/science/article/pii/S0167865517302209
Villa S, Salzo S, Baldassarre L, Verri A (2013) Accelerated and inexact forward-backward algorithms. SIAM J Optim 23(3):1607–1633. https://doi.org/10.1137/110844805
Xu HK (2004) Viscosity approximation methods for nonexpansive mappings. J Math Anal Appl 298(1):279–291. https://doi.org/10.1016/j.jmaa.2004.04.059. http://www.sciencedirect.com/science/article/pii/S0022247X04004160
Xu HK (2006) A regularization method for the proximal point algorithm. J Glob Optim 36(1):115–125. https://doi.org/10.1007/s10898-006-9002-7
Yu YL (2013) Better approximation and faster algorithm using the proximal average. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems 26, Curran Associates, Inc. pp 458–466. http://papers.nips.cc/paper/4934-better-approximation-and-faster-algorithm-using-the-proximal-average.pdf
Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7:2541–2563. http://dl.acm.org/citation.cfm?id=1248547.1248637
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Verma, M., Sahu, D.R. & Shukla, K.K. VAGA: a novel viscosity-based accelerated gradient algorithm. Appl Intell 48, 2613–2627 (2018). https://doi.org/10.1007/s10489-017-1110-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-017-1110-1