Abstract
This paper is concerned by solving supervised machine learning problem as an inverse problem. Recently, many works have focused on defining a relationship between supervised learning and the well-known inverse problems. However, this connection between the learning problem and the inverse one has been done in the particular case where the inverse problem is reformulated as a minimization problem with a quadratic cost functional (\(L^2\) cost functional). Although, it is well known that the cost functional can be \(L^1\), \(L^2\) or any positive function that measures the gap between the predicted data and the observed one. Indeed, the use of \(L^1\) loss function for supervised learning problem gives more consistent results (see Rosasco et al. in Neural Comput 16:1063–1076, 2004). This strengthens the idea of reformulating the inverse problem, associated to machine learning problem, into a minimization problem using \( L^{1}\) functional. However, the \(L^{1}\) loss function is non-differentiable, which precludes the use of standard optimization tools. To overcome this difficulty, we propose in this paper a new technique of approximation based on the reformulation of the associated inverse problem into a minimizing one of a slanting cost functional Chen et al. (MIS Q Manag Inf Syst 36:1165–1188, 2012), which is solved using Tikhonov regularization and Newton’s method. This approach leads to an efficient numerical algorithm allowing us to solve supervised learning problem in the most general framework. To confirm this, we present some numerical results showing the efficiency of the proposed approach. Furthermore, the numerical experiment validation is made through academic and real-life data. Thus, the comparison with existing methods and numerical stability of the algorithm is presented in order to show that our approach is better in terms of convergence speed and quality of predicted models.
















Similar content being viewed by others
References
Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A (2004) Are loss functions all the same? Neural Comput 16:1063–1076
Chen H, Chiang R, Storey V (2012) Business intelligence and analytics: from big data to big impact. MIS Q Manag Inf Syst 36:1165–1188
Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin, Heidelberg
Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition, vol 31. Applications of mathematics (New York). Springer, New York
Rozas J, C Sánchez-DelBarrio J, Messeguer X, Rozas R (2004) DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19:2496–2497
Rani P, Liu C, Sarkar N, Vanman E (2006) An empirical study of machine learning techniques for affect recognition in human robot interaction. Pattern Anal Appl 9:58–69
Kononenko I (2001) Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 23:89–109
Osuna E, Freund R, Girosit F (1997) Training support vector machines: an application to face detection. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 130–136
Farivar F, Ahmadabadi MN (2015) Continuous reinforcement learning to robust fault tolerant control for a class of unknown nonlinear systems. Appl Soft Comput 37:702–714
Peng H-W, Lee S-J, Lee C-H (2017) An oblique elliptical basis function network approach for supervised learning applications. Appl Soft Comput 60:552–563
Kumar YJ, Salim N, Raza B (2012) Cross-document structural relationship identification using supervised machine learning. Appl Soft Comput 12(10):3124–3131
Yang Y, Zhang H, Yuan D, Sun D, Li G, Ranjan R, Sun M (2019) Hierarchical extreme learning machine based image denoising network for visual internet of things. Appl Soft Comput 74:747–759
Maimon O, Rokach L (2005) Introduction to supervised methods. Data mining and knowledge discovery handbook. Springer, Boston, MA, pp 149–164
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New York
Hilas C, Mastorocostas P (2008) An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowl Based Syst 21:721–726
Cucker F, Smale S (2002) On the mathematical foundations of learning. Bull Am Math Soc 39:1–49
Slavakis K, Giannakis G, Mateos G (2014) Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge. IEEE Signal Process 31:18–31
Emrouznejad A (2016) Big data optimization: recent developments and challenges, vol 18. Springer, Berlin
Bertero M, De Mol C, Pike ER (1988) Linear inverse problems with discrete data. II. Stability and regularisation. Inverse Probl 4(3):573–594
Kirsch A (1996) An introduction to the mathematical theory of inverse problems. Springer, Berlin, Heidelberg
Kurkova V (2004) Supervised learning as an inverse problem. Technical report 960. Institute of Computer Science, Academy of Sciences of the Czech Republic
Mukherjee S, Niyogi P, Poggio T, Rifkin R (2006) Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv Comput Math 25:161–193
De Vito E, Rosasco L, Caponnetto A, Piana M, Verri A (2004) Some properties of regularized kernel methods. J Mach Learn Res 5:1363–1390
Tikhonov AN, Arsenin VY (1977) Solutions of ill-posed problems. W.H. Winston, New York
Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404
Hadamard J (1923) Lectures on Cauchy’s problem in linear partial differential equations. Yale University Press, New Haven
Schmidt M, Fung G, Rosales R (2007) Fast optimization methods for L1 regularization: a comparative study and two new approaches. In: European conference on machine learning. Springer, Berlin, Heidelberg, pp 286–297
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York
da Silva HP, Carreiras C, Lourenço A, das Neves RC, Ferreira R (2015) Off-the-person electrocardiography: performance assessment and clinical correlation. Health Technol 4:309–318
Gb M, Rg M (2001) The impact of the mit-bih arrhythmia database. IEEE Eng Med Biol 20:45–50
Kasai HS (2017) A Matlab library for stochastic optimization algorithms. JMLR J Mach Learn Res 19:7942–7946
Bottou L (2012) Stochastic gradient descent tricks. Springer, Berlin Heidelberg, pp 421–436
Mokhtari A, Eisen M, Ribeiro A (2018) IQN: an incremental quasi-Newton method with local superlinear convergence rate. SIAM J Optim 28(2):1670–1698
Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in neural information processing systems, pp 315–323
Acknowledgements
The authors are grateful to anonymous referees for their careful review of our manuscript and for their constructive comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lyaqini, S., Quafafou, M., Nachaoui, M. et al. Supervised learning as an inverse problem based on non-smooth loss function. Knowl Inf Syst 62, 3039–3058 (2020). https://doi.org/10.1007/s10115-020-01439-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-020-01439-2
Keywords
Profiles
- Mourad Nachaoui View author profile