Supervised learning as an inverse problem based on non-smooth loss function

Lyaqini, Soufiane; Quafafou, Mohamed; Nachaoui, Mourad; Chakib, Abdelkrim

doi:10.1007/s10115-020-01439-2

Supervised learning as an inverse problem based on non-smooth loss function

Regular Paper
Published: 20 February 2020

Volume 62, pages 3039–3058, (2020)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Soufiane Lyaqini¹,
Mohamed Quafafou²,
Mourad Nachaoui ORCID: orcid.org/0000-0002-4020-5401^1,3 &
…
Abdelkrim Chakib¹

577 Accesses
Explore all metrics

Abstract

This paper is concerned by solving supervised machine learning problem as an inverse problem. Recently, many works have focused on defining a relationship between supervised learning and the well-known inverse problems. However, this connection between the learning problem and the inverse one has been done in the particular case where the inverse problem is reformulated as a minimization problem with a quadratic cost functional ($L^2$ cost functional). Although, it is well known that the cost functional can be $L^1$, $L^2$ or any positive function that measures the gap between the predicted data and the observed one. Indeed, the use of $L^1$ loss function for supervised learning problem gives more consistent results (see Rosasco et al. in Neural Comput 16:1063–1076, 2004). This strengthens the idea of reformulating the inverse problem, associated to machine learning problem, into a minimization problem using $ L^{1}$ functional. However, the $L^{1}$ loss function is non-differentiable, which precludes the use of standard optimization tools. To overcome this difficulty, we propose in this paper a new technique of approximation based on the reformulation of the associated inverse problem into a minimizing one of a slanting cost functional Chen et al. (MIS Q Manag Inf Syst 36:1165–1188, 2012), which is solved using Tikhonov regularization and Newton’s method. This approach leads to an efficient numerical algorithm allowing us to solve supervised learning problem in the most general framework. To confirm this, we present some numerical results showing the efficiency of the proposed approach. Furthermore, the numerical experiment validation is made through academic and real-life data. Thus, the comparison with existing methods and numerical stability of the algorithm is presented in order to show that our approach is better in terms of convergence speed and quality of predicted models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learned Regularizers for Inverse Problems

Convex Regularization of Discrete-Valued Inverse Problems

References

Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A (2004) Are loss functions all the same? Neural Comput 16:1063–1076
Article Google Scholar
Chen H, Chiang R, Storey V (2012) Business intelligence and analytics: from big data to big impact. MIS Q Manag Inf Syst 36:1165–1188
Article Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin, Heidelberg
Book Google Scholar
Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition, vol 31. Applications of mathematics (New York). Springer, New York
Book Google Scholar
Rozas J, C Sánchez-DelBarrio J, Messeguer X, Rozas R (2004) DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19:2496–2497
Article Google Scholar
Rani P, Liu C, Sarkar N, Vanman E (2006) An empirical study of machine learning techniques for affect recognition in human robot interaction. Pattern Anal Appl 9:58–69
Article Google Scholar
Kononenko I (2001) Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 23:89–109
Article Google Scholar
Osuna E, Freund R, Girosit F (1997) Training support vector machines: an application to face detection. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 130–136
Farivar F, Ahmadabadi MN (2015) Continuous reinforcement learning to robust fault tolerant control for a class of unknown nonlinear systems. Appl Soft Comput 37:702–714
Article Google Scholar
Peng H-W, Lee S-J, Lee C-H (2017) An oblique elliptical basis function network approach for supervised learning applications. Appl Soft Comput 60:552–563
Article Google Scholar
Kumar YJ, Salim N, Raza B (2012) Cross-document structural relationship identification using supervised machine learning. Appl Soft Comput 12(10):3124–3131
Article Google Scholar
Yang Y, Zhang H, Yuan D, Sun D, Li G, Ranjan R, Sun M (2019) Hierarchical extreme learning machine based image denoising network for visual internet of things. Appl Soft Comput 74:747–759
Article Google Scholar
Maimon O, Rokach L (2005) Introduction to supervised methods. Data mining and knowledge discovery handbook. Springer, Boston, MA, pp 149–164
Chapter Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New York
Book Google Scholar
Hilas C, Mastorocostas P (2008) An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowl Based Syst 21:721–726
Article Google Scholar
Cucker F, Smale S (2002) On the mathematical foundations of learning. Bull Am Math Soc 39:1–49
Article MathSciNet Google Scholar
Slavakis K, Giannakis G, Mateos G (2014) Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge. IEEE Signal Process 31:18–31
Article Google Scholar
Emrouznejad A (2016) Big data optimization: recent developments and challenges, vol 18. Springer, Berlin
Book Google Scholar
Bertero M, De Mol C, Pike ER (1988) Linear inverse problems with discrete data. II. Stability and regularisation. Inverse Probl 4(3):573–594
Article MathSciNet Google Scholar
Kirsch A (1996) An introduction to the mathematical theory of inverse problems. Springer, Berlin, Heidelberg
Book Google Scholar
Kurkova V (2004) Supervised learning as an inverse problem. Technical report 960. Institute of Computer Science, Academy of Sciences of the Czech Republic
Mukherjee S, Niyogi P, Poggio T, Rifkin R (2006) Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv Comput Math 25:161–193
Article MathSciNet Google Scholar
De Vito E, Rosasco L, Caponnetto A, Piana M, Verri A (2004) Some properties of regularized kernel methods. J Mach Learn Res 5:1363–1390
MathSciNet MATH Google Scholar
Tikhonov AN, Arsenin VY (1977) Solutions of ill-posed problems. W.H. Winston, New York
MATH Google Scholar
Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404
Article MathSciNet Google Scholar
Hadamard J (1923) Lectures on Cauchy’s problem in linear partial differential equations. Yale University Press, New Haven
MATH Google Scholar
Schmidt M, Fung G, Rosales R (2007) Fast optimization methods for L1 regularization: a comparative study and two new approaches. In: European conference on machine learning. Springer, Berlin, Heidelberg, pp 286–297
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York
Book Google Scholar
https://archive.ics.uci.edu/ml/datasets/airfoil+self-noise
da Silva HP, Carreiras C, Lourenço A, das Neves RC, Ferreira R (2015) Off-the-person electrocardiography: performance assessment and clinical correlation. Health Technol 4:309–318
Article Google Scholar
Gb M, Rg M (2001) The impact of the mit-bih arrhythmia database. IEEE Eng Med Biol 20:45–50
Article Google Scholar
Kasai HS (2017) A Matlab library for stochastic optimization algorithms. JMLR J Mach Learn Res 19:7942–7946
Google Scholar
Bottou L (2012) Stochastic gradient descent tricks. Springer, Berlin Heidelberg, pp 421–436
Google Scholar
Mokhtari A, Eisen M, Ribeiro A (2018) IQN: an incremental quasi-Newton method with local superlinear convergence rate. SIAM J Optim 28(2):1670–1698
Article MathSciNet Google Scholar
Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in neural information processing systems, pp 315–323

Download references

Acknowledgements

The authors are grateful to anonymous referees for their careful review of our manuscript and for their constructive comments.

Author information

Authors and Affiliations

LMA FST Béni-Mellal, Université Sultan Moulay Slimane, Béni Mellal, Morocco
Soufiane Lyaqini, Mourad Nachaoui & Abdelkrim Chakib
LSIS – CNRS UMR 6168, Université Aix-Marseille, Marseille, France
Mohamed Quafafou
LMJL UMR 6629 CNRS, Université de Nantes, Nantes, France
Mourad Nachaoui

Authors

Soufiane Lyaqini
View author publications
You can also search for this author inPubMed Google Scholar
Mohamed Quafafou
View author publications
You can also search for this author inPubMed Google Scholar
Mourad Nachaoui
View author publications
You can also search for this author inPubMed Google Scholar
Abdelkrim Chakib
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mourad Nachaoui.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lyaqini, S., Quafafou, M., Nachaoui, M. et al. Supervised learning as an inverse problem based on non-smooth loss function. Knowl Inf Syst 62, 3039–3058 (2020). https://doi.org/10.1007/s10115-020-01439-2

Download citation

Received: 19 June 2019
Revised: 13 January 2020
Accepted: 20 January 2020
Published: 20 February 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s10115-020-01439-2

Keywords

Profiles

Mourad Nachaoui View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supervised learning as an inverse problem based on non-smooth loss function

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learned Regularizers for Inverse Problems

Learned Regularizers for Inverse Problems

Convex Regularization of Discrete-Valued Inverse Problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now