Skip to main content
Log in

Supervised learning as an inverse problem based on non-smooth loss function

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

This paper is concerned by solving supervised machine learning problem as an inverse problem. Recently, many works have focused on defining a relationship between supervised learning and the well-known inverse problems. However, this connection between the learning problem and the inverse one has been done in the particular case where the inverse problem is reformulated as a minimization problem with a quadratic cost functional (\(L^2\) cost functional). Although, it is well known that the cost functional can be \(L^1\), \(L^2\) or any positive function that measures the gap between the predicted data and the observed one. Indeed, the use of \(L^1\) loss function for supervised learning problem gives more consistent results (see Rosasco et al. in Neural Comput 16:1063–1076, 2004). This strengthens the idea of reformulating the inverse problem, associated to machine learning problem, into a minimization problem using \( L^{1}\) functional. However, the \(L^{1}\) loss function is non-differentiable, which precludes the use of standard optimization tools. To overcome this difficulty, we propose in this paper a new technique of approximation based on the reformulation of the associated inverse problem into a minimizing one of a slanting cost functional Chen et al. (MIS Q Manag Inf Syst 36:1165–1188, 2012), which is solved using Tikhonov regularization and Newton’s method. This approach leads to an efficient numerical algorithm allowing us to solve supervised learning problem in the most general framework. To confirm this, we present some numerical results showing the efficiency of the proposed approach. Furthermore, the numerical experiment validation is made through academic and real-life data. Thus, the comparison with existing methods and numerical stability of the algorithm is presented in order to show that our approach is better in terms of convergence speed and quality of predicted models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A (2004) Are loss functions all the same? Neural Comput 16:1063–1076

    Article  Google Scholar 

  2. Chen H, Chiang R, Storey V (2012) Business intelligence and analytics: from big data to big impact. MIS Q Manag Inf Syst 36:1165–1188

    Article  Google Scholar 

  3. Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin, Heidelberg

    Book  Google Scholar 

  4. Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition, vol 31. Applications of mathematics (New York). Springer, New York

    Book  Google Scholar 

  5. Rozas J, C Sánchez-DelBarrio J, Messeguer X, Rozas R (2004) DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19:2496–2497

    Article  Google Scholar 

  6. Rani P, Liu C, Sarkar N, Vanman E (2006) An empirical study of machine learning techniques for affect recognition in human robot interaction. Pattern Anal Appl 9:58–69

    Article  Google Scholar 

  7. Kononenko I (2001) Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 23:89–109

    Article  Google Scholar 

  8. Osuna E, Freund R, Girosit F (1997) Training support vector machines: an application to face detection. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 130–136

  9. Farivar F, Ahmadabadi MN (2015) Continuous reinforcement learning to robust fault tolerant control for a class of unknown nonlinear systems. Appl Soft Comput 37:702–714

    Article  Google Scholar 

  10. Peng H-W, Lee S-J, Lee C-H (2017) An oblique elliptical basis function network approach for supervised learning applications. Appl Soft Comput 60:552–563

    Article  Google Scholar 

  11. Kumar YJ, Salim N, Raza B (2012) Cross-document structural relationship identification using supervised machine learning. Appl Soft Comput 12(10):3124–3131

    Article  Google Scholar 

  12. Yang Y, Zhang H, Yuan D, Sun D, Li G, Ranjan R, Sun M (2019) Hierarchical extreme learning machine based image denoising network for visual internet of things. Appl Soft Comput 74:747–759

    Article  Google Scholar 

  13. Maimon O, Rokach L (2005) Introduction to supervised methods. Data mining and knowledge discovery handbook. Springer, Boston, MA, pp 149–164

    Chapter  Google Scholar 

  14. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New York

    Book  Google Scholar 

  15. Hilas C, Mastorocostas P (2008) An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowl Based Syst 21:721–726

    Article  Google Scholar 

  16. Cucker F, Smale S (2002) On the mathematical foundations of learning. Bull Am Math Soc 39:1–49

    Article  MathSciNet  Google Scholar 

  17. Slavakis K, Giannakis G, Mateos G (2014) Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge. IEEE Signal Process 31:18–31

    Article  Google Scholar 

  18. Emrouznejad A (2016) Big data optimization: recent developments and challenges, vol 18. Springer, Berlin

    Book  Google Scholar 

  19. Bertero M, De Mol C, Pike ER (1988) Linear inverse problems with discrete data. II. Stability and regularisation. Inverse Probl 4(3):573–594

    Article  MathSciNet  Google Scholar 

  20. Kirsch A (1996) An introduction to the mathematical theory of inverse problems. Springer, Berlin, Heidelberg

    Book  Google Scholar 

  21. Kurkova V (2004) Supervised learning as an inverse problem. Technical report 960. Institute of Computer Science, Academy of Sciences of the Czech Republic

  22. Mukherjee S, Niyogi P, Poggio T, Rifkin R (2006) Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv Comput Math 25:161–193

    Article  MathSciNet  Google Scholar 

  23. De Vito E, Rosasco L, Caponnetto A, Piana M, Verri A (2004) Some properties of regularized kernel methods. J Mach Learn Res 5:1363–1390

    MathSciNet  MATH  Google Scholar 

  24. Tikhonov AN, Arsenin VY (1977) Solutions of ill-posed problems. W.H. Winston, New York

    MATH  Google Scholar 

  25. Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404

    Article  MathSciNet  Google Scholar 

  26. Hadamard J (1923) Lectures on Cauchy’s problem in linear partial differential equations. Yale University Press, New Haven

    MATH  Google Scholar 

  27. Schmidt M, Fung G, Rosales R (2007) Fast optimization methods for L1 regularization: a comparative study and two new approaches. In: European conference on machine learning. Springer, Berlin, Heidelberg, pp 286–297

  28. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York

    Book  Google Scholar 

  29. https://archive.ics.uci.edu/ml/datasets/airfoil+self-noise

  30. da Silva HP, Carreiras C, Lourenço A, das Neves RC, Ferreira R (2015) Off-the-person electrocardiography: performance assessment and clinical correlation. Health Technol 4:309–318

    Article  Google Scholar 

  31. Gb M, Rg M (2001) The impact of the mit-bih arrhythmia database. IEEE Eng Med Biol 20:45–50

    Article  Google Scholar 

  32. Kasai HS (2017) A Matlab library for stochastic optimization algorithms. JMLR J Mach Learn Res 19:7942–7946

    Google Scholar 

  33. Bottou L (2012) Stochastic gradient descent tricks. Springer, Berlin Heidelberg, pp 421–436

    Google Scholar 

  34. Mokhtari A, Eisen M, Ribeiro A (2018) IQN: an incremental quasi-Newton method with local superlinear convergence rate. SIAM J Optim 28(2):1670–1698

    Article  MathSciNet  Google Scholar 

  35. Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in neural information processing systems, pp 315–323

Download references

Acknowledgements

The authors are grateful to anonymous referees for their careful review of our manuscript and for their constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mourad Nachaoui.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lyaqini, S., Quafafou, M., Nachaoui, M. et al. Supervised learning as an inverse problem based on non-smooth loss function. Knowl Inf Syst 62, 3039–3058 (2020). https://doi.org/10.1007/s10115-020-01439-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-020-01439-2

Keywords

Navigation