Skip to main content
Log in

Local learning regularization networks for localized regression

  • Engineering Applications of Neural Networks
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Local learning algorithms use a neighborhood of training data close to a given testing query point in order to learn the local parameters and create on-the-fly a local model specifically designed for this query point. The local approach delivers breakthrough performance in many application domains. This paper considers local learning versions of regularization networks (RN) and investigates several options for improving their online prediction performance, both in accuracy and speed. First, we exploit the interplay between locally optimized and globally optimized hyper-parameters (regularization parameter and kernel width) each new predictor needs to optimize online. There is a substantial reduction of the operation cost in the case we use two globally optimized hyper-parameters that are common to all local models. We also demonstrate that this global optimization of the two hyper-parameters produces more accurate models than the other cases that locally optimize online either the regularization parameter, or the kernel width, or both. Then by comparing Eigenvalue decomposition (EVD) with Cholesky decomposition specifically for the local learning training and testing phases, we also reveal that the Cholesky-based implementations are faster that their EVD counterparts for all the training cases. While EVD is suitable for validating cost-effectively several regularization parameters, Cholesky should be preferred when validating several neighborhood sizes (the number of k-nearest neighbors) as well as when the local network operates online. Then, we exploit parallelism in a multi-core system for these local computations demonstrating that the execution times are further reduced. Finally, although the use of pre-computed stored local models instead of the online learning local models is even faster, this option deteriorates the performance. Apparently, there is a substantial gain in waiting for a testing point to arrive before building a local model, and hence the online local learning RNs are more accurate than their pre-computed stored local models. To support all these findings, we also present extensive experimental results and comparisons on several benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford

    MATH  Google Scholar 

  2. Poggio T, Girosi F (1990) Regularization algorithms for learning that are equivalent to multilayer networks. Science 247:978–982

    Article  MATH  MathSciNet  Google Scholar 

  3. Girosi F, Jones M, Poggio T (1995) Regularization theory and neural networks architectures. Neural Comput 7:219–269

    Article  Google Scholar 

  4. Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13:1–50

    Article  MATH  MathSciNet  Google Scholar 

  5. Kashima H, Ide T, Kato T, Sugiyama M (2009) Recent Advances and trends in large-scale kernel methods. IEICE Trans Inf Syst E92-D(7):1338–1353

    Article  Google Scholar 

  6. Bottou L, Vapnik V (1992) Local learning algorithms. Neural Comput 4(6):888–900

    Article  Google Scholar 

  7. Vapnik V, Bottou L (1993) Local algorithms for pattern recognition and dependencies estimation. Neural Comput 5(6):893–909

    Article  Google Scholar 

  8. Robins A, Frean M (1998) Local learning algorithms for sequential tasks in neural networks. J Adv Comput Intell Intell Inf 2(6):221–227

    Article  Google Scholar 

  9. Vijayakumar S, Schaal S (1998) Local adaptive subspace regression. Neural Process Lett 7(3):139–149

    Article  Google Scholar 

  10. Vijayakumar S, Schaal S (2000) Locally weighted projection regression: an O(n) algorithm for incremental real time learning in high dimensional space. In: ACM proceedings of the 17th international conference on machine learning (ICML2000), pp 1079–1086

  11. Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: Thrun S, Saul LK, Schölkopf B (eds) Advances in neural information processing systems 16. MIT Press, pp 321–328

  12. Wu M, Schölkopf B (2007) Transductive classification via local learning regularization. In: Proceedings of the eleventh international conference on artificial intelligence and statistics

  13. Wu M, Yu K, Yu S, Schölkopf B (2007) Local learning projections. In: ACM Proceedings of the 24th international conference on machine learning (ICML2007), pp 1039–1046

  14. Zhang H, Berg AC, Maire M, Malik J (2006) SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 2, pp 2126–2136

  15. Blanzieri E, Melgani F (2006) An adaptive SVM nearest neighbor classifier for remotely sensed imagery. In: Proceedings of the IEEE international conference on geoscience and remote sensing symposium (IGARSS 06), pp 3931–3934

  16. Blanzieri E, Melgani F (2008) Nearest neighbor classification of remote sensing images with the maximal margin principle. IEEE Trans Geosci Remote Sens 46(6):1804–1811

    Article  Google Scholar 

  17. Segata N, Blanzieri E, Delany SJ, Cunningham P (2009) Noise reduction for instance-based learning with a local maximal margin approach. J Intell Inf Syst 35(2):301–331

    Article  Google Scholar 

  18. Segata N, Blanzieri E (2010) Fast and scalable local kernel machines. J Mach Learn Res 11:1883–1926

    MATH  MathSciNet  Google Scholar 

  19. Yang T, Kecman V (2010) Face recognition with adaptive local hyperplane algorithm. Pattern Anal Appl 13(1):79–83

    Article  MathSciNet  Google Scholar 

  20. Cheng H, Tan P-N, Jin R (2010) Efficient algorithm for localized support vector machine. IEEE Trans Knowl Data Eng 22(4):537–549

    Article  Google Scholar 

  21. Zakai A, Ritov Y (2009) Consistency and localizability. J Mach Learn Res 10:827–856

    MATH  MathSciNet  Google Scholar 

  22. Hable R (2013) Universal consistency of localized versions of regularized kernel methods. J Mach Learn Res 14:153–186

    MATH  MathSciNet  Google Scholar 

  23. Kokkinos Y, Margaritis KG (2013) Parallel and local learning for fast probabilistic neural networks in scalable data mining. In: ACM proceedings of the 6th Balkan conference in informatics (BCI 2013), pp 47–52

  24. Yu A, Grauman K (2014) Predicting useful neighborhoods for lazy local learning. In: Neural information processing systems (NIPS 2014), pp 1916–1924

  25. Zhang J, Feng L, Wu B (2016) Local extreme learning machine: local classification model for shape feature extraction. Neural Comput Appl. doi:10.1007/s00521-015-2008-7

    Google Scholar 

  26. Kokkinos Y, Margaritis KG (2015) Multithreaded local learning regularization neural networks for regression tasks. In: Proceedings of the 16th international conference on engineering applications of neural networks (EANN 2015), pp 129–138

  27. Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge

    Google Scholar 

  28. Rifkin R, Yeo G, Poggio T (2003) Regularized least-squares classification. Nato Sci Ser Sub Ser III Comput Syst Sci 190:131–154

    Google Scholar 

  29. Cawley GC, Talbot NLC (2007) Preventing Over-fitting during model selection via Bayesian regularisation of the hyper-parameters. J Mach Learn Res 8:841–861

    MATH  Google Scholar 

  30. Rifkin RM, Lippert RA (2007) Notes on regularized least squares. Technical report, MIT Computer Science and Artificial Intelligence Laboratory

  31. Golub GH, van Loan CF (1996) Matrix computations, 3rd edn. John Hopkins University Press, Baltimore

    MATH  Google Scholar 

  32. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2002) Numerical recipes in C++: the art of scientific computing, 2nd edn. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  33. Buyya R (1999) High performance cluster computing: programming and applications, 2. Prentice Hall, Upper Saddle River

    Google Scholar 

  34. García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064

    Article  Google Scholar 

Download references

Acknowledgments

We gratefully acknowledge the useful comments and suggestions of the anonymous reviewers that help on improving the presentation and clarity of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konstantinos G. Margaritis.

Appendix

Appendix

For all the algorithms, we use a lot of caching to speed up the process. The training phase of each case must also find the best number of neighbors, denoted as k best. We search for the best k number in the grid {δL, 2δL,…, L max} where L max is the maximum candidate number of neighbors. A local distance matrix maintains the cached distances between the neighbor points. Based on this matrix, a cached local kernel matrix is created once for every candidate σ m value. Thus, for each L max, σ m and λ l value only one Cholesky is computed for the kernel matrix. Then, progressively the Cholesky back substitution solves for the local weights of all the k candidate values. All four local RN cases use the minimum global training errors to find the best global parameters.

In the training phase, there are 3 loops one inside another. One loop iterates through the candidate width values σ m , another loop iterates through the candidate k-neighbor values and another iterates through the candidate regularization values λ l . The ordering of the loops is important. For the best and fastest ordering in the EVD implementations the loop that tests the widths σ m must be first, followed by a second loop that iterates through the candidate k-neighbor values which inside has the last loop that validates the candidate regularization values λ l . In the Cholesky implementations, the fastest ordering of computations again has first the loop for the widths σ m , but now the second loop must iterate through the candidate regularization values λ l , and the loop for the candidate k-neighbor values must be the third.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kokkinos, Y., Margaritis, K.G. Local learning regularization networks for localized regression. Neural Comput & Applic 28, 1309–1328 (2017). https://doi.org/10.1007/s00521-016-2569-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-016-2569-0

Keywords

Navigation