Local learning regularization networks for localized regression

Kokkinos, Yiannis; Margaritis, Konstantinos G.

doi:10.1007/s00521-016-2569-0

Local learning regularization networks for localized regression

Engineering Applications of Neural Networks
Published: 01 September 2016

Volume 28, pages 1309–1328, (2017)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Yiannis Kokkinos¹ &
Konstantinos G. Margaritis¹

215 Accesses
2 Citations
Explore all metrics

Abstract

Local learning algorithms use a neighborhood of training data close to a given testing query point in order to learn the local parameters and create on-the-fly a local model specifically designed for this query point. The local approach delivers breakthrough performance in many application domains. This paper considers local learning versions of regularization networks (RN) and investigates several options for improving their online prediction performance, both in accuracy and speed. First, we exploit the interplay between locally optimized and globally optimized hyper-parameters (regularization parameter and kernel width) each new predictor needs to optimize online. There is a substantial reduction of the operation cost in the case we use two globally optimized hyper-parameters that are common to all local models. We also demonstrate that this global optimization of the two hyper-parameters produces more accurate models than the other cases that locally optimize online either the regularization parameter, or the kernel width, or both. Then by comparing Eigenvalue decomposition (EVD) with Cholesky decomposition specifically for the local learning training and testing phases, we also reveal that the Cholesky-based implementations are faster that their EVD counterparts for all the training cases. While EVD is suitable for validating cost-effectively several regularization parameters, Cholesky should be preferred when validating several neighborhood sizes (the number of k-nearest neighbors) as well as when the local network operates online. Then, we exploit parallelism in a multi-core system for these local computations demonstrating that the execution times are further reduced. Finally, although the use of pre-computed stored local models instead of the online learning local models is even faster, this option deteriorates the performance. Apparently, there is a substantial gain in waiting for a testing point to arrive before building a local model, and hence the online local learning RNs are more accurate than their pre-computed stored local models. To support all these findings, we also present extensive experimental results and comparisons on several benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multithreaded Local Learning Regularization Neural Networks for Regression Tasks

DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning

AutoNLP for Optimal Number of Epochs in Multi-labeled Deep-Learning Models for Predicting Mental Disorders

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
MATH Google Scholar
Poggio T, Girosi F (1990) Regularization algorithms for learning that are equivalent to multilayer networks. Science 247:978–982
Article MATH MathSciNet Google Scholar
Girosi F, Jones M, Poggio T (1995) Regularization theory and neural networks architectures. Neural Comput 7:219–269
Article Google Scholar
Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13:1–50
Article MATH MathSciNet Google Scholar
Kashima H, Ide T, Kato T, Sugiyama M (2009) Recent Advances and trends in large-scale kernel methods. IEICE Trans Inf Syst E92-D(7):1338–1353
Article Google Scholar
Bottou L, Vapnik V (1992) Local learning algorithms. Neural Comput 4(6):888–900
Article Google Scholar
Vapnik V, Bottou L (1993) Local algorithms for pattern recognition and dependencies estimation. Neural Comput 5(6):893–909
Article Google Scholar
Robins A, Frean M (1998) Local learning algorithms for sequential tasks in neural networks. J Adv Comput Intell Intell Inf 2(6):221–227
Article Google Scholar
Vijayakumar S, Schaal S (1998) Local adaptive subspace regression. Neural Process Lett 7(3):139–149
Article Google Scholar
Vijayakumar S, Schaal S (2000) Locally weighted projection regression: an O(n) algorithm for incremental real time learning in high dimensional space. In: ACM proceedings of the 17th international conference on machine learning (ICML2000), pp 1079–1086
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: Thrun S, Saul LK, Schölkopf B (eds) Advances in neural information processing systems 16. MIT Press, pp 321–328
Wu M, Schölkopf B (2007) Transductive classification via local learning regularization. In: Proceedings of the eleventh international conference on artificial intelligence and statistics
Wu M, Yu K, Yu S, Schölkopf B (2007) Local learning projections. In: ACM Proceedings of the 24th international conference on machine learning (ICML2007), pp 1039–1046
Zhang H, Berg AC, Maire M, Malik J (2006) SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 2, pp 2126–2136
Blanzieri E, Melgani F (2006) An adaptive SVM nearest neighbor classifier for remotely sensed imagery. In: Proceedings of the IEEE international conference on geoscience and remote sensing symposium (IGARSS 06), pp 3931–3934
Blanzieri E, Melgani F (2008) Nearest neighbor classification of remote sensing images with the maximal margin principle. IEEE Trans Geosci Remote Sens 46(6):1804–1811
Article Google Scholar
Segata N, Blanzieri E, Delany SJ, Cunningham P (2009) Noise reduction for instance-based learning with a local maximal margin approach. J Intell Inf Syst 35(2):301–331
Article Google Scholar
Segata N, Blanzieri E (2010) Fast and scalable local kernel machines. J Mach Learn Res 11:1883–1926
MATH MathSciNet Google Scholar
Yang T, Kecman V (2010) Face recognition with adaptive local hyperplane algorithm. Pattern Anal Appl 13(1):79–83
Article MathSciNet Google Scholar
Cheng H, Tan P-N, Jin R (2010) Efficient algorithm for localized support vector machine. IEEE Trans Knowl Data Eng 22(4):537–549
Article Google Scholar
Zakai A, Ritov Y (2009) Consistency and localizability. J Mach Learn Res 10:827–856
MATH MathSciNet Google Scholar
Hable R (2013) Universal consistency of localized versions of regularized kernel methods. J Mach Learn Res 14:153–186
MATH MathSciNet Google Scholar
Kokkinos Y, Margaritis KG (2013) Parallel and local learning for fast probabilistic neural networks in scalable data mining. In: ACM proceedings of the 6th Balkan conference in informatics (BCI 2013), pp 47–52
Yu A, Grauman K (2014) Predicting useful neighborhoods for lazy local learning. In: Neural information processing systems (NIPS 2014), pp 1916–1924
Zhang J, Feng L, Wu B (2016) Local extreme learning machine: local classification model for shape feature extraction. Neural Comput Appl. doi:10.1007/s00521-015-2008-7
Google Scholar
Kokkinos Y, Margaritis KG (2015) Multithreaded local learning regularization neural networks for regression tasks. In: Proceedings of the 16th international conference on engineering applications of neural networks (EANN 2015), pp 129–138
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Google Scholar
Rifkin R, Yeo G, Poggio T (2003) Regularized least-squares classification. Nato Sci Ser Sub Ser III Comput Syst Sci 190:131–154
Google Scholar
Cawley GC, Talbot NLC (2007) Preventing Over-fitting during model selection via Bayesian regularisation of the hyper-parameters. J Mach Learn Res 8:841–861
MATH Google Scholar
Rifkin RM, Lippert RA (2007) Notes on regularized least squares. Technical report, MIT Computer Science and Artificial Intelligence Laboratory
Golub GH, van Loan CF (1996) Matrix computations, 3rd edn. John Hopkins University Press, Baltimore
MATH Google Scholar
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2002) Numerical recipes in C++: the art of scientific computing, 2nd edn. Cambridge University Press, Cambridge
MATH Google Scholar
Buyya R (1999) High performance cluster computing: programming and applications, 2. Prentice Hall, Upper Saddle River
Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064
Article Google Scholar

Download references

Acknowledgments

We gratefully acknowledge the useful comments and suggestions of the anonymous reviewers that help on improving the presentation and clarity of this paper.

Author information

Authors and Affiliations

Parallel and Distributed Processing Laboratory, Department of Applied Informatics, University of Macedonia, 156 Egnatia Str., P.O. Box 1591, 54006, Thessaloniki, Greece
Yiannis Kokkinos & Konstantinos G. Margaritis

Authors

Yiannis Kokkinos
View author publications
Search author on:PubMed Google Scholar
Konstantinos G. Margaritis
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Konstantinos G. Margaritis.

Appendix

For all the algorithms, we use a lot of caching to speed up the process. The training phase of each case must also find the best number of neighbors, denoted as k _best. We search for the best k number in the grid {δL, 2δL,…, L _max} where L _max is the maximum candidate number of neighbors. A local distance matrix maintains the cached distances between the neighbor points. Based on this matrix, a cached local kernel matrix is created once for every candidate σ _m value. Thus, for each L _max, σ _m and λ _l value only one Cholesky is computed for the kernel matrix. Then, progressively the Cholesky back substitution solves for the local weights of all the k candidate values. All four local RN cases use the minimum global training errors to find the best global parameters.

In the training phase, there are 3 loops one inside another. One loop iterates through the candidate width values σ _m, another loop iterates through the candidate k-neighbor values and another iterates through the candidate regularization values λ _l. The ordering of the loops is important. For the best and fastest ordering in the EVD implementations the loop that tests the widths σ _m must be first, followed by a second loop that iterates through the candidate k-neighbor values which inside has the last loop that validates the candidate regularization values λ _l. In the Cholesky implementations, the fastest ordering of computations again has first the loop for the widths σ _m, but now the second loop must iterate through the candidate regularization values λ _l, and the loop for the candidate k-neighbor values must be the third.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kokkinos, Y., Margaritis, K.G. Local learning regularization networks for localized regression. Neural Comput & Applic 28, 1309–1328 (2017). https://doi.org/10.1007/s00521-016-2569-0

Download citation

Received: 14 January 2016
Accepted: 17 August 2016
Published: 01 September 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s00521-016-2569-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Local learning regularization networks for localized regression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multithreaded Local Learning Regularization Neural Networks for Regression Tasks

DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning

AutoNLP for Optimal Number of Epochs in Multi-labeled Deep-Learning Models for Predicting Mental Disorders

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now