A new perspective for Minimal Learning Machines: A lightweight approach

doi:10.1016/j.neucom.2020.03.088

Neurocomputing

Volume 401, 11 August 2020, Pages 308-319

https://doi.org/10.1016/j.neucom.2020.03.088 Get rights and content

Highlights

•
We propose a new procedure to train Minimal Learning Machines.
•
Our proposal do not rely on any assumption regarding the data.
•
Our proposal builds a regularized system that imposes sparseness.
•
We evaluated both proposals on well-known state-of-the-art datasets.

Abstract

This paper introduces a new procedure to train Minimal Learning Machines (MLM) for regression tasks. Besides that, we propose a new prediction process in MLM. A well-known drawback concerning the (original) MLM model formulation is the lack of sparseness.The most recent efforts on this problem strongly rely on the selection of reference points before training and prediction steps in MLM, all based on some supposition regarding the data. In the opposite direction, here, we explore another formulation of MLM in which we do not rely on any assumption regarding the data for prior selection. Instead, our proposal, named Lightweight Minimal Learning Machine (LW-MLM), builds a regularized system that imposes sparseness. We thrive in such a sparse criterion, not by selection but instead using a piece of weighted information into the model. We validate the contributions of this paper through four types of experiments to evaluate different aspects of our proposal: the prediction error performance, the goodness-of-fit of estimated vs. measured values, the norm values which are related to the sparsity, and finally, the prediction error in high dimensional settings. Based on the results, we show that LW-MLM is a valid alternative since achieved similar or higher accuracy rates against other variants being all seen as statistically equivalent.

Introduction

The Minimal Learning Machine (MLM, [4]) is a supervised method used for classification and regression tasks. The training step is based on a multi-response linear regression model that finds the mapping between distances computed from the input and output spaces. In such a model, we obtain the outputs from new inputs (not seen in the training step) by a linear system resolution with an optimization process in the output space. Also, in the original proposal, predicting new inputs becomes an onerous task since such a computation (might) involves all samples in the training set, namely, Reference points (RPs). However, the MLM exhibits an equivalent or even better generalization performance in comparison to other models [4].

A well-known drawback concerning the (original) MLM model formulation is the lack of sparseness. Sparseness is related to express the induced model in terms of a relatively small number of samples in both training and prediction procedures. As the induced model lacks sparseness, it becomes intractable to deal with large-size datasets since the computational complexity in the prediction scales according to the size of the training set. Another known drawback, this time related to the goodness-of-fit of estimated values, is the lack of a term besides the error during the learning. In this case, regularization in the model can discourage the learning of a more complex or flexible model, to avoid the risk of overfitting.

In the most recent and adopted formulation of MLM, named Random-MLM [4], a lower rank linear system is obtained by randomly selecting samples as a manner to also impose some sparsity in both training and prediction alongside regularization. However, in this approach, randomness can discard potentially useful data that could be important for the induction process and might end up impairing the MLM generalization performance. Thus, a proper manner of imposing some sparsity and regularization over the random selection is indeed desirable due to the reasons as the ones mentioned earlier.

The most recent efforts (Section 2) on this problem strongly rely on RPs selections before training and prediction steps in MLM. To do it so, such efforts rely on any supposition regarding the data. In the opposite direction, here, we intend to explore another formulation of MLM in which we do not rely on any supposition regarding the data for prior selection. Instead, we rely on the supposition that the whole known geometric structure of data is learned while it encourages us to discard some RPs only in the out-of-sample prediction procedure to provide a more compact and faster model.

From the sparsity point-of-view, such a process of selecting RPs and computing the mapping between distance matrices is treated as a sparse resolution problem of linear systems with multiple responses, which unfortunately is an NP-Hard problem. In this setting, we posed the MLM model in two mains constraints to solve this problem: (1) reconstruction constraint (error term), and (2) model complexity constraint (norm term). The reconstruction constraint requires that the recovered mapping should be consistent with input and output spaces. Additionally, the model complexity constraint assumes that one can represent such a mapping sparsely in an appropriately chosen overcomplete dictionary and that one can recover their sparse representations from it. For sparse learning strategies, we refer the reader to the following works [6], [7], [20] and for some applications related to image super-resolution [21], order preservation [14], and multitask learning in image emotion distribution prediction [23], all sparse representation-based.

Stated that, this work tackles the problem of imposing sparsity and regularization in MLM by proposing a novel formulation based on a pattern relevance. Our proposal, named “Lightweight” MLM (LW-MLM, Section 3) is mainly based on a ranking criterion that accounts for the matching between the input and output local distance measurements, indicating which points correspond to the most linear part of the target function. The main idea and principal contribution is to build a regularized system by pattern as a manner to impose sparseness not by selection but using weighted information into the model. Differently from other approaches such as the Regularized Least-Squares version of MLM (wMLM), ours do not work at the error term, but instead in the norm of the solution obtained by the distance regression.

We validate the contributions of this paper through four types of experiments to evaluate different aspects of LW-MLM, namely, the prediction error performance, the goodness-of-fit of estimated vs. measured values, the norm values which are related to the sparsity and, finally, the prediction error in high dimensional feature spaces (Section 4). Based on the results, we show that LW-MLM is a valid alternative since achieved similar or higher accuracy rates against other variants, all with low variance, being all seen as statistically equivalent (Section 5). Also, from empirical analysis, we support that, due to such a formulation in LW-MLM, some problems can take advantage of the regularization approach by pattern while other models do not perform precisely.

Section snippets

Minimal Learning Machine

The Minimal Learning Machine (MLM, [4]) is a supervised method used for pattern recognition and regression tasks. The training step is based on a multiresponse linear regression to find the mapping between distances computed from the input and output spaces. In this model, the outputs from new inputs (not seen in the training step) are calculated by a linear system resolution with an optimization process in the output spaces possibles.

Formulation

The new formulation has the following cost function during the training step: $\min_{B} J_{LW} (B, P) = {∥ D B - Δ ∥}_{F}^{2} + {∥ P B ∥}_{F}^{2}$ which yields the following solution (see Appendix A): ${\hat{B}}_{LW} = {(D^{⊺} D + P^{⊺} P)}^{- 1} D^{⊺} Δ$ where P is a regularization matrix based on the sample regularization factor. The role of P here is the main proposal of our work.

Although, it sounds inappropriate having a hyperparameter P as a matrix N × N, it is actually to fill the gap due to the multiresponse system in the MLM model. In fact, P can be derived by a

Experiments and discussion

This section presents the experimental framework followed in this paper, together with the results collected and discussions on them.

Conclusion

In this work, we proposed a novel formulation for the MLM model based on sample regularization factor, named Lightweight Minimal Learning Machine (LW-MLM). We mainly extend the definition of the cost function in the original version of MLM and, then, embedded a speeding up procedure into the out-of-sample prediction.

We validate LW-MLM through four types of experiments, evaluating different aspects: prediction error, he goodness-of-fit of estimated vs. measured values, the model complexity (via

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001.

José Alberth Vasconcelos Florêncio holds a bachelor’s degree in Industrial Mechatronics from the Federal Institute of Ceará 2017. He is currently pursuing his master’s in Computer Science at IFCE.

References (24)

B. Ni et al.
Order preserving sparse coding
IEEE Trans. Pattern Anal. Mach. Intell.
(2015)
A.S.C. Alencar et al.
MLM-rank: a ranking algorithm based on the minimal learning machine
Proceedings of the Forth 2015 Brazilian Conference on Intelligent Systems (BRACIS 2015)
(2015)
W.L. Caldas et al.
Fast co-MLM: an efficient semi-supervised co-training method based on the minimal learning machine
New Gener. Comput.
(2018)
A.H. de Souza Júnior
Regional models and minimal learning machines for nonlinear dynamic system identification
(October 2014)
A.H. de Souza Junior et al.
SEP 21 2015. Minimal Learning Machine: a novel supervised distance-based approach for regression and classification. NEUROCOMPUTING 164, 34–44
Proceedings of the Twelfth International Work-Conference on Artificial Neural Networks (IWANN), Puerto de la Cruz, Spain
(2013)
J. Demšar
Statistical comparisons of classifiers over multiple data sets
J. Mach. Learn. Res.
(2006)
D. Donoho et al.
Optimally sparse representation in general (nonorthogonal) dictionaries via l(1) minimization
Proc. Natl. Acad. Sci. USA
(MAR 4 2003)
J. Fuchs
On sparse representations in arbitrary redundant bases
IEEE Trans. Inf. Theory
(2003)
J.P.P. Gomes et al.
2017. A Robust Minimal Learning Machine based on the M-estimator
Proceedings of the Twenty-fifth European Symposium on Artificial Neural Networks, ESANN 2017, Bruges, Belgium
(2017)
J.P.P. Gomes et al.
A cost sensitive minimal learning machine for pattern classification

T. Karkkainen

Extreme minimal learning machine: ridge regression with distance-based basis

Neurocomputing

(2019)

M. Lichman, UCI machine learning repository,...

Cited by (3)

Homogeneity hypothesis in discriminant analysis
2021, CEUR Workshop Proceedings
Do Randomized Algorithms Improve the Efficiency of Minimal Learning Machine?
2020, Machine Learning and Knowledge Extraction
Minimal learning machine: Theoretical results and clustering-based reference point selection
2020, Journal of Machine Learning Research

Saulo Anderson Freitas de Oliveira holds a bachelor’s (2013) degree in Computer Science and master’s (2016) degree in Telecommunications Engineering from the Federal Institute of Ceará (IFCE), Brazil. He is currently pursuing his doctorate in Computer Science at Universidade Federal do Ceará (UFC), Fortaleza, CE, Brazil.

João Paulo Pordeus Gomes holds a bachelor’s degree in Electrical Engineering from Universidade Federal do Ceará (UFC, 2004), Brazil, master’s (2006) degree in aeronautical Engineering and doctorate’s (2011) degree in electronic engineering from Instituto Tecnológico de Aerontica (ITA), São José dos Campos, SP, Brazil. Dr. Gomes worked for EMBRAER S.A. between 2006 and 2013, as a Technology Development Engineer focusing on fault monitoring applications on aeronautical systems. He is currently an Assistant Professor at UFC.

Ajalmar R. Rocha Neto has been a Professor at Federal Institute of Ceará (Instituto Federal do Ceará, IFCE) since 2006. He has obtained his Bachelors degree in Computer Science and achieved his M.S. and Ph.D. degrees in Teleinformatics Engineering at Federal University of Ceará (Universidade Federal do Cear UFC) in 2006 and 2011, respectively. He received the Best Paper of Young Research in International Work-Conference on Artificial Neural Network in 2011. He has been supervising postgraduate students in Computer Science and Telecommunications Engineering at IFCE. In terms of research, he has been working and publishing papers in artificial and computational intelligence, machine learning and computer vision.

View full text

A new perspective for Minimal Learning Machines: A lightweight approach

Highlights

Abstract

Introduction

Section snippets

Minimal Learning Machine

Formulation

Experiments and discussion

Conclusion

Declaration of Competing Interest

Acknowledgments

IEEE Trans. Pattern Anal. Mach. Intell.

MLM-rank: a ranking algorithm based on the minimal learning machine

Proceedings of the Forth 2015 Brazilian Conference on Intelligent Systems (BRACIS 2015)

Fast co-MLM: an efficient semi-supervised co-training method based on the minimal learning machine

New Gener. Comput.

Regional models and minimal learning machines for nonlinear dynamic system identification

SEP 21 2015. Minimal Learning Machine: a novel supervised distance-based approach for regression and classification. NEUROCOMPUTING 164, 34–44

Proceedings of the Twelfth International Work-Conference on Artificial Neural Networks (IWANN), Puerto de la Cruz, Spain

Statistical comparisons of classifiers over multiple data sets

J. Mach. Learn. Res.

Optimally sparse representation in general (nonorthogonal) dictionaries via l(1) minimization

Proc. Natl. Acad. Sci. USA

On sparse representations in arbitrary redundant bases

IEEE Trans. Inf. Theory

2017. A Robust Minimal Learning Machine based on the M-estimator

Proceedings of the Twenty-fifth European Symposium on Artificial Neural Networks, ESANN 2017, Bruges, Belgium

A cost sensitive minimal learning machine for pattern classification

Extreme minimal learning machine: ridge regression with distance-based basis

Neurocomputing