Abstract
Function approximation may be described as the task of modeling the input-output relation and therefore yielding an estimation of the real output function value. In many domains, an ideal learning algorithm needs to approximate nonlinear time-varying functions from a high-dimensional input space and avoid problems from irrelevant or redundant input data. Therefore, the method has to meet three requirements, namely, it must: allow incremental learning to deal with changing functions and changing input distributions; keep the computational cost low; and achieve accurate estimations. In this paper, we explore different approaches to perform function approximation based on the Local Adaptive Receptive Fields Self-Organizing Map (LARFSOM). Local models are built by calculating between the output associated with the winning node and the difference vector between the input vector and the weight vector. These models are combined by using a weighted sum to yield the final approximate value. The topology is adapted in a self-organizing way, and the weight vectors are adjusted in a modified unsupervised learning algorithm for supervised problems. Experiments were carried out on synthetic and real-world datasets. Experimental results indicate that the proposed approaches perform competitively against Support Vector Regression (SVR) and can improve function approximation accuracy and computational cost against the locally weighted interpolation (LWI), a state-of-the-art interpolating algorithm for self-organizing maps.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In this paper, incremental learning means that weights of a learning system are updating without degrading previous knowledge and three matters are considered: (1) a limited memory scenario such that after a new sample is learned it is discarded and cannot be reused. The approach of stored and reused previous training data can be very inefficient in a real-time environment [34]; (2) input and output distributions are unknown; (3) these distributions can be time-varying [39].
The calculations of the required derivatives are described in “Appendix A”.
The calculations of the required derivatives are described in “Appendix B”.
The calculations of the required derivatives are described in “Appendix C”.
The random rotation matrix R is calculated by the command qr(randn(10)) present in Matlab, Octave, and Julia.
The datasets can be downloaded from http://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html. This site also gives more details about each dataset.
The LWI takes 150 times and 5 times the time spend by LWxD to train and execute, respectively.
References
Al-Musaylh MS, Deo RC, Adamowski JF, Li Y (2018) Short-term electricity demand forecasting with mars, svr and arima models using aggregated demand data in Queensland, Australia. Adv Eng Inform 35:1–16. https://doi.org/10.1016/j.aei.2017.11.002. http://www.sciencedirect.com/science/article/pii/S1474034617301477
Araújo AFR, Costa DC (2009) Local adaptive receptive field self-organizing map for image color segmentation. Image Vis Comput 27(9):1229–1239
Araújo AFR, Rêgo R (2013) Self-organizing maps with a time-varying structure. ACM Comput Surv (CSUR) 46(1):7:1–7:38
Aupetit M (2006) Learning topology with the generative Gaussian graph and the em algorithm. In: Advances in neural information processing systems, pp 83–90
Aupetit M, Couturier P, Massotte P (2000) Function approximation with continuous self-organizing maps using neighboring influence interpolation. In: Proc. neural computation (NC’2000), pp 23–26
Aupetit M, Couturier P, Massotte P (2001) Induced voronoï kernels for principal manifolds approximation. In: Advances in self-organising maps. Springer, pp 73–80
Bache K, Lichman M (2013) Uci machine learning repository. http://archive.ics.uci.edu/ml
Barreto GA (2007) Time series prediction with the self-organizing map: a review. In: Hammer B, Hitzler P (eds) Perspectives of neural-symbolic integration, vol 77. Studies in Computational Intelligence. Springer, Berlin, pp 135–158
Barreto GA, Araújo AFR (2004) Identification and control of dynamical systems using the self-organizing map. IEEE Trans Neural Netw 15(5):1244–1259
Basak D, Pal S, Patranabis DC (2007) Support vector regression. Neural Inf Process Lett Rev 11(10):203–224
Campos MM, Carpenter GA (2000) Building adaptive basis functions with a continuous self-organizing map. Neural Process Lett 11(1):59–78. https://doi.org/10.1023/A:1009622004201
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2: 27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chávez E, Navarro G, Baeza-Yates R, Marroquín JL (2001) Searching in metric spaces. ACM Comput Surv (CSUR) 33(3):273–321. https://doi.org/10.1145/502807.502808
Cho J, Principe JC, Erdogmus D, Motter MA (2006) Modeling and inverse controller design for an unmanned aerial vehicle based on the self-organizing map. IEEE Trans Neural Netw 17(2):445–460
Cuadros-Vargas E, Romero RF, Obermayer K (2003) Speeding up algorithms of SOM family for large and high dimensional databases. In: Proc. workshop on self organizing maps (WSOM’03), pp 167–172
Díaz-Vico D, Torres-Barrán A, Omari A, Dorronsoro JR (2017) Deep neural networks for wind and solar energy prediction. Neural Process Lett 46(3):829–844. https://doi.org/10.1007/s11063-017-9613-7
Flentge F (2006) Locally weighted interpolating growing neural gas. IEEE Trans Neural Netw 17(6):1382–1393
Fritzke B (1995) Incremental learning of local linear mappings. In: Proc. International conference on artificial neural networks (ICANN), pp 217–222
Gaillard P, Aupetit M, Govaert G (2008) Learning topology of a labeled data set with the supervised generative gaussian graph. Neurocomputing 71(7):1283–1299. https://doi.org/10.1016/j.neucom.2007.12.028. http://www.sciencedirect.com/science/article/pii/S0925231208000635
Göppert J, Rosenstiel W (1995) Interpolation in SOM: improved generalization by iterative methods. In: Soulié FF, Gallinari P (eds) Proc. international conference on artificial neural networks (ICANN), vol II, pp 69–74. EC2, Nanterre, France
Göppert J, Rosentiel W (1997) The continuous interpolating self-organizing map. Neural Process Lett 5(3):185–192. https://doi.org/10.1023/A:1009694727439
Hartono P, Hollensen P, Trappenberg T (2015) Learning-regulated context relevant topographical map. IEEE Trans Neural Netw Learn Syst 26(10):2323–2335. https://doi.org/10.1109/TNNLS.2014.2379275
Hecht T, Lefort M, Gepperth A (2015) Using self-organizing maps for regression: the importance of the output function. In: Proc. European symposium on artificial neural networks (ESANN). Bruges, Belgium. https://hal.archives-ouvertes.fr/hal-01251011
Heskes T (1999) Energy functions for self-organizing maps. In: Oja E, Kaski S (eds) Kohonen maps, pp 303–315. Elsevier Science B.V., Amsterdam. https://doi.org/10.1016/B978-044450270-4/50024-3
Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A, Hassabis D, Clopath C, Kumaran D, Hadsell R (2017) Overcoming catastrophic forgetting in neural networks. Proc Natl Acad Sci 114(13):3521–3526. https://doi.org/10.1073/pnas.1611835114. https://www.pnas.org/content/114/13/3521
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69
Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52 – 65. http://www.sciencedirect.com/science/article/pii/S0893608012002596
Lawrence S, Tsoi AC, Back AD (1996) Function approximation with neural networks and local methods: bias, variance and smoothness. In: Proc. Australian conference on neural networks (ACNN), vol 1621
Li S, Fang H, Liu X (2018) Parameter optimization of support vector regression based on sine cosine algorithm. Expert Syst Appl 91:63–77. https://doi.org/10.1016/j.eswa.2017.08.038. http://www.sciencedirect.com/science/article/pii/S0957417417305833
Li Z, Hoiem D (2018) Learning without forgetting. IEEE Trans Pattern Anal Mach Intell 40(12):2935–2947. https://doi.org/10.1109/TPAMI.2017.2773081
Lomonaco V, Maltoni D (2017) Core50: a new dataset and benchmark for continuous object recognition. In: Levine S, Vanhoucke V, Goldberg K (eds) Proc. annual conference on robot learning, Proceedings of machine learning research, vol 78, pp 17–26. PMLR. http://proceedings.mlr.press/v78/lomonaco17a.html
Ludwig L, Kessler W, Göppert J, Rosenstiel W (1995) SOM with topological interpolation for the prediction of interference spectra. In: Proc. engineering applications of neural networks (EANN), pp 379–389. Helsinki, Finland
Maltoni D, Lomonaco V (2019) Continuous learning in single-incremental-task scenarios. Neural Netw 116:56–73. https://doi.org/10.1016/j.neunet.2019.03.010. http://www.sciencedirect.com/science/article/pii/S0893608019300838
Parisi GI, Kemker R, Part JL, Kanan C, Wermter S (2019) Continual lifelong learning with neural networks: a review. Neural Netw 113:54–71. https://doi.org/10.1016/j.neunet.2019.01.012. http://www.sciencedirect.com/science/article/pii/S0893608019300231
Principe JC, Wang L, Motter MA (1998) Local dynamic modeling with self-organizing maps and applications to nonlinear system identification and control. Proc IEEE 86(11):2240–2258
Rasmussen CE, Neal RM, Hinton G, van Camp D, Revow M, Ghahramani Z, Kustra R, Tibshirani R (1996) Delve data for evaluating learning in valid experiments. http://www.cs.toronto.edu/~delve
Rusu AA, Rabinowitz NC, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, Pascanu R, Hadsell R (2016) Progressive neural networks. arXiv:1606.04671
Salcedo-Sanz S, Rojo-Álvarez JL, Martínez-Ramón M, Camps-Valls G (2014) Support vector machines in engineering: an overview. Wiley Interdiscip Rev Data Min Knowl Discov 4(3):234–267. https://doi.org/10.1002/widm.1125
Schaal S, Atkeson CG (1998) Constructive incremental learning from only local information. Neural Comput 10(8):2047–2084. https://doi.org/10.1162/089976698300016963
Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5):1207–1245. https://doi.org/10.1162/089976600300015565
Shepard D (1968) A two-dimensional interpolation function for irregularly-spaced data. In: Proc. ACM national conference. ACM ’68, pp 517–524. ACM, New York, NY, USA. https://doi.org/10.1145/800186.810616
Sibson R (1981) A brief description of natural neighbour interpolation. Interpreting multivariate data, pp 21–36
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222
Thampi G, Principe JC, Cho J, Motter M (2002) Adaptive inverse control using SOM based multiple models. In: Proc. Portuguese conference automatic control (CONTROLO), pp 278–282
Vijayakumar S, D’Souza A, Schaal S (2005) Incremental online learning in high dimensions. Neural Comput 17(12):2602–2634. https://doi.org/10.1162/089976605774320557
Walter J, Ritter H (1996) Rapid learning with parametrized self-organizing maps. Neurocomputing 12(2):131–153
Wang J, Li Y (2018) Short-term wind speed prediction using signal preprocessing technique and evolutionary support vector regression. Neural Process Lett 48(2):1043–1061. https://doi.org/10.1007/s11063-017-9766-4
Zenke F, Poole B, Ganguli S (2017) Continual learning through synaptic intelligence. In: Proc. International Conference on Machine Learning - Volume 70, ICML’17, pp. 3987–3995. JMLR.org. http://dl.acm.org/citation.cfm?id=3305890.3306093
Acknowledgements
The authors would like to thank the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for financial support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Derivatives for Learning Rule of Difference Technique
For the vector \(\varvec{k}_{s}\) of the winner \(n_{s}\):
Derivatives for Learning Rule of Extended Difference Technique
For the vectors \(\varvec{k}_{s,j}\) of the winner \(n_{s}\):
since
then
Derivatives for Learning Rule of Locally Weighted Extended Difference Technique
To calculate the derivative \(\frac{\partial \widetilde{F}_{s}(\varvec{\xi }^{in})}{\partial \varvec{k}_{i}}\), we need to determine \(\frac{\partial \widetilde{M}_{(s,k)}(\varvec{\xi }^{in})}{\partial \varvec{k}_{i}}\)
We can distinguish two cases:
- 1.
Case: \(n_{i} = n_{(s,k)}\)
$$\begin{aligned} \frac{\partial \widetilde{M}_{(s,k)}(\varvec{\xi }^{in})}{\partial \varvec{k}_{i}} = \frac{\partial }{\partial \varvec{k}_{(s,k),j}} \left( w^{out}_{(s,k)} + \sum _{l = 0}^{ |N_{(s,k)}| } \varvec{k}_{(s,k),l} \cdot \varvec{d}_{(s,k),l} \right) = \sum _{l = 0}^{ |N_{(s,k)}| } \frac{\partial \varvec{k}_{(s,k),l}}{\partial \varvec{k}_{(s,k),j}} \cdot \varvec{d}_{(s,k),l}\nonumber \\ \end{aligned}$$(27)since
$$\begin{aligned} \frac{\partial \varvec{k}_{(s,k),l}}{\partial \varvec{k}_{(s,k),j}} = \left\{ \begin{array}{ll} \varvec{I} &{}\quad l = j \\ \varvec{0} &{}\quad l \ne j \end{array}\right. \end{aligned}$$(28)then
$$\begin{aligned} \frac{\partial \widetilde{M}_{(s,k)}(\varvec{\xi }^{in})}{\partial \varvec{k}_{(s,k),j}} = \varvec{d}_{(s,k),j} \end{aligned}$$(29) - 2.
Case: \(n_{i} \ne n_{(s,k)}\)
$$\begin{aligned} \frac{\partial \widetilde{F}_{s}(\varvec{\xi }^{in})}{\partial \varvec{k}_{i}} = \varvec{0} \end{aligned}$$(30)since
$$\begin{aligned} \frac{\partial \varvec{k}_{(s,k),l}}{\partial \varvec{k}_{i}} = \varvec{0} \end{aligned}$$(31)
Rights and permissions
About this article
Cite this article
Ferreira, P.H.M., Araújo, A.F.R. Growing Self-Organizing Maps for Nonlinear Time-Varying Function Approximation. Neural Process Lett 51, 1689–1714 (2020). https://doi.org/10.1007/s11063-019-10168-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-019-10168-9