Abstract
In real applications learning algorithms have to address several issues such as, huge amount of data, samples which arrive continuously and underlying data generation processes that evolve over time. Classical learning is not always appropriate to work in these environments since independent and indentically distributed data are assumed. Taking into account the requirements of the learning process, systems should be able to modify both their structures and their parameters. In this survey, our aim is to review the developed methodologies for adaptive learning with artificial neural networks, analyzing the strategies that have been traditionally applied over the years. We focus on sequential learning, the handling of the concept drift problem and the determination of the network structure. Despite the research in this field, there are currently no standard methods to deal with these environments and diverse issues remain an open problem.
Similar content being viewed by others
References
Alippi C, Roveri M (2008) Just-in-time adaptive classifiers—part II: designing the classifier. IEEE Trans Neural Netw 19(12):2053–2064
Alippi C, Boracchi G, Roveri M (2011) A just-in-time adaptive classification systems based on the intersection of confidence intervals rule. Neural Netw 24(8):791–800
Alippi C, Boracchi G, Roveri M (2012) Just-in-time ensemble of classifiers. In: Proceedings of international joint conference on neural networks (IJCNN’12), pp 1–8
Alippi C, Boracchi G, Roveri M (2013) Just-in-time classifiers for recurrent concepts. IEEE Trans Neural Netw Learn Syst 24(4):620–634
Augasta MG, Kathirvalavakumar T (2011) A novel pruning algorithm for optimizing feedforward neural network of classification problems. Neural Process Lett 34:241–258
Augasta MG, Kathirvalavakumar T (2013) Pruning algorithms of neural networks a comparative study. Cent Eur J Comp Sci 3(3):105–115
Bauer F, Lukas MA (2011) Comparing parameter choice methods for regularization of ill-posed problems. Math Comput Simul 81:1795–1841
Baum EB, Haussler D (1989) What size net gives valid generalization? Neural Comput 1:151–160
Beale EM (1972) A derivation of conjugate gradients, numerical methods for nonlinear optimization. Academic Press, New York
Bertini Junior JR, Nicoletti MC (2016) Enhancing constructive neural networks performance using functionally expanded input data. J Artif Intell Soft Comput Res 6(2):119–131
Bifet A, Gavalda R (2006) Kalman filters and adaptive windows for learning in data streams. In: Proceedings of international conference discovery science, pp 29–40
Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceddings of SIAM international conference on data mining (SDM 2007)
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
Bondarenko A, Borisov A, Aleksejeva L (2015) Neurons vs weights pruning in artificial neural networks. In: Proceedings of the 10th international scientific and practical conference, vol III, pp 22–28
Bottou L (2004) Stochastic learning. Adv Lect Mach Learn Lect Notes Artif Intell 3176:146–168
Bouchachia A (2011) Incremental learning with multi-level adaptation. Neurocomputing 74(11):1785–1799
Bouchachia A, Gabrys B, Sahel Z (2007) Overview of some incremental learning algorithms. In: Proceedings of the IEEE international conference on fuzzy systems, pp 1–6
Brzezinski D, Stephanowski J (2014) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans Neural Netw Learn Syst 25(1):81–94
Camargo LS, Yoneyama T (2001) Specification of training sets and the number of hidden neurons for multilayer perceptrons. Neural Comput 13(12):2673–2680
Chentouf R, Jutten C (1996) DWINA: depth and width incremental neural algorithm. In: Proceedings of the IEEE international conference on neural networks, pp 153–158
Cun YL, Denker JS, Solla SA (1990) Optimal brain damage. Adv Neural Inf Process 2:598–605
de Jesus Rubio J, Perez-Cruz H (2014) Evolving intelligent system for the modelling of nonlinear systems with dead-zone input. Appl Soft Comput 14(Part B):289–304
Ditzler G, Rosen G, Polikar R (2013) Discounted expert weighting for concept drift. In: IEEE symposium on computational intelligence in dynamic and uncertain environments (CIDUE’13), pp 61–67
Ditzler G, Rosen G, Polikar R (2014) Domain adaptation bounds for multiple expert systems under concept drift. In: International joint conference on neural networks (IJCNN’14), pp 595–601
Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):12–25
Egrioglu E, Aladag CH, Gunay S (2008) A new model selection strategy in artificial neural networks. Appl Math Comput 195:591–597
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531
Engel Y, Mannor S, Meir R (2004) The kernel recursive least-squares algorithm. IEEE Trans Signal Process 52(8):2275–2285
Esposito F, Ferilli S, Fanizzi N, Basile T, Mauro MD (2004) Incremental learning and concept drift in INTHELEX. Intell Data Anal 8(3):213–237
Fan Q, Zurada JM, Wu W (2014) Convergence of online gradient method for feedforward neural networks with smoothing \(l_{1/2}\) regularization penalty. Neural Netw 50:72–78
Fritzke B (1994) Growing cell structures a self-organizing network for unsupervised and supervised learning. Neural Netw 7(9):1441–1460
Gama J (2010) Knowledge discovery from data streams. Chapman and Hall/CRC, Boca Raton
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Proceedings in adavances artificial intelligence (SBIA 2004), pp 586–295
Gama J, Sebastiao R, Pereira Rodrigues P (2013) On evaluating stream learning algorithms. Mach Learn 90(3):317–346
Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1–44:37
García-Pedrajas N, Ortiz-Boyer D (2007) A cooperative constructive method for neural networks for pattern recognition. Pattern Recognit 40(1):80–98
Ghazikhani A, Monsefi R, Sadoghi Yazdi H (2014) Online neural network model for non-stationary and imbalanced data stream classification. Int J Mach Learn Cybernet 5(1):51–62
Goodwin GC, Sin KS (1984) Adaptive filtering, prediction and control. Prentice-Hall, Englewood Cliffs
Gregorcic G, Lightbody G (2007) Local model network identification with gaussian processes. IEEE Trans Neural Netw 18:1404–1423
Grossberg S (1987) Competitive learning: from interactive activation to adaptive resonance. Cogn Sci 11(1):23–63
Hagan MT, Menhaj M (1994) Training feedforward networks with the marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993
Han H-G, Qiao J-F (2013) A structure optimisation algorithm for feedforward neural network construction. Neurocomputing 99:347–357
Hassibi B, Stork DG (1993) Second-order derivatives for network pruning: optimal brain surgeon. Adv Neural Inf Process Syst 5:164–171
Haykin S (1999) Neural networks: a comprehensive foundation. Prentice Hall, New Jersey
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Hsu CF (2008) Adaptive growing-and-pruning neural network control for a linear piezoelectric ceramic motor. Eng Appl Artif Intell 21(8):1153–1163
Huang G-B, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71(16–18):3460–3468
Huang DS, Du JX (2008) A constructive hybrid structure optimization methodology for radial basis probabilistic neural networks. IEEE Trans Neural Netw 19(12):2099–2115
Huang GB, Saratchandran P, Sundararajan N (2005) Generalised growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16(1):57–67
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theorey and applications. NeuroComputing 70:489–501
Islam MM, Sattar MA, Amin MF, Yao X, Murase K (2009) A new adaptive merging and growing algorithm for designing artificial neural networks. IEEE Trans Syst Man Cybern 39(3):705–722
Jain LC, Seera M, Lim CP, Balasubramaniam P (2014) A review of online learning in supervised neural networks. Neural Comput Appl 25:491–509
Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300
Krempl G, Žliobaitė I, Brzeziński D, Hüllermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J (2014) Open challenges for data stream mining research. SIGKDD Explor 16(1):1–10
Kubat M, Gamma J, Utgoff P (2004) Incremental learning and concept drift, editor’s introduction: guest-editorial. Intell Data Anal 8(3):211–212
Kuncheva L, Žliobaitė I (2009) On the window size for classification in changing environments. Intell Data Anal 13(6):861–872
Kwok T-Y, Yeung D-Y (1997) Constructive algorihtms for structure learning in feedforward neural networks for regresion problems. IEEE Trans Neural Netw 8(3):630–645
Lauret P, Fock E, Mara TA (2006) A node pruning algorithm based on a fourier amplitude sensitivity test method. IEEE Trans Neural Netw 17(2):273–293
LeCunn Y, Bottou L, Orr G, Müller K-R (1998) Efficient backprop. Neural Netw Tricks Trade 1524:9–50
Levenberg K (1944) A method for the solution of certain non-linear problems in least squares. Q J Appl Math 2(2):164–168
Liang N-Y, Huang G-B (2006) A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17(6):1411–1423
Liu Y, Starzyk A, Zhu Z (2007) Optimizing number of hidden neurons in neural networks. In: Proceedings of the artificial intelligence and applications (AIAP’07), pp 121–126
Liu W, Pokharel PP, Principe JC (2008a) The kernel least-mean-square algorithm. IEEE Trans Signal Process 56(2):543–554
Liu Y, Starzyk A, Zhu Z (2008b) Optimized approximation algorithm in neural networks without overfitting. IEEE Trans Neural Netw 19(6):983–995
Liu W, Park I, Principe JC (2009) Extended kernel recursive least squares algorithm. IEEE Trans Signal Process 57(10):3801–3814
Ma L, Khorasani K (2003) A new strategy for adaptively constructing multilayer feedforward neural networks. Neurocomputing 51:361–385
Marquardt DW (1963) An algorithm for least-squares estimation of non-linear parameters. J Soc Ind Appl Math 11(2):431–441
Marques Silva A, Caminhasa W, Lemosa A, Gomide F (2014) A fast learning algorithm for evolving neo-fuzzy neuron. Appl Soft Comput 14(B):194–209
Martínez-Rego D, Pérez-Sánchez B, Fontenla-Romero O, Alonso-Betanzos A (2011) A robust incremental learning method for non-stationary environments. Neurocomputing 74:1800–1808
Minku LL, White AP, Yao X (2010) The impact of diversity on on-line ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22:730–742
Minku L, Yao X (2012) Ddd: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619–633
Moller M (1993) Supervised learning on large redundant training sets. Int J Neural Syst 4(1):15–25
Nagumo J, Noda A (1967) A learning method for system identification. IEEE Trans Autom Control 12:283–287
Narasimha PL, Delashmit WH, Manry MT, Li J, Maldonado F (2008) An integrated growing-pruning method for feedforward network training. Neurocomputing 71(13–15):2831–2847
Ortega-Zamorano F, Jerez J, Urda D, Luque-Baena R, Franco L (2014) Fpga implementation of the C-MANTEC neural networks constructive algorithm. IEEE Trans Ind Inf 10(2):1154–1161
Ortega-Zamorano F, Jerez J, Jurez G, Franco L (2015) Fpga implmentation comparison between c-mantec and back propagation. In: International workshop on artificial neural networks (IWANN 2015), vol Part II of LNCS, pp 197–208
Peng H, Mou L, Li G, Chen Y, Lu Y, Jin Z (2015) A comparative study on regularization strategies for embedding-based neural networks. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP 2015), pp 2106–2111
Pérez-Sánchez B, Fontenla-Romero O, Guijarro-Berdiñas B, Martínez-Rego D (2013) An online learning algorithm for adaptable topologies of neural networks. Expert Syst Appl 40:7294–7304
Pérez-Sánchez B, Fontenla-Romero O, Guijarro-Berdiñas B (2014) Self-adaptive topology neural network for online incremental learning. In: Proceedings of the international conference on agents and artificial intelligence (ICAART’14), pp 94–101
Pérez-Sánchez B, Fontenla-Romero O, Guijarro-Berdiñas B (2015) Adaptive neural topology based on Vapnik–Chervonenkis dimension. In: Lecture Notes in Artificial Intelligence (in press)
Plavidis NG, Tasoulis DK, Adams NM, Hand DJ (2011) Landa perceptron: an adaptive classifier for data streams. Pattern Recogn 44(1):78–96
Qiao JF, Han HG (2010) A repair algorithm for RBF neural network and its application to chemical oxygen demand modeling. Int J Neural Syst 20(1):63–74
Qiao J, Zhang Z, Bo Y (2014) An online self-adaptive modular neural network for time-varying systems. Neurocomputing 125:7–16
Qiao J, Li F, Han H, Li W (2016) Constructive algorithm for fully connected cascade feedforward neural networks. Neurocomputing 182:154–164
Qi M, Zhang GP (2001) An investigation of model selection criteria for neural network time series forecasting. Eur J Oper Res 132:666–680
Reitermanová Z (2008) Feedforward neural networks architecture optimization and knowledge extraction. In: Proceedings of week of doctoral students (WDS 2008), vol Part I, pp 159–164
Robins A (2004) Sequential learning in neural networks: a review and a discussion of pseudorehearsal based methods. Intell Data Anal 8(3):301–322
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386–408
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations of back-propagation errors. Nature 323:533–536
Scarselli F, Tsoi AC (1998) Universal approximation using feedforward neural networks a surver of some existing methods and some new results. Neural Netw 11(1):15–37
Shao HM, Zheng GF (2011) Boundedness and convergence of online gradient method with penalty and momentum. Neurocomputing 74:765–770
Sharma SK, Chandra P (2010) Constructive neural networks: a review. Int J Eng Sci Technol 2(12):7847–7855
Subirats JL, Franco L, Jerez JM (2012) C-mantec: a novel constructive neural network algorithm incorporating competition between neurons. Neural Netw 26:131–140
Teoh EJ, Tan KC, Xiang C (2006) Estimating the number of hidden neurons in a feedforward network using the singular value decomposition. IEEE Trans Neural Netw 17(6):1623–1629
Thomas P, Suhner MC (2015) A new multilayer perceptron pruning algorithm for classification and regression applications. Neural Process Lett 42(2):437–458
Vapnik V (1998) Statistical learning theory. Wiley, New York
Wang C, Hill DJ (2006) Learning from neural control. IEEE Trans Neural Netw 17(1):30–46
Wang J, Yang G, Liu S, Zurada JM (2015a) Convergence analysis of multilayer feedforward networks trained with penalty terms: a review. J Appl Comput Sci Methods 7(2):89–103
Wang J-H, Wang H-Y, Chen Y-L, Liu C-M (2015b) A constructive algorithm for unsupervised learning with incremental neural network. J Appl Res Technol 13:188–196
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23:69–101
Widrow E, Hoff ME (1960) Adaptive switching circuits. In: Proceedings of IRE WESCON convention, pp 96–104
Wu W, Fan QW, Zurada JM, Wang J, Yang DK, Liu Y (2014) Batch gradient method with smoothing regularization for training of feedforward neural networks. Neural Netw 50:72–78
Xu J, Ho DWC (2006) A new training and pruning algorithm based on node dependence and jacobian rank deficiency. Neurocomputing 70(1–3):544–558
Yamakawa T, Uchino E, Miki T, Kusabagi H (1992) A neofuzzy neuron and its applications to system identification and predictions to system behavior. Proc Int Conf Fuzzy Logic Neural Netw 1:477–484
Ye Y, Squartini S, Piazza F (2013) Online sequential extreme learning machine in nonstationary environments. Neurocomputing 116:94–101
Yoan M, Sorjamaa A, Bas P, Simula O, Jutten C, Lendasse A (2010) OP-ELM: optimally pruned extreme learning machine. IEEE Trans Neural Netw 21(1):158–162
Yu X, Chen QF (2012) Convergence of gradient method with penalty for ridge polynomial neural network. Neurocomputing 97:405–409
Zeng W, Wang C (2015) Classification of neurodegenerative diseases using gait dynamics via deterministic learning. Inf Sci 317(C):246–258
Zeng W, Wang C, Yang F (2014) Silhouette-based gait recognition via deterministic learning. Pattern Recogn 47(11):3568–3584
Zeng W, Wang Q, Liu F, Wang Y (2016) Learning from adaptive neural network output feedback control of a unicycle-type mobile robot. ISA Trans 61:337–347
Zhang HS, Wu W, Liu F, Yao MC (2009) Boundedness and convergence of online gadient method with penalty for feedforward neural networks. IEEE Trans Neural Netw 20(6):1050–1054
Zhang R, Lan Y, Huang GB, Xu ZB (2012) Universal approximation of extreme learning machine with adaptive growth of hidden nodes. IEEE Trans Neural Netw Learn Syst 23(2):365–371
Acknowledgments
The authors would like to thank support for this work from the Xunta de Galicia (Grant code GRC2014/035) and the Secretaría de Estado de Investigación of the Spanish Government (Grant code TIN2015-65069), all partially supported by the European Union ERDF funds.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pérez-Sánchez, B., Fontenla-Romero, O. & Guijarro-Berdiñas, B. A review of adaptive online learning for artificial neural networks. Artif Intell Rev 49, 281–299 (2018). https://doi.org/10.1007/s10462-016-9526-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-016-9526-2