Abstract
A great challenge in real-world applications driven which use data streams to solve forecast problems is handling missing data. Although there are methods to reduce the effects caused by this issue, most systems are not modeled in a preventive way to enable an adequate treatment of this type of occurrence. In this context, this paper introduces a new evolving fuzzy approach called evolving Neo-Fuzzy Neuron with Missing Data Procedure (eNFN-MDP), that handles single and multiple missing values on data samples. eNFN-MDP checks whether there are variables with missing values for each new sample. If one or more missing values are found, the estimated values are imputed. Then, the output is computed with all available values. Forecasting examples illustrate the usefulness of the approach. Experimental comparisons in Missing at Random and Missing Completely at Random in nonstationary environments are performed. The results of the eNFN-MDP are compared with state-of-the-art methods and models. Simulations results show that the eNFN-MDP achieves as high as or higher performance than the remaining evolving modeling methods. Therefore, the experimental results suggest the proposed approach as a simple and efficient alternative for data imputation in evolving modeling.









Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Notes
Also known as Missing Not Random—MNAR.
The eMG and eNFN Matlab code and the Python implementation of the eFPG were made available by the respective authors. The eNFN-MDP was developed in Matlab.
References
Angelov P, Filev D (2005) Simpl\_ets: a simplified method for learning evolving Takagi-Sugeno fuzzy models. In: The 14th IEEE international conference on fuzzy systems, 2005. FUZZ’05., IEEE, 2005, pp 1068–1073
Aguiar C, Leite D (2020) Unsurpervised fuzzy eIX: Evolving Internal-eXternal Fuzzy Clustering. In: Proceedings of the IEEE conference on evolving and adaptive intelligent systems (EAIS), IEEE, 2020, pp 1–8
Amiri M, Jensen R (2016) Missing data imputation using fuzzy-rough methods. Neurocomputing 205:152–164
Angelov P (2011) Fuzzily connected multimodel systems evolving autonomously from data streams. IEEE Trans Syst Man Cybern Part B (Cybern) 41(4):898–910. https://doi.org/10.1109/TSMCB.2010.2098866
Angelov PP, Filev DP (2004) An approach to online identification of Takagi-Sugeno fuzzy models. IEEE Trans Syst Man Cybern Part B (Cybern) 34(1):484–498
Angelov P, Zhou X (2006) Evolving fuzzy systems from data streams in real-time. In: (2006) International symposium on evolving fuzzy systems. IEEE 2006:29–35
Angelov PP, Gu X, Príncipe JC (2017) Autonomous learning multimodel systems from data streams. IEEE Trans Fuzzy Syst 26(4):2213–2224
Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35. https://doi.org/10.1016/j.ins.2013.01.021
Bezerra C, Costa B, Guedes L, Angelov P (2020) An evolving approach to data streams clustering based on typicality and eccentricity data analytics. Inf Sci 518:13–28. https://doi.org/10.1016/j.ins.2019.12.022
Brooks TF, Pope DS, Marcolini MA (1989) Airfoil self-noise and prediction, technical report no. NASA RP-1218, National Aeronautics of Space Administration, Office of Management (1989), p 146
Caminhas W, Gomide F (2000) A fast learning algorithm for neofuzzy networks. In: Proceedings of information processing and management of uncertainty in knowledge based systems, vol 1, pp 1784–1790
Cheng C-Y, Tseng W-L, Chang C-F, Chang C-H, Gau SS-F (2020) A deep learning approach for missing data imputation of rating scales assessing attention-deficit hyperactivity disorder. Front Psychiatry 11:673. https://doi.org/10.3389/fpsyt.2020.00673
Chen L-T, Feng Y, Wu P-J, Peng C-YJ (2020) Dealing with missing data by EM in single-case studies. Behav Res Methods 52(1):131–150. https://doi.org/10.3758/s13428-019-01210-8
Cooke M, Morris A, Green P (1997) Missing data techniques for robust speech recognition. In: 1997 IEEE international conference on acoustics, speech, and signal processing, vol 2, IEEE, 1997, pp 863–866
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–22
De Vito S, Massera E, Piga M, Martinotto L, Di Francia G (2008) On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sens Actuators B Chem 129(2):750–757
DeVito S, Piga M, Martinotto L, DiFrancia G (2009) Co, NO2 and NOX urban pollution monitoring with on-field calibrated electronic nose by automatic Bayesian regularization. Sens Actuators B Chem 143(1):182–191
De Vito S, Fattoruso G, Pardo M, Tortorella F, Di Francia G (2012) Semi-supervised learning techniques in artificial olfaction: a novel approach to classification problems and drift counteraction. IEEE Sens J 12(11):3215–3224
Dua D, Graff C (2019) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml. Accessed Jan 2020
Enders CK, Baraldi AN (2018) Missing data handling methods, the Wiley handbook of psychometric testing: a multidisciplinary reference on survey, scale and test development, pp 139–185
Farhangfar A, Kurgan LA, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern Part A Syst Hum 37(5):692–709
Fletcher Mercaldo S, Blume JD (2020) Missing data and prediction: the pattern submodel. Biostatistics 21(2):236–252
Folguera L, Zupan J, Cicerone D, Magallanes JF (2015) Self-organizing maps for imputation of missing data in incomplete data matrices. Chemomet Intell Lab Syst 143:146–151. https://doi.org/10.1016/j.chemolab.2015.03.002
Garcia C, Leite D, Škrjanc I (2019) Incremental missing-data imputation for evolving fuzzy granular prediction. IEEE Trans Fuzzy Syst 28(10):2348–2362. https://doi.org/10.1109/TFUZZ.2019.2935688
Garcia C, Esmin A, Leite D, Škrjanc I (2019) Evolvable fuzzy systems from data streams with missing values: with application to temporal pattern recognition and cryptocurrency prediction. Pattern Recognit Lett 128:278–282
Gelman A, Hill J (2006) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge
Grzymala-Busse JW, Goodwin LK, Grzymala-Busse WJ, Zheng X (2005) Handling missing attribute values in preterm birth data sets. In: International workshop on rough sets, fuzzy sets, data mining, and granular-soft computing, Springer, 2005, pp 342–351
Hadeed SJ, O’Rourke MK, Burgess JL, Harris RB, Canales RA (2020) Imputation methods for addressing missing data in short-term monitoring of air pollutants. Sci Total Environ 730:139140. https://doi.org/10.1016/j.scitotenv.2020.139140
Kang H (2013) The prevention and handling of the missing data. Korean J Anesthesiol 64(5):402
Kiersztyn A, Karczmarek P, Łopucki R, Pedrycz W, Al E, Kitowski I, Zbyryt A (2020) Data imputation in related time series using fuzzy set-based techniques. In: 2020 IEEE international conference on fuzzy systems (FUZZ-IEEE), 2020, pp 1–8. https://doi.org/10.1109/FUZZ48607.2020.9177617
Krause RW, Huisman M, Steglich C, Sniiders TA (2018) Missing network data a comparison of different imputation methods. In: 2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), IEEE, 2018, pp 159–163
Lau K, López R, Oñate E, Ortega E, Flores R, Mier-Torrecilla M, Idelsohn S, Sacco C, González E (2006) A neural networks approach for aerofoil noise prediction, master thesis, Department of Aeronautics, Imperial College of Science, Technology and Medicine. United Kingdom, London
Leite D, Škrjanc I (2019) Ensemble of evolving optimal granular experts. OWA Aggreg Time Ser Predict Inf Sci 504:95–112. https://doi.org/10.1016/j.ins.2019.07.053
Leite D, Škrjanc I, Gomide F (2020) An overview on evolving systems and learning from stream data. Evolving Syst 11:181–198. https://doi.org/10.1007/s12530-020-09334-5
Lemos A, Caminhas W, Gomide F (2010) Multivariable Gaussian evolving fuzzy modeling system. IEEE Trans Fuzzy Syst 19(1):91–104
Little TD, Lang KM, Wu W, Rhemtulla M (2016) Missing data. Dev Psychopathol 1:1–37. https://doi.org/10.1002/9781119125556.devpsy117
Lopez R, Balsa-Canto E, Oñate E (2008) Neural networks for variational problems in engineering. Int J Numer Methods Eng 75(11):1341–1360
Lughofer E, Angelov P (2011) Handling drifts and shifts in on-line data streams with evolving fuzzy systems. Appl Soft Comput 11:2057–2068. https://doi.org/10.1016/j.asoc.2010.07.003
Mackey MC, Glass L (1977) Oscillation and chaos in physiological control systems. Science 197(4300):287–289
Myers TA (2011) Goodbye, listwise deletion: presenting hot deck imputation as an easy and effective tool for handling missing data. Commun Methods Meas 5(4):297–310
Nishanth KJ, Ravi V (2016) Probabilistic neural network based categorical data imputation. Neurocomputing 218:17–25. https://doi.org/10.1016/j.neucom.2016.08.044
Osman MS, Abu-Mahfouz AM, Page PR (2018) A survey on data imputation techniques: water distribution system as a use case. IEEE Access 6:63279–63291
Pedersen AB, Mikkelsen EM, Cronin-Fenton D, Kristensen NR, Pham TM, Pedersen L, Petersen I (2017) Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol 9:157
Pratama I, Permanasari AE, Ardiyanto I, Indrayani R (2016) A review of missing values handling methods on time-series data. In: 2016 international conference on information technology systems and innovation (ICITSI), IEEE, 2016, pp 1–6
Rachdi M, Laksaci A, Kaid Z, Benchiha A, Al-Awadhi FA (2021) k-nearest neighbors local linear regression for functional and missing data at random. Statistica Neerlandica 75(1):42–65. https://doi.org/10.1111/stan.12224
Santos MS, Pereira RC, Costa AF, Soares JP, Santos J, Abreu PH (2019) Generating synthetic missing data: a review by missing mechanism. IEEE Access 7:11651–11667
Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7(2):147
Schneider T (2001) Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. J Clim 14(5):853–871
Sefidian AM, Daneshpour N (2020) Estimating missing data using novel correlation maximization based methods. Appl Soft Comput 91:106249. https://doi.org/10.1016/j.asoc.2020.106249
Shafronenko A, Bodyanskiy Y, Pliss I, Popov S (2020) Evolving neo-fuzzy system for distorted data online processing. In: 2020 10th international conference on advanced computer information technologies (ACIT), IEEE, 2020, pp 352–355
Silva AM (2014) Sistemas neuro-fuzzy evolutivos: Novos algoritmos de aprendizado e aplicacoes—in portuguese. Ph.D. thesis, UFMG—Federal University of Minas Gerais, Belo Horizonte, MG, Brazil
Silva AM, Caminhas W, Lemos A, Gomide F (2014) A fast learning algorithm for evolving neo-fuzzy neuron. Appl Soft Comput 14:194–209
Silva AM, Caminhas W, Lemos A, Gomide F (2015) Adaptive input selection and evolving neural fuzzy networks modeling. Int J Comput Intell Syst 8(sup1):3–14
Škrjanc I (2019) Cluster-volume based merging concept for incrementally evolving fuzzy Gaussian clustering—eGAUSS+. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2019.2931874
Škrjanc I, Iglesias J, Sanchis de Miguel A, Leite D, Lughofer E, Gomide F (2019) Evolving fuzzy and neuro-fuzzy approaches in clustering, regression, identification, and classification: a survey. Inf Sci. https://doi.org/10.1016/j.ins.2019.03.060
Spinelli I, Scardapane S, Uncini A (2020) Missing data imputation with adversarially-trained graph convolutional networks. Neural Netw 129:249–260. https://doi.org/10.1016/j.neunet.2020.06.005
Tang J, Zhang G, Wang Y, Wang H, Liu F (2015) A hybrid approach to integrate fuzzy c-means based imputation method with genetic algorithm for missing traffic volume data estimation. Transport Res Part C Emerg Technol 51:29–40
Tang J, Zhang X, Yin W, Zou Y, Wang Y (2020) Missing data imputation for traffic flow based on combination of fuzzy neural network and rough set theory. J Intell Transport Syst. https://doi.org/10.1080/15472450.2020.1713772
van Ginkel JR, Linting M, Rippe RC, van der Voort A (2020) Rebutting existing misconceptions about multiple imputation as a method for handling missing data. J Person Assess 102(3):297–308
Yadav ML, Roychoudhury B (2018) Handling missing values: a study of popular imputation packages in r. Knowl Based Syst 160:104–118
Zhang Z (2016) Missing data imputation: focusing on single imputation. Ann Transl Med 4(1):1–8
Acknowledgements
The authors acknowledges CAPES, Brazilian Ministry of Education, code 001.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: missing data create
Appendix: missing data create
Algorithm 5 was used to create the datasets to MCAR scenarios. This algorithm receives the complete dataset and the missing rate for the new dataset. Then, it calculates the number of samples that should have missing data from the number of samples in the original dataset and the missing rate. Then, the algorithm randomly selects samples that will have missing data. Finally, for each sample chosen in the previous step, the algorithm randomly selects the variable that must-have missing value.

The Algorithm 6 was used to generate the datasets with MAR scenery. This algorithm receives the original dataset, the missing rate for the most propensity variable, and the missing rate for the other variables. The algorithm selects a variable from the base to be the variable most likely to have a missing value and calculates the probability of the other variables being missing based on the number of variables and the minimum missing rate established. Then, two vectors are started to control the selections, one zeroed, and one with the value of each index. With this, the algorithm iterates through each variable so that each one occupies a number of positions in the select vector relative to its probability percentage of missing. Finally, when iterating through each sample in the original dataset, the algorithm randomly selects an index from the list. If the index of a variable is selected, the variable receives a missing value.

Rights and permissions
About this article
Cite this article
Júnior, G.A.d.S., Silva, A.M.d. A simple and efficient incremental missing data imputation method for evolving neo-fuzzy network. Evolving Systems 13, 201–220 (2022). https://doi.org/10.1007/s12530-021-09376-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-021-09376-3
Keywords
Profiles
- Alisson Marques da Silva View author profile