Skip to main content
Log in

A simple and efficient incremental missing data imputation method for evolving neo-fuzzy network

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

A great challenge in real-world applications driven which use data streams to solve forecast problems is handling missing data. Although there are methods to reduce the effects caused by this issue, most systems are not modeled in a preventive way to enable an adequate treatment of this type of occurrence. In this context, this paper introduces a new evolving fuzzy approach called evolving Neo-Fuzzy Neuron with Missing Data Procedure (eNFN-MDP), that handles single and multiple missing values on data samples. eNFN-MDP checks whether there are variables with missing values for each new sample. If one or more missing values are found, the estimated values are imputed. Then, the output is computed with all available values. Forecasting examples illustrate the usefulness of the approach. Experimental comparisons in Missing at Random and Missing Completely at Random in nonstationary environments are performed. The results of the eNFN-MDP are compared with state-of-the-art methods and models. Simulations results show that the eNFN-MDP achieves as high as or higher performance than the remaining evolving modeling methods. Therefore, the experimental results suggest the proposed approach as a simple and efficient alternative for data imputation in evolving modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Also known as Missing Not Random—MNAR.

  2. The eMG and eNFN Matlab code and the Python implementation of the eFPG were made available by the respective authors. The eNFN-MDP was developed in Matlab.

  3. http://archive.ics.uci.edu/ml/datasets/airfoil+self-noise.

  4. http://archive.ics.uci.edu/ml/datasets/Air+Quality.

References

  • Angelov P, Filev D (2005) Simpl\_ets: a simplified method for learning evolving Takagi-Sugeno fuzzy models. In: The 14th IEEE international conference on fuzzy systems, 2005. FUZZ’05., IEEE, 2005, pp 1068–1073

  • Aguiar C, Leite D (2020) Unsurpervised fuzzy eIX: Evolving Internal-eXternal Fuzzy Clustering. In: Proceedings of the IEEE conference on evolving and adaptive intelligent systems (EAIS), IEEE, 2020, pp 1–8

  • Amiri M, Jensen R (2016) Missing data imputation using fuzzy-rough methods. Neurocomputing 205:152–164

    Article  Google Scholar 

  • Angelov P (2011) Fuzzily connected multimodel systems evolving autonomously from data streams. IEEE Trans Syst Man Cybern Part B (Cybern) 41(4):898–910. https://doi.org/10.1109/TSMCB.2010.2098866

    Article  Google Scholar 

  • Angelov PP, Filev DP (2004) An approach to online identification of Takagi-Sugeno fuzzy models. IEEE Trans Syst Man Cybern Part B (Cybern) 34(1):484–498

    Article  Google Scholar 

  • Angelov P, Zhou X (2006) Evolving fuzzy systems from data streams in real-time. In: (2006) International symposium on evolving fuzzy systems. IEEE 2006:29–35

  • Angelov PP, Gu X, Príncipe JC (2017) Autonomous learning multimodel systems from data streams. IEEE Trans Fuzzy Syst 26(4):2213–2224

    Article  Google Scholar 

  • Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35. https://doi.org/10.1016/j.ins.2013.01.021

    Article  Google Scholar 

  • Bezerra C, Costa B, Guedes L, Angelov P (2020) An evolving approach to data streams clustering based on typicality and eccentricity data analytics. Inf Sci 518:13–28. https://doi.org/10.1016/j.ins.2019.12.022

    Article  MathSciNet  MATH  Google Scholar 

  • Brooks TF, Pope DS, Marcolini MA (1989) Airfoil self-noise and prediction, technical report no. NASA RP-1218, National Aeronautics of Space Administration, Office of Management (1989), p 146

  • Caminhas W, Gomide F (2000) A fast learning algorithm for neofuzzy networks. In: Proceedings of information processing and management of uncertainty in knowledge based systems, vol 1, pp 1784–1790

  • Cheng C-Y, Tseng W-L, Chang C-F, Chang C-H, Gau SS-F (2020) A deep learning approach for missing data imputation of rating scales assessing attention-deficit hyperactivity disorder. Front Psychiatry 11:673. https://doi.org/10.3389/fpsyt.2020.00673

    Article  Google Scholar 

  • Chen L-T, Feng Y, Wu P-J, Peng C-YJ (2020) Dealing with missing data by EM in single-case studies. Behav Res Methods 52(1):131–150. https://doi.org/10.3758/s13428-019-01210-8

    Article  Google Scholar 

  • Cooke M, Morris A, Green P (1997) Missing data techniques for robust speech recognition. In: 1997 IEEE international conference on acoustics, speech, and signal processing, vol 2, IEEE, 1997, pp 863–866

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–22

    MathSciNet  MATH  Google Scholar 

  • De Vito S, Massera E, Piga M, Martinotto L, Di Francia G (2008) On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sens Actuators B Chem 129(2):750–757

    Article  Google Scholar 

  • DeVito S, Piga M, Martinotto L, DiFrancia G (2009) Co, NO2 and NOX urban pollution monitoring with on-field calibrated electronic nose by automatic Bayesian regularization. Sens Actuators B Chem 143(1):182–191

    Article  Google Scholar 

  • De Vito S, Fattoruso G, Pardo M, Tortorella F, Di Francia G (2012) Semi-supervised learning techniques in artificial olfaction: a novel approach to classification problems and drift counteraction. IEEE Sens J 12(11):3215–3224

    Article  Google Scholar 

  • Dua D, Graff C (2019) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml. Accessed Jan 2020

  • Enders CK, Baraldi AN (2018) Missing data handling methods, the Wiley handbook of psychometric testing: a multidisciplinary reference on survey, scale and test development, pp 139–185

  • Farhangfar A, Kurgan LA, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern Part A Syst Hum 37(5):692–709

    Article  Google Scholar 

  • Fletcher Mercaldo S, Blume JD (2020) Missing data and prediction: the pattern submodel. Biostatistics 21(2):236–252

    Article  MathSciNet  Google Scholar 

  • Folguera L, Zupan J, Cicerone D, Magallanes JF (2015) Self-organizing maps for imputation of missing data in incomplete data matrices. Chemomet Intell Lab Syst 143:146–151. https://doi.org/10.1016/j.chemolab.2015.03.002

    Article  Google Scholar 

  • Garcia C, Leite D, Škrjanc I (2019) Incremental missing-data imputation for evolving fuzzy granular prediction. IEEE Trans Fuzzy Syst 28(10):2348–2362. https://doi.org/10.1109/TFUZZ.2019.2935688

  • Garcia C, Esmin A, Leite D, Škrjanc I (2019) Evolvable fuzzy systems from data streams with missing values: with application to temporal pattern recognition and cryptocurrency prediction. Pattern Recognit Lett 128:278–282

    Article  Google Scholar 

  • Gelman A, Hill J (2006) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Grzymala-Busse JW, Goodwin LK, Grzymala-Busse WJ, Zheng X (2005) Handling missing attribute values in preterm birth data sets. In: International workshop on rough sets, fuzzy sets, data mining, and granular-soft computing, Springer, 2005, pp 342–351

  • Hadeed SJ, O’Rourke MK, Burgess JL, Harris RB, Canales RA (2020) Imputation methods for addressing missing data in short-term monitoring of air pollutants. Sci Total Environ 730:139140. https://doi.org/10.1016/j.scitotenv.2020.139140

    Article  Google Scholar 

  • Kang H (2013) The prevention and handling of the missing data. Korean J Anesthesiol 64(5):402

    Article  Google Scholar 

  • Kiersztyn A, Karczmarek P, Łopucki R, Pedrycz W, Al E, Kitowski I, Zbyryt A (2020) Data imputation in related time series using fuzzy set-based techniques. In: 2020 IEEE international conference on fuzzy systems (FUZZ-IEEE), 2020, pp 1–8. https://doi.org/10.1109/FUZZ48607.2020.9177617

  • Krause RW, Huisman M, Steglich C, Sniiders TA (2018) Missing network data a comparison of different imputation methods. In: 2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), IEEE, 2018, pp 159–163

  • Lau K, López R, Oñate E, Ortega E, Flores R, Mier-Torrecilla M, Idelsohn S, Sacco C, González E (2006) A neural networks approach for aerofoil noise prediction, master thesis, Department of Aeronautics, Imperial College of Science, Technology and Medicine. United Kingdom, London

  • Leite D, Škrjanc I (2019) Ensemble of evolving optimal granular experts. OWA Aggreg Time Ser Predict Inf Sci 504:95–112. https://doi.org/10.1016/j.ins.2019.07.053

    Article  Google Scholar 

  • Leite D, Škrjanc I, Gomide F (2020) An overview on evolving systems and learning from stream data. Evolving Syst 11:181–198. https://doi.org/10.1007/s12530-020-09334-5

  • Lemos A, Caminhas W, Gomide F (2010) Multivariable Gaussian evolving fuzzy modeling system. IEEE Trans Fuzzy Syst 19(1):91–104

    Article  Google Scholar 

  • Little TD, Lang KM, Wu W, Rhemtulla M (2016) Missing data. Dev Psychopathol 1:1–37. https://doi.org/10.1002/9781119125556.devpsy117

  • Lopez R, Balsa-Canto E, Oñate E (2008) Neural networks for variational problems in engineering. Int J Numer Methods Eng 75(11):1341–1360

    Article  MathSciNet  Google Scholar 

  • Lughofer E, Angelov P (2011) Handling drifts and shifts in on-line data streams with evolving fuzzy systems. Appl Soft Comput 11:2057–2068. https://doi.org/10.1016/j.asoc.2010.07.003

    Article  Google Scholar 

  • Mackey MC, Glass L (1977) Oscillation and chaos in physiological control systems. Science 197(4300):287–289

    Article  Google Scholar 

  • Myers TA (2011) Goodbye, listwise deletion: presenting hot deck imputation as an easy and effective tool for handling missing data. Commun Methods Meas 5(4):297–310

    Article  Google Scholar 

  • Nishanth KJ, Ravi V (2016) Probabilistic neural network based categorical data imputation. Neurocomputing 218:17–25. https://doi.org/10.1016/j.neucom.2016.08.044

    Article  Google Scholar 

  • Osman MS, Abu-Mahfouz AM, Page PR (2018) A survey on data imputation techniques: water distribution system as a use case. IEEE Access 6:63279–63291

    Article  Google Scholar 

  • Pedersen AB, Mikkelsen EM, Cronin-Fenton D, Kristensen NR, Pham TM, Pedersen L, Petersen I (2017) Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol 9:157

    Article  Google Scholar 

  • Pratama I, Permanasari AE, Ardiyanto I, Indrayani R (2016) A review of missing values handling methods on time-series data. In: 2016 international conference on information technology systems and innovation (ICITSI), IEEE, 2016, pp 1–6

  • Rachdi M, Laksaci A, Kaid Z, Benchiha A, Al-Awadhi FA (2021) k-nearest neighbors local linear regression for functional and missing data at random. Statistica Neerlandica 75(1):42–65. https://doi.org/10.1111/stan.12224

    Article  MathSciNet  Google Scholar 

  • Santos MS, Pereira RC, Costa AF, Soares JP, Santos J, Abreu PH (2019) Generating synthetic missing data: a review by missing mechanism. IEEE Access 7:11651–11667

    Article  Google Scholar 

  • Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7(2):147

    Article  Google Scholar 

  • Schneider T (2001) Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. J Clim 14(5):853–871

    Article  Google Scholar 

  • Sefidian AM, Daneshpour N (2020) Estimating missing data using novel correlation maximization based methods. Appl Soft Comput 91:106249. https://doi.org/10.1016/j.asoc.2020.106249

    Article  Google Scholar 

  • Shafronenko A, Bodyanskiy Y, Pliss I, Popov S (2020) Evolving neo-fuzzy system for distorted data online processing. In: 2020 10th international conference on advanced computer information technologies (ACIT), IEEE, 2020, pp 352–355

  • Silva AM (2014) Sistemas neuro-fuzzy evolutivos: Novos algoritmos de aprendizado e aplicacoes—in portuguese. Ph.D. thesis, UFMG—Federal University of Minas Gerais, Belo Horizonte, MG, Brazil

  • Silva AM, Caminhas W, Lemos A, Gomide F (2014) A fast learning algorithm for evolving neo-fuzzy neuron. Appl Soft Comput 14:194–209

    Article  Google Scholar 

  • Silva AM, Caminhas W, Lemos A, Gomide F (2015) Adaptive input selection and evolving neural fuzzy networks modeling. Int J Comput Intell Syst 8(sup1):3–14

    Google Scholar 

  • Škrjanc I (2019) Cluster-volume based merging concept for incrementally evolving fuzzy Gaussian clustering—eGAUSS+. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2019.2931874

  • Škrjanc I, Iglesias J, Sanchis de Miguel A, Leite D, Lughofer E, Gomide F (2019) Evolving fuzzy and neuro-fuzzy approaches in clustering, regression, identification, and classification: a survey. Inf Sci. https://doi.org/10.1016/j.ins.2019.03.060

  • Spinelli I, Scardapane S, Uncini A (2020) Missing data imputation with adversarially-trained graph convolutional networks. Neural Netw 129:249–260. https://doi.org/10.1016/j.neunet.2020.06.005

    Article  Google Scholar 

  • Tang J, Zhang G, Wang Y, Wang H, Liu F (2015) A hybrid approach to integrate fuzzy c-means based imputation method with genetic algorithm for missing traffic volume data estimation. Transport Res Part C Emerg Technol 51:29–40

    Article  Google Scholar 

  • Tang J, Zhang X, Yin W, Zou Y, Wang Y (2020) Missing data imputation for traffic flow based on combination of fuzzy neural network and rough set theory. J Intell Transport Syst. https://doi.org/10.1080/15472450.2020.1713772

  • van Ginkel JR, Linting M, Rippe RC, van der Voort A (2020) Rebutting existing misconceptions about multiple imputation as a method for handling missing data. J Person Assess 102(3):297–308

    Article  Google Scholar 

  • Yadav ML, Roychoudhury B (2018) Handling missing values: a study of popular imputation packages in r. Knowl Based Syst 160:104–118

    Article  Google Scholar 

  • Zhang Z (2016) Missing data imputation: focusing on single imputation. Ann Transl Med 4(1):1–8

    Google Scholar 

Download references

Acknowledgements

The authors acknowledges CAPES, Brazilian Ministry of Education, code 001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alisson Marques da Silva.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: missing data create

Appendix: missing data create

Algorithm 5 was used to create the datasets to MCAR scenarios. This algorithm receives the complete dataset and the missing rate for the new dataset. Then, it calculates the number of samples that should have missing data from the number of samples in the original dataset and the missing rate. Then, the algorithm randomly selects samples that will have missing data. Finally, for each sample chosen in the previous step, the algorithm randomly selects the variable that must-have missing value.

figure e

The Algorithm 6 was used to generate the datasets with MAR scenery. This algorithm receives the original dataset, the missing rate for the most propensity variable, and the missing rate for the other variables. The algorithm selects a variable from the base to be the variable most likely to have a missing value and calculates the probability of the other variables being missing based on the number of variables and the minimum missing rate established. Then, two vectors are started to control the selections, one zeroed, and one with the value of each index. With this, the algorithm iterates through each variable so that each one occupies a number of positions in the select vector relative to its probability percentage of missing. Finally, when iterating through each sample in the original dataset, the algorithm randomly selects an index from the list. If the index of a variable is selected, the variable receives a missing value.

figure f

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Júnior, G.A.d.S., Silva, A.M.d. A simple and efficient incremental missing data imputation method for evolving neo-fuzzy network. Evolving Systems 13, 201–220 (2022). https://doi.org/10.1007/s12530-021-09376-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-021-09376-3

Keywords

Navigation