A simple and efficient incremental missing data imputation method for evolving neo-fuzzy network

Júnior, Giovanni Amormino da Silva; Silva, Alisson Marques da

doi:10.1007/s12530-021-09376-3

A simple and efficient incremental missing data imputation method for evolving neo-fuzzy network

Original Paper
Published: 08 April 2021

Volume 13, pages 201–220, (2022)
Cite this article

Evolving Systems Aims and scope Submit manuscript

Giovanni Amormino da Silva Júnior¹ &
Alisson Marques da Silva ORCID: orcid.org/0000-0002-1023-6514¹

369 Accesses
Explore all metrics

Abstract

A great challenge in real-world applications driven which use data streams to solve forecast problems is handling missing data. Although there are methods to reduce the effects caused by this issue, most systems are not modeled in a preventive way to enable an adequate treatment of this type of occurrence. In this context, this paper introduces a new evolving fuzzy approach called evolving Neo-Fuzzy Neuron with Missing Data Procedure (eNFN-MDP), that handles single and multiple missing values on data samples. eNFN-MDP checks whether there are variables with missing values for each new sample. If one or more missing values are found, the estimated values are imputed. Then, the output is computed with all available values. Forecasting examples illustrate the usefulness of the approach. Experimental comparisons in Missing at Random and Missing Completely at Random in nonstationary environments are performed. The results of the eNFN-MDP are compared with state-of-the-art methods and models. Simulations results show that the eNFN-MDP achieves as high as or higher performance than the remaining evolving modeling methods. Therefore, the experimental results suggest the proposed approach as a simple and efficient alternative for data imputation in evolving modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Co-active neuro-fuzzy inference system model as single imputation approach for non-monotone pattern of missing data

Article 09 February 2021

Missing data imputation using decision trees and fuzzy clustering with iterative learning

Article 11 December 2019

Missing information in imbalanced data stream: fuzzy adaptive imputation approach

Article 16 August 2021

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Notes

Also known as Missing Not Random—MNAR.
The eMG and eNFN Matlab code and the Python implementation of the eFPG were made available by the respective authors. The eNFN-MDP was developed in Matlab.
http://archive.ics.uci.edu/ml/datasets/airfoil+self-noise.
http://archive.ics.uci.edu/ml/datasets/Air+Quality.

References

Angelov P, Filev D (2005) Simpl\_ets: a simplified method for learning evolving Takagi-Sugeno fuzzy models. In: The 14th IEEE international conference on fuzzy systems, 2005. FUZZ’05., IEEE, 2005, pp 1068–1073
Aguiar C, Leite D (2020) Unsurpervised fuzzy eIX: Evolving Internal-eXternal Fuzzy Clustering. In: Proceedings of the IEEE conference on evolving and adaptive intelligent systems (EAIS), IEEE, 2020, pp 1–8
Amiri M, Jensen R (2016) Missing data imputation using fuzzy-rough methods. Neurocomputing 205:152–164
Article Google Scholar
Angelov P (2011) Fuzzily connected multimodel systems evolving autonomously from data streams. IEEE Trans Syst Man Cybern Part B (Cybern) 41(4):898–910. https://doi.org/10.1109/TSMCB.2010.2098866
Article Google Scholar
Angelov PP, Filev DP (2004) An approach to online identification of Takagi-Sugeno fuzzy models. IEEE Trans Syst Man Cybern Part B (Cybern) 34(1):484–498
Article Google Scholar
Angelov P, Zhou X (2006) Evolving fuzzy systems from data streams in real-time. In: (2006) International symposium on evolving fuzzy systems. IEEE 2006:29–35
Angelov PP, Gu X, Príncipe JC (2017) Autonomous learning multimodel systems from data streams. IEEE Trans Fuzzy Syst 26(4):2213–2224
Article Google Scholar
Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35. https://doi.org/10.1016/j.ins.2013.01.021
Article Google Scholar
Bezerra C, Costa B, Guedes L, Angelov P (2020) An evolving approach to data streams clustering based on typicality and eccentricity data analytics. Inf Sci 518:13–28. https://doi.org/10.1016/j.ins.2019.12.022
Article MathSciNet MATH Google Scholar
Brooks TF, Pope DS, Marcolini MA (1989) Airfoil self-noise and prediction, technical report no. NASA RP-1218, National Aeronautics of Space Administration, Office of Management (1989), p 146
Caminhas W, Gomide F (2000) A fast learning algorithm for neofuzzy networks. In: Proceedings of information processing and management of uncertainty in knowledge based systems, vol 1, pp 1784–1790
Cheng C-Y, Tseng W-L, Chang C-F, Chang C-H, Gau SS-F (2020) A deep learning approach for missing data imputation of rating scales assessing attention-deficit hyperactivity disorder. Front Psychiatry 11:673. https://doi.org/10.3389/fpsyt.2020.00673
Article Google Scholar
Chen L-T, Feng Y, Wu P-J, Peng C-YJ (2020) Dealing with missing data by EM in single-case studies. Behav Res Methods 52(1):131–150. https://doi.org/10.3758/s13428-019-01210-8
Article Google Scholar
Cooke M, Morris A, Green P (1997) Missing data techniques for robust speech recognition. In: 1997 IEEE international conference on acoustics, speech, and signal processing, vol 2, IEEE, 1997, pp 863–866
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–22
MathSciNet MATH Google Scholar
De Vito S, Massera E, Piga M, Martinotto L, Di Francia G (2008) On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sens Actuators B Chem 129(2):750–757
Article Google Scholar
DeVito S, Piga M, Martinotto L, DiFrancia G (2009) Co, NO2 and NOX urban pollution monitoring with on-field calibrated electronic nose by automatic Bayesian regularization. Sens Actuators B Chem 143(1):182–191
Article Google Scholar
De Vito S, Fattoruso G, Pardo M, Tortorella F, Di Francia G (2012) Semi-supervised learning techniques in artificial olfaction: a novel approach to classification problems and drift counteraction. IEEE Sens J 12(11):3215–3224
Article Google Scholar
Dua D, Graff C (2019) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml. Accessed Jan 2020
Enders CK, Baraldi AN (2018) Missing data handling methods, the Wiley handbook of psychometric testing: a multidisciplinary reference on survey, scale and test development, pp 139–185
Farhangfar A, Kurgan LA, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern Part A Syst Hum 37(5):692–709
Article Google Scholar
Fletcher Mercaldo S, Blume JD (2020) Missing data and prediction: the pattern submodel. Biostatistics 21(2):236–252
Article MathSciNet Google Scholar
Folguera L, Zupan J, Cicerone D, Magallanes JF (2015) Self-organizing maps for imputation of missing data in incomplete data matrices. Chemomet Intell Lab Syst 143:146–151. https://doi.org/10.1016/j.chemolab.2015.03.002
Article Google Scholar
Garcia C, Leite D, Škrjanc I (2019) Incremental missing-data imputation for evolving fuzzy granular prediction. IEEE Trans Fuzzy Syst 28(10):2348–2362. https://doi.org/10.1109/TFUZZ.2019.2935688
Garcia C, Esmin A, Leite D, Škrjanc I (2019) Evolvable fuzzy systems from data streams with missing values: with application to temporal pattern recognition and cryptocurrency prediction. Pattern Recognit Lett 128:278–282
Article Google Scholar
Gelman A, Hill J (2006) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge
Book Google Scholar
Grzymala-Busse JW, Goodwin LK, Grzymala-Busse WJ, Zheng X (2005) Handling missing attribute values in preterm birth data sets. In: International workshop on rough sets, fuzzy sets, data mining, and granular-soft computing, Springer, 2005, pp 342–351
Hadeed SJ, O’Rourke MK, Burgess JL, Harris RB, Canales RA (2020) Imputation methods for addressing missing data in short-term monitoring of air pollutants. Sci Total Environ 730:139140. https://doi.org/10.1016/j.scitotenv.2020.139140
Article Google Scholar
Kang H (2013) The prevention and handling of the missing data. Korean J Anesthesiol 64(5):402
Article Google Scholar
Kiersztyn A, Karczmarek P, Łopucki R, Pedrycz W, Al E, Kitowski I, Zbyryt A (2020) Data imputation in related time series using fuzzy set-based techniques. In: 2020 IEEE international conference on fuzzy systems (FUZZ-IEEE), 2020, pp 1–8. https://doi.org/10.1109/FUZZ48607.2020.9177617
Krause RW, Huisman M, Steglich C, Sniiders TA (2018) Missing network data a comparison of different imputation methods. In: 2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), IEEE, 2018, pp 159–163
Lau K, López R, Oñate E, Ortega E, Flores R, Mier-Torrecilla M, Idelsohn S, Sacco C, González E (2006) A neural networks approach for aerofoil noise prediction, master thesis, Department of Aeronautics, Imperial College of Science, Technology and Medicine. United Kingdom, London
Leite D, Škrjanc I (2019) Ensemble of evolving optimal granular experts. OWA Aggreg Time Ser Predict Inf Sci 504:95–112. https://doi.org/10.1016/j.ins.2019.07.053
Article Google Scholar
Leite D, Škrjanc I, Gomide F (2020) An overview on evolving systems and learning from stream data. Evolving Syst 11:181–198. https://doi.org/10.1007/s12530-020-09334-5
Lemos A, Caminhas W, Gomide F (2010) Multivariable Gaussian evolving fuzzy modeling system. IEEE Trans Fuzzy Syst 19(1):91–104
Article Google Scholar
Little TD, Lang KM, Wu W, Rhemtulla M (2016) Missing data. Dev Psychopathol 1:1–37. https://doi.org/10.1002/9781119125556.devpsy117
Lopez R, Balsa-Canto E, Oñate E (2008) Neural networks for variational problems in engineering. Int J Numer Methods Eng 75(11):1341–1360
Article MathSciNet Google Scholar
Lughofer E, Angelov P (2011) Handling drifts and shifts in on-line data streams with evolving fuzzy systems. Appl Soft Comput 11:2057–2068. https://doi.org/10.1016/j.asoc.2010.07.003
Article Google Scholar
Mackey MC, Glass L (1977) Oscillation and chaos in physiological control systems. Science 197(4300):287–289
Article Google Scholar
Myers TA (2011) Goodbye, listwise deletion: presenting hot deck imputation as an easy and effective tool for handling missing data. Commun Methods Meas 5(4):297–310
Article Google Scholar
Nishanth KJ, Ravi V (2016) Probabilistic neural network based categorical data imputation. Neurocomputing 218:17–25. https://doi.org/10.1016/j.neucom.2016.08.044
Article Google Scholar
Osman MS, Abu-Mahfouz AM, Page PR (2018) A survey on data imputation techniques: water distribution system as a use case. IEEE Access 6:63279–63291
Article Google Scholar
Pedersen AB, Mikkelsen EM, Cronin-Fenton D, Kristensen NR, Pham TM, Pedersen L, Petersen I (2017) Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol 9:157
Article Google Scholar
Pratama I, Permanasari AE, Ardiyanto I, Indrayani R (2016) A review of missing values handling methods on time-series data. In: 2016 international conference on information technology systems and innovation (ICITSI), IEEE, 2016, pp 1–6
Rachdi M, Laksaci A, Kaid Z, Benchiha A, Al-Awadhi FA (2021) k-nearest neighbors local linear regression for functional and missing data at random. Statistica Neerlandica 75(1):42–65. https://doi.org/10.1111/stan.12224
Article MathSciNet Google Scholar
Santos MS, Pereira RC, Costa AF, Soares JP, Santos J, Abreu PH (2019) Generating synthetic missing data: a review by missing mechanism. IEEE Access 7:11651–11667
Article Google Scholar
Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7(2):147
Article Google Scholar
Schneider T (2001) Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. J Clim 14(5):853–871
Article Google Scholar
Sefidian AM, Daneshpour N (2020) Estimating missing data using novel correlation maximization based methods. Appl Soft Comput 91:106249. https://doi.org/10.1016/j.asoc.2020.106249
Article Google Scholar
Shafronenko A, Bodyanskiy Y, Pliss I, Popov S (2020) Evolving neo-fuzzy system for distorted data online processing. In: 2020 10th international conference on advanced computer information technologies (ACIT), IEEE, 2020, pp 352–355
Silva AM (2014) Sistemas neuro-fuzzy evolutivos: Novos algoritmos de aprendizado e aplicacoes—in portuguese. Ph.D. thesis, UFMG—Federal University of Minas Gerais, Belo Horizonte, MG, Brazil
Silva AM, Caminhas W, Lemos A, Gomide F (2014) A fast learning algorithm for evolving neo-fuzzy neuron. Appl Soft Comput 14:194–209
Article Google Scholar
Silva AM, Caminhas W, Lemos A, Gomide F (2015) Adaptive input selection and evolving neural fuzzy networks modeling. Int J Comput Intell Syst 8(sup1):3–14
Google Scholar
Škrjanc I (2019) Cluster-volume based merging concept for incrementally evolving fuzzy Gaussian clustering—eGAUSS+. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2019.2931874
Škrjanc I, Iglesias J, Sanchis de Miguel A, Leite D, Lughofer E, Gomide F (2019) Evolving fuzzy and neuro-fuzzy approaches in clustering, regression, identification, and classification: a survey. Inf Sci. https://doi.org/10.1016/j.ins.2019.03.060
Spinelli I, Scardapane S, Uncini A (2020) Missing data imputation with adversarially-trained graph convolutional networks. Neural Netw 129:249–260. https://doi.org/10.1016/j.neunet.2020.06.005
Article Google Scholar
Tang J, Zhang G, Wang Y, Wang H, Liu F (2015) A hybrid approach to integrate fuzzy c-means based imputation method with genetic algorithm for missing traffic volume data estimation. Transport Res Part C Emerg Technol 51:29–40
Article Google Scholar
Tang J, Zhang X, Yin W, Zou Y, Wang Y (2020) Missing data imputation for traffic flow based on combination of fuzzy neural network and rough set theory. J Intell Transport Syst. https://doi.org/10.1080/15472450.2020.1713772
van Ginkel JR, Linting M, Rippe RC, van der Voort A (2020) Rebutting existing misconceptions about multiple imputation as a method for handling missing data. J Person Assess 102(3):297–308
Article Google Scholar
Yadav ML, Roychoudhury B (2018) Handling missing values: a study of popular imputation packages in r. Knowl Based Syst 160:104–118
Article Google Scholar
Zhang Z (2016) Missing data imputation: focusing on single imputation. Ann Transl Med 4(1):1–8
Google Scholar

Download references

Acknowledgements

The authors acknowledges CAPES, Brazilian Ministry of Education, code 001.

Author information

Authors and Affiliations

Graduate Program in Mathematical and Computational Modeling, CEFET-MG-Federal Center for Technological Education of Minas Gerais, Av. Amazonas, 7675-Nova Gameleira, Belo Horizonte, Minas Gerais, Brazil
Giovanni Amormino da Silva Júnior & Alisson Marques da Silva

Authors

Giovanni Amormino da Silva Júnior
View author publications
You can also search for this author inPubMed Google Scholar
Alisson Marques da Silva
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Alisson Marques da Silva.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: missing data create

Algorithm 5 was used to create the datasets to MCAR scenarios. This algorithm receives the complete dataset and the missing rate for the new dataset. Then, it calculates the number of samples that should have missing data from the number of samples in the original dataset and the missing rate. Then, the algorithm randomly selects samples that will have missing data. Finally, for each sample chosen in the previous step, the algorithm randomly selects the variable that must-have missing value.

The Algorithm 6 was used to generate the datasets with MAR scenery. This algorithm receives the original dataset, the missing rate for the most propensity variable, and the missing rate for the other variables. The algorithm selects a variable from the base to be the variable most likely to have a missing value and calculates the probability of the other variables being missing based on the number of variables and the minimum missing rate established. Then, two vectors are started to control the selections, one zeroed, and one with the value of each index. With this, the algorithm iterates through each variable so that each one occupies a number of positions in the select vector relative to its probability percentage of missing. Finally, when iterating through each sample in the original dataset, the algorithm randomly selects an index from the list. If the index of a variable is selected, the variable receives a missing value.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Júnior, G.A.d.S., Silva, A.M.d. A simple and efficient incremental missing data imputation method for evolving neo-fuzzy network. Evolving Systems 13, 201–220 (2022). https://doi.org/10.1007/s12530-021-09376-3

Download citation

Received: 05 October 2020
Accepted: 17 March 2021
Published: 08 April 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s12530-021-09376-3

Keywords

Profiles

Alisson Marques da Silva View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A simple and efficient incremental missing data imputation method for evolving neo-fuzzy network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Co-active neuro-fuzzy inference system model as single imputation approach for non-monotone pattern of missing data

Missing data imputation using decision trees and fuzzy clustering with iterative learning

Missing information in imbalanced data stream: fuzzy adaptive imputation approach

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: missing data create

Appendix: missing data create

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now