Missing information in imbalanced data stream: fuzzy adaptive imputation approach

Halder, 
                              Bohnishikha; Ahmed, Md Manjur; Amagasa, Toshiyuki; Isa, Nor Ashidi Mat; Faisal, Rahat Hossain; Rahman, Md. Mostafijur

doi:10.1007/s10489-021-02741-4

Missing information in imbalanced data stream: fuzzy adaptive imputation approach

Published: 16 August 2021

Volume 52, pages 5561–5583, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Bohnishikha Halder¹,
Md Manjur Ahmed ORCID: orcid.org/0000-0002-0977-4091¹,
Toshiyuki Amagasa²,
Nor Ashidi Mat Isa³,
Rahat Hossain Faisal¹ &
…
Md. Mostafijur Rahman⁴

726 Accesses
9 Citations
Explore all metrics

Abstract

From a real-world perspective, missing information is an ordinary scenario in data stream. Generally, missing data generate diverse problems in recognizing the pattern of data (i.e., clustering and classification). Particularly, missing data in data stream is a challenging topic. With imbalanced data, the problem of missing data greatly affects pattern recognition. As a solution to all these issues, this study puts forward an adaptive technique with fuzzy-based information decomposition method, which simultaneously solves the problem of incomplete data and overcomes the imbalanced data stream in a dataset. The main purpose of the proposed fuzzy adaptive imputation approach (FAIA) is to represent the effect of missing values in imbalance data stream and handle the missing data problem in imbalance data stream. FAIA is a single pass method. It considers adaptive selection of intervals based on all observed instances by using the interrelationship of attributes to identify correct interval for computing missing instances. Here, the interrelationship of two attributes means one attribute’s value of an instance depends on another attribute’s value of the same instance. In FAIA, after measuring all interval distances from a certain missing value, the least distance is selected for this missing value. Synthetic data of minority class are generated using the same process of missing value imputation for balancing data that is called oversampling. Instances of the datasets are divided into the chunks in data stream to balance data without any ensemble of previous chunks because missing values may misguide the future chunks. To demonstrate the performance of FAIA, the experiment is divided into three parts: missing data imputation, imbalanced information for offline data for data stream, and imbalanced information with missing value for offline data. Eleven numerical datasets with different dimensions from various repositories are considered for the computing performance of missing data imputation and imbalanced data without data stream. Four different datasets are also used to measure the performance of imbalanced data stream. In maximum measuring cases, the proposed method outperforms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Missing value imputation using a fuzzy clustering-based EM approach

Article 25 February 2015

Missing data imputation using decision trees and fuzzy clustering with iterative learning

Article 11 December 2019

Data-Driven Machine Learning Approach for Predicting Missing Values in Large Data Sets: A Comparison Study

References

Pan R, Yang T, Cao J, Lu K, Zhang Z (2015) Missing data imputation by K nearest neighbours based on grey relational structure and mutual information. Appl Intell 43(3):614–632
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Liu S, Zhang J, Xiang Y, Zhou W (Dec. 2017) Fuzzy-based information decomposition for incomplete and imbalanced data learning. IEEE Trans Fuzzy Syst 25(6):1476–1490
Article Google Scholar
Ren S, Zhu W, Liao B, Li Z, Wang P, Li K, Chen M, Li Z (2019) Selection-based resampling ensemble algorithm for nonstationary imbalanced stream data learning. Knowl-Based Syst 163:705–722
Article Google Scholar
Lu Y, Cheung Y-M, Tang YY (2019) Adaptive Chunk-Based Dynamic Weighted Majority for Imbalanced Data Streams With Concept Drift." IEEE Transactions on Neural Networks and Learning Systems
Ng WWY, Zhang J, Lai CS, Pedrycz W, Lai LL, Wang X (2018) Cost-sensitive weighting and imbalance-reversed bagging for streaming imbalanced and concept drifting in electricity pricing classification. IEEE Trans Ind Inf 15(3):1588–1597
Article Google Scholar
Zhu X, Zhang S, Jin Z, Zhang Z, Zhuoming X (2010) Missing value estimation for mixed-attribute data sets. IEEE Trans Knowl Data Eng 23(1):110–121
Article Google Scholar
Little RJA, Rubin DB (2019) Statistical analysis with missing data. Vol. 793. John Wiley & Sons
Lim P, Goh CK, Tan KC (2016) Evolutionary cluster-based synthetic oversampling ensemble (eco-ensemble) for imbalance learning. IEEE Trans Cybern 47(9):2850–2861
Article Google Scholar
Yoon J, Zame WR, van der Schaar M (May 2019) Estimating missing data in temporal data streams using multi-directional recurrent neural networks. IEEE Trans Biomed Eng 66(5):1477–1490
Article Google Scholar
Zhang S (2012) Nearest neighbor selection for iteratively kNN imputation. J Syst Softw 85(11):2541–2552
Article Google Scholar
García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR, Verleysen M (2009) K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72(7–9):1483–1493
Article Google Scholar
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Dataset Name: Iris dataset. https://archive.ics.uci.edu/ml/datasets/iris. Retrieved on January, 2021
Wang S, Minku LL, Yao X (2018) A systematic study of online class imbalance learning with concept drift. IEEE Trans Neural Netw Learn Syst 29(10):4802–4821
Article Google Scholar
Folguera L, Zupan J, Cicerone D, Magallanes JF (2015) Self-organizing maps for imputation of missing data in incomplete data matrices. Chemom Intell Lab Syst 143:146–151
Article Google Scholar
Cristianini N, Shawe-Taylor J (2004) Support vector machines and other kernel-based learning methods, Cambridge
Brás LP, Menezes JC (2007) Improving cluster-based missing value estimation of DNA microarray data. Biomol Eng 24(2):273–282
Article Google Scholar
Brzezinski D, Stefanowski J, Susmaga R, Szczech I (Aug. 2020) On the dynamics of classification measures for imbalanced and streaming data. IEEE Trans Neural Netw Learn Syst 31(8):2868–2878
Article Google Scholar
Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. International conference on intelligent computing. Springer, Berlin, Heidelberg
Barua S, Islam MM, Yao X, Murase K (2012) MWMOTE--majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425
Article Google Scholar
Zhang C, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-imbalance: an open-source software for multi-class imbalance learning. Knowl-Based Syst 174:137–143
Article Google Scholar
Sun J, Li H, Fujita H, Binbin F, Ai W (2020) Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weighting. Information Fusion 54:128–144
Article Google Scholar
Gao J, Fan W, Han J, Yu PS (2007) A general framework for mining concept-drifting data streams with skewed distributions. In Proceedings of the 2007 siam international conference on data mining, pp. 3–14. Society for Industrial and Applied Mathematics
Chen S, He H (2009) Sera: selectively recursive approach towards nonstationary imbalanced stream data mining. 2009 International Joint Conference on Neural Networks. IEEE
Chen S, He H, Li K, Desai S (2010) Musera: Multiple selectively recursive approach towards imbalanced stream data mining. In The 2010 international joint conference on neural networks (IJCNN), pp. 1–8. IEEE
Ren S, Liao B, Zhu W, Li Z, Liu W, Li K (2018) The gradual resampling ensemble for mining imbalanced data streams with concept drift. Neurocomputing 286:150–166
Article Google Scholar
Chongfu H (1997) Principle of information diffusion. Fuzzy Sets Syst 91(1):69–90
Article MathSciNet Google Scholar
Datasets for analyzing the data streaming: Electricity market dataset, Weather dataset, Hyperplane dataset, SEA dataset. https://github.com/vlosing/driftDatasets, Retrieved on July, 2020
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C (Applications and Reviews) 42(4):463–484
Article Google Scholar
Arabmakki E, Kantardzic M (2017) SOM-based partial labeling of imbalanced data stream. Neurocomputing 262:120–133
Article Google Scholar
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, no. 4 463–484
Ditzler G, Polikar R (2012) Incremental learning of concept drift from streaming imbalanced data. IEEE Trans Knowl Data Eng 25.10: 2283–2301

Download references

Acknowledgements

This research work was supported by Special Grant of ICT Division (Ministry of Posts, Telecommunications and Information Technology), Bangladesh, Grant No. 56.00.0000.028.20.004.20-333. The authors would like to acknowledge Ministry of Higher Education, Malaysia for their partial support through the Fundamental Research Grant Scheme (FRGS), under Grant 203/PELECT/6071398.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Barishal, Barishal, 8200, Bangladesh
Bohnishikha Halder, Md Manjur Ahmed & Rahat Hossain Faisal
Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan
Toshiyuki Amagasa
School of Electrical and Electronic Engineering, Engineering Campus, Universiti Sains Malaysia, 14300 Nibong Tebal, Penang, Malaysia
Nor Ashidi Mat Isa
Department of Software Engineering, Daffodil International University, Dhaka, Bangladesh
Md. Mostafijur Rahman

Authors

Bohnishikha Halder
View author publications
You can also search for this author in PubMed Google Scholar
Md Manjur Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Toshiyuki Amagasa
View author publications
You can also search for this author in PubMed Google Scholar
Nor Ashidi Mat Isa
View author publications
You can also search for this author in PubMed Google Scholar
Rahat Hossain Faisal
View author publications
You can also search for this author in PubMed Google Scholar
Md. Mostafijur Rahman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Md Manjur Ahmed or Nor Ashidi Mat Isa.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Halder, ., Ahmed, M.M., Amagasa, T. et al. Missing information in imbalanced data stream: fuzzy adaptive imputation approach. Appl Intell 52, 5561–5583 (2022). https://doi.org/10.1007/s10489-021-02741-4

Download citation

Accepted: 04 August 2021
Published: 16 August 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10489-021-02741-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Missing information in imbalanced data stream: fuzzy adaptive imputation approach

Abstract

Access this article

Similar content being viewed by others

Missing value imputation using a fuzzy clustering-based EM approach

Missing data imputation using decision trees and fuzzy clustering with iterative learning

Data-Driven Machine Learning Approach for Predicting Missing Values in Large Data Sets: A Comparison Study

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Missing information in imbalanced data stream: fuzzy adaptive imputation approach

Abstract

Access this article

Similar content being viewed by others

Missing value imputation using a fuzzy clustering-based EM approach

Missing data imputation using decision trees and fuzzy clustering with iterative learning

Data-Driven Machine Learning Approach for Predicting Missing Values in Large Data Sets: A Comparison Study

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation