Abstract
Classifying incomplete data remains a challenging task, as missing values can provide uncertain and imprecise information that reduces classification performance. To address this issue, we proposed a hybrid imputation-based optimal evidential classification (HOEC) method for missing data under the Dempster-Shafer theory framework. The proposed HOEC method can capture uncertainty and imprecision during imputation and classification procedures. Specifically, a hybrid imputation strategy was developed to estimate the missing values in the training and test sets by combining single and multiple imputations. Thus, we obtained accurate estimations and captured their uncertainties. An optimal evidential partition rule was then designed to adaptively submit an incomplete sample to a singleton class or meta-class under the Dempster-Shafer theory framework. Therefore, we can capture the imprecision caused by missing values and reduce classification errors. Experiments on several incomplete datasets demonstrated the effectiveness of the HOEC method compared with related methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability and Access
The datasets analyzed in this study are available at the UCI repository (http://archive.ics.uci.edu/).
Notes
The incomplete sample \(\textbf{y}_j\) is directly submitted to a specific singleton class if its KNNs in X come from one class, that is, \(V=1\).
For ease of understanding, we assume that \({\textbf {y}}\) is difficult to distinguish between the two singleton classes (\({\omega _\varphi }\) and \({\omega _{\max }}\)), that is, \(\{{\omega _\varphi },{\omega _{\max }}\}\) is the most likely meta-class that the sample \({{\textbf {y}}}_j\) may belong to.
References
Chen Y, Huang C, Lo Y, Chen Y, Lai F (2022) Combining attention with spectrum to handle missing values on time series data without imputation. Inf Sci 609:1271–1287
Liu X, Du S, Li T, Teng F, Yang Y (2023) A missing value filling model based on feature fusion enhanced autoencoder. Appl Intell 53(21):24931–24946
Wang W, Zhan J, Herrera-Viedma E (2022) A three-way decision approach with a probability dominance relation based on prospect theory for incomplete information systems. Inf Sci 611:199–224
Buonanno A, Di Gennaro G, Graditi G, Nogarotto A, Palmieri FA, Valenti M (2023) Fusion of energy sensors with missing values. Appl Intell 1–15
Little RJ, Rubin DB (2019) Statistical Analysis with Missing Data vol. 793. John Wiley & Sons, Inc., second edition
Sun Y, Li J, Xu Y, Zhang T, Wang X (2023) Deep learning versus conventional methods for missing data imputation: A review and comparative study. Expert Syst Appl 120201
Mundfrom DJ, Whitcomb A (1998) Imputing missing values: The effect on the accuracy of classification
Brás LP, Menezes JC (2007) Improving cluster-based missing value estimation of dna microarray data. Biomolecular Eng 24(2):273–282
Zhang K, Zhou F, Wu L, Xie N, He Z (2024) Semantic understanding and prompt engineering for large-scale traffic data imputation. Inf Fusion 102:102038
Qin J, Fu W, Gao H, Zheng WX (2016) Distributed \( k \)-means algorithm and fuzzy \( c \)-means algorithm for sensor networks based on multiagent consensus theory. IEEE Trans Cybernet 47(3):772–783
Dai J, Hu H, Hu Q, Huang W, Zheng N, Liu L (2017) Locally linear approximation approach for incomplete data. IEEE Trans Cybernetics 48(6):1720–1732
Liu S, Zhang J, Xiang Y, Zhou W (2017) Fuzzy-based information decomposition for incomplete and imbalanced data learning. IEEE Trans Fuzzy Syst 25(6):1476–1490
Karmitsa N, Taheri S, Bagirov A, Mäkinen P (2020) Missing value imputation via clusterwise linear regression. IEEE Trans Knowl Data Eng 34(4):1889–1901
Ali A, Abu-Elkheir M, Atwan A, Elmogy M (2023) Missing values imputation using fuzzy k-top matching value. J King Saud University-Comput Inf Sci 35(1):426–437
Zahin SA, Ahmed CF, Alam T (2018) An effective method for classification with missing values. Appl Intell 48:3209–3230
Kenward MG, Carpenter J (2007) Multiple imputation: current perspectives. Stat Methods Med Res 16(3):199–218
Hu Y, Yang Z, Hou W (2023) Multiple Receding Imputation of Time Series Based on Similar Conditions Screening. IEEE Trans Knowl Data Eng 35(3):2837–2846
Faisal S, Tutz G (2021) Multiple imputation using nearest neighbor methods. Inf Sci 570:500–516
Zhao F, Lu Y, Li X, Wang L, Song Y, Fan D, Zhang C, Chen X (2022) Multiple imputation method of missing credit risk assessment data based on generative adversarial networks. Appl Soft Comput 126:109273
Zhao F, Lu Y, Li X, Wang L, Song Y, Fan D, Zhang C, Chen X (2022) Multiple imputation method of missing credit risk assessment data based on generative adversarial networks. Appl Soft Comput 126:109273
Shafer GA (1978) A mathematical theory of evidence. Technometrics 20(1):106–106
Denœux T (2023) Quantifying Prediction Uncertainty in Regression Using Random Fuzzy Sets: The ENNreg Model. IEEE Trans Fuzzy Syst 31(10):3690–3699
Smets P (1990) The combination of evidence in the transferable belief model. IEEE Trans Pattern Anal Mach Intell 12(5):447–458
Liu Z, Pan Q, Mercier G, Dezert J (2014) A new incomplete pattern classification method based on evidential reasoning. IEEE Trans Cybernetics 45(4):635–646
Liu Z, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recognition 52:85–95
Ma Z, Tian H, Liu Z, Zhang Z (2020) A new incomplete pattern belief classification method with multiple estimations based on knn. Appl Soft Comput 90:106175
Zhang Z, Tian H, Yan L, Martin A, Zhou K (2021) Learning a credal classifier with optimized and adaptive multiestimation for missing data imputation. IEEE Trans Syst, Man, Cybernetics: Syst 52(7):4092–4104
Zhang Z, Ye S, Zhang Y, Ding W, Wang H (2022) Belief combination of classifiers for incomplete data. IEEE/CAA J Automatica Sinica 9(4):652–667
Cui H, Zhang H, Chang Y, Kang B (2023) Bgc: Belief gravitational clustering approach and its application in the counter-deception of belief functions. Eng Appl Artif Intell 123:106235
Zhang Z, Ye S, Liu Z, Wang H, Ding W (2023) Deep Hyperspherical Clustering for Skin Lesion Medical Image Segmentation. IEEE J Biomed Health Inf 27(8):3770–3781
Jiao L, Yang H, Wang F, Liu Z, Pan Q (2023) Dtec: Decision tree-based evidential clustering for interpretable partition of uncertain data. Pattern Recognition 144:109846
Zhang Z, Liu Z, Martin A, Zhou K (2022) Bsc: Belief shift clustering. IEEE Trans Syst, Man, Cybernetics: Syst 53(3):1748–1760
Xiao F (2022) GEJS: A generalized evidential divergence measure for multisource information fusion. IEEE Trans Syst, Man, Cybernetics: Syst 53(4):2246–2258
Hua Z, Jing X (2023) An improved belief hellinger divergence for dempster-shafer theory and its application in multi-source information fusion. Appl Intell 1–20
Zhang XX, Wang YM, Chen SQ, Chen L (2021) Discrete-valued belief structures combination and normalization using evidential reasoning rule. Appl Intell 51:1379–1393
Denoeux T (1995) A k-nearest neighbor classification rule based on dempster-shafer theory. IEEE Trans Syst, Man, Cybernetics 25(5):804–813
Zhang Z (2016) Introduction to machine learning: k-nearest neighbors. Annal Translational Med 4(11):218–218
Rokach L (2016) Decision forest: Twenty years of research. Inf Fusion 27:111–125
Liu Z, Pan Q, Dezert J, Han J, He Y (2017) Classifier fusion with contextual reliability evaluation. IEEE Trans Cybernetics 48(5):1605–1618
Frank A (2010) Uci machine learning repository. http://archive.ics.uci.edu/ml
Acknowledgements
This study was partially supported by the National Key Research and Development Program of China (No. 2018YFCXXXXXXX), Henan Major Public Welfare Project (No. 201300311200), and Henan Key Research and Development Project(No. 231111211600).
Author information
Authors and Affiliations
Contributions
Zhen Zhang: Methodology, Supervision, Writing and Editing. Hong-peng Tian: Software, Methodology, and Original draft preparation.
Corresponding author
Ethics declarations
Competing Interests
The authors declare that they have no competing financial interests or personal relationships that may have influenced the work reported in this study.
Ethical and Informed Consent for Data Used
We declare that this study is an original work and has not been published or submitted elsewhere. We confirm that the order of the authors listed in the manuscript was approved by all authors and that informed consent was obtained from all authors involved in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Z., Tian, Hp. Hybrid imputation-based optimal evidential classification for missing data. Appl Intell 55, 69 (2025). https://doi.org/10.1007/s10489-024-05950-9
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05950-9