Skip to main content

Advertisement

Hybrid imputation-based optimal evidential classification for missing data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Classifying incomplete data remains a challenging task, as missing values can provide uncertain and imprecise information that reduces classification performance. To address this issue, we proposed a hybrid imputation-based optimal evidential classification (HOEC) method for missing data under the Dempster-Shafer theory framework. The proposed HOEC method can capture uncertainty and imprecision during imputation and classification procedures. Specifically, a hybrid imputation strategy was developed to estimate the missing values in the training and test sets by combining single and multiple imputations. Thus, we obtained accurate estimations and captured their uncertainties. An optimal evidential partition rule was then designed to adaptively submit an incomplete sample to a singleton class or meta-class under the Dempster-Shafer theory framework. Therefore, we can capture the imprecision caused by missing values and reduce classification errors. Experiments on several incomplete datasets demonstrated the effectiveness of the HOEC method compared with related methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability and Access

The datasets analyzed in this study are available at the UCI repository (http://archive.ics.uci.edu/).

Notes

  1. The incomplete sample \(\textbf{y}_j\) is directly submitted to a specific singleton class if its KNNs in X come from one class, that is, \(V=1\).

  2. For ease of understanding, we assume that \({\textbf {y}}\) is difficult to distinguish between the two singleton classes (\({\omega _\varphi }\) and \({\omega _{\max }}\)), that is, \(\{{\omega _\varphi },{\omega _{\max }}\}\) is the most likely meta-class that the sample \({{\textbf {y}}}_j\) may belong to.

References

  1. Chen Y, Huang C, Lo Y, Chen Y, Lai F (2022) Combining attention with spectrum to handle missing values on time series data without imputation. Inf Sci 609:1271–1287

    Article  MATH  Google Scholar 

  2. Liu X, Du S, Li T, Teng F, Yang Y (2023) A missing value filling model based on feature fusion enhanced autoencoder. Appl Intell 53(21):24931–24946

    Article  Google Scholar 

  3. Wang W, Zhan J, Herrera-Viedma E (2022) A three-way decision approach with a probability dominance relation based on prospect theory for incomplete information systems. Inf Sci 611:199–224

    Article  MATH  Google Scholar 

  4. Buonanno A, Di Gennaro G, Graditi G, Nogarotto A, Palmieri FA, Valenti M (2023) Fusion of energy sensors with missing values. Appl Intell 1–15

  5. Little RJ, Rubin DB (2019) Statistical Analysis with Missing Data vol. 793. John Wiley & Sons, Inc., second edition

  6. Sun Y, Li J, Xu Y, Zhang T, Wang X (2023) Deep learning versus conventional methods for missing data imputation: A review and comparative study. Expert Syst Appl 120201

  7. Mundfrom DJ, Whitcomb A (1998) Imputing missing values: The effect on the accuracy of classification

  8. Brás LP, Menezes JC (2007) Improving cluster-based missing value estimation of dna microarray data. Biomolecular Eng 24(2):273–282

    Article  MATH  Google Scholar 

  9. Zhang K, Zhou F, Wu L, Xie N, He Z (2024) Semantic understanding and prompt engineering for large-scale traffic data imputation. Inf Fusion 102:102038

    Article  Google Scholar 

  10. Qin J, Fu W, Gao H, Zheng WX (2016) Distributed \( k \)-means algorithm and fuzzy \( c \)-means algorithm for sensor networks based on multiagent consensus theory. IEEE Trans Cybernet 47(3):772–783

    Article  MATH  Google Scholar 

  11. Dai J, Hu H, Hu Q, Huang W, Zheng N, Liu L (2017) Locally linear approximation approach for incomplete data. IEEE Trans Cybernetics 48(6):1720–1732

    Article  MATH  Google Scholar 

  12. Liu S, Zhang J, Xiang Y, Zhou W (2017) Fuzzy-based information decomposition for incomplete and imbalanced data learning. IEEE Trans Fuzzy Syst 25(6):1476–1490

    Article  MATH  Google Scholar 

  13. Karmitsa N, Taheri S, Bagirov A, Mäkinen P (2020) Missing value imputation via clusterwise linear regression. IEEE Trans Knowl Data Eng 34(4):1889–1901

    MATH  Google Scholar 

  14. Ali A, Abu-Elkheir M, Atwan A, Elmogy M (2023) Missing values imputation using fuzzy k-top matching value. J King Saud University-Comput Inf Sci 35(1):426–437

    Google Scholar 

  15. Zahin SA, Ahmed CF, Alam T (2018) An effective method for classification with missing values. Appl Intell 48:3209–3230

    Article  MATH  Google Scholar 

  16. Kenward MG, Carpenter J (2007) Multiple imputation: current perspectives. Stat Methods Med Res 16(3):199–218

  17. Hu Y, Yang Z, Hou W (2023) Multiple Receding Imputation of Time Series Based on Similar Conditions Screening. IEEE Trans Knowl Data Eng 35(3):2837–2846

    Article  MATH  Google Scholar 

  18. Faisal S, Tutz G (2021) Multiple imputation using nearest neighbor methods. Inf Sci 570:500–516

    Article  MathSciNet  MATH  Google Scholar 

  19. Zhao F, Lu Y, Li X, Wang L, Song Y, Fan D, Zhang C, Chen X (2022) Multiple imputation method of missing credit risk assessment data based on generative adversarial networks. Appl Soft Comput 126:109273

    Article  Google Scholar 

  20. Zhao F, Lu Y, Li X, Wang L, Song Y, Fan D, Zhang C, Chen X (2022) Multiple imputation method of missing credit risk assessment data based on generative adversarial networks. Appl Soft Comput 126:109273

    Article  Google Scholar 

  21. Shafer GA (1978) A mathematical theory of evidence. Technometrics 20(1):106–106

    Article  MATH  Google Scholar 

  22. Denœux T (2023) Quantifying Prediction Uncertainty in Regression Using Random Fuzzy Sets: The ENNreg Model. IEEE Trans Fuzzy Syst 31(10):3690–3699

    Article  MATH  Google Scholar 

  23. Smets P (1990) The combination of evidence in the transferable belief model. IEEE Trans Pattern Anal Mach Intell 12(5):447–458

    Article  MATH  Google Scholar 

  24. Liu Z, Pan Q, Mercier G, Dezert J (2014) A new incomplete pattern classification method based on evidential reasoning. IEEE Trans Cybernetics 45(4):635–646

    Article  MATH  Google Scholar 

  25. Liu Z, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recognition 52:85–95

    Article  MATH  Google Scholar 

  26. Ma Z, Tian H, Liu Z, Zhang Z (2020) A new incomplete pattern belief classification method with multiple estimations based on knn. Appl Soft Comput 90:106175

    Article  MATH  Google Scholar 

  27. Zhang Z, Tian H, Yan L, Martin A, Zhou K (2021) Learning a credal classifier with optimized and adaptive multiestimation for missing data imputation. IEEE Trans Syst, Man, Cybernetics: Syst 52(7):4092–4104

    Article  MATH  Google Scholar 

  28. Zhang Z, Ye S, Zhang Y, Ding W, Wang H (2022) Belief combination of classifiers for incomplete data. IEEE/CAA J Automatica Sinica 9(4):652–667

    Article  MATH  Google Scholar 

  29. Cui H, Zhang H, Chang Y, Kang B (2023) Bgc: Belief gravitational clustering approach and its application in the counter-deception of belief functions. Eng Appl Artif Intell 123:106235

  30. Zhang Z, Ye S, Liu Z, Wang H, Ding W (2023) Deep Hyperspherical Clustering for Skin Lesion Medical Image Segmentation. IEEE J Biomed Health Inf 27(8):3770–3781

    Article  MATH  Google Scholar 

  31. Jiao L, Yang H, Wang F, Liu Z, Pan Q (2023) Dtec: Decision tree-based evidential clustering for interpretable partition of uncertain data. Pattern Recognition 144:109846

    Article  Google Scholar 

  32. Zhang Z, Liu Z, Martin A, Zhou K (2022) Bsc: Belief shift clustering. IEEE Trans Syst, Man, Cybernetics: Syst 53(3):1748–1760

    Article  MATH  Google Scholar 

  33. Xiao F (2022) GEJS: A generalized evidential divergence measure for multisource information fusion. IEEE Trans Syst, Man, Cybernetics: Syst 53(4):2246–2258

    Article  MATH  Google Scholar 

  34. Hua Z, Jing X (2023) An improved belief hellinger divergence for dempster-shafer theory and its application in multi-source information fusion. Appl Intell 1–20

  35. Zhang XX, Wang YM, Chen SQ, Chen L (2021) Discrete-valued belief structures combination and normalization using evidential reasoning rule. Appl Intell 51:1379–1393

    Article  MATH  Google Scholar 

  36. Denoeux T (1995) A k-nearest neighbor classification rule based on dempster-shafer theory. IEEE Trans Syst, Man, Cybernetics 25(5):804–813

    Article  MATH  Google Scholar 

  37. Zhang Z (2016) Introduction to machine learning: k-nearest neighbors. Annal Translational Med 4(11):218–218

    Article  MATH  Google Scholar 

  38. Rokach L (2016) Decision forest: Twenty years of research. Inf Fusion 27:111–125

    Article  MATH  Google Scholar 

  39. Liu Z, Pan Q, Dezert J, Han J, He Y (2017) Classifier fusion with contextual reliability evaluation. IEEE Trans Cybernetics 48(5):1605–1618

    Article  MATH  Google Scholar 

  40. Frank A (2010) Uci machine learning repository. http://archive.ics.uci.edu/ml

Download references

Acknowledgements

This study was partially supported by the National Key Research and Development Program of China (No. 2018YFCXXXXXXX), Henan Major Public Welfare Project (No. 201300311200), and Henan Key Research and Development Project(No. 231111211600).

Author information

Authors and Affiliations

Authors

Contributions

Zhen Zhang: Methodology, Supervision, Writing and Editing. Hong-peng Tian: Software, Methodology, and Original draft preparation.

Corresponding author

Correspondence to Hong-peng Tian.

Ethics declarations

Competing Interests

The authors declare that they have no competing financial interests or personal relationships that may have influenced the work reported in this study.

Ethical and Informed Consent for Data Used

We declare that this study is an original work and has not been published or submitted elsewhere. We confirm that the order of the authors listed in the manuscript was approved by all authors and that informed consent was obtained from all authors involved in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Tian, Hp. Hybrid imputation-based optimal evidential classification for missing data. Appl Intell 55, 69 (2025). https://doi.org/10.1007/s10489-024-05950-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05950-9

Keywords