A New Way of Handling Missing Data in Multi-source Classification Based on Adaptive Imputation

Abdelkhalek, Ikram; Ben Brahim, Afef; Essousi, Nadia

doi:10.1007/978-3-030-00856-7_8

Ikram Abdelkhalek¹⁸,
Afef Ben Brahim¹⁹ &
Nadia Essousi¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11163))

Included in the following conference series:

International Conference on Model and Data Engineering

1096 Accesses

Abstract

Data fusion is an interesting methodology for improving the classification performance. It consists in combining data acquired from multiple sources for more informative decision and better decision making. This latter is a challenging task due to many issues. The main of these issues arises from the data to be fused. Missing data presents one of the issues, their presence affects the performance of the algorithms and results on a misleading prediction. Appropriately handling missing data is crucial for accurate inference. Several approaches have been proposed in the literature to deal with multi-source classification problems, however they neglect the presence of missingness in the data and assume that the data are complete which is not the case in real life. Other approaches use directly simple data imputation before the learning process, which is not always enough to obtain a reliable learning and prediction model. In this paper, we propose a new approach to deal with missing data in multi-source classification problem. In our approach, we avoid the direct imputation when the concerned feature is not important, but we also adjust the predictions fusion process based on the missing data rate in each data source and in the new instance to classify. This approach is used with Random Forests as an ensemble classifier, and it has shown improved classification performance compared to existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Ensemble Learning for Heterogeneous Missing Data Imputation

Multiple Imputation Ensembles (MIE) for Dealing with Missing Data

Article Open access 23 April 2020

Multiple Imputation and Ensemble Learning for Classification with Incomplete Data

References

Hall, D.L., Llinas, J.: An introduction to multisensor data fusion. Proc. IEEE 85(1), 6–23 (1997)
Article Google Scholar
Dasarathy, B.V.: Decision Fusion. IEEE Computer Society Press, Los Alamitos (1994)
Google Scholar
Kittler, J., Hatef, M., Duin, R.P., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)
Article Google Scholar
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994)
Article MathSciNet Google Scholar
Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)
MATH Google Scholar
Ghosh, A., Sharma, R., Joshi, P.K.: Random forest classification of urban landscape using Landsat archive and ancillary data: combining seasonal maps with decision level fusion. Appl. Geogr. 48, 31–41 (2014)
Article Google Scholar
Wang, Y., Dunham, M.H., Waddle, J.A., Mcgee, M.: Classifier fusion for poorly-differentiated tumor classification using both messenger RNA and microRNA expression profiles. In: Proceedings of the 2006 Computational Systems Bioinformatics Conference (CSB 2006), Stanford, California (2006)
Google Scholar
Lahat, D., Adali, T., Jutien, C.: Multimodal data fusion: an overview of methods, challenges, and prospects. Proc. IEEE 103(9), 1449–1477 (2015)
Article Google Scholar
Momeni, A., Pincus, M., Libien, J.: Imputation and missing data. Introduction to Statistical Methods in Pathology, pp. 185–200. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-60543-2_8
Chapter Google Scholar
Acuna, E., Rodriguez, C.: The treatment of missing values and its effect on classifier accuracy. In: Banks, D., McMorris, F.R., Arabie, P., Gaul, W. (eds.) Classification, Clustering, and Data Mining Applications, pp. 639–647. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-642-17103-1_60
Chapter Google Scholar
Yuan, L., Wang, Y., Thompson, P., Narayan, V., Ye, J.: Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. NeuroImage 61(3), 622–32 (2012)
Article Google Scholar
Aziz, M.S., Reddy, C.K.: Robust prediction from multiple heterogeneous data sources with partial information. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1857–1860 (2010)
Google Scholar
Williams, G.: Random forests. Data Mining with Rattle and R, pp. 245–268. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-9890-3_12
Chapter Google Scholar
Genuer, R., Poggi, J.M., Tuleau-Malot, C.: Variable selection using random forests. Pattern Recogn. Lett. 31(14), 2225–2236 (2010)
Article Google Scholar
Batista, G.E., Monard, M.C., et al.: A study of k-nearest neighbour as an imputation method. In: Proceedings of the International Conference on Hybrid Intelligent Systems, pp. 251–260 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut Supérieur de Gestion de Tunis, LARODEC, Université de Tunis, Tunis, Tunisia
Ikram Abdelkhalek & Nadia Essousi
Tunis Business School, LARODEC, Université de Tunis, Tunis, Tunisia
Afef Ben Brahim

Authors

Ikram Abdelkhalek
View author publications
You can also search for this author in PubMed Google Scholar
Afef Ben Brahim
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Essousi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ikram Abdelkhalek .

Editor information

Editors and Affiliations

Cadi Ayyad University, Marrakesh, Morocco
El Hassan Abdelwahed
LIAS/ISAE-ENSMA, Futuroscope Chasseneuil Cedex, France
Ladjel Bellatreche
University of Bologna, Cesena, Italy
Mattéo Golfarelli
University of Lorraine, Vandœuvre-lès-Nancy, France
Dominique Méry
University of Houston, Houston, TX, USA
Carlos Ordonez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abdelkhalek, I., Ben Brahim, A., Essousi, N. (2018). A New Way of Handling Missing Data in Multi-source Classification Based on Adaptive Imputation. In: Abdelwahed, E., Bellatreche, L., Golfarelli, M., Méry, D., Ordonez, C. (eds) Model and Data Engineering. MEDI 2018. Lecture Notes in Computer Science(), vol 11163. Springer, Cham. https://doi.org/10.1007/978-3-030-00856-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-00856-7_8
Published: 13 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00855-0
Online ISBN: 978-3-030-00856-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics