Abstract
Data fusion is an interesting methodology for improving the classification performance. It consists in combining data acquired from multiple sources for more informative decision and better decision making. This latter is a challenging task due to many issues. The main of these issues arises from the data to be fused. Missing data presents one of the issues, their presence affects the performance of the algorithms and results on a misleading prediction. Appropriately handling missing data is crucial for accurate inference. Several approaches have been proposed in the literature to deal with multi-source classification problems, however they neglect the presence of missingness in the data and assume that the data are complete which is not the case in real life. Other approaches use directly simple data imputation before the learning process, which is not always enough to obtain a reliable learning and prediction model. In this paper, we propose a new approach to deal with missing data in multi-source classification problem. In our approach, we avoid the direct imputation when the concerned feature is not important, but we also adjust the predictions fusion process based on the missing data rate in each data source and in the new instance to classify. This approach is used with Random Forests as an ensemble classifier, and it has shown improved classification performance compared to existing approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hall, D.L., Llinas, J.: An introduction to multisensor data fusion. Proc. IEEE 85(1), 6–23 (1997)
Dasarathy, B.V.: Decision Fusion. IEEE Computer Society Press, Los Alamitos (1994)
Kittler, J., Hatef, M., Duin, R.P., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994)
Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)
Ghosh, A., Sharma, R., Joshi, P.K.: Random forest classification of urban landscape using Landsat archive and ancillary data: combining seasonal maps with decision level fusion. Appl. Geogr. 48, 31–41 (2014)
Wang, Y., Dunham, M.H., Waddle, J.A., Mcgee, M.: Classifier fusion for poorly-differentiated tumor classification using both messenger RNA and microRNA expression profiles. In: Proceedings of the 2006 Computational Systems Bioinformatics Conference (CSB 2006), Stanford, California (2006)
Lahat, D., Adali, T., Jutien, C.: Multimodal data fusion: an overview of methods, challenges, and prospects. Proc. IEEE 103(9), 1449–1477 (2015)
Momeni, A., Pincus, M., Libien, J.: Imputation and missing data. Introduction to Statistical Methods in Pathology, pp. 185–200. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-60543-2_8
Acuna, E., Rodriguez, C.: The treatment of missing values and its effect on classifier accuracy. In: Banks, D., McMorris, F.R., Arabie, P., Gaul, W. (eds.) Classification, Clustering, and Data Mining Applications, pp. 639–647. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-642-17103-1_60
Yuan, L., Wang, Y., Thompson, P., Narayan, V., Ye, J.: Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. NeuroImage 61(3), 622–32 (2012)
Aziz, M.S., Reddy, C.K.: Robust prediction from multiple heterogeneous data sources with partial information. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1857–1860 (2010)
Williams, G.: Random forests. Data Mining with Rattle and R, pp. 245–268. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-9890-3_12
Genuer, R., Poggi, J.M., Tuleau-Malot, C.: Variable selection using random forests. Pattern Recogn. Lett. 31(14), 2225–2236 (2010)
Batista, G.E., Monard, M.C., et al.: A study of k-nearest neighbour as an imputation method. In: Proceedings of the International Conference on Hybrid Intelligent Systems, pp. 251–260 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Abdelkhalek, I., Ben Brahim, A., Essousi, N. (2018). A New Way of Handling Missing Data in Multi-source Classification Based on Adaptive Imputation. In: Abdelwahed, E., Bellatreche, L., Golfarelli, M., Méry, D., Ordonez, C. (eds) Model and Data Engineering. MEDI 2018. Lecture Notes in Computer Science(), vol 11163. Springer, Cham. https://doi.org/10.1007/978-3-030-00856-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-00856-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00855-0
Online ISBN: 978-3-030-00856-7
eBook Packages: Computer ScienceComputer Science (R0)