Abstract
Missing values have negative impacts on big data analysis. However, in absence of extra knowledge, exact imputation can hardly be conducted for many data sets. Therefore, we have to tolerate missing values and perform data mining on incomplete data sets directly. To achieve high quality data mining on incomplete data, we propose a classification approach based on multiple views. We use various complete views of the data set to generate the base classifiers and combine the results of base classifiers. Since the amount of base classifiers will affect the effectiveness and efficiency of the classification, we aim to find proper view sets. We prove that the view set selection problem is an NP-hard problem and develop an approximation algorithm with approximate ratio \(ln|S|+1\) where S is the feature set of original data set. Extensive experimental results demonstrate the efficiency and effectiveness of the proposed approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, T., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Oba, S., Sato, M.A., Takemasa, I., Monden, M., Matsubara, K.I., Ishii, S.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16), 2088–2096 (2003)
Zhu, X., Zhang, S., Jin, Z., Zhang, Z., Xu, Z.: Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23(1), 110–121 (2011)
Setiawan, N.A., Venkatachalam, P.A., Hani, A.F.M.: Missing attribute value prediction based on artificial neural network and rough set theory. In: International Conference on BioMedical Engineering and Informatics, BMEI 2008. IEEE (2008)
Abdella, M., Marwala, T.: The use of genetic algorithms and neural networks to approximate missing data in database. In: IEEE 3rd International Conference on Computational Cybernetics, ICCC 2005. IEEE (2005)
Hagan, M.T., Demuth, H.B., Beale, M.H., De Jesús, O.: Neural Network Design. PWS publishing company, Boston (1996)
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference very large Data Bases, VLDB (1994)
Pei, J., Han, J., Mao, R., et al.: Closet: an efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2000)
Christofides, N.: Graph Theory–An Algorithmic Approach. Academic Press Inc., New York (1975)
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (1967)
Jin, L.: Research on missing value imputation of incomplete data. Harbin Institute of Technology (2013)
Acknowledgement
This paper was partially supported by National Sci-Tech Support Plan 2015BAH10F01 and NSFC grant U1509216,61472099,61133002 and the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Provience LC2016026.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Sun, M., Wang, H., Meng, F., Li, J., Gao, H. (2016). Incomplete Data Classification Based on Multiple Views. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9932. Springer, Cham. https://doi.org/10.1007/978-3-319-45817-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-45817-5_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45816-8
Online ISBN: 978-3-319-45817-5
eBook Packages: Computer ScienceComputer Science (R0)