Missing Value Imputation Using a Semi-supervised Rank Aggregation Approach

Matsubara, Edson T.; Prati, Ronaldo C.; Batista, Gustavo E. A. P. A.; Monard, Maria C.

doi:10.1007/978-3-540-88190-2_27

Edson T. Matsubara³,
Ronaldo C. Prati³,
Gustavo E. A. P. A. Batista³ &
…
Maria C. Monard³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5249))

Included in the following conference series:

Brazilian Symposium on Artificial Intelligence

1323 Accesses
4 Citations

Abstract

One relevant problem in data quality is the presence of missing data. In cases where missing data are abundant, effective ways to deal with these absences could improve the performance of machine learning algorithms. Missing data can be treated using imputation. Imputation methods replace the missing data by values estimated from the available data. This paper presents Corai, an imputation algorithm which is an adaption of Co-training, a multi-view semi-supervised learning algorithm. The comparison of Corai with other imputation methods found in the literature in three data sets from UCI with different levels of missingness inserted into up to three attributes, shows that Corai tends to perform well in data sets at greater percentages of missingness and number of attributes with missing values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Feature Based Multivariate Data Imputation

Handling Missing Values for the CN2 Algorithm

Scalable Model-Based Cascaded Imputation of Missing Data

References

Little, R.J.A., Rubin, D.B.: Statistical analysis with missing data. John Wiley & Sons, Inc., New York (1986)
MATH Google Scholar
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Google Scholar
Batista, G.E.A.P.A., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Applied Art. Intell. 17(5-6), 519–533 (2003)
Article Google Scholar
Levy, P.: Missing data estimation, ‘hot deck’ and ‘cold deck’. In: Encyclopedia of Biostatistics. Wiley, Chichester (1998)
Google Scholar
Zhu, X.: Semi-supervised learning literature survey. Computer Sciences TR 1530, University of Wisconsin Madison (2007), http://pages.cs.wisc.edu/~jerryzhu/research/ssl/semireview.html
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of Royal Stat. Soc. B39, 1–38 (1977)
MathSciNet MATH Google Scholar
Blum, A., Mitchell, T.M.: Combining labeled and unlabeled sata with co-training. In: Conference on Learning Theory, pp. 92–100 (1998)
Google Scholar
Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: International Conference on Machine Learning, pp. 327–334 (2000)
Google Scholar
Zhou, Z.H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005)
Article Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Mathematics and Computer Science at University of São Paulo, P. O. Box 668, ZIP Code 13560-970, São Carlos, SP, Brazil
Edson T. Matsubara, Ronaldo C. Prati, Gustavo E. A. P. A. Batista & Maria C. Monard

Authors

Edson T. Matsubara
View author publications
You can also search for this author in PubMed Google Scholar
Ronaldo C. Prati
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo E. A. P. A. Batista
View author publications
You can also search for this author in PubMed Google Scholar
Maria C. Monard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Systems Engineering and Computer Science - COPPE, Federal University of Rio de Janeiro (UFRJ), Brazil
Gerson Zaverucha
Department of Automation and Systems, Federal University of Santa Catarina, CEP 88.040-900, Brazil
Augusto Loureiro da Costa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Matsubara, E.T., Prati, R.C., Batista, G.E.A.P.A., Monard, M.C. (2008). Missing Value Imputation Using a Semi-supervised Rank Aggregation Approach. In: Zaverucha, G., da Costa, A.L. (eds) Advances in Artificial Intelligence - SBIA 2008. SBIA 2008. Lecture Notes in Computer Science(), vol 5249. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88190-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-88190-2_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88189-6
Online ISBN: 978-3-540-88190-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics