Skip to main content

Missing Value Imputation Using a Semi-supervised Rank Aggregation Approach

  • Conference paper
Advances in Artificial Intelligence - SBIA 2008 (SBIA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5249))

Included in the following conference series:

Abstract

One relevant problem in data quality is the presence of missing data. In cases where missing data are abundant, effective ways to deal with these absences could improve the performance of machine learning algorithms. Missing data can be treated using imputation. Imputation methods replace the missing data by values estimated from the available data. This paper presents Corai, an imputation algorithm which is an adaption of Co-training, a multi-view semi-supervised learning algorithm. The comparison of Corai with other imputation methods found in the literature in three data sets from UCI with different levels of missingness inserted into up to three attributes, shows that Corai tends to perform well in data sets at greater percentages of missingness and number of attributes with missing values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Little, R.J.A., Rubin, D.B.: Statistical analysis with missing data. John Wiley & Sons, Inc., New York (1986)

    MATH  Google Scholar 

  2. Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)

    Google Scholar 

  3. Batista, G.E.A.P.A., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Applied Art. Intell. 17(5-6), 519–533 (2003)

    Article  Google Scholar 

  4. Levy, P.: Missing data estimation, ‘hot deck’ and ‘cold deck’. In: Encyclopedia of Biostatistics. Wiley, Chichester (1998)

    Google Scholar 

  5. Zhu, X.: Semi-supervised learning literature survey. Computer Sciences TR 1530, University of Wisconsin Madison (2007), http://pages.cs.wisc.edu/~jerryzhu/research/ssl/semireview.html

  6. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of Royal Stat. Soc. B39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  7. Blum, A., Mitchell, T.M.: Combining labeled and unlabeled sata with co-training. In: Conference on Learning Theory, pp. 92–100 (1998)

    Google Scholar 

  8. Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: International Conference on Machine Learning, pp. 327–334 (2000)

    Google Scholar 

  9. Zhou, Z.H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005)

    Article  Google Scholar 

  10. Asuncion, A., Newman, D.: UCI machine learning repository (2007)

    Google Scholar 

  11. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  12. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Matsubara, E.T., Prati, R.C., Batista, G.E.A.P.A., Monard, M.C. (2008). Missing Value Imputation Using a Semi-supervised Rank Aggregation Approach. In: Zaverucha, G., da Costa, A.L. (eds) Advances in Artificial Intelligence - SBIA 2008. SBIA 2008. Lecture Notes in Computer Science(), vol 5249. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88190-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88190-2_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88189-6

  • Online ISBN: 978-3-540-88190-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics