Skip to main content

Classifying Unseen Cases with Many Missing Values

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1574))

Abstract

Handling missing attribute values is a important issue for classifier learning, since missing attribute values in either training data or test (unseen) data affect the prediction accuracy of learned classifiers. In many real KDD applications, attributes with missing values are very common. This paper studies the robustness of four recently developed committee learning techniques, including Boosting, Bagging, SASC and SASCMB, relative to C4.5 for tolerating missing values in test data. Boosting is found to have a similar level of robustness to C4.5 for tolerating missing values in test data in terms of average error in a representative collection of atural domains under investigation. Bagging performs slightly better tha Boosting, while SASC and SASCMB perform better than them in this regard, with SASCMB performing best.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dietterich, T.G.: Machine learning research. AI Magazine 18 (1997) 97–136.

    Google Scholar 

  2. Quinlan, J.R.: Bagging, Boosting, and C4.5. Proceedings of the 13th National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press (1996) 725–730.

    Google Scholar 

  3. Breima, L.: Bagging predictors. Machine Learning 24 (1996) 123–140.

    Google Scholar 

  4. Schapire, R.E., Freund, Y., Bartlett, P., and Lee, W.S.: Boosting the margin: A new explanatio for the effectiveness of voting methods. Proceedings of the 14th International Conference on Machine Learning. Morgan Kauffmann (1997) 322–330.

    Google Scholar 

  5. Quinlan, J.R.: C4.5: Program for Machine Learning. Morgan Kauffmann (1993).

    Google Scholar 

  6. Quinlan, J.R.: Unknown attribute values in induction. Proceedings of the 6th International Workshop on Machine Learning. Morgan Kauffmann (1989) 164–168.

    Google Scholar 

  7. Schapire, R.E.: The strength of weak learnability. Machine Learning 5 (1990) 197–227.

    Google Scholar 

  8. Zheng, Z. and Webb, G.I.: Stochastic attribute selection committees. Proceedings of the 10th Australian Joint Conference on Artificial Intelligence. Berlin: Springer-Verlag (1998).

    Google Scholar 

  9. Zheng, Z. and Webb, G.I.: Stochastic attribute selection committees with multiple boosting: Learning more accurate and more stable classifier committees. Proceedings of the 3rd Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin: Springer-Verlag (1999).

    Google Scholar 

  10. Zheng, Z. and Low, B.T.: Classifying unseen cases with many missing values. Tech Report (TR C99/02)(available at http://www3.cm.deakin.edu.au/~zijian/Papers/comm-missing-trC99-02.ps.gz), School of Computing and Mathematics, Deakin University, Australia (1999).

    Google Scholar 

  11. Blake, C., Keogh, E. and Merz, C.J.: UCI Repository of Machine Learning Databases (http://www.ics.uci.edu/~mlearn/MLRepository.html). Irvine, CA: University of California, Dept of Information and Computer Science (1998).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zheng, Z., Low, B.T. (1999). Classifying Unseen Cases with Many Missing Values. In: Zhong, N., Zhou, L. (eds) Methodologies for Knowledge Discovery and Data Mining. PAKDD 1999. Lecture Notes in Computer Science(), vol 1574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48912-6_50

Download citation

  • DOI: https://doi.org/10.1007/3-540-48912-6_50

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65866-5

  • Online ISBN: 978-3-540-48912-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics