Classifying Unseen Cases with Many Missing Values

Zheng, Zijian; Low, Boon Toh

doi:10.1007/3-540-48912-6_50

Classifying Unseen Cases with Many Missing Values

Zijian Zheng³ &
Boon Toh Low⁴

Conference paper
First Online: 01 January 2002

1033 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1574))

Abstract

Handling missing attribute values is a important issue for classifier learning, since missing attribute values in either training data or test (unseen) data affect the prediction accuracy of learned classifiers. In many real KDD applications, attributes with missing values are very common. This paper studies the robustness of four recently developed committee learning techniques, including Boosting, Bagging, SASC and SASCMB, relative to C4.5 for tolerating missing values in test data. Boosting is found to have a similar level of robustness to C4.5 for tolerating missing values in test data in terms of average error in a representative collection of atural domains under investigation. Bagging performs slightly better tha Boosting, while SASC and SASCMB perform better than them in this regard, with SASCMB performing best.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dietterich, T.G.: Machine learning research. AI Magazine 18 (1997) 97–136.
Google Scholar
Quinlan, J.R.: Bagging, Boosting, and C4.5. Proceedings of the 13th National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press (1996) 725–730.
Google Scholar
Breima, L.: Bagging predictors. Machine Learning 24 (1996) 123–140.
Google Scholar
Schapire, R.E., Freund, Y., Bartlett, P., and Lee, W.S.: Boosting the margin: A new explanatio for the effectiveness of voting methods. Proceedings of the 14th International Conference on Machine Learning. Morgan Kauffmann (1997) 322–330.
Google Scholar
Quinlan, J.R.: C4.5: Program for Machine Learning. Morgan Kauffmann (1993).
Google Scholar
Quinlan, J.R.: Unknown attribute values in induction. Proceedings of the 6th International Workshop on Machine Learning. Morgan Kauffmann (1989) 164–168.
Google Scholar
Schapire, R.E.: The strength of weak learnability. Machine Learning 5 (1990) 197–227.
Google Scholar
Zheng, Z. and Webb, G.I.: Stochastic attribute selection committees. Proceedings of the 10th Australian Joint Conference on Artificial Intelligence. Berlin: Springer-Verlag (1998).
Google Scholar
Zheng, Z. and Webb, G.I.: Stochastic attribute selection committees with multiple boosting: Learning more accurate and more stable classifier committees. Proceedings of the 3rd Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin: Springer-Verlag (1999).
Google Scholar
Zheng, Z. and Low, B.T.: Classifying unseen cases with many missing values. Tech Report (TR C99/02)(available at http://www3.cm.deakin.edu.au/~zijian/Papers/comm-missing-trC99-02.ps.gz), School of Computing and Mathematics, Deakin University, Australia (1999).
Google Scholar
Blake, C., Keogh, E. and Merz, C.J.: UCI Repository of Machine Learning Databases (http://www.ics.uci.edu/~mlearn/MLRepository.html). Irvine, CA: University of California, Dept of Information and Computer Science (1998).
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, Deakin University, Geelong, Victoria, 3217, Australia
Zijian Zheng
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Boon Toh Low

Authors

Zijian Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Boon Toh Low
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Systems Engineering, Yamaguchi University, Tokiwa-Dai, 2557, Ube, 755, Japan
Ning Zhong
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Lizhu Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, Z., Low, B.T. (1999). Classifying Unseen Cases with Many Missing Values. In: Zhong, N., Zhou, L. (eds) Methodologies for Knowledge Discovery and Data Mining. PAKDD 1999. Lecture Notes in Computer Science(), vol 1574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48912-6_50

Download citation

DOI: https://doi.org/10.1007/3-540-48912-6_50
Published: 24 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65866-5
Online ISBN: 978-3-540-48912-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics