Abstract
Handling missing attribute values is a important issue for classifier learning, since missing attribute values in either training data or test (unseen) data affect the prediction accuracy of learned classifiers. In many real KDD applications, attributes with missing values are very common. This paper studies the robustness of four recently developed committee learning techniques, including Boosting, Bagging, SASC and SASCMB, relative to C4.5 for tolerating missing values in test data. Boosting is found to have a similar level of robustness to C4.5 for tolerating missing values in test data in terms of average error in a representative collection of atural domains under investigation. Bagging performs slightly better tha Boosting, while SASC and SASCMB perform better than them in this regard, with SASCMB performing best.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Dietterich, T.G.: Machine learning research. AI Magazine 18 (1997) 97–136.
Quinlan, J.R.: Bagging, Boosting, and C4.5. Proceedings of the 13th National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press (1996) 725–730.
Breima, L.: Bagging predictors. Machine Learning 24 (1996) 123–140.
Schapire, R.E., Freund, Y., Bartlett, P., and Lee, W.S.: Boosting the margin: A new explanatio for the effectiveness of voting methods. Proceedings of the 14th International Conference on Machine Learning. Morgan Kauffmann (1997) 322–330.
Quinlan, J.R.: C4.5: Program for Machine Learning. Morgan Kauffmann (1993).
Quinlan, J.R.: Unknown attribute values in induction. Proceedings of the 6th International Workshop on Machine Learning. Morgan Kauffmann (1989) 164–168.
Schapire, R.E.: The strength of weak learnability. Machine Learning 5 (1990) 197–227.
Zheng, Z. and Webb, G.I.: Stochastic attribute selection committees. Proceedings of the 10th Australian Joint Conference on Artificial Intelligence. Berlin: Springer-Verlag (1998).
Zheng, Z. and Webb, G.I.: Stochastic attribute selection committees with multiple boosting: Learning more accurate and more stable classifier committees. Proceedings of the 3rd Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin: Springer-Verlag (1999).
Zheng, Z. and Low, B.T.: Classifying unseen cases with many missing values. Tech Report (TR C99/02)(available at http://www3.cm.deakin.edu.au/~zijian/Papers/comm-missing-trC99-02.ps.gz), School of Computing and Mathematics, Deakin University, Australia (1999).
Blake, C., Keogh, E. and Merz, C.J.: UCI Repository of Machine Learning Databases (http://www.ics.uci.edu/~mlearn/MLRepository.html). Irvine, CA: University of California, Dept of Information and Computer Science (1998).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zheng, Z., Low, B.T. (1999). Classifying Unseen Cases with Many Missing Values. In: Zhong, N., Zhou, L. (eds) Methodologies for Knowledge Discovery and Data Mining. PAKDD 1999. Lecture Notes in Computer Science(), vol 1574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48912-6_50
Download citation
DOI: https://doi.org/10.1007/3-540-48912-6_50
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65866-5
Online ISBN: 978-3-540-48912-2
eBook Packages: Springer Book Archive