Induction as Pre-processing

Wu, Xindong

doi:10.1007/3-540-48912-6_16

Xindong Wu³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1574))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1014 Accesses
1 Citations

Abstract

In most data mining applications where induction is used as the primary tool for knowledge extraction, it is difficult to precisely identify a complete set of relevant attributes. The real world database from which knowledge is to be extracted usually contains a combination of relevant, noisy and irrelevant attributes. Therefore, pre-processing the database to select relevant attributes becomes a very important task in knowledge discovery and data mining. This paper starts with two existing induction systems, C4.5 and HCV, and uses one of them to select relevant attributes for the other. Experimental results on 12 standard data sets showtha t using HCV induction for C4.5 attribute selection is generally useful.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ali, K.M. & Passani, M.J., Reducing the small disjuncts problem by learning probabilistic concept descriptions, Computational Learning Theory and Natural Learning Systems, T. Petsche et al. (Eds.), Vol.3, 1992.
Google Scholar
Clark, P.E. & Boswell, R., Rule induction with CN2: Some recent improvements. In Proceedings of the Fifth European Working Session on Learning, Porto, Portugal: Springer-Verlag, 1991, 151–163.
Google Scholar
Dougherty et al., Supervised and Unsupervised Discretization of Continuous Features, Proceedings of the 12th International Conference on Machine Learning, 194–202.
Google Scholar
Gams, M., Drobnic, M. & Petkovsek, M. Learning from examples — a uniform view, Int. J. Man-Machine Studies, 34 (1991): 49–68.
Article Google Scholar
Hong, J., AE1: An extension matrix approximate method for the general covering problem, International Journal of Computer and Information Sciences, 14 (1985), 6: 421–437.
Article Google Scholar
Mahlen, P, Dealing with Continuous Attribute Domains in Inductive Learning, Masters Thesis, Dept. of Numerical Analysis and Computer Science, Royal Instit. of Technology, Stockholm, Sweden, 1995.
Google Scholar
Michalski, R.S., Mozetic, I., Hong, J., Lavrac, N., The multi-purpose incremental learning system AQ15 and its testing application to three medical domains, Proceedings of the Fifth National Conference on Artificial Intelligence, 1986, 1041–1045.
Google Scholar
Murphe, P.M. & Aha, D.W., UCI Repository of Machine Learning Databases, Machine-Readable Data Repository, Irvine, CA, University of California, Department of Information and Computer Science, 1995.
Google Scholar
Pagllo, G. & Haussler, D., Boolean feature discovery in empirical learning, Machine Learning, 5 (1990): 71–99.
Article Google Scholar
Quinlan, J.R., Induction of decision trees, Machine Learning, 1(1986).
Google Scholar
Quinlan, J.R., C4.5: Programs for Machine Learning, CA: Morgan Kaufmann1993.
Google Scholar
Shannon, C.E. & Weaver, W., The Mathematical Theory of Communications, The University of Illinois Press, Urbana, IL, 1949.
Google Scholar
Utgoff, P.E., Incremental Induction of Decision Trees, Machine Learning, 4 (1989), 161–186.
Article Google Scholar
Utgoff P.E., Shift of Bias for Inductive Concept Learning, Machine Learning: An AI Approach, Volume 2, Chapter 5, Morgan Kaufmann Pub., 1986, 107–148.
Google Scholar
Wu, X., The HCV induction algorithm, Proceedings of the 21st ACM Computer Science Conference, S.C., Kwasny and J. Fuch (Eds.), ACM Press, USA, 1993, 168–175.
Google Scholar
Wu, X., Knowledge Acquisition from Data Bases, Ablex Publishing Corp., U.S.A., 1995.
Google Scholar
Wu, X., Krisar, J. & Mahlen, P., Noise Handling with Extension Matrices, International Journal on Artificial Intelligence Tools, 5 (1996), 1: 81–97.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematical and Computer Sciences, Colorado School of Mines, 1500 Illinois Street, Golden, Colorado, 80401, USA
Xindong Wu

Authors

Xindong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Systems Engineering, Yamaguchi University, Tokiwa-Dai, 2557, Ube, 755, Japan
Ning Zhong
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Lizhu Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X. (1999). Induction as Pre-processing. In: Zhong, N., Zhou, L. (eds) Methodologies for Knowledge Discovery and Data Mining. PAKDD 1999. Lecture Notes in Computer Science(), vol 1574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48912-6_16

Download citation

DOI: https://doi.org/10.1007/3-540-48912-6_16
Published: 24 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65866-5
Online ISBN: 978-3-540-48912-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics