Abstract
Since the data is collected from disparate sources in many actual data mining environments, it is common to have data values in different abstraction levels. This paper shows that such multiple abstraction levels of data can cause undesirable effects in decision tree classification. After explaining that equalizing abstraction levels by force cannot provide satisfactory solutions of this problem, it presents a method to utilize the data as it is. The proposed method accommodates the generalization/specialization relationship between data values in both of the construction and the class assignment phases of decision tree classification. The experimental results show that the proposed method reduces classification error rates significantly when multiple abstraction levels of data are involved.
This work has been supported by Korea Science and Engineering Foundation (KOSEF) through the Advanced Information Technology Research Center (AITrc).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. Gehrke, R. Ramakrishinan, and V. Ganti, “RainForest-A Framework for Fast Decision Tree Construction of Large Datasets,” Data Mining and Knowledge Discovery, 4, pp. 127–162, 2000
J. Gehrke, V. Ganti, R. Ramakrishnan, and W.-Y Loh, “BOAT-Optimistic Decision Tree Construction,” In Proc. of ACM SIGMOD Conf., Philadelphia, Pennsylvania, June 1999, pp. 169–180
M. Berry and G. Linoff, Data Mining Techniques-For Marketing, Sales, and Customer Support, Wiley and Sons, 1997
J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Pub., 1993.
K. Hatonen, M. K lemettinen, H. Mannila, P. R onkainen, and H. Toivonen, “Knowledge Discovery from Telecommunication Network Alarm Databases,” In Proc. of the 12th International Conference on Data Engineering, New Orleans, Louisiana, February 1996, pp. 115–122
L. English, Improving Data Warehouse and Business Information Quality-Method for Reducing Costs and Increasing Profits, Wiley & Sons, 1999
R. Wang, V. Storey and C. Firth, A Framework for Analysis of Data Quality Research,” IEEE Transactions on Knowledge and Engineering, 7(4), pp. 623–640, 1995
Trillium Software System, “A Practical Guide to Achieving Enterprise Data Quality,” White Paper, Trillium Software, 1998.
J. Williams, “Tools for Traveling Data,” DBMS, Miller Freeman Inc., June 1997
Vality Technology Inc., “The Five Legacy Data Contaminants You Will Encounter in Your Warehouse Migration,” White Paper, Vality Technology Inc., 1998
G. Klir and T. Folger, Fuzzy Sets, Uncertainty, and Information, Prentice-Hall Int’l Inc., 1988
C. Shannon, “The Mathematical Theory of Communication,” The Bell System Tech. Jour., 27, 1948
C. Batini, S. Ceri, and S. Navathe, Conceptual Database Design, Benjamin Cummings, Inc., 1992
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, D., Jeonga, M., Won, YK. (2001). Decision Trees for Multiple Abstraction Levels of Data. In: Klusch, M., Zambonelli, F. (eds) Cooperative Information Agents V. CIA 2001. Lecture Notes in Computer Science(), vol 2182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44799-7_9
Download citation
DOI: https://doi.org/10.1007/3-540-44799-7_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42545-8
Online ISBN: 978-3-540-44799-3
eBook Packages: Springer Book Archive