Abstract
The main taskin decision tree construction algorithms is to find the “best partition” of the set of objects. In this paper, we investigate the problem of optimal binary partition of continuous attribute for large data sets stored in relational databases. The critical for time complexity of algorithms solving this problem is the number of simple SQL queries necessary to construct such partitions. The straightforward approach to optimal partition selection needs at least O(N) queries, where N is the number of pre-assumed partitions of the searching space. We show some properties of optimization measures related to discernibility between objects, that allow to construct the partition very close to optimal using only O(logN) simple queries.
Preview
Unable to display preview. Download preview PDF.
References
Chmielewski, M. R., Grzymala-Busse, J. W.: Global discretization of attributes as preprocessing for machine learning. In. T.Y. Lin, A.M. Wildberger (eds.). Soft Computing. Rough Sets, Fuzzy Logic Neural Networks, Uncertainty Management, Knowledge Discovery, Simulation Councils, Inc., San Diego, CA 294–297
Dougherty J., Kohavi R., Sahami M.: Supervised and unsupervised discretization of continuous features. In. Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA
Fayyad, U. M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Machine Learning 8, 87–102
Fayyad, U. M., Irani, K.B.: The attribute selection problem in decision tree generation. In. Proc. of AAAI-92, San Jose, CA. MIT Press
J. E. Gehrke, R. Ramakrishnan, and V. Ganti. RAINFOREST-A Framework for Fast Decision Tree Construction of Large Datasets. In Proc. of the 24th International Conference on Very Large Data Bases, New York, New York, 1998.
Nguyen, H. Son: Discretization Methods in Data Mining. In L. Polkowski, A. Skowron (Eds.): Rough Sets in Knowledge Discovery 1, Springer Physica-Verlag, Heidelberg, 451–482.
H.S. Nguyen and S.H. Nguyen. From Optimal Hyperplanes to Optimal Deciison Trees, Fundamenta Informaticae 34No 1-2, (1998) 145–174.
Nguyen, H. Son: Efficient SQL-Querying Method for Data Mining in Large Data Bases. Proc. of Sixteenth International Joint Conference on Artificial Intelligence, IJCAI-99, Morgan Kaufmann Publishers, Stockholm, Sweden, pp. 806–811.
Pawlak Z.: Rough sets: Theoretical aspects of reasoning about data, Kluwer Dordrecht.
Polkowski, L., Skowron, A. (Eds.): Rough Sets in Knowledge Discovery Vol. 1,2, Springer Physica-Verlag, Heidelberg.
Quinlan, J. R. C4.5. Programs for machine learning. Morgan Kaufmann, San Mateo CA.
Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In. R. Slowiński (ed.). Intelligent Decision Support-Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht 311–362
J. Komorowski, Z. Pawlak, L. Polkowski and A. Skowron,(1998). Rough sets: A tutorial. In: S.K. Pal and A. Skowron (eds.), Rough-fuzzy hybridization: A new trend in decision making, Springer-Verlag, Singapore, pp. 3–98.
Ziarko, W.: Rough set as a methodology in Data Mining. In Polkowski, L., Skowron, A. (Eds.): Rough Sets in Knowledge Discovery Vol. 1,2, Springer Physica-Verlag, Heidelberg, pp. 554–576.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Son, N.H. (2001). On Efficient Construction of Decision Trees From Large Databases. In: Ziarko, W., Yao, Y. (eds) Rough Sets and Current Trends in Computing. RSCTC 2000. Lecture Notes in Computer Science(), vol 2005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45554-X_43
Download citation
DOI: https://doi.org/10.1007/3-540-45554-X_43
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43074-2
Online ISBN: 978-3-540-45554-7
eBook Packages: Springer Book Archive