Abstract
Dynamic programming has been studied extensively, e.g., in computational geometry and string matching. It has recently found a new application in the optimal multisplitting of numerical attribute value domains.We reflect the results obtained earlier to this problem and study whether they help to shed a new light on the inherent complexity of this time-critical subtask of machine learning and data mining programs. The concept of monotonicity has come up in earlier research. It helps to explain the different asymptotic time requirements of optimal multisplitting with respect to different attribute evaluation functions. As case studies we examine Training Set Error and Average Class Entropy functions. The former has a linear-time optimization algorithm, while the latter—like most well-known attribute evaluation functions—takes a quadratic time to optimize. It is shown that neither of them fulfills the strict monotonicity condition, but computing optimal Training Set Error values can be decomposed into monotone subproblems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, A., Klawe, M.M., Moran, S., Shor, P., Wilber, R.: Geometric Applications of a Matrix Searching Algorithm. Algorithmica 2 (1987) 195–208
Auer, P.: Optimal Splits of Single Attributes. Unpublished manuscript, Institute for Theoretical Computer Science, Graz University of Technology (1997)
Birkendorf, A.: On Fast and Simple Algorithms for Finding Maximal Subarrays and Applications in Learning Theory. In: Ben-David, S. (ed.): Computational Learning Theory. Lecture Notes in Artificial Intelligence, Vol. 1208, Springer-Verlag, Berlin Heidelberg New York (1997) 198–209
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley and Sons, New York (1991)
Elomaa, T., Rousu, J.: General and Efficient Multisplitting of Numerical Attributes. Mach. Learn. 36 (1999) 201–244
Elomaa, T., Rousu, J.: Speeding Up the Search for Optimal Partitions. In: żytkow, J., Rauch, J. (eds.): Principles of Data Mining and Knowledge Discovery. Lecture Notes in Artificial Intelligence, Vol. 1704, Springer-Verlag, Berlin Heidelberg New York (1999) 89–97
Elomaa, T., Rousu, J.: Generalizing Boundary Points. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence. AAAI Press, Menlo Park, CA (2000) to appear
Fayyad, U.M., Irani, K.B.: On the Handling of Continuous-Valued Attributes in Decision Tree Generation. Mach. Learn. 8 (1992) 87–102
Fulton, T., Kasif, S., Salzberg, S.: Effcient Algorithms for Finding Multi-Way Splits for Decision Trees. In: Prieditis, A., Russell, S. (eds.): Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, San Francisco, CA (1995) 244–251
Galil, Z., Park, K.: A Linear-Time Algorithm for Concave One-Dimensional Dynamic Programming. Inf. Process. Lett. 33 (1990) 309–311
Galil, Z., Park, K.: Dynamic Programming with Convexity, Concavity and Sparsity. Theor. Comput. Sci. 92 (1992) 49–76
López de Mà ntaras, R.: A Distance-Based Attribute Selection Measure for Decision Tree Induction. Mach. Learn. 6 (1991) 81–92
Quinlan, J.R.: Induction of Decision Trees. Mach. Learn. 1 (1986) 81–106
Zighed, D.A., Rakotomalala, R., Feschet, F.: Optimal Multiple Intervals Discretization of Continuous Attributes for Supervised Learning. In: Heckerman, D. et al. (eds.), Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA (1997) 295–298
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Elomaa, T., Rousu, J. (2000). On the Complexity of Optimal Multisplitting. In: RaĹ›, Z.W., Ohsuga, S. (eds) Foundations of Intelligent Systems. ISMIS 2000. Lecture Notes in Computer Science(), vol 1932. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39963-1_58
Download citation
DOI: https://doi.org/10.1007/3-540-39963-1_58
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41094-2
Online ISBN: 978-3-540-39963-6
eBook Packages: Computer ScienceComputer Science (R0)