On the Complexity of Optimal Multisplitting

Elomaa, Tapio; Rousu, Juho

doi:10.1007/3-540-39963-1_58

Tapio Elomaa⁴ &
Juho Rousu⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1932))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

687 Accesses

Abstract

Dynamic programming has been studied extensively, e.g., in computational geometry and string matching. It has recently found a new application in the optimal multisplitting of numerical attribute value domains.We reflect the results obtained earlier to this problem and study whether they help to shed a new light on the inherent complexity of this time-critical subtask of machine learning and data mining programs. The concept of monotonicity has come up in earlier research. It helps to explain the different asymptotic time requirements of optimal multisplitting with respect to different attribute evaluation functions. As case studies we examine Training Set Error and Average Class Entropy functions. The former has a linear-time optimization algorithm, while the latter—like most well-known attribute evaluation functions—takes a quadratic time to optimize. It is shown that neither of them fulfills the strict monotonicity condition, but computing optimal Training Set Error values can be decomposed into monotone subproblems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, A., Klawe, M.M., Moran, S., Shor, P., Wilber, R.: Geometric Applications of a Matrix Searching Algorithm. Algorithmica 2 (1987) 195–208
Article MATH MathSciNet Google Scholar
Auer, P.: Optimal Splits of Single Attributes. Unpublished manuscript, Institute for Theoretical Computer Science, Graz University of Technology (1997)
Google Scholar
Birkendorf, A.: On Fast and Simple Algorithms for Finding Maximal Subarrays and Applications in Learning Theory. In: Ben-David, S. (ed.): Computational Learning Theory. Lecture Notes in Artificial Intelligence, Vol. 1208, Springer-Verlag, Berlin Heidelberg New York (1997) 198–209
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley and Sons, New York (1991)
Book MATH Google Scholar
Elomaa, T., Rousu, J.: General and Efficient Multisplitting of Numerical Attributes. Mach. Learn. 36 (1999) 201–244
Article MATH Google Scholar
Elomaa, T., Rousu, J.: Speeding Up the Search for Optimal Partitions. In: żytkow, J., Rauch, J. (eds.): Principles of Data Mining and Knowledge Discovery. Lecture Notes in Artificial Intelligence, Vol. 1704, Springer-Verlag, Berlin Heidelberg New York (1999) 89–97
Google Scholar
Elomaa, T., Rousu, J.: Generalizing Boundary Points. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence. AAAI Press, Menlo Park, CA (2000) to appear
Google Scholar
Fayyad, U.M., Irani, K.B.: On the Handling of Continuous-Valued Attributes in Decision Tree Generation. Mach. Learn. 8 (1992) 87–102
MATH Google Scholar
Fulton, T., Kasif, S., Salzberg, S.: Effcient Algorithms for Finding Multi-Way Splits for Decision Trees. In: Prieditis, A., Russell, S. (eds.): Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, San Francisco, CA (1995) 244–251
Google Scholar
Galil, Z., Park, K.: A Linear-Time Algorithm for Concave One-Dimensional Dynamic Programming. Inf. Process. Lett. 33 (1990) 309–311
Article MATH MathSciNet Google Scholar
Galil, Z., Park, K.: Dynamic Programming with Convexity, Concavity and Sparsity. Theor. Comput. Sci. 92 (1992) 49–76
Article MATH MathSciNet Google Scholar
López de Màntaras, R.: A Distance-Based Attribute Selection Measure for Decision Tree Induction. Mach. Learn. 6 (1991) 81–92
Article Google Scholar
Quinlan, J.R.: Induction of Decision Trees. Mach. Learn. 1 (1986) 81–106
Google Scholar
Zighed, D.A., Rakotomalala, R., Feschet, F.: Optimal Multiple Intervals Discretization of Continuous Attributes for Supervised Learning. In: Heckerman, D. et al. (eds.), Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA (1997) 295–298
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Helsinki, P. O. Box 26 (Teollisuuskatu 23), FIN-00014, Finland
Tapio Elomaa
VTT Biotechnology, P. O. Box 1500 (Tietotie 2), FIN-02044 VTT, Finland
Juho Rousu

Authors

Tapio Elomaa
View author publications
You can also search for this author in PubMed Google Scholar
Juho Rousu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, 01-237, Warsaw, Poland
Zbigniew W. Raś
Department of Computer Science, University of North Carolina, NC 28223, Charlotte, USA
Zbigniew W. Raś
Department of Science and Technology, Waseda University, 61-414, 3-4-1, Ohkubo, Shinjuku-ku, 169-8555, Tokyo, Japan
Setsuo Ohsuga

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Elomaa, T., Rousu, J. (2000). On the Complexity of Optimal Multisplitting. In: Raś, Z.W., Ohsuga, S. (eds) Foundations of Intelligent Systems. ISMIS 2000. Lecture Notes in Computer Science(), vol 1932. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39963-1_58

Download citation

DOI: https://doi.org/10.1007/3-540-39963-1_58
Published: 02 July 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41094-2
Online ISBN: 978-3-540-39963-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics