On the difficulty of designing good classifiers

Grigni, Michelangelo; Mirelli, Vincent; Papadimitriou, Christos H.

doi:10.1007/3-540-61332-3_161

Michelangelo Grigni¹^nAff2,
Vincent Mirelli³ &
Christos H. Papadimitriou¹^nAff4

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1090))

Included in the following conference series:

International Computing and Combinatorics Conference

161 Accesses

Abstract

It is a very interesting and well-studied problem, given two point sets W, B \(\subseteq\)ℜⁿ, to design a linear decision tree that classifies them —that is, no leaf subdivision contains points from both B and W — and is as simple as possible, either in terms of the total number of nodes, or in terms of its depth. We show that, unless ZPP=NP, the depth of a classifier cannot be approximated within a factor smaller than 6/5, and that the total number of nodes cannot be approximated within a factor smaller than n ^1/5. Our proof relies on a simple connection between this problem and graph coloring, and uses recent nonapproximability results for graph coloring. We also study the problem of designing a classifier with a single inequality that involves as few variables as possible, and point out certain aspects of the difficulty of this problem.

Research partially supported by the National Science Foundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

G. Ausiello, D'Atri, and M. Protasi. Structure preserving reductions among convex optimization problems. Journal of Computer and System Sciences, 21:136–153, 1980.
Article Google Scholar
S. Arora, C. Lund, R. Morwani, M. Sudan, M. Szegedy Proof verification and hardness of approximation problems Proc. 33rd FOCS, 1992.
Google Scholar
Hans L. Bodlaender, Michael R. Fellows, and Michael T. Hallett. Beyond np-completeness for problems of bounded width: Hardness for the W hierarchy. In 26th Annual ACM Symposium on Theory of Computing (STOC), pages 449–458, 1994.
Google Scholar
Leo Breiman, Jerome J. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Wadsworth, 1984.
Google Scholar
B. E. Boser, I. M. Guyon, and V. N. Vapnik. Automatic capacity tuning of very-large VC-dimension classifiers. Manuscript, 1994.
Google Scholar
Bellare, M.; Goldreich, O.; Sudan, M. “Free bits, PCPs and non-approximability — towards tight results” in Proceedings. 36th Annual Symposium on Foundations of Computer Science pages 422–31, 1995. B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory, pages 144–52. ACM, 1992.
Google Scholar
Avrim L. Blum and Ronald L. Rivest. Training a 3-node neural network is NP-complete. Neural Networks, 5:117–127, 1992.
Google Scholar
Cai, Chen, Downey, and Fellows. On the structure of parameterized problems in NP (extended abstract). In Annual Symposium on Theoretical Aspects of Computer Science, 1994.
Google Scholar
[DFK⁺94] Rodney G. Downey, Michael R. Fellows, Bruce M. Kapron, Michael T. Hallett, and H. Todd Wareham. The parameterized complexity of some problems in logic and linguistics. In Third International Symposium on Logical Foundations of Computer Science, pages 89–100. EATCS, Springer-Verlag, 1994.
Google Scholar
Martin Fürer. Improved hardness results for approximating the chromatic number. Abstract 95-19 distributed at Structures, 1995.
Google Scholar
Michael Goodrich, Vincent Mirelli, Mark Orletsky, and Jeffery Salowe. Decision tree construction in fixed dimensions: Being global is hard but local greed is good. Technical Report #1995.01, Johns Hopkins U. Computer Science Dept., 1995.
Google Scholar
Simon Haykin. Neural Networks: A Comprehensive Foundation. Macmillan College Publishing, 1994.
Google Scholar
D. Heath, S. Kasif, and S. Salzberg. Learning oblique decision trees. In Proc. 13th International Joint Conference on Artificial Intelligence, pages 1002–1007. Morgan Kaufmann, 1993. Chambery, France.
Google Scholar
L. Hyafil and R. L. Rivest. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5, 1976.
Google Scholar
M. Hallett and H. Wareham. A compendium of parameterized complexity results. SIGACT News (ACM Special Interest Group on Automata and Computability Theory), 25, 1994. Also available online from ftp: //cs.uvic.ca/pub/W_hierarchy/.
Google Scholar
C. Lund and M. Yannakakis. On the hardness of approximating minimization problems. In Proc. 25th Annual ACM Symposium on Theory of Computing (STOC), pages 286–293. ACM, 1993.
Google Scholar
Nimrod Megiddo. On the complexity of polyhedral separability. Discrete Computational Geometry, 3:325–337, 1988.
Google Scholar
Sreerama K. Murthy, Simon Kasif, and Steven Salzburg. A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2:1–33, 1994.
Google Scholar

Download references

Author information

Michelangelo Grigni
Present address: The Department of Mathematics and Computer Science, Emory University, USA
Christos H. Papadimitriou
Present address: the Division of Computer Science, University of California, Berkeley

Authors and Affiliations

CSE Department, UCSD, USA
Michelangelo Grigni & Christos H. Papadimitriou
Army Research Laboratory, USA
Vincent Mirelli

Authors

Michelangelo Grigni
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Mirelli
View author publications
You can also search for this author in PubMed Google Scholar
Christos H. Papadimitriou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Michelangelo Grigni or Christos H. Papadimitriou .

Editor information

Jin-Yi Cai Chak Kuen Wong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grigni, M., Mirelli, V., Papadimitriou, C.H. (1996). On the difficulty of designing good classifiers. In: Cai, JY., Wong, C.K. (eds) Computing and Combinatorics. COCOON 1996. Lecture Notes in Computer Science, vol 1090. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61332-3_161

Download citation

DOI: https://doi.org/10.1007/3-540-61332-3_161
Published: 04 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61332-9
Online ISBN: 978-3-540-68461-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics