Hierarchical Classification of Documents with Error Control

Cheng, Chun-hung; Tang, Jian; Wai-chee Fu, Ada; King, Irwin

doi:10.1007/3-540-45357-1_46

Chun-hung Cheng⁴,
Jian Tang⁵,
Ada Wai-chee Fu⁴ &
…
Irwin King⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1353 Accesses

Abstract

Classification is a function that matches a new object with one of the predefined classes. Document classification is characterized by the large number of attributes involved in the objects (documents). The traditional method of building a single classifier to do all the classification work would incur a high overhead. Hierarchical classification is a more efficient method — instead of a single classifier, we use a set of classifiers distributed over a class taxonomy, one for each internal node. However, once a misclassification occurs at a high level class, it may result in a class that is far apart from the correct one. An existing approach to coping with this problem requires terms also to be arranged hierarchically. In this paper, instead of overhauling the classifier itself, we propose mechanisms to detect misclassification and take appropriate actions. We then discuss an alternative that masks the misclassification based on a well known software fault tolerance technique. Our experiments show our algorithms represent a good trade-off between speed and accuracy in most applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Document Classification with Hierarchically Structured Dictionaries

HierCost: Improving Large Scale Hierarchical Classification with Cost Sensitive Learning

Benchmarking Classification Algorithms for Measuring the Performance on Maintainable Applications

References

H. Almualim, Y. Akiba, S. Kaneda, “An efficient algorithm for finding optimal gain-ratio multiple-split tests on hierarchical attributes in decision tree learning”, Proc. of National Conf. on Artificial Intelligence, AAAI 1996, pp 703–708.
Google Scholar
R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer and A. Swami, “An interval classifier for database mining applications”, Proc. of VLDB, 1992, pp 560–573.
Google Scholar
L. Breiman, J. Friedman, R. Olshen and C. Stone, “Classification and regression trees”, Wadsworth, Belmont, 1984.
Google Scholar
S. Chakrabarti, B. Dom, R. Agrawal and P. Raghavan, “Using taxonomy, discriminants, and signatures for navigating in text databases”, Proc. of the 23rd VLDB, 1997, pp 446–455.
Google Scholar
K. Cios, W. Pedrycz and r. Swiniarski, “Data mining methods for knowledge discovery”, Kluwer Academic Publishers, 1998.
Google Scholar
P. Cheeseman, J. Kelly, M. Self, “AutoClass: a Bayesian classification system”, Proc. of 5th Int’l Conf. on Machine Learning, Morgan Kaufman, June 1988.
Google Scholar
N. Friedman and M. Goldszmidt, “Building classifiers using Bayesian networks”, Proc. of AAAI, 1996, 1277–1284.
Google Scholar
T. Fukuda, Y. Morimoto and S. Morishita, “Constructing efficient decision trees by using optimized numeric association rules”, Proc. Of VLDB, 1996, pp 146–155.
Google Scholar
J. Gehrke, R. Ramakrishnan and V. Ganti, “Rainforest-a framework for fast decision tree construction of large datasets”, Proc. of VLDB, 1998, pp 416–427.
Google Scholar
D. Heckerman, “Bayesian networks for data mining”, Data Mining and Knowledge Discovery, 1, 1997, pp 79–119.
Article Google Scholar
D. Koller and M. Sahami, “Toward optimal feature selection”, Proc. of Int’l. Conf. on Machine Learning, Vol. 13, Morgan-Kaufmann, 1996.
Google Scholar
D. Koller and M. Sahami, “Hierarchically classifying documents using very few words”, Proc. of the 14th Int’l. Conf. on Machine Learning, 1997, pp 170–178.
Google Scholar
M. Mehta, R. Agrawal and J Rissanen, “SLIQ: a fast scalable classifier for data mining”, Proc. of fifth Int’l Conf. on EDBT, March 1996
Google Scholar
J. Quinlan, “Induction of decision trees”, Machine Learning, 1986, pp 81–106.
Google Scholar
J. Quinlan, “C4.5: programs for machine learning”, Morgan Kaufman, 1993.
Google Scholar
G. Salton, “Automatic text processing, the transformation analysis and retrieval of information by computer”, Addison-Wesley, 1989.
Google Scholar
J. Shafer, R. Agrawal and M. Mehta, “Sprint: a scalable parallel classifier for data mining”, Proc. of the 22nd VLDB, 1996, pp 544–555.
Google Scholar
E.S. Ristad, “A natural law of succession”, Research report CS-TR-495-95, Princeton University, July 1995.
Google Scholar
S. Weiss, and C. Kulikowski, “Computer systems that learn: Classification and prediction methods from statistics, neural nets, machine learning and expert systems”, Morgan Faufman, 1991.
Google Scholar
K. Wang, S. Zhou and S.C. Liew, “Building hierarchical classifiers using class proximity”, Proc. of the 25th VLDB, 1999, pp 363–374.
Google Scholar
Y. Morimoto, T. Fukuda, H. Matsuzawa, T. Tokuyama and K. Yoda, “Algorithms for mining association rules for binary segmentations of huge categorical databases ”, Proc. of VLDB, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
Chun-hung Cheng, Ada Wai-chee Fu & Irwin King
Department of Computer Science, Memorial University of Newfoundland, St. John’s, NF A1B 3X5, Canada
Jian Tang

Authors

Chun-hung Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jian Tang
View author publications
You can also search for this author in PubMed Google Scholar
Ada Wai-chee Fu
View author publications
You can also search for this author in PubMed Google Scholar
Irwin King
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Information Systems, The University of Hong Kong, Pokfulam, Hong Kong China
David Cheung
CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, ACT 2601, Australia
Graham J. Williams
Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Kowloon, Hong Kong China
Qing Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, Ch., Tang, J., Wai-chee Fu, A., King, I. (2001). Hierarchical Classification of Documents with Error Control. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_46

Download citation

DOI: https://doi.org/10.1007/3-540-45357-1_46
Published: 11 April 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics