Skip to main content

Hierarchical Classification of Documents with Error Control

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Included in the following conference series:

Abstract

Classification is a function that matches a new object with one of the predefined classes. Document classification is characterized by the large number of attributes involved in the objects (documents). The traditional method of building a single classifier to do all the classification work would incur a high overhead. Hierarchical classification is a more efficient method — instead of a single classifier, we use a set of classifiers distributed over a class taxonomy, one for each internal node. However, once a misclassification occurs at a high level class, it may result in a class that is far apart from the correct one. An existing approach to coping with this problem requires terms also to be arranged hierarchically. In this paper, instead of overhauling the classifier itself, we propose mechanisms to detect misclassification and take appropriate actions. We then discuss an alternative that masks the misclassification based on a well known software fault tolerance technique. Our experiments show our algorithms represent a good trade-off between speed and accuracy in most applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Almualim, Y. Akiba, S. Kaneda, “An efficient algorithm for finding optimal gain-ratio multiple-split tests on hierarchical attributes in decision tree learning”, Proc. of National Conf. on Artificial Intelligence, AAAI 1996, pp 703–708.

    Google Scholar 

  2. R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer and A. Swami, “An interval classifier for database mining applications”, Proc. of VLDB, 1992, pp 560–573.

    Google Scholar 

  3. L. Breiman, J. Friedman, R. Olshen and C. Stone, “Classification and regression trees”, Wadsworth, Belmont, 1984.

    Google Scholar 

  4. S. Chakrabarti, B. Dom, R. Agrawal and P. Raghavan, “Using taxonomy, discriminants, and signatures for navigating in text databases”, Proc. of the 23rd VLDB, 1997, pp 446–455.

    Google Scholar 

  5. K. Cios, W. Pedrycz and r. Swiniarski, “Data mining methods for knowledge discovery”, Kluwer Academic Publishers, 1998.

    Google Scholar 

  6. P. Cheeseman, J. Kelly, M. Self, “AutoClass: a Bayesian classification system”, Proc. of 5th Int’l Conf. on Machine Learning, Morgan Kaufman, June 1988.

    Google Scholar 

  7. N. Friedman and M. Goldszmidt, “Building classifiers using Bayesian networks”, Proc. of AAAI, 1996, 1277–1284.

    Google Scholar 

  8. T. Fukuda, Y. Morimoto and S. Morishita, “Constructing efficient decision trees by using optimized numeric association rules”, Proc. Of VLDB, 1996, pp 146–155.

    Google Scholar 

  9. J. Gehrke, R. Ramakrishnan and V. Ganti, “Rainforest-a framework for fast decision tree construction of large datasets”, Proc. of VLDB, 1998, pp 416–427.

    Google Scholar 

  10. D. Heckerman, “Bayesian networks for data mining”, Data Mining and Knowledge Discovery, 1, 1997, pp 79–119.

    Article  Google Scholar 

  11. D. Koller and M. Sahami, “Toward optimal feature selection”, Proc. of Int’l. Conf. on Machine Learning, Vol. 13, Morgan-Kaufmann, 1996.

    Google Scholar 

  12. D. Koller and M. Sahami, “Hierarchically classifying documents using very few words”, Proc. of the 14th Int’l. Conf. on Machine Learning, 1997, pp 170–178.

    Google Scholar 

  13. M. Mehta, R. Agrawal and J Rissanen, “SLIQ: a fast scalable classifier for data mining”, Proc. of fifth Int’l Conf. on EDBT, March 1996

    Google Scholar 

  14. J. Quinlan, “Induction of decision trees”, Machine Learning, 1986, pp 81–106.

    Google Scholar 

  15. J. Quinlan, “C4.5: programs for machine learning”, Morgan Kaufman, 1993.

    Google Scholar 

  16. G. Salton, “Automatic text processing, the transformation analysis and retrieval of information by computer”, Addison-Wesley, 1989.

    Google Scholar 

  17. J. Shafer, R. Agrawal and M. Mehta, “Sprint: a scalable parallel classifier for data mining”, Proc. of the 22nd VLDB, 1996, pp 544–555.

    Google Scholar 

  18. E.S. Ristad, “A natural law of succession”, Research report CS-TR-495-95, Princeton University, July 1995.

    Google Scholar 

  19. S. Weiss, and C. Kulikowski, “Computer systems that learn: Classification and prediction methods from statistics, neural nets, machine learning and expert systems”, Morgan Faufman, 1991.

    Google Scholar 

  20. K. Wang, S. Zhou and S.C. Liew, “Building hierarchical classifiers using class proximity”, Proc. of the 25th VLDB, 1999, pp 363–374.

    Google Scholar 

  21. Y. Morimoto, T. Fukuda, H. Matsuzawa, T. Tokuyama and K. Yoda, “Algorithms for mining association rules for binary segmentations of huge categorical databases ”, Proc. of VLDB, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cheng, Ch., Tang, J., Wai-chee Fu, A., King, I. (2001). Hierarchical Classification of Documents with Error Control. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_46

Download citation

  • DOI: https://doi.org/10.1007/3-540-45357-1_46

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41910-5

  • Online ISBN: 978-3-540-45357-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics