Skip to main content

A Fast Algorithm for Hierarchical Text Classification

  • Conference paper
  • First Online:
Data Warehousing and Knowledge Discovery (DaWaK 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1874))

Included in the following conference series:

Abstract

Text classification is becoming more important with the proliferation of the Internet and the huge amount of data it transfers. We present an efficient algorithm for text classification using hierarchical classifiers based on a concept hierarchy. The simple TFIDF classifier is chosen to train sample data and to classify other new data. Despite its simplicity, results of experiments on Web pages and TV closed captions demonstrate high classification accuracy. Application of feature subset selection techniques improves the performance. Our algorithm is computationally efficient being bounded by O(n log n) for n samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Chakrabarti, B. Dom, R. Agrawal, P. Raghavan. Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases. In Proceedings of the 23rd VLDB Conference, 1997.

    Google Scholar 

  2. CNN.com. http://www.cnn.com/

  3. M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. In Proceedings of the 15th Conference on Artificial Intelligence, 1998.

    Google Scholar 

  4. R. Korfhage. Information Storage and Retrieval. New York: Wiley, 1997.

    Google Scholar 

  5. A. McCallum, K. Nigam, J. Rennie, and K. Seymore. Building Domain-Specific Search Engines with Machine Learning Techniques. In AAAI-99 Spring Symposium on Intelligent Agents in Cyberspace, 1999.

    Google Scholar 

  6. T. Mitchell. Machine Learning. New York: McGraw Hill, 1997.

    MATH  Google Scholar 

  7. D. Mladenic. Text-learning and related intelligent agents: a survey. In IEEE Intelligent Systems, Vol. 14, (no. 4), pages 44–54 July–Aug. 1999.

    Article  Google Scholar 

  8. D. Mladenic and M. Grobelnik. Featrure Selection for Classification based on Text Hierarchy. In Working Notes of Learning from Text and the Web, Conference on Automated Learning and Discovery (CONALD), 1998.

    Google Scholar 

  9. US Patent and Trademark Office. http://www.uspto.gov

  10. M. Sahami. Using Machine Learning to Improve Information Access. Ph.D. Dissertation, Department of Computer Science, Stanford University. 1998.

    Google Scholar 

  11. G. Salton. Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer. Reading, Massachusetts: Addison-Wesley, 1989.

    Google Scholar 

  12. M. Sanderson, B. Croft. Deriving concept hierarchies from text. In Proceedings of the 22nd ACM SIGIR Conference, pages 206–213, 1999.

    Google Scholar 

  13. Yahoo. http://www.yahoo.com/

  14. Y. Yang. A Re-examination of Text Categorization Methods. In Proceedings of the 22nd ACM SIGIR Conference, pages 42–49, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chuang, W.T., Tiyyagura, A., Yang, J., Giuffrida, G. (2000). A Fast Algorithm for Hierarchical Text Classification. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2000. Lecture Notes in Computer Science, vol 1874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44466-1_41

Download citation

  • DOI: https://doi.org/10.1007/3-540-44466-1_41

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67980-6

  • Online ISBN: 978-3-540-44466-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics