A Fast Algorithm for Hierarchical Text Classification

Chuang, Wesley T.; Tiyyagura, Asok; Yang, Jihoon; Giuffrida, Giovanni

doi:10.1007/3-540-44466-1_41

Wesley T. Chuang^7,9,
Asok Tiyyagura⁸,
Jihoon Yang⁹ &
…
Giovanni Giuffrida⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1874))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

557 Accesses
11 Citations

Abstract

Text classification is becoming more important with the proliferation of the Internet and the huge amount of data it transfers. We present an efficient algorithm for text classification using hierarchical classifiers based on a concept hierarchy. The simple TFIDF classifier is chosen to train sample data and to classify other new data. Despite its simplicity, results of experiments on Web pages and TV closed captions demonstrate high classification accuracy. Application of feature subset selection techniques improves the performance. Our algorithm is computationally efficient being bounded by O(n log n) for n samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Chakrabarti, B. Dom, R. Agrawal, P. Raghavan. Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases. In Proceedings of the 23rd VLDB Conference, 1997.
Google Scholar
CNN.com. http://www.cnn.com/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. In Proceedings of the 15th Conference on Artificial Intelligence, 1998.
Google Scholar
R. Korfhage. Information Storage and Retrieval. New York: Wiley, 1997.
Google Scholar
A. McCallum, K. Nigam, J. Rennie, and K. Seymore. Building Domain-Specific Search Engines with Machine Learning Techniques. In AAAI-99 Spring Symposium on Intelligent Agents in Cyberspace, 1999.
Google Scholar
T. Mitchell. Machine Learning. New York: McGraw Hill, 1997.
MATH Google Scholar
D. Mladenic. Text-learning and related intelligent agents: a survey. In IEEE Intelligent Systems, Vol. 14, (no. 4), pages 44–54 July–Aug. 1999.
Article Google Scholar
D. Mladenic and M. Grobelnik. Featrure Selection for Classification based on Text Hierarchy. In Working Notes of Learning from Text and the Web, Conference on Automated Learning and Discovery (CONALD), 1998.
Google Scholar
US Patent and Trademark Office. http://www.uspto.gov
M. Sahami. Using Machine Learning to Improve Information Access. Ph.D. Dissertation, Department of Computer Science, Stanford University. 1998.
Google Scholar
G. Salton. Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer. Reading, Massachusetts: Addison-Wesley, 1989.
Google Scholar
M. Sanderson, B. Croft. Deriving concept hierarchies from text. In Proceedings of the 22nd ACM SIGIR Conference, pages 206–213, 1999.
Google Scholar
Yahoo. http://www.yahoo.com/
Y. Yang. A Re-examination of Text Categorization Methods. In Proceedings of the 22nd ACM SIGIR Conference, pages 42–49, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, UCLA, Los Angeles, CA, 90095, USA
Wesley T. Chuang & Giovanni Giuffrida
Department of Computer Science, Iowa State University, Ames, IA, 50011, USA
Asok Tiyyagura
HRL Laboratories, LLC, 3011Malibu Canyon Rd, Malibu, CA, 90265, USA
Wesley T. Chuang & Jihoon Yang

Authors

Wesley T. Chuang
View author publications
You can also search for this author in PubMed Google Scholar
Asok Tiyyagura
View author publications
You can also search for this author in PubMed Google Scholar
Jihoon Yang
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Giuffrida
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto, 606-8501, Japan
Yahiko Kambayashi
Computer Science Department, Western Michigan University, Kalamazoo, MI, 49008, USA
Mukesh Mohania
Vienna University of Technology, IFS, Favoritenstr. 9-11/188, 1040, Vienna, Austria
A. Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chuang, W.T., Tiyyagura, A., Yang, J., Giuffrida, G. (2000). A Fast Algorithm for Hierarchical Text Classification. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2000. Lecture Notes in Computer Science, vol 1874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44466-1_41

Download citation

DOI: https://doi.org/10.1007/3-540-44466-1_41
Published: 06 July 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67980-6
Online ISBN: 978-3-540-44466-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics