Abstract
Text classification is becoming more important with the proliferation of the Internet and the huge amount of data it transfers. We present an efficient algorithm for text classification using hierarchical classifiers based on a concept hierarchy. The simple TFIDF classifier is chosen to train sample data and to classify other new data. Despite its simplicity, results of experiments on Web pages and TV closed captions demonstrate high classification accuracy. Application of feature subset selection techniques improves the performance. Our algorithm is computationally efficient being bounded by O(n log n) for n samples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Chakrabarti, B. Dom, R. Agrawal, P. Raghavan. Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases. In Proceedings of the 23rd VLDB Conference, 1997.
CNN.com. http://www.cnn.com/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. In Proceedings of the 15th Conference on Artificial Intelligence, 1998.
R. Korfhage. Information Storage and Retrieval. New York: Wiley, 1997.
A. McCallum, K. Nigam, J. Rennie, and K. Seymore. Building Domain-Specific Search Engines with Machine Learning Techniques. In AAAI-99 Spring Symposium on Intelligent Agents in Cyberspace, 1999.
T. Mitchell. Machine Learning. New York: McGraw Hill, 1997.
D. Mladenic. Text-learning and related intelligent agents: a survey. In IEEE Intelligent Systems, Vol. 14, (no. 4), pages 44–54 July–Aug. 1999.
D. Mladenic and M. Grobelnik. Featrure Selection for Classification based on Text Hierarchy. In Working Notes of Learning from Text and the Web, Conference on Automated Learning and Discovery (CONALD), 1998.
US Patent and Trademark Office. http://www.uspto.gov
M. Sahami. Using Machine Learning to Improve Information Access. Ph.D. Dissertation, Department of Computer Science, Stanford University. 1998.
G. Salton. Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer. Reading, Massachusetts: Addison-Wesley, 1989.
M. Sanderson, B. Croft. Deriving concept hierarchies from text. In Proceedings of the 22nd ACM SIGIR Conference, pages 206–213, 1999.
Yahoo. http://www.yahoo.com/
Y. Yang. A Re-examination of Text Categorization Methods. In Proceedings of the 22nd ACM SIGIR Conference, pages 42–49, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chuang, W.T., Tiyyagura, A., Yang, J., Giuffrida, G. (2000). A Fast Algorithm for Hierarchical Text Classification. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2000. Lecture Notes in Computer Science, vol 1874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44466-1_41
Download citation
DOI: https://doi.org/10.1007/3-540-44466-1_41
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67980-6
Online ISBN: 978-3-540-44466-4
eBook Packages: Springer Book Archive