Skip to main content

Text Classification for DAG-Structured Categories

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3518))

Abstract

Hierarchical text classification concerning the relationship among categories has become an interesting problem recently. Most research has focused on tree-structured categories, but in reality directed acyclic graph (DAG) – structured categories, where a child category may have more than one parent category, appear more often. In this paper, we introduce three approaches, namely, flat, tree-based, and DAG-based, for solving the multi-label text classification problem in which categories are organized as a DAG, and documents are classified into both leaf and internal categories. We also present experimental results of the methods using SVMs as classifiers on the Reuters-21578 collection and our data set of research papers in Artificial Intelligence.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ACM Portal, http://portal.acm.org/portal.cfm

  2. Aixin, S., Ee-Peng, L.: Hierarchical Text Classification and Evaluation. In: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 521–528 (2001)

    Google Scholar 

  3. Blockeel, H., Bruynooghe, M., Dzeroski, S., Ramon, J., Struyf: Hierarchical Multi-Classification. In: Proceedings of the 1st SIGKDD Workshop on Multi-Relational Data Mining, pp. 21–35 (2002)

    Google Scholar 

  4. Cao, T.H., Ta, H.D.N., Tran, C.T.Q.: Searching the Web: a Semantics-Based Approach. In: Proceedings of the 2003 International Conference on High Performance Scientific Computing, pp. 57–68. Springer, Heidelberg (2004)

    Google Scholar 

  5. Chang, C.C., Lin, C.J.: LIBSVM - A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin

  6. Citeseer, http://citeseer.ist.psu.edu/

  7. Cristianini, N., Taylor, J.S.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  8. Dumais, S., Chen, H.: Hierarchical classification of Web content. In: Proceedings of the 23rd ACM Internation Conference on Research and Development in Information Retrieval, pp. 256–263 (2000)

    Google Scholar 

  9. Huynh, T.N., Vu, D.Q., Cao, T.H.: Automatic Topics Extraction from Artificial Intelligence Research Papers. In: Proceedings of the 2004 International School on Computational Sciences and Engineering, pp. 133–139 (2004)

    Google Scholar 

  10. Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer Academic Publishers, Dordrecht (2001)

    Google Scholar 

  11. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevent Features. In: European Conference on Machine Learning (1998)

    Google Scholar 

  12. Keerthi, S.S., Lin, C.J.: Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel. In: Proceedings of the 15th International Conference on Neural Computation, pp. 1667–1689 (2003)

    Google Scholar 

  13. Koller, D., Sahami, M.: Hierarchically Classifying Documents Using a Very Few Words. In: Proceedings of the 14th International Conference on Machine Learning, pp. 170–178 (1997)

    Google Scholar 

  14. Krebel, U.: Pairwise Classification and Support Vector Machines. In: Scholkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods – Support Vector Learnings, pp. 255–268. MIT Press, Cambridge (1999)

    Google Scholar 

  15. McCallum, A.: Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering, http://www.cs.cmu.edu/~mccallum/bow

  16. McCallum, A., Rosenfeld, R., Mitchell, T., Andrew, Y.N.: Improving Text Classification by Shrinkage in a Hierarchy of Classes. In: Proceedings of the 15th International Conference on Machine Learning, pp. 359–367 (1998)

    Google Scholar 

  17. MIT AI Library, http://www.ai.mit.edu/

  18. Reuters-21578, http://www.daviddlewis.com/resources/testcollections/reuters221578/

  19. Rijsbergen, C.J.V.: Information Retrievel, 2nd edn. Butterworths (1979)

    Google Scholar 

  20. Stanford CS Technical Reports, http://www-db.standford.edu/TR/

  21. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    MATH  Google Scholar 

  22. Vapnik, V.: Statistical Learning Theory. John Wiley, Chichester (1998)

    MATH  Google Scholar 

  23. Witten, I.H., Frank, E.: DataMining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  24. Yang, Y.: An Evaluation of Statistical Approaches to Text Classification. Journal of Information Retrieval 1(1/2), 67–88 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nguyen, C.D., Dung, T.A., Cao, T.H. (2005). Text Classification for DAG-Structured Categories. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_36

Download citation

  • DOI: https://doi.org/10.1007/11430919_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26076-9

  • Online ISBN: 978-3-540-31935-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics