A New Inductive Learning Method for Multilabel Text Categorization

Chang, Yu-Chuan; Chen, Shyi-Ming; Liau, Churn-Jung

doi:10.1007/11779568_132

Yu-Chuan Chang²⁰,
Shyi-Ming Chen²⁰ &
Churn-Jung Liau²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4031))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1647 Accesses

Abstract

In this paper, we present a new inductive learning method for multilabel text categorization. The proposed method uses a mutual information measure to select terms and constructs document descriptor vectors for each category based on these terms. These document descriptor vectors form a document descriptor matrix. It also uses the document descriptor vectors to construct a document-similarity matrix based on the "cosine similarity measure". It then constructs a term-document relevance matrix by applying the inner product of the document descriptor matrix to the document similarity matrix. The proposed method infers the degree of relevance of the selected terms to construct the category descriptor vector of each category. Then, the relevance score between each category and a testing document is calculated by applying the inner product of its category descriptor vector to the document descriptor vector of the testing document. The maximum relevance score L is then chosen. If the relevance score between a category and the testing document divided by L is not less than a predefined threshold value λ between zero and one, then the document is classified into that category. We also compare the classification accuracy of the proposed method with that of the existing learning methods (i.e., Find Similar, Naïve Bayes, Bayes Nets and Decision Trees) in terms of the break-even point of micro-averaging for categorizing the "Reuters-21578 Aptè split" data set. The proposed method gets a higher average accuracy than the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Hybrid Approach for Classification of Text Documents Using Naïve Bayes and Instance-Based Learning

Semi-supervised Text Categorization Using Recursive K-means Clustering

Automatic Document Classification Based on J.S. Mill’s Ideas

References

Aptè, C., Damerau, F.J., Weiss, S.M.: Automatic Learning of Decision Rules for Text Categorization. ACM Transactions on Information Systems 1, 233–251 (1997)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)
Google Scholar
Bekkerman, R., Ran, E.Y., Tishby, N., Winter, Y.: Distributional Word Clusters vs. Words for Text Categorization. Journal of Machine Learning Research, 1183–1208 (2003)
Google Scholar
Caropreso, M.F., Matwin, S., Sebastiani, F.: A Learner-Independent Evaluation of the Usefulness of Statistical Phrases for Automated Text Categorization. In: Chin, A.G. (ed.) Text Databases and Document Management: Theory and Practice, pp. 78–102. Idea Group Publishing, Hershey (2001)
Google Scholar
Chinkering, D., Heckerman, D., Meek, C.: A Bayesian Approach for Learning Bayesian Networks with Local Structure. In: Proceedings of Thirteen Conference on Uncertainty in Artificial Intelligence, pp. 80–89. Morgan Kaufmann, San Franscisco (1997)
Google Scholar
Cohen, W.W., Singer, Y.: Context-Sensitive Learning Methods for Text Categorization. ACM Transactions on Information Systems 17, 141–173 (1999)
Article Google Scholar
Dhillon, I.S., Mallela, S., Kumar, R.: A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification. Journal of Machine Learning Research 3, 1265–1287 (2003)
MATH MathSciNet Google Scholar
Dumais, S.T., Platt, J., Heckerman, D., Sahami, M.: Inductive Learning Algorithms and Representation for Text Categorization. In: Proceedings of CIKM 1998, 7th ACM International Conference on Information and Knowledge Management, Bethesda MD, pp. 148–155 (1998)
Google Scholar
Fuhr, N., Buckley, C.: A Probabilistic Learning Approach for Document Indexing. ACM Transactions on Information Systems 9, 248–323 (1991)
Article Google Scholar
Fuhr, N., Pfeifer, U.: Probabilistic Information Retrieval as Combination of Abstraction Inductive Learning and Probabilistic Assumptions. ACM Transactions on Information Systems 12, 92–115 (1994)
Article Google Scholar
Hankerson, D., Harris, G.A., Johnson Jr., P.D.: Introduction to Information Theory and Data Compression. CRC Press, Boca Raton (1998)
MATH Google Scholar
Lewis, D.D., Ringuetee, M.: Comparison of Two Learning Algorithms for Text Categorization. In: Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)
Google Scholar
Rocchio, J.J.: Relevance Feedback in Information Retrieval. In: Salton, G. (ed.) The SMART Retrieval System: Experiments in Automatic Document Processing, pp. 313–323. Prentice-Hall, New Jersey (1971)
Google Scholar
Sahami, M.: Learning Limited Dependence Bayesian Classifiers. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 335–338. AAAI Press, Menlo Park (1996)
Google Scholar
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34, 1–47 (2002)
Article MathSciNet Google Scholar
Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1, 69–90 (1999)
Article Google Scholar
Yang, Y., Chute, C.G.: An Example-based Mapping Method for Text Categorization and Retrieval. ACM Transactions on Information Systems 12, 252–277 (1994)
Article Google Scholar
Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Proceedings of SIGIR 1999 22th ACM International Conference on Research and Development in Information Retrieval, pp. 42–49. Berkeley, California (1999)
Google Scholar
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of ICML 1997. 14th International Conference on Machine Learning, Nashville, TN, pp. 412–420 (1997)
Google Scholar
Reuters-21578 Aptè split data set, http://kdd.ics.uci.edu/data-bases/reuters21578/reuters21578.html
Reuters-21578 Aptè split 10 categories data set, http://ai-nlp.info.uniroma2.it/moschitti/corpora.htm

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, R.O.C.
Yu-Chuan Chang & Shyi-Ming Chen
Institute of Information Science, Academia Sinica, Taipei, Taiwan, R.O.C.
Churn-Jung Liau

Authors

Yu-Chuan Chang
View author publications
You can also search for this author in PubMed Google Scholar
Shyi-Ming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Churn-Jung Liau
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Texas State University-San Marcos, Nueces 247, 601 University Drive, 78666-4616, San Marcos, TX, USA
Moonis Ali
ESIA Laboratoire d’Informatique, Sytèmes, Traitement de l’Information et de la Connaissance, Université de Savoie, B.P. 806, F-74016, ANNECY Cedex, France
Richard Dapoigny

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chang, YC., Chen, SM., Liau, CJ. (2006). A New Inductive Learning Method for Multilabel Text Categorization. In: Ali, M., Dapoigny, R. (eds) Advances in Applied Artificial Intelligence. IEA/AIE 2006. Lecture Notes in Computer Science(), vol 4031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11779568_132

Download citation

DOI: https://doi.org/10.1007/11779568_132
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35453-6
Online ISBN: 978-3-540-35454-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics