Skip to main content

Semi Supervised Learning Based Text Classification Model for Multi Label Paradigm

  • Conference paper
  • 760 Accesses

Abstract

Automatic text categorization (ATC) is a prominent research area within Information retrieval. Through this paper a classification model for ATC in multi-label domain is discussed. We are proposing a new multi label text classification model for assigning more relevant set of categories to every input text document. Our model is greatly influenced by graph based framework and Semi supervised learning. We demonstrate the effectiveness of our model using Enron, Slashdot, Bibtex and RCV1 datasets. We also compare performance of our model with few popular existing supervised techniques. Our experimental results indicate that the use of Semi Supervised Learning in multi label text classification greatly improves the decision making capability of classifier.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhu, J.: Semi-supervised learning Literature Survey. Computer Science Technical Report TR 1530, University of Wisconsin – Madison (2005)

    Google Scholar 

  2. Chapelle, O., Schfolkopf, B., Zien, A.: Semi-Supervised Learning, 03-08. MIT Press (2006)

    Google Scholar 

  3. Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2007)

    Article  Google Scholar 

  4. Santos, A., Canuto, A., Neto, A.: A comparative analysis of classification methods to multi-label tasks in different application domains. International Journal of Computer Information Systems and Industrial Management Applications 3, 218–227 (2011) ISSN: 2150-7988

    Google Scholar 

  5. Cerri, R., da Silva, R.R.O., de Carvalho, A.C.P.L.F.: Comparing methods for multilabel classification of proteins using machine learning techniques. In: Guimarães, K.S., Panchenko, A., Przytycka, T.M. (eds.) BSB 2009. LNCS, vol. 5676, pp. 109–120. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  6. Tsoumakas, G., Kalliris, G., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proc. of the ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD 2008), pp. 30–44 (2008)

    Google Scholar 

  7. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)

    Article  MATH  Google Scholar 

  8. Liu, Y., Jin, R., Yang, L.: Semi-supervised Multi-label Learning by Constrained Non-Negative Matrix Factorization. In: AAAI (2006)

    Google Scholar 

  9. Zha, Z., Mie, T., Wang, Z., Hua, X.: Graph-Based Semi-Supervised Learning with Multi-label. In: ICME, pp. 1321–1324 (2008)

    Google Scholar 

  10. Chen, G., Song, Y., Zhang, C.: Semi-supervised Multi-label Learning by Solving a Sylvester Equation. In: SDM (2008)

    Google Scholar 

  11. Semi-supervised Nonnegative Matrix factorization. IEEE (January 2011)

    Google Scholar 

  12. Wei, Q., Yang, Z., Junping, Z., Wang, Y.: Semi-supervised Multi- label Learning Algorithm using dependency among labels. In: IPCSIT, vol. 3 (2011)

    Google Scholar 

  13. Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (2004)

    Google Scholar 

  14. Angelova, R., Weikum, G.: Graph based text classification: Learn from your neighbours. In: SIGIR 2006. ACM (2006) 1-59593-369-7/06/0008

    Google Scholar 

  15. Jebara, T., Wang, Chang: Graph construction and b-matching for semi supervised learning. In: Proceedings of ICML- 2009(2009)

    Google Scholar 

  16. Thomas, Ilias, Nello: Scalable corpus annotation by graph construction and label propogation. In: Proceedings of ICPRAM, pp. 25–34 (2012)

    Google Scholar 

  17. Talukdar, P., Pereira, F.: Experimentation in graph based semi supervised learning methods for class instance acquisition. In: The Proceedings of 48th Annual Meet of ACL, pp. 1473–1481 (2010)

    Google Scholar 

  18. Dai, X., Tian, B., Zhou, J., Chen, J.: Incorporating LSI into spectral graph transducer for text classification. In: The Proceedings of AAAI (2008)

    Google Scholar 

  19. Dharmadhikari, S.C., Ingle, M., Kulkarni, P.: Analysis of semi supervised methods towards multi-label text classification. IJCA 42, 15–20, ISBN: 973-93-80866-84-5

    Google Scholar 

  20. Dharmadhikari, S.C., Ingle, M., Kulkarni, P.: A comparative analysis of supervised multi-label text classification methods. IJERA 1(4), 1952–1961, ISSN: 2248-9622

    Google Scholar 

  21. http://mulan.sourceforge.net/datasets.html

  22. http://MEKA.sourceforge.net

  23. http://www.cs.waikato.ac.nz/ml/weka/

  24. Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 254–269. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  25. Schapire, R.E., Singer, Y.: Boostexter: A boosting based system for text categorization. Machine learning 39(2-3) (2000)

    Google Scholar 

  26. Ueda, Saito, K.: Parametric mixture models for multi-labelled text. In: Proc. of NIPS (2002)

    Google Scholar 

  27. Griffiths, Ghahramani: Infinite latent feature models and the Indian buffet process. In: Proc. of NIPS (2005)

    Google Scholar 

  28. Rousu, Saunders: On maximum margin hierarchical multi-label classification. In: Proc. of NIPS Workshop on Learning with Structured Outputs (2004)

    Google Scholar 

  29. Zhu, S., Ji, X., Gong, Y.: Multi-labelled classification using maximum entropy method. In: Proc. of SIGIR (2005)

    Google Scholar 

  30. Ding, C., Jin, R., li, T., Simon, H.: A learning framework using Green’s Function and Kernel Regularization with application to Recommender System. ACM, San Jose (2007) 978-1-59593-609-7/07/0008

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Dharmadhikari, S.C., Ingle, M., Kulkarni, P. (2014). Semi Supervised Learning Based Text Classification Model for Multi Label Paradigm. In: Das, V.V., Elkafrawy, P. (eds) Signal Processing and Information Technology. SPIT 2012. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 117. Springer, Cham. https://doi.org/10.1007/978-3-319-11629-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11629-7_26

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11628-0

  • Online ISBN: 978-3-319-11629-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics