Skip to main content

Weakly Supervised Short Text Categorization Using World Knowledge

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2020 (ISWC 2020)

Abstract

Short text categorization is an important task in many NLP applications, such as sentiment analysis, news feed categorization, etc. Due to the sparsity and shortness of the text, many traditional classification models perform poorly if they are directly applied to short text. Moreover, supervised approaches require large amounts of manually labeled data, which is a costly, labor intensive, and time-consuming task. This paper proposes a weakly supervised short text categorization approach, which does not require any manually labeled data. The proposed model consists of two main modules: (1) a data labeling module, which leverages an external Knowledge Base (KB) to compute probabilistic labels for a given unlabeled training data set, and (2) a classification model based on a Wide & Deep learning approach. The effectiveness of the proposed method is validated via evaluation on multiple datasets. The experimental results show that the proposed approach outperforms unsupervised state-of-the-art classification approaches and achieves comparable performance to supervised approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/ISE-FIZKarlsruhe/WESSTEC.

  2. 2.

    https://code.google.com/archive/p/word2vec/.

  3. 3.

    https://github.com/madhasri/Twitter-Trending-Topic-Classification/tree/master/data.

  4. 4.

    https://github.com/google-research/bert.

References

  1. Chang, M.W., Ratinov, L.A., Roth, D., Srikumar, V.: Importance of semantic representation: dataless classification. In: AAAI (2008)

    Google Scholar 

  2. Chen, J., Hu, Y., Liu, J., Xiao, Y., Jiang, H.: Deep short text classification with knowledge powered attention. In: AAAI (2019)

    Google Scholar 

  3. Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: IJCAI (2011)

    Google Scholar 

  4. Chen, P., Sun, Z., Bing, L., Yang, W.: Recurrent attention network on memory for aspect sentiment analysis. In: EMNLP (2017)

    Google Scholar 

  5. Cheng, H., et al.: Wide & deep learning for recommender systems. In: DLRS@RecSys (2016)

    Google Scholar 

  6. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)

    Google Scholar 

  7. Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 1895–1923 (1998)

    Article  Google Scholar 

  8. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJCAI (2007)

    Google Scholar 

  9. Hingmire, S., Chougule, S., Palshikar, G.K., Chakraborti, S.: Document classification by topic labeling. In: SIGIR (2013)

    Google Scholar 

  10. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML (2014)

    Google Scholar 

  11. Lehmann, J., et al.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 6, 167–195 (2015)

    Article  Google Scholar 

  12. Li, C., Xing, J., Sun, A., Ma, Z.: Effective document labeling with very few seed words: a topic model approach. In: CIKM (2016)

    Google Scholar 

  13. Li, M., Clinton, G., Miao, Y., Gao, F.: Short text classification via knowledge powered attention with similarity matrix based CNN. CoRR (2020)

    Google Scholar 

  14. Linmei, H., Yang, T., Shi, C., Ji, H., Li, X.: Heterogeneous graph attention networks for semi-supervised short text classification. In: EMNLP-IJCNLP (2019)

    Google Scholar 

  15. Ma, Y., Peng, H., Cambria, E.: Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. In: AAAI (2018)

    Google Scholar 

  16. Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised neural text classification. In: CIKM (2018)

    Google Scholar 

  17. Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised hierarchical text classification. In: AAAI (2019)

    Google Scholar 

  18. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)

    Google Scholar 

  19. Nigam, K., McCallum, A., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39, 103–134 (2000)

    Article  Google Scholar 

  20. Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW (2008)

    Google Scholar 

  21. Rabinovich, E., et al.: Learning concept abstractness using weak supervision. In: EMNLP (2018)

    Google Scholar 

  22. Ratner, A., Bach, S.H., Ehrenberg, H.R., Fries, J.A., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. In: PVLDB (2017)

    Google Scholar 

  23. Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese BERT-networks. In: EMNLP-IJCNLP (2019)

    Google Scholar 

  24. Song, G., Ye, Y., Du, X., Huang, X., Bie, S.: Short text classification: a survey. J. Multimedia 9, 635 (2014)

    Article  Google Scholar 

  25. Song, Y., Roth, D.: On dataless hierarchical text classification. In: AAAI (2014)

    Google Scholar 

  26. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE: large-scale information network embedding. In: WWW (2015)

    Google Scholar 

  27. Türker, R., Zhang, L., Koutraki, M., Sack, H.: TECNE: knowledge based text classification using network embeddings. In: EKAW (2018)

    Google Scholar 

  28. Türker, R., Zhang, L., Koutraki, M., Sack, H.: “The less is more” for text classification. In: SEMANTiCS (2018)

    Google Scholar 

  29. Türker, R., Zhang, L., Koutraki, M., Sack, H.: Knowledge-based short text categorization using entity and category embedding. In: ESWC (2019)

    Google Scholar 

  30. Wang, J., Wang, Z., Zhang, D., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: IJCAI (2017)

    Google Scholar 

  31. Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174, 806–814 (2016)

    Article  Google Scholar 

  32. Xuan, J., Jiang, H., Ren, Z., Yan, J., Luo, Z.: Automatic bug triage using semi-supervised text classification. In: SEKE (2010)

    Google Scholar 

  33. Zeng, J., Li, J., Song, Y., Gao, C., Lyu, M.R., King, I.: Topic memory networks for short text classification. In: EMNLP (2018)

    Google Scholar 

  34. Zhang, X., LeCun, Y.: Text understanding from scratch. CoRR (2015)

    Google Scholar 

  35. Zhang, Y., Wallace, B.C.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. CoRR (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rima Türker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Türker, R., Zhang, L., Alam, M., Sack, H. (2020). Weakly Supervised Short Text Categorization Using World Knowledge. In: Pan, J.Z., et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12506. Springer, Cham. https://doi.org/10.1007/978-3-030-62419-4_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62419-4_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62418-7

  • Online ISBN: 978-3-030-62419-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics