Weakly Supervised Short Text Categorization Using World Knowledge

Türker, Rima; Zhang, Lei; Alam, Mehwish; Sack, Harald

doi:10.1007/978-3-030-62419-4_33

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12506))

Included in the following conference series:

International Semantic Web Conference

2470 Accesses
6 Citations

Abstract

Short text categorization is an important task in many NLP applications, such as sentiment analysis, news feed categorization, etc. Due to the sparsity and shortness of the text, many traditional classification models perform poorly if they are directly applied to short text. Moreover, supervised approaches require large amounts of manually labeled data, which is a costly, labor intensive, and time-consuming task. This paper proposes a weakly supervised short text categorization approach, which does not require any manually labeled data. The proposed model consists of two main modules: (1) a data labeling module, which leverages an external Knowledge Base (KB) to compute probabilistic labels for a given unlabeled training data set, and (2) a classification model based on a Wide & Deep learning approach. The effectiveness of the proposed method is validated via evaluation on multiple datasets. The experimental results show that the proposed approach outperforms unsupervised state-of-the-art classification approaches and achieves comparable performance to supervised approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Chang, M.W., Ratinov, L.A., Roth, D., Srikumar, V.: Importance of semantic representation: dataless classification. In: AAAI (2008)
Google Scholar
Chen, J., Hu, Y., Liu, J., Xiao, Y., Jiang, H.: Deep short text classification with knowledge powered attention. In: AAAI (2019)
Google Scholar
Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: IJCAI (2011)
Google Scholar
Chen, P., Sun, Z., Bing, L., Yang, W.: Recurrent attention network on memory for aspect sentiment analysis. In: EMNLP (2017)
Google Scholar
Cheng, H., et al.: Wide & deep learning for recommender systems. In: DLRS@RecSys (2016)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Google Scholar
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 1895–1923 (1998)
Article Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJCAI (2007)
Google Scholar
Hingmire, S., Chougule, S., Palshikar, G.K., Chakraborti, S.: Document classification by topic labeling. In: SIGIR (2013)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML (2014)
Google Scholar
Lehmann, J., et al.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 6, 167–195 (2015)
Article Google Scholar
Li, C., Xing, J., Sun, A., Ma, Z.: Effective document labeling with very few seed words: a topic model approach. In: CIKM (2016)
Google Scholar
Li, M., Clinton, G., Miao, Y., Gao, F.: Short text classification via knowledge powered attention with similarity matrix based CNN. CoRR (2020)
Google Scholar
Linmei, H., Yang, T., Shi, C., Ji, H., Li, X.: Heterogeneous graph attention networks for semi-supervised short text classification. In: EMNLP-IJCNLP (2019)
Google Scholar
Ma, Y., Peng, H., Cambria, E.: Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. In: AAAI (2018)
Google Scholar
Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised neural text classification. In: CIKM (2018)
Google Scholar
Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised hierarchical text classification. In: AAAI (2019)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)
Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39, 103–134 (2000)
Article Google Scholar
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW (2008)
Google Scholar
Rabinovich, E., et al.: Learning concept abstractness using weak supervision. In: EMNLP (2018)
Google Scholar
Ratner, A., Bach, S.H., Ehrenberg, H.R., Fries, J.A., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. In: PVLDB (2017)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese BERT-networks. In: EMNLP-IJCNLP (2019)
Google Scholar
Song, G., Ye, Y., Du, X., Huang, X., Bie, S.: Short text classification: a survey. J. Multimedia 9, 635 (2014)
Article Google Scholar
Song, Y., Roth, D.: On dataless hierarchical text classification. In: AAAI (2014)
Google Scholar
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE: large-scale information network embedding. In: WWW (2015)
Google Scholar
Türker, R., Zhang, L., Koutraki, M., Sack, H.: TECNE: knowledge based text classification using network embeddings. In: EKAW (2018)
Google Scholar
Türker, R., Zhang, L., Koutraki, M., Sack, H.: “The less is more” for text classification. In: SEMANTiCS (2018)
Google Scholar
Türker, R., Zhang, L., Koutraki, M., Sack, H.: Knowledge-based short text categorization using entity and category embedding. In: ESWC (2019)
Google Scholar
Wang, J., Wang, Z., Zhang, D., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: IJCAI (2017)
Google Scholar
Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174, 806–814 (2016)
Article Google Scholar
Xuan, J., Jiang, H., Ren, Z., Yan, J., Luo, Z.: Automatic bug triage using semi-supervised text classification. In: SEKE (2010)
Google Scholar
Zeng, J., Li, J., Song, Y., Gao, C., Lyu, M.R., King, I.: Topic memory networks for short text classification. In: EMNLP (2018)
Google Scholar
Zhang, X., LeCun, Y.: Text understanding from scratch. CoRR (2015)
Google Scholar
Zhang, Y., Wallace, B.C.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. CoRR (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

FIZ Karlsruhe – Leibniz Institute for Information Infrastructure, Eggenstein-Leopoldshafen, Germany
Rima Türker, Lei Zhang, Mehwish Alam & Harald Sack
Karlsruhe Institute of Technology, Institute AIFB, Karlsruhe, Germany
Rima Türker, Mehwish Alam & Harald Sack

Authors

Rima Türker
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Mehwish Alam
View author publications
You can also search for this author in PubMed Google Scholar
Harald Sack
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rima Türker .

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, UK
Jeff Z. Pan
University of Liverpool, Liverpool, UK
Valentina Tamma
University of Bari, Bari, Italy
Claudia d’Amato
University of California, Santa Barbara, Santa Barbara, CA, USA
Krzysztof Janowicz
California State University, Long Beach, Long Beach, CA, USA
Bo Fu
Vienna University of Economics and Business, Vienna, Austria
Axel Polleres
Rensselaer Polytechnic Institute, Troy, NY, USA
Oshani Seneviratne
Massachusetts Institute of Technology, Cambridge, MA, USA
Lalana Kagal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Türker, R., Zhang, L., Alam, M., Sack, H. (2020). Weakly Supervised Short Text Categorization Using World Knowledge. In: Pan, J.Z., et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12506. Springer, Cham. https://doi.org/10.1007/978-3-030-62419-4_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-62419-4_33
Published: 01 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62418-7
Online ISBN: 978-3-030-62419-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the Semantic Web Science Association (opens in a new tab)