Skip to main content

Knowledge-Based Dataless Text Categorization

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11762))

Abstract

Text categorization is an important task due to the rapid growth of online available text data in various domains such as web search snippets, news documents, etc. Traditional supervised methods require a significant amount of training data and manually labeling such data can be very time-consuming and costly. Moreover, in case the text to be labeled is of a specific domain, then only the expensive domain experts are able to fulfill the manual labeling task. This thesis focuses on the problem of missing labeled data and aims to develop a novel and generic model which does not require any labeled training data to categorize text. Instead, it utilizes the semantic similarity between documents and the predefined categories by leveraging graph embedding techniques.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://goo.gl/JyCnZq.

  2. 2.

    http://jwebpro.sourceforge.net/data-web-snippets.tar.gz.

  3. 3.

    https://en.wikipedia.org/wiki/Category:Sports.

References

  1. Chang, M.W., Ratinov, L.A., Roth, D., Srikumar, V.: Importance of semantic representation: dataless classification. In: AAAI (2008)

    Google Scholar 

  2. Conneau, A., Schwenk, H., Barrault, L., LeCun, Y.: Very deep convolutional networks for natural language processing. CoRR (2016)

    Google Scholar 

  3. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: IJCAI (2007)

    Google Scholar 

  4. Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014)

    Google Scholar 

  5. Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks. In: CoRR (2016)

    Google Scholar 

  6. Li, C., Xing, J., Sun, A., Ma, Z.: Effective document labeling with very few seed words: a topic model approach. In: CIKM (2016)

    Google Scholar 

  7. Li, Y., Zheng, R., Tian, T., Hu, Z., Iyer, R., Sycara, K.P.: Joint embedding of hierarchical categories and entities for concept categorization and dataless classification. In: COLING (2016)

    Google Scholar 

  8. Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised neural text classification. In: ACM (2018)

    Google Scholar 

  9. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)

    Google Scholar 

  10. Nigam, K., McCallum, A., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39, 103–134 (2000)

    Article  Google Scholar 

  11. Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: KDD (2014)

    Google Scholar 

  12. Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW (2008)

    Google Scholar 

  13. Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 498–514. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_30

    Chapter  Google Scholar 

  14. Song, Y., Roth, D.: On dataless hierarchical text classification. In: AAAI (2014)

    Google Scholar 

  15. Türker, R., Zhang, L., Koutraki, M., Sack, H.: Knowledge-based short text categorization using entity and category embedding. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 346–362. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_23

    Chapter  Google Scholar 

  16. Wang, C., Song, Y., Li, H., Zhang, M., Han, J.: Text classification with heterogeneous information network kernels. In: AAAI (2016)

    Google Scholar 

  17. Wang, J., Wang, Z., Zhang, D., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: IJCAI (2017)

    Google Scholar 

  18. Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansionusing word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174, 806–814 (2016)

    Article  Google Scholar 

  19. Xuan, J., Jiang, H., Ren, Z., Yan, J., Luo, Z.: Automatic bug triage using semi-supervised text classification. In: SEKE (2010)

    Google Scholar 

  20. Zhang, X., LeCun, Y.: Text understanding from scratch. CoRR (2015)

    Google Scholar 

Download references

Acknowledgement

This thesis is supervised by Prof. Harald Sack and Dr. Lei Zhang.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rima Türker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Türker, R. (2019). Knowledge-Based Dataless Text Categorization. In: Hitzler, P., et al. The Semantic Web: ESWC 2019 Satellite Events. ESWC 2019. Lecture Notes in Computer Science(), vol 11762. Springer, Cham. https://doi.org/10.1007/978-3-030-32327-1_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32327-1_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32326-4

  • Online ISBN: 978-3-030-32327-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics