Skip to main content

The Impact of Supercategory Inclusion on Semantic Classifier Performance

  • Conference paper
  • First Online:
Intelligent Systems in Industrial Applications (ISMIS 2020)

Abstract

It is a known phenomenon that text document classifiers may benefit from inclusion of hypernyms of the terms in the document. However, this inclusion may be a mixed blessing because it may fuzzify the boundaries between document classes [5, 6, 10].

We have elaborated a new type of document classifiers, so called semantic classifiers, trained not on the original data but rather on the categories assigned to the document by our semantic categorizer [1, 4], that require significantly smaller corpus of training data and outperforms traditional classifiers used in the domain.

With this research we want to clarify what is the advantage/disadvantage of using supercategories of the assigned categories (an analogon of hypernyms) on the quality of classification. In particular we concluded that supercategories should be added with restricted weight, for otherwise they may deteriorate the classification performance. We found also that our technique of aggregating the categories counteracts the fuzzifying of class boundaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.nlm.nih.gov/mesh/.

References

  1. Borkowski, P.: Metody semantycznej kategoryzacji w zadaniach analizy dokumentów tekstowych. Ph.D. thesis, Institute of Computer Science of Polish Academy of Sciences (2019)

    Google Scholar 

  2. Borkowski, P., Ciesielski, K., Klopotek, M.A.: Unsupervised aggregation of categories for document labelling. In: Foundations of Intelligent Systems - 21st International Symposium. ISMIS 2014, Roskilde, Denmark, 25–27 June 2014. Proceedings, pp. 335–344 (2014)

    Google Scholar 

  3. Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by combining semantic concepts with unsupervised statistical learning. In: Sheth, A., et al. (eds.) The Semantic Web - ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Berlin (2008)

    Google Scholar 

  4. Ciesielski, K., Borkowski, P., Klopotek, M.A., Trojanowski, K., Wysocki, K.: Wikipedia-based document categorization. In: SIIS 2011, pp. 265–278 (2011)

    Google Scholar 

  5. Huang, Z., Thint, M., Qin, Z.: Question classification using head words and their hypernyms. In: EMNLP 2008: Proceedings of the Conference on Empirical Methods in Natural Language, pp. 927–936, October 2008

    Google Scholar 

  6. Li, X., Roth, D.: Learning question classifiers. In: The 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7 (2002)

    Google Scholar 

  7. Nguyen, C.T.: Bridging semantic gaps in information retrieval: context-based approaches. ACM VLDB 10 (2010)

    Google Scholar 

  8. Rafi, M., Hassan, S., Shaikh, M.S.: Content-based text categorization using wikitology. CoRR abs/1208.3623 (2012)

    Google Scholar 

  9. Ramakrishna Murty, M., Murthy, J., Prasad Reddy, P., Satapathy, S.: A survey of cross-domain text categorization techniques. In: RAIT 2012, pp. 499–504. IEEE (2012)

    Google Scholar 

  10. Scott, S., Matwin, S.: Text classification using wordnet hypernyms. In: Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference, pp. 38–44, 45–52. Association for Computational Linguistics (1998)

    Google Scholar 

  11. Wang, P., Domeniconi, C., Hu, J.: Using Wikipedia for co-clustering based cross-domain text classification. In: ICDM 2008, pp. 1085–1090. IEEE (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Piotr Borkowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Borkowski, P., Ciesielski, K., Kłopotek, M.A. (2021). The Impact of Supercategory Inclusion on Semantic Classifier Performance. In: Stettinger, M., Leitner, G., Felfernig, A., Ras, Z.W. (eds) Intelligent Systems in Industrial Applications. ISMIS 2020. Studies in Computational Intelligence, vol 949. Springer, Cham. https://doi.org/10.1007/978-3-030-67148-8_6

Download citation

Publish with us

Policies and ethics