Skip to main content

Training-Less Multi-label Text Classification Using Knowledge Bases and Word Embeddings

  • Conference paper
  • First Online:
Book cover Knowledge Science, Engineering and Management (KSEM 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11776))

Abstract

Traditional multi-label text classifiers suffer from the high dimensionality of feature space, label imbalance, and training overhead. In this work, we depart from traditional approaches with intensive feature engineering and linguistic analysis by introducing a novel ontology-based training-less multi-label text classifier. We transform the classification task into a graph matching problem to have a training-less classifier. The experiment results, using the EUR-Lex dataset, proved that our method offers competitive performance with respect to the above-mentioned approaches in terms of \(F1_{macro}\) giving fair performance over the different labels despite of the training-less configurations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alkhatib, W., Rensing, C., Silberbauer, J.: Multi-label text classification using semantic features and dimensionality reduction with autoencoders. In: Gracia, J., Bond, F., McCrae, J.P., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds.) LDK 2017. LNCS (LNAI), vol. 10318, pp. 380–394. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59888-8_32

    Chapter  Google Scholar 

  2. Alkhatib, W., Sabrin, S., Neitzel, S., Rensing, C.: Towards ontology-based training-less multi-label text classification. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds.) NLDB 2018. LNCS, vol. 10859, pp. 389–396. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91947-8_40

    Chapter  Google Scholar 

  3. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)

  4. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on Computational linguistics-Volume 2, pp. 539–545. Association for Computational Linguistics (1992)

    Google Scholar 

  5. Janik, M.G.: Training-less ontology-based text categorization. Ph.D. thesis, UGA (2008)

    Google Scholar 

  6. Mahdisoltani, F., Biega, J., Suchanek, F.: Yago3: a knowledge base from multilingual wikipedias. In: 7th Biennial Conference on Innovative Data Systems Research, CIDR Conference (2014)

    Google Scholar 

  7. Miller, G.A.: WordNet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  8. Speer, R., Havasi, C.: Representing general relational knowledge in ConceptNet 5. In: LREC, pp. 3679–3686 (2012)

    Google Scholar 

  9. Uschold, M., King, M., Moralee, S., Zorgios, Y.: The enterprise ontology. Knowl. Eng. Rev. 13(1), 31–89 (1998)

    Article  Google Scholar 

  10. Zhou, P., El-Gohary, N.: Ontology-based multilabel text classification of construction regulatory documents. J. Comput. Civil Eng. 30(4), 04015058 (2015)

    Article  Google Scholar 

Download references

Acknowledgment

This work has been co-funded by the German Federal Ministry of Education and Research (BMBF) within in the framework of the Software Campus project “PIOBRec” [01IS17050].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wael Alkhatib .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alkhatib, W., Schnitzer, S., Rensing, C. (2019). Training-Less Multi-label Text Classification Using Knowledge Bases and Word Embeddings. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds) Knowledge Science, Engineering and Management. KSEM 2019. Lecture Notes in Computer Science(), vol 11776. Springer, Cham. https://doi.org/10.1007/978-3-030-29563-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29563-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29562-2

  • Online ISBN: 978-3-030-29563-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics