Training-Less Multi-label Text Classification Using Knowledge Bases and Word Embeddings

Alkhatib, Wael; Schnitzer, Steffen; Rensing, Christoph

doi:10.1007/978-3-030-29563-9_10

Wael Alkhatib¹¹,
Steffen Schnitzer¹¹ &
Christoph Rensing¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11776))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1306 Accesses
1 Citations

Abstract

Traditional multi-label text classifiers suffer from the high dimensionality of feature space, label imbalance, and training overhead. In this work, we depart from traditional approaches with intensive feature engineering and linguistic analysis by introducing a novel ontology-based training-less multi-label text classifier. We transform the classification task into a graph matching problem to have a training-less classifier. The experiment results, using the EUR-Lex dataset, proved that our method offers competitive performance with respect to the above-mentioned approaches in terms of $F1_{macro}$ giving fair performance over the different labels despite of the training-less configurations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards Ontology-Based Training-Less Multi-label Text Classification

A Hierarchical Fine-Tuning Approach Based on Joint Embedding of Words and Parent Categories for Hierarchical Multi-label Text Classification

Large scale biomedical texts classification: a kNN and an ESA-based approaches

Article Open access 16 June 2016

References

Alkhatib, W., Rensing, C., Silberbauer, J.: Multi-label text classification using semantic features and dimensionality reduction with autoencoders. In: Gracia, J., Bond, F., McCrae, J.P., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds.) LDK 2017. LNCS (LNAI), vol. 10318, pp. 380–394. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59888-8_32
Chapter Google Scholar
Alkhatib, W., Sabrin, S., Neitzel, S., Rensing, C.: Towards ontology-based training-less multi-label text classification. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds.) NLDB 2018. LNCS, vol. 10859, pp. 389–396. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91947-8_40
Chapter Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on Computational linguistics-Volume 2, pp. 539–545. Association for Computational Linguistics (1992)
Google Scholar
Janik, M.G.: Training-less ontology-based text categorization. Ph.D. thesis, UGA (2008)
Google Scholar
Mahdisoltani, F., Biega, J., Suchanek, F.: Yago3: a knowledge base from multilingual wikipedias. In: 7th Biennial Conference on Innovative Data Systems Research, CIDR Conference (2014)
Google Scholar
Miller, G.A.: WordNet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Speer, R., Havasi, C.: Representing general relational knowledge in ConceptNet 5. In: LREC, pp. 3679–3686 (2012)
Google Scholar
Uschold, M., King, M., Moralee, S., Zorgios, Y.: The enterprise ontology. Knowl. Eng. Rev. 13(1), 31–89 (1998)
Article Google Scholar
Zhou, P., El-Gohary, N.: Ontology-based multilabel text classification of construction regulatory documents. J. Comput. Civil Eng. 30(4), 04015058 (2015)
Article Google Scholar

Download references

Acknowledgment

This work has been co-funded by the German Federal Ministry of Education and Research (BMBF) within in the framework of the Software Campus project “PIOBRec” [01IS17050].

Author information

Authors and Affiliations

Communication Multimedia Lab, TU Darmstadt, Rundeturmstr. 10, 64283, Darmstadt, Germany
Wael Alkhatib, Steffen Schnitzer & Christoph Rensing

Authors

Wael Alkhatib
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Schnitzer
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Rensing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wael Alkhatib .

Editor information

Editors and Affiliations

University of Piraeus, Piraeus, Greece
Christos Douligeris
University of Vienna, Vienna, Austria
Dimitris Karagiannis
University of Piraeus, Piraeus, Greece
Dimitris Apostolou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alkhatib, W., Schnitzer, S., Rensing, C. (2019). Training-Less Multi-label Text Classification Using Knowledge Bases and Word Embeddings. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds) Knowledge Science, Engineering and Management. KSEM 2019. Lecture Notes in Computer Science(), vol 11776. Springer, Cham. https://doi.org/10.1007/978-3-030-29563-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-29563-9_10
Published: 22 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29562-2
Online ISBN: 978-3-030-29563-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics