Using Graphs and Semantic Information to Improve Text Classifiers

Das, Nibaran; Ghosh, Swarnendu; Gonçalves, Teresa; Quaresma, Paulo

doi:10.1007/978-3-319-10888-9_33

Nibaran Das²⁰,
Swarnendu Ghosh²⁰,
Teresa Gonçalves²¹ &
…
Paulo Quaresma^21,22

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8686))

Included in the following conference series:

International Conference on Natural Language Processing

2018 Accesses
2 Citations

Abstract

Text classification using semantic information is the latest trend of research due to its greater potential to accurately represent text content compared with bag-of-words (BOW) approaches. On the other hand, representation of semantics through graphs has several advantages over the traditional representation of feature vector. Therefore, error tolerant graph matching techniques can be used for text classification. Nevertheless, very few methodologies exist in the literature which use semantic representation through graphs. In the present work, a methodology has been proposed to represent semantic information from a summarized text into a graph. The discourse representation structure of a text is utilized in order to represent its semantic content and, afterwards, it is transformed into a graph. Five different graph matching techniques based on Maximum Common Subgraphs (mcs) and Minimum Common Supergraphs (MCS) are evaluated on 20 classes from the Reuters dataset taking 10 docs of each class for both training and testing purposes using the k-NN classifier. From the results it can be observed that the technique has potential to perform text classification as well as the traditional BOW approaches. Moreover a majority voting based combination of the semantic representation and a traditional BOW approach provided an improved recognition accuracy on the same data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Angelova, R., Weikum, G.: Graph-based text classification: Learn from your neighbors. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 485–492. ACM, New York (2006)
Chapter Google Scholar
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly, Beijing (2009), http://www.nltk.org/book
Google Scholar
Bleik, S., Mishra, M., Huan, J., Song, M.: Text categorization of biomedical data sets using graph kernels and a controlled vocabulary. IEEE/ACM Trans. Comput. Biology Bioinform. 10(5), 1211–1217 (2013)
Article Google Scholar
Bos, J.: Wide-coverage semantic analysis with Boxer. In: Proceedings of the 2008 Conference on Semantics in Text Processing, STEP 2008, pp. 277–286. Association for Computational Linguistics, Stroudsburg (2008)
Chapter Google Scholar
Bunke, H., Foggia, P., Guidobaldi, C., Sansone, C., Vento, M.: A comparison of algorithms for maximum common subgraph on randomly connected graphs. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SSPR&SPR 2002. LNCS, vol. 2396, pp. 123–132. Springer, Heidelberg (2002)
Google Scholar
Curran, J.R., Clark, S., Bos, J.: Linguistically Motivated Large-Scale NLP with C&C and Boxer. In: Carroll, J.A., van den Bosch, A., Zaenen, A. (eds.) ACL. The Association for Computational Linguistics (2007)
Google Scholar
Himsolt, M.: GML: A portable Graph File Format. Tech. rep. Universität Passau, 94030 Passau, Germany (1999), http://www.infosun.fim.uni-passau.de/Graphlet/GML/gml-tr.html
Kamp, H., Reyle, U.: From Discourse to Logic: Introduction to Model-theoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Studies in Linguistics and Philosophy, vol. 42. Kluwer, Dordrecht (1993)
Google Scholar
Riesen, K., Bunke, H.: Graph Classification and Clustering Based on Vector Space Embedding. World Scientific Publishing Co., Inc., River Edge (2010)
MATH Google Scholar
Wang, Z., Liu, Z.: Graph-based knn text classification. In: 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), vol. 5, pp. 2363–2366 (2010)
Google Scholar
Zhang, L., Li, Y., Sun, C., Nadee, W.: Rough set based approach to text classification. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 245–252 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Engineering, Jadavpur University, Kolkata, India
Nibaran Das & Swarnendu Ghosh
Dept. of Computer Science, School of S&T, University of Évora, Évora, Portugal
Teresa Gonçalves & Paulo Quaresma
L2F - Spoken Language Systems Laboratory, INESC-ID, Lisbon, Portugal
Paulo Quaresma

Authors

Nibaran Das
View author publications
You can also search for this author in PubMed Google Scholar
Swarnendu Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Quaresma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248, Warsaw, Poland
Adam Przepiórkowski & Maciej Ogrodniczuk &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Das, N., Ghosh, S., Gonçalves, T., Quaresma, P. (2014). Using Graphs and Semantic Information to Improve Text Classifiers. In: Przepiórkowski, A., Ogrodniczuk, M. (eds) Advances in Natural Language Processing. NLP 2014. Lecture Notes in Computer Science(), vol 8686. Springer, Cham. https://doi.org/10.1007/978-3-319-10888-9_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-10888-9_33
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10887-2
Online ISBN: 978-3-319-10888-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics