On a Novel Representation of Multiple Textual Documents in a Single Graph

Giarelis, Nikolaos; Kanakaris, Nikos; Karacapilidis, Nikos

doi:10.1007/978-981-15-5925-9_9

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 193))

Included in the following conference series:

International Conference on Intelligent Decision Technologies

514 Accesses
8 Citations

Abstract

This paper introduces a novel approach to represent multiple documents as a single graph, namely, the graph-of-docs model, together with an associated novel algorithm for text categorization. The proposed approach enables the investigation of the importance of a term into a whole corpus of documents and supports the inclusion of relationship edges between documents, thus enabling the calculation of important metrics as far as documents are concerned. Compared to well-tried existing solutions, our initial experimentations demonstrate a significant improvement of the accuracy of the text categorization process. For the experimentations reported in this paper, we used a well-known dataset containing about 19,000 documents organized in various subjects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, C.C.: Machine Learning for Text. Springer (2018)
Google Scholar
Armenatzoglou, N., Pham, H., Ntranos, V., Papadias, D., Shahabi, C.: Real-time multi-criteria social graph partitioning: a game theoretic approach. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1617–1628, ACM Press (2015)
Google Scholar
Blanco, R., Lioma, C.: Graph-based term weighting for information retrieval. Inf. Retr. 15(1), 54–92 (2012)
Article Google Scholar
Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 834–838 (2013)
Google Scholar
Bougouin, A., Boudin, F., Daille, B.: Topicrank: graph-based topic ranking for keyphrase extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 543–551 (2013)
Google Scholar
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)
Article MathSciNet Google Scholar
Kanterakis, A., Iatraki, G., Pityanou, K., Koumakis, L., Kanakaris, N., Karacapilidis, N., Potamias, G.: Towards reproducible bioinformatics: the OpenBio-C scientific workflow environment. In: Proceedings of the 19th IEEE International Conference on Bioinformatics and Bioengineering (BIBE), pp. 221–226, Athens, Greece (2019)
Google Scholar
Karacapilidis, N., Papadias, D., Gordon, T., Voss, H.: Collaborative environmental planning with GeoMed. Eur. J. Oper. Res. Spec. Issue Environ. Plan. 102(2), 335–346 (1997)
Article Google Scholar
Karacapilidis, N., Tzagarakis, M., Karousos, N., Gkotsis, G., Kallistros, V., Christodoulou, S., Mettouris, C., Nousia, D.: Tackling cognitively-complex collaboration with CoPe_it! Int. J. Web-Based Learn Teach. Technol 4(3), 22–38 (2009)
Article Google Scholar
Landherr, A., Friedl, B., Heidemann, J.: A critical review of centrality measures in social networks. Bus Inf. Syst. Eng. 2(6), 371–385 (2010)
Article Google Scholar
Lu, H., Halappanavar, M., Kalyanaraman, A.: Parallel heuristics for scalable community detection. Parallel Comput. 47, 19–37 (2015)
Article MathSciNet Google Scholar
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Google Scholar
Miller, J.J.: Graph database applications and concepts with Neo4j. In: Proceedings of the Southern Association for Information Systems Conference, vol. 2324, no. 36, Atlanta, GA, USA (2013)
Google Scholar
Monge, A., Elkan, C.: An efficient domain-independent algorithm for detecting approximately duplicate database records (1997)
Google Scholar
Nijssen, S., Kok, J. N.: A quickstart in frequent structure mining can make a difference. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 647–652, ACM Press (2004)
Google Scholar
Nikolentzos, G., Meladianos, P., Rousseau, F., Stavrakas, Y., Vazirgiannis, M.: Shortest-path graph kernels for document similarity. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1890–1900 (2017)
Google Scholar
Nikolentzos, G., Siglidis, G., Vazirgiannis, M.: Graph Kernels: a survey. arXiv preprint arXiv:1904.12218 (2019)
Ohsawa, Y., Benson, N. E., Yachida, M.: KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries, pp. 12–18, IEEE Press (1998)
Google Scholar
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3) (2007)
Google Scholar
Rawat, D.S., Kashyap, N.K.: Graph database: a complete GDBMS survey. Int. J. 3, 217–226 (2017)
Google Scholar
Rousseau, F., Kiagias, E., Vazirgiannis, M.: Text categorization as a graph classification problem. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1, pp. 1702–1712 (2015)
Google Scholar
Rousseau, F., Vazirgiannis, M.: Graph-of-word and TW-IDF: new approach to ad hoc IR. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 59–68, ACM (2013)
Google Scholar
Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T., Tsuda, K.: gBoost: a mathematical programming approach to graph classification and regression. Mach. Learn. 75(1), 69–89 (2009)
Article Google Scholar
Seidman, S.B.: Network structure and minimum degree. Soc. Netw. 5(3), 269–287 (1983)
Article MathSciNet Google Scholar
Siglidis, G., Nikolentzos, G., Limnios, S., Giatsidis, C., Skianis, K., Vazirgianis, M.: Grakel: a graph kernel library in python. arXiv preprint arXiv:1806.02193 (2018)
Sonawane, S.S., Kulkarni, P.A.: Graph based representation and analysis of text document: a survey of techniques. Int. J. Comput. Appl. 96(19) (2014)
Google Scholar
Tixier, A., Malliaros, F., Vazirgiannis, M.: A graph degeneracy-based approach to keyword extraction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1860–1870 (2016)
Google Scholar
Wang, W., Wang, C., Zhu, Y., Shi, B., Pei, J., Yan, X., Han, J.: Graphminer: a structural pattern-mining system for large disk-based graph databases and its applications. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 879–881. ACM Press (2005)
Google Scholar
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: Proceedings of the IEEE International Conference on Data Mining, pp. 721–724. IEEE Press (2002)
Google Scholar
Yang, Z., Algesheimer, R., Tessone, C.J.: A comparative analysis of community detection algorithms on artificial networks. Sci. Rep. 6, 30750. https://doi.org/10.1038/srep30750 (2016)

Download references

Acknowledgements

The work presented in this paper is supported by the OpenBio-C project (www.openbio.eu), which is co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH—CREATE—INNOVATE (Project id: T1EDK-05275).

Author information

Authors and Affiliations

Industrial Management and Information Systems Lab, MEAD, University of Patras, 26504, Rio Patras, Greece
Nikolaos Giarelis, Nikos Kanakaris & Nikos Karacapilidis

Authors

Nikolaos Giarelis
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Kanakaris
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Karacapilidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikos Karacapilidis .

Editor information

Editors and Affiliations

Gdynia Maritime University, Gdynia, Poland
Ireneusz Czarnowski
KES International Research, UK
Robert J. Howlett
Faculty of Engineering and Information Technology, Centre for Artificial Intelligence, University of Technology Sydney, Sydney, NSW, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Giarelis, N., Kanakaris, N., Karacapilidis, N. (2020). On a Novel Representation of Multiple Textual Documents in a Single Graph. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies. IDT 2020. Smart Innovation, Systems and Technologies, vol 193. Springer, Singapore. https://doi.org/10.1007/978-981-15-5925-9_9

Download citation

DOI: https://doi.org/10.1007/978-981-15-5925-9_9
Published: 12 June 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5924-2
Online ISBN: 978-981-15-5925-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics