research-article

Lingua Franca – Entity-Aware Machine Translation Approach for Question Answering over Knowledge Graphs

Authors:

Nikit Srivastava,

Aleksandr Perevalov,

Denis Kuchelev,

Diego Moussallem,

Axel-Cyrille Ngonga Ngomo,

Andreas BothAuthors Info & Claims

K-CAP '23: Proceedings of the 12th Knowledge Capture Conference 2023

Pages 122 - 130

https://doi.org/10.1145/3587259.3627567

Published: 05 December 2023 Publication History

Abstract

This research paper proposes an approach called Lingua Franca that improves machine translation quality by utilizing information from a knowledge graph to translate named entities accurately. The accurate entity translation is crucial when applied to entity-oriented search including Knowledge Graph Question Answering systems. In a nutshell, the approach preserves recognized named entities with an entity-replacement technique during the translation process. It replaces the entities back with their labels found in a knowledge graph for the target language to ensure that questions are translated correctly before answering them using a Knowledge Graph Question Answering system. The paper also introduces an open-source modular framework that enables researchers to design their own named entity-aware machine translation pipelines. The presented experimental results demonstrate the effectiveness of the Lingua Franca approach in comparison to baseline Machine Translation models. The approach shows a statistically significant improvement in the quality provided by several Knowledge Graph Question Answering systems using Lingua Franca on different datasets.

References

[1]

Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. FLAIR: An easy-to-use framework for state-of-the-art NLP. In NAACL 2019, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 54–59.

[2]

Nikita Bhutani, Xinyi Zheng, and H V Jagadish. 2019. Learning to Answer Complex Questions over Knowledge Bases with Query Composition. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (Beijing, China) (CIKM ’19). Association for Computing Machinery, New York, NY, USA, 739–748. https://doi.org/10.1145/3357384.3358033

Digital Library

[3]

Andreas Both, Dennis Diefenbach, Kuldeep Singh, Saedeeh Shekarpour, Didier Cherix, and Christoph Lange. 2016. Qanary–a methodology for vocabulary-driven open question answering systems. In European Semantic Web Conference. Springer, 625–641.

[4]

Nicola De Cao, Ledell Wu, Kashyap Popat, Mikel Artetxe, Naman Goyal, Mikhail Plekhanov, Luke Zettlemoyer, Nicola Cancedda, Sebastian Riedel, and Fabio Petroni. 2021. Multilingual Autoregressive Entity Linking. CoRR abs/2103.12528 (2021). arXiv:2103.12528https://arxiv.org/abs/2103.12528

[5]

Dennis Diefenbach, A. Both, K. Singh, and P. Maret. 2020. Towards a Question Answering System over the Semantic Web. Semantic Web 11 (2020), 421–439. https://doi.org/10.3233/SW-190343

Digital Library

[6]

Paolo Ferragina and Ugo Scaiella. 2010. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM international conference on Information and knowledge management. 1625–1628.

Digital Library

[7]

David Freedman, Robert Pisani, and Roger Purves. 2007. Statistics (international student edition). Pisani, R. Purves, 4th edn. WW Norton & Company, New York (2007).

[8]

Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. 2020. spaCy: Industrial-strength Natural Language Processing in Python. (2020). https://doi.org/10.5281/zenodo.1212303

[9]

Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019. Knowledge Graph Embedding Based Question Answering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (Melbourne VIC, Australia) (WSDM ’19). Association for Computing Machinery, New York, NY, USA, 105–113. https://doi.org/10.1145/3289600.3290956

Digital Library

[10]

Zhongwei Li, Xuancong Wang, AiTi Aw, Eng Siong Chng, and Haizhou Li. 2018. Named-entity tagging and domain adaptation for better customized translation. In Proceedings of the seventh named entities workshop. 41–46.

[11]

Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, and Luke Zettlemoyer. 2020. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics 8 (2020), 726–742.

[12]

Ekaterina Loginova, Stalin Varanasi, and Günter Neumann. 2020. Towards End-to-End Multilingual Question Answering. Information Systems Frontiers (ISF) 22 (3 2020), 1–14.

[13]

Minh-Thang Luong, Ilya Sutskever, Quoc V Le, Oriol Vinyals, and Wojciech Zaremba. 2014. Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206 (2014).

[14]

Pablo N Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. 2011. DBpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th international conference on semantic systems. 1–8.

Digital Library

[15]

Marc Miquel-Ribé and David Laniado. 2018. Wikipedia culture gap: quantifying content imbalances across 40 language editions. Frontiers in physics 6 (2018), 54.

[16]

Maciej Modrzejewski, Miriam Exel, Bianka Buschbeck, Thanh-Le Ha, and Alex Waibel. 2020. Incorporating external annotation to improve named entity translation in NMT. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. 45–51.

[17]

Andrea Moro, Francesco Cecconi, and Roberto Navigli. 2014. Multilingual Word Sense Disambiguation and Entity Linking for Everybody. In Proceedings of the 13th International Semantic Web Conference, Posters and Demonstrations (ISWC 2014). Riva del Garda, Italy, 25–28.

[18]

Pedro Mota, Vera Cabarrão, and Eduardo Farah. 2022. Fast-Paced Improvements to Named Entity Handling for Neural Machine Translation. In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation. European Association for Machine Translation, Ghent, Belgium, 141–149. https://aclanthology.org/2022.eamt-1.17

[19]

Diego Moussallem, Paramjot Kaur, Thiago Ferreira, Chris van der Lee, Anastasia Shimorina, Felix Conrads, Michael Röder, René Speck, Claire Gardent, Simon Mille, Nikolai Ilinykh, and Axel-Cyrille Ngonga Ngomo. 2020. A General Benchmarking Framework for Text Generation. In Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+). Association for Computational Linguistics, Dublin, Ireland (Virtual), 27–33. https://aclanthology.org/2020.webnlg-1.3

[20]

Diego Moussallem, Axel-Cyrille Ngonga Ngomo, Paul Buitelaar, and Mihael Arcan. 2019. Utilizing knowledge graphs for neural machine translation augmentation. (2019), 139–146.

[21]

Diego Moussallem, Tommaso Soru, and Axel-Cyrille Ngonga Ngomo. 2019. THOTH: neural translation and enrichment of knowledge graphs. In International Semantic Web Conference. Springer, 505–522.

Digital Library

[22]

Diego Moussallem, Ricardo Usbeck, Michael Röder, and Axel-Cyrille Ngonga Ngomo. 2018. Entity linking in 40 languages using MAG. In European Semantic Web Conference. Springer, 176–181.

Digital Library

[23]

Diego Moussallem, Ricardo Usbeck, Michael Röeder, and Axel-Cyrille Ngonga Ngomo. 2017. MAG: A Multilingual, Knowledge-Base Agnostic and Deterministic Entity Linking Approach. In Proceedings of the Knowledge Capture Conference (Austin, TX, USA) (K-CAP 2017). Association for Computing Machinery, New York, NY, USA, Article 9, 8 pages. https://doi.org/10.1145/3148011.3148024

Digital Library

[24]

NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia-Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. 2022. No Language Left Behind: Scaling Human-Centered Machine Translation. (2022).

[25]

Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, and James R. Curran. 2013. Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence 194 (2013), 151–175. https://doi.org/10.1016/j.artint.2012.03.006 Artificial Intelligence, Wikipedia and Semi-Structured Resources.

Digital Library

[26]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.

[27]

Aleksandr Perevalov, Andreas Both, Dennis Diefenbach, and Axel-Cyrille Ngonga Ngomo. 2022. Can Machine Translation be a Reasonable Alternative for Multilingual Question Answering Systems over Knowledge Graphs?. In Proceedings of the ACM Web Conference 2022. 977–986.

Digital Library

[28]

Aleksandr Perevalov, Dennis Diefenbach, Ricardo Usbeck, and Andreas Both. 2022. QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers. In 2022 IEEE 16th International Conference on Semantic Computing (ICSC). IEEE, 229–234.

[29]

Maja Popović. 2016. chrF deconstructed: beta parameters and n-gram weights. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 499–504.

[30]

Nikolay Radoev, Amal Zouaq, Mathieu Tremblay, and Michel Gagnon. 2018. A Language Adaptive Method for Question Answering on French and English. In Semantic Web Challenges, Davide Buscaldi, Aldo Gangemi, and Diego Reforgiato Recupero (Eds.). Springer International Publishing, Cham, 98–113.

[31]

Ahmad Sakor, Kuldeep Singh, Anery Patel, and Maria-Esther Vidal. 2020. Falcon 2.0: An entity and relation linking tool over wikidata. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 3141–3148.

Digital Library

[32]

Saeedeh Shekarpour, Edgard Marx, Axel-Cyrille Ngonga Ngomo, and Sören Auer. 2015. Sina: Semantic interpretation of user queries for question answering on interlinked data. Journal of Web Semantics 30 (2015), 39–51.

Digital Library

[33]

Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. 223–231.

[34]

Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, and Angela Fan. 2020. Multilingual Translation with Extensible Multilingual Pretraining and Finetuning. https://doi.org/10.48550/ARXIV.2008.00401

[35]

Simone Tedeschi, Valentino Maiorca, Niccolò Campolungo, Francesco Cecconi, and Roberto Navigli. 2021. WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, 2521–2533. https://doi.org/10.18653/v1/2021.findings-emnlp.215

[36]

Jörg Tiedemann and Santhosh Thottingal. 2020. OPUS-MT — Building open translation services for the World. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT). Lisbon, Portugal.

[37]

Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003. 142–147. https://aclanthology.org/W03-0419

[38]

Arata Ugawa, Akihiro Tamura, Takashi Ninomiya, Hiroya Takamura, and Manabu Okumura. 2018. Neural Machine Translation Incorporating Named Entity. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, 3240–3250. https://aclanthology.org/C18-1274

[39]

Ricardo Usbeck, Michael Röder, Axel-Cyrille Ngonga Ngomo, Ciro Baron, Andreas Both, Martin Brümmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix, Bernd Eickmann, 2015. GERBIL: general entity annotator benchmarking framework. In Proceedings of the 24th international conference on World Wide Web. 1133–1143.

Digital Library

[40]

Ricardo Usbeck, Xi Yan, Aleksandr Perevalov, Longquan Jiang, Julius Schulz, Angelie Kraft, Cedric Möller, Junbo Huang, Jan Reineke, Axel-Cyrille Ngonga Ngomo, Muhammad Saleem, and Andreas Both. 2023. QALD-10 — The 10th Challenge on Question Answering over Linked Data. Under review in the Semantic Web Journal (02 2023). https://www.semantic-web-journal.net/system/files/swj3357.pdf

[41]

Frank Wilcoxon. 1992. Individual comparisons by ranking methods. In Breakthroughs in statistics. Springer, 196–202.

[42]

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019).

[43]

Yang Zhao, Lu Xiang, Junnan Zhu, Jiajun Zhang, Yu Zhou, and Chengqing Zong. 2020. Knowledge Graph Enhanced Neural Machine Translation via Multi-task Learning on Sub-entity Granularity. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 4495–4505. https://doi.org/10.18653/v1/2020.coling-main.397

[44]

Yang Zhao, Jiajun Zhang, Yu Zhou, and Chengqing Zong. 2021. Knowledge graphs enhanced neural machine translation. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 4039–4045.

[45]

Yucheng Zhou, Xiubo Geng, Tao Shen, Wenqiang Zhang, and Daxin Jiang. 2021. Improving Zero-Shot Cross-lingual Transfer for Multilingual Question Answering over Knowledge Graph. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 5822–5834. https://doi.org/10.18653/v1/2021.naacl-main.465

[46]

Kelly H Zou, Kemal Tuncali, and Stuart G Silverman. 2003. Correlation and simple linear regression. Radiology 227, 3 (2003), 617–628.

Cited By

Perevalov ABoth ANgonga Ngomo A(2024)Multilingual question answering systems for knowledge graphs – a surveySemantic Web10.3233/SW-24363315:5(2089-2124)Online publication date: 9-Oct-2024
https://doi.org/10.3233/SW-243633

Index Terms

Lingua Franca – Entity-Aware Machine Translation Approach for Question Answering over Knowledge Graphs
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Question answering
    2. Specialized information retrieval
      1. Structure and multilingual text search
        Multilingual and cross-lingual retrieval

Recommendations

Can Machine Translation be a Reasonable Alternative for Multilingual Question Answering Systems over Knowledge Graphs?
WWW '22: Proceedings of the ACM Web Conference 2022

Providing access to information is the main and most important purpose of the Web. However, despite available easy-to-use tools (e.g., search engines, chatbots, question answering) the accessibility is typically limited by the capability of using the ...
Knowledge Graph Question Answering with Ambiguous Query
WWW '23: Proceedings of the ACM Web Conference 2023

Knowledge graph question answering aims to identify answers of the query according to the facts in the knowledge graph. In the vast majority of the existing works, the input queries are considered perfect and can precisely express the user’s query ...
End-to-end entity-aware neural machine translation
Abstract
Accurate translation of entities (e.g., person names, organizations, geography) is important in neural machine translation (briefly, NMT), as they are usually more difficult to translate than other words, and an incorrect translation of them will ... $_{}$

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

K-CAP '23: Proceedings of the 12th Knowledge Capture Conference 2023

December 2023

270 pages

ISBN:9798400701412

DOI:10.1145/3587259

Editors:
Brent Venable
University of West Florida and Institute for Human and Machine Cognition, Pensacola, FL, USA
,
Daniel Garijo
Ontology Engineering Group, Universidad Politécnica de Madrid, Spain
,
Brian Jalaian
University of West Florida and Institute for Human & Machine Cognition, Pensacola, FL, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

BMBF
MKW NRW

Conference

K-CAP '23

Sponsor:

SIGAI

K-CAP '23: Knowledge Capture Conference 2023

December 5 - 7, 2023

FL, Pensacola, USA

Acceptance Rates

Overall Acceptance Rate 55 of 198 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
102
Total Downloads

Downloads (Last 12 months)66
Downloads (Last 6 weeks)7

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Perevalov ABoth ANgonga Ngomo A(2024)Multilingual question answering systems for knowledge graphs – a surveySemantic Web10.3233/SW-24363315:5(2089-2124)Online publication date: 9-Oct-2024
https://doi.org/10.3233/SW-243633

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten