research-article

Can Machine Translation be a Reasonable Alternative for Multilingual Question Answering Systems over Knowledge Graphs?

Authors:

Aleksandr Perevalov,

Dennis Diefenbach,

Axel-Cyrille Ngonga NgomoAuthors Info & Claims

WWW '22: Proceedings of the ACM Web Conference 2022

Pages 977 - 986

https://doi.org/10.1145/3485447.3511940

Published: 25 April 2022 Publication History

Abstract

Providing access to information is the main and most important purpose of the Web. However, despite available easy-to-use tools (e.g., search engines, chatbots, question answering) the accessibility is typically limited by the capability of using the English language. This excludes a huge amount of people. In this work, we discuss Knowledge Graph Question Answering (KGQA) systems that aim at providing natural language access to data stored in Knowledge Graphs (KG). While several KGQA systems have been proposed, only very few have dealt with a language other than English. In this work, we follow our research agenda of enabling speakers of any language to access the knowledge stored in KGs. Because of the lack of native support for many languages, we use machine translation (MT) tools to evaluate KGQA systems regarding questions in languages that are unsupported by a KGQA system. In total, our evaluation is based on 8 different languages (including some that never were evaluated before). For the intensive evaluation, we extend the QALD-9 dataset for KGQA with Wikidata queries and high-quality translations. The extension was done in a crowdsourcing manner by native speakers of the different languages. By using multiple KGQA systems for the evaluation, we were enabled to investigate and answer the main research question: “Can MT be an alternative for multilingual KGQA systems?”. The evaluation results demonstrated that the monolingual KGQA systems can be effectively ported to the new languages with MT tools.

References

[1]

Nitish Aggarwal. 2012. Cross Lingual Semantic Search by Improving Semantic Similarity and Relatedness Measures. In Proceedings of the 11th International Conference on The Semantic Web - Volume Part II (Boston, MA) (ISWC’12). Springer-Verlag, Berlin, Heidelberg, 375–382. https://doi.org/10.1007/978-3-642-35173-0_26

Digital Library

[2]

Kisuh Ahn, Beatrice Alex, Johan Bos, Tiphaine Dalmas, Jochen L Leidner, and Matthew B Smillie. 2004. Cross-lingual question answering using off-the-shelf machine translation. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 446–457. https://doi.org/10.1007/11519645_44

Digital Library

[3]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In The Semantic Web, Karl Aberer, Key-Sun Choi, Natasha Noy, Dean Allemang, Kyung-Il Lee, Lyndon Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudré-Mauroux (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 722–735.

[4]

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. Association for Computing Machinery, New York, NY, USA, 1247–1250. https://doi.org/10.1145/1376616.1376746

Digital Library

[5]

Andreas Both, Dennis Diefenbach, Kuldeep Singh, Saedeeh Shekarpour, Didier Cherix, and Christoph Lange. 2016. Qanary–a methodology for vocabulary-driven open question answering systems. In European Semantic Web Conference. Springer International Publishing, Cham, 625–641.

Digital Library

[6]

Andreas Both, Axel-Cyrille Ngonga Ngomo, Ricardo Usbeck, Denis Lukovnikov, Christiane Lemke, and Maximilian Speicher. 2014. A Service-Oriented Search Framework for Full Text, Geospatial and Semantic Search. In Proceedings of the 10th International Conference on Semantic Systems (Leipzig, Germany) (SEM ’14). Association for Computing Machinery, New York, NY, USA, 65–72. https://doi.org/10.1145/2660517.2660528

Digital Library

[7]

Mikhail Burtsev, Alexander Seliverstov, Rafael Airapetyan, Mikhail Arkhipov, Dilyara Baymurzina, Nickolay Bushkov, Olga Gureenkova, Taras Khakhulin, Yuri Kuratov, Denis Kuznetsov, Alexey Litinsky, Varvara Logacheva, Alexey Lymar, Valentin Malykh, Maxim Petrov, Vadim Polulyakh, Leonid Pugachev, Alexey Sorokin, Maria Vikhreva, and Marat Zaynutdinov. 2018. DeepPavlov: Open-Source Library for Dialogue Systems. In Proceedings of ACL 2018, System Demonstrations. Association for Computational Linguistics, Melbourne, Australia, 122–127.

[8]

Casimiro Pio Carrino, Marta Ruiz Costa-Jussà, and José Adrián Rodríguez Fonollosa. 2020. Automatic Spanish translation of SQuAD dataset for multi-lingual question answering. In LREC 2020: 12th International Conference on Language Resources and Evaluation: Marseílle, France: May 13-15, 2020: conference proceedings. European Language Resources Association (ELRA), 5515–5523.

[9]

Ryan Cotterell, Sabrina J. Mielke, Jason Eisner, and Brian Roark. 2018. Are All Languages Equally Hard to Language-Model?. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 536–541. https://doi.org/10.18653/v1/N18-2085

[10]

Ruixiang Cui, Rahul Aralikatte, Heather Lent, and Daniel Hershcovich. 2021. Multilingual Compositional Wikidata Questions. arxiv:2108.03509 [cs.CL]

[11]

Dennis Diefenbach, A. Both, K. Singh, and P. Maret. 2020. Towards a Question Answering System over the Semantic Web. Semantic Web 11(2020), 421–439. https://doi.org/10.3233/SW-190343

Digital Library

[12]

Dennis Diefenbach, Vanessa Lopez, Kamal Singh, and Pierre Maret. 2018. Core techniques of question answering systems over knowledge bases: a survey. Knowledge and Information systems 55, 3 (2018), 529–569. https://doi.org/10.1007/s10115-017-1100-y

Digital Library

[13]

Dennis Diefenbach, Kuldeep Singh, Andreas Both, Didier Cherix, Christoph Lange, and Sören Auer. 2017. The Qanary Ecosystem: Getting New Insights by Composing Question Answering Pipelines. In Web Engineering - 17th International Conference, ICWE 2017, Rome, Italy, June 5-8, 2017, Proceedings(Lecture Notes in Computer Science, Vol. 10360), Jordi Cabot, Roberto De Virgilio, and Riccardo Torlone (Eds.). Springer, 171–189. https://doi.org/10.1007/978-3-319-60131-1_10

[14]

George Doddington. 2002. Automatic Evaluation of Machine Translation Quality Using N-Gram Co-Occurrence Statistics. In Proceedings of the Second International Conference on Human Language Technology Research (San Diego, California) (HLT ’02). Morgan Kaufmann Publishers Inc., 138–145.

Digital Library

[15]

André Freitas, João Gabriel Oliveira, Seán O’Riain, Edward Curry, and João Carlos Pereira Da Silva. 2011. Querying Linked Data Using Semantic Relatedness: A Vocabulary Independent Approach. In Proceedings of the 16th International Conference on Natural Language Processing and Information Systems (Alicante, Spain) (NLDB’11). Springer-Verlag, Berlin, Heidelberg, 40–51.

[16]

Konrad Höffner, Sebastian Walter, Edgard Marx, Ricardo Usbeck, Jens Lehmann, and Axel-Cyrille Ngonga Ngomo. 2016. Survey on Challenges of Question Answering in the Semantic Web. Semantic Web 8 (11 2016). https://doi.org/10.3233/SW-160247

Digital Library

[17]

Daniel Jurafsky and James H. Martin. 2020. Speech and Language Processing (3rd Edition draft).Prentice-Hall, Inc., USA, Chapter Chapter 23: Question Answering.

[18]

Daniel Keysers, Nathanael Schärli, Nathan Scales, Hylke Buisman, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz Stafiniak, Tibor Tihon, Dmitry Tsarkov, Xiao Wang, Marc van Zee, and Olivier Bousquet. 2020. Measuring Compositional Generalization: A Comprehensive Method on Realistic Data. In International Conference on Learning Representations (ICLR). https://openreview.net/pdf?id=SygcCnNKwr

[19]

Ekaterina Loginova, Stalin Varanasi, and Günter Neumann. 2020. Towards End-to-End Multilingual Question Answering. Information Systems Frontiers (ISF) 22 (3 2020), 1–14.

[20]

A. Neves, Andre Lamurias, and F. Couto. 2020. Biomedical Question Answering using Extreme Multi-Label Classification and Ontologies in the Multilingual Panorama. In Semantic Indexing and Information Retrieval for Health Held in conjunction with the 42nd European Conference on Information Retrieval (SIIRH@ECIR).

[21]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.

[22]

Thomas Pellissier Tanon, Marcos Dias de Assunção, Eddy Caron, and Fabian M. Suchanek. 2018. Demoing Platypus – A Multilingual Question Answering Platform for Wikidata. In The Semantic Web: ESWC 2018 Satellite Events, Aldo Gangemi, Anna Lisa Gentile, Andrea Giovanni Nuzzolese, Sebastian Rudolph, Maria Maleshkova, Heiko Paulheim, Jeff Z Pan, and Mehwish Alam (Eds.). Springer International Publishing, Cham, 111–116.

[23]

Aleksandr Perevalov, Dennis Diefenbach, Ricardo Usbeck, and Andreas Both. 2022. QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers. In 2022 IEEE 16th International Conference on Semantic Computing (ICSC). IEEE. arxiv:2202.00120 [cs.CL]

[24]

Amir Pouran Ben Veyseh. 2016. Cross-Lingual Question Answering Using Common Semantic Space. In Proceedings of TextGraphs-10: the Workshop on Graph-based Methods for Natural Language Processing. Association for Computational Linguistics, 15–19. https://doi.org/10.18653/v1/W16-1403

[25]

Ivan Rybin, Vladislav Korablinov, Pavel Efimov, and Pavel Braslavski. 2021. RuBQ 2.0: An Innovated Russian Question Answering Dataset. In The Semantic Web. Springer International Publishing, Cham, 532–547.

[26]

Kuldeep Singh, Arun Sethupat Radhakrishna, Andreas Both, Saeedeh Shekarpour, Ioanna Lytra, Ricardo Usbeck, Akhilesh Vyas, Akmal Khikmatullaev, Dharmen Punjani, Christoph Lange, Maria Esther Vidal, Jens Lehmann, and Sören Auer. 2018. Why Reinvent the Wheel: Let’s Build Question Answering Systems Together. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1247–1256. https://doi.org/10.1145/3178876.3186023

Digital Library

[27]

Daniil Sorokin and Iryna Gurevych. 2018. Modeling semantics with gated graph neural networks for knowledge base question answering. arXiv preprint arXiv:1808.04126(2018).

[28]

Jörg Tiedemann and Santhosh Thottingal. 2020. OPUS-MT — Building open translation services for the World. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT). Lisbon, Portugal.

[29]

Kai Ming Ting. 2010. Precision and Recall. In Encyclopedia of Machine Learning, Claude Sammut and Geoffrey I. Webb (Eds.). Springer, Boston, MA, 781–781. https://doi.org/10.1007/978-0-387-30164-8_652

[30]

Ricardo Usbeck, Ria Hari Gusmita, Axel-Cyrille Ngonga Ngomo, and Muhammad Saleem. 2018. 9th Challenge on Question Answering over Linked Data (QALD-9). In Joint proceedings of the 4th Workshop on Semantic Deep Learning (SemDeep-4) and NLIWoD4: Natural Language Interfaces for the Web of Data (NLIWOD-4) and 9th Question Answering over Linked Data challenge (QALD-9) co-located with 17th International Semantic Web Conference (ISWC 2018), Monterey, California, United States of America, October 8th - 9th, 2018.58–64.

[31]

Ricardo Usbeck, Michael Röder, Axel-Cyrille Ngonga Ngomo, Ciro Baron, Andreas Both, Martin Brümmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix, Bernd Eickmann, Paolo Ferragina, Christiane Lemke, Andrea Moro, Roberto Navigli, Francesco Piccinno, Giuseppe Rizzo, Harald Sack, René Speck, Raphaël Troncy, Jörg Waitelonis, and Lars Wesemann. 2015. GERBIL: General Entity Annotator Benchmarking Framework. In Proceedings of the 24th International Conference on World Wide Web (Florence, Italy) (WWW ’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1133–1143. https://doi.org/10.1145/2736277.2741626

Digital Library

[32]

Martin van Hees. 2015. Web-based automatic translation: the Yandex.Translate API.

[33]

Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A Free Collaborative Knowledgebase. Commun. ACM 57, 10 (Sept. 2014), 78–85. https://doi.org/10.1145/2629489

Digital Library

[34]

Lei Zhang, Maribel Acosta, Michael Färber, Steffen Thoma, and Achim Rettinger. 2017. BreXearch: Exploring Brexit Data Using Cross-Lingual and Cross-Media Semantic Search. In International Semantic Web Conference (Posters, Demos & Industry Tracks).

[35]

Lei Zhang, Michael Färber, and Achim Rettinger. 2016. XKnowSearch! Exploiting Knowledge Bases for Entity-Based Cross-Lingual Information Retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (Indianapolis, Indiana, USA) (CIKM ’16). Association for Computing Machinery, New York, NY, USA, 2425–2428. https://doi.org/10.1145/2983323.2983324

Digital Library

[36]

Yucheng Zhou, Xiubo Geng, Tao Shen, Wenqiang Zhang, and Daxin Jiang. 2021. Improving Zero-Shot Cross-lingual Transfer for Multilingual Question Answering over Knowledge Graph. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 5822–5834. https://doi.org/10.18653/v1/2021.naacl-main.465

[37]

Óscar Ferrández, Christian Spurk, Milen Kouylekov, Iustin Dornescu, Sergio Ferrández, Matteo Negri, Rubén Izquierdo, David Tomás, Constantin Orasan, Guenter Neumann, Bernardo Magnini, and Jose Luis Vicedo. 2011. The QALL-ME Framework: A specifiable-domain multilingual Question Answering architecture. Journal of Web Semantics 9, 2 (2011), 137 – 145.

Cited By

Perevalov ABoth ANgonga Ngomo A(2024)Multilingual question answering systems for knowledge graphs – a surveySemantic Web10.3233/SW-24363315:5(2089-2124)Online publication date: 9-Oct-2024
https://doi.org/10.3233/SW-243633
Mountantonakis MMertzanis LBastakis MTzitzikas Y(2024)A comparative evaluation for question answering over Greek texts by using machine translation and BERTLanguage Resources and Evaluation10.1007/s10579-024-09745-9Online publication date: 19-Jun-2024
https://doi.org/10.1007/s10579-024-09745-9
Wang DFeng XLiu ZWang C(2024)2M-NER: contrastive learning for multilingual and multimodal NER with language and modal fusionApplied Intelligence10.1007/s10489-024-05490-254:8(6252-6268)Online publication date: 9-May-2024
https://dl.acm.org/doi/10.1007/s10489-024-05490-2
Show More Cited By

Index Terms

Can Machine Translation be a Reasonable Alternative for Multilingual Question Answering Systems over Knowledge Graphs?
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
    2. Natural language processing
2. Information systems
  1. Information retrieval

Index terms have been assigned to the content through auto-classification.

Recommendations

Lingua Franca – Entity-Aware Machine Translation Approach for Question Answering over Knowledge Graphs
K-CAP '23: Proceedings of the 12th Knowledge Capture Conference 2023

This research paper proposes an approach called Lingua Franca that improves machine translation quality by utilizing information from a knowledge graph to translate named entities accurately. The accurate entity translation is crucial when applied to ...
Post-Ordering by Parsing with ITG for Japanese-English Statistical Machine Translation

Word reordering is a difficult task for translation between languages with widely different word orders, such as Japanese and English. A previously proposed post-ordering method for Japanese-to-English translation first translates a Japanese sentence ...
Large aligned treebanks for syntax-based machine translation

We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the non-terminal constituent level for use in syntax-based machine translation. We describe how they were constructed and applied to a syntax- ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '22: Proceedings of the ACM Web Conference 2022

April 2022

3764 pages

ISBN:9781450390965

DOI:10.1145/3485447

Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Elena Simperl
King’s College London, UK
,
Deepak Agarwal
Pinterest, USA
,
Aristides Gionis
KTH Royal Institute of Technology, Sweden
,
Ivan Herman
W3C / retired
,
Lionel Médini
Université Lyon 1, France

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 April 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '22

Sponsor:

SIGWEB

WWW '22: The ACM Web Conference 2022

April 25 - 29, 2022

Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
467
Total Downloads

Downloads (Last 12 months)42
Downloads (Last 6 weeks)6

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Perevalov ABoth ANgonga Ngomo A(2024)Multilingual question answering systems for knowledge graphs – a surveySemantic Web10.3233/SW-24363315:5(2089-2124)Online publication date: 9-Oct-2024
https://doi.org/10.3233/SW-243633
Mountantonakis MMertzanis LBastakis MTzitzikas Y(2024)A comparative evaluation for question answering over Greek texts by using machine translation and BERTLanguage Resources and Evaluation10.1007/s10579-024-09745-9Online publication date: 19-Jun-2024
https://doi.org/10.1007/s10579-024-09745-9
Wang DFeng XLiu ZWang C(2024)2M-NER: contrastive learning for multilingual and multimodal NER with language and modal fusionApplied Intelligence10.1007/s10489-024-05490-254:8(6252-6268)Online publication date: 9-May-2024
https://dl.acm.org/doi/10.1007/s10489-024-05490-2
Perevalov AGashkov AEltsova MBoth A(2024)Understanding SPARQL Queries: Are We Already There? Multilingual Natural Language Generation Based on SPARQL Queries and Large Language ModelsThe Semantic Web – ISWC 202410.1007/978-3-031-77850-6_10(173-191)Online publication date: 11-Nov-2024
https://dl.acm.org/doi/10.1007/978-3-031-77850-6_10
Perevalov AGashkov AEltsova MBoth A(2024)Language Models as SPARQL Query Filtering for Improving the Quality of Multilingual Question Answering over Knowledge GraphsWeb Engineering10.1007/978-3-031-62362-2_1(3-18)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1007/978-3-031-62362-2_1
Gounakis NMountantonakis MTzitzikas Y(2023)Evaluating a Radius-based Pipeline for Question Answering over Cultural (CIDOC-CRM based) Knowledge GraphsProceedings of the 34th ACM Conference on Hypertext and Social Media10.1145/3603163.3609067(1-10)Online publication date: 4-Sep-2023
https://dl.acm.org/doi/10.1145/3603163.3609067
Srivastava NPerevalov AKuchelev DMoussallem DNgonga Ngomo ABoth A(2023)Lingua Franca – Entity-Aware Machine Translation Approach for Question Answering over Knowledge GraphsProceedings of the 12th Knowledge Capture Conference 202310.1145/3587259.3627567(122-130)Online publication date: 5-Dec-2023
https://dl.acm.org/doi/10.1145/3587259.3627567
Peng WLi WHu YChen HDuh WHuang HKato MMothe JPoblete B(2023)Leader-Generator Net: Dividing Skill and Implicitness for Conquering FairytaleQAProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591710(791-801)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591710
Yu MZhang QYu JZhao MLi XJin DYang MYu R(2023)Knowledge graph completion using topological correlation and multi-perspective independenceKnowledge-Based Systems10.1016/j.knosys.2022.110031259:COnline publication date: 10-Jan-2023
https://dl.acm.org/doi/10.1016/j.knosys.2022.110031
Perevalov A(2022)Enhancing Multilingual Accessibility of Question Answering over Knowledge GraphsCompanion Proceedings of the Web Conference 202210.1145/3487553.3524197(349-353)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3487553.3524197

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten