skip to main content
10.1145/3485447.3511940acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Can Machine Translation be a Reasonable Alternative for Multilingual Question Answering Systems over Knowledge Graphs?

Published: 25 April 2022 Publication History

Abstract

Providing access to information is the main and most important purpose of the Web. However, despite available easy-to-use tools (e.g., search engines, chatbots, question answering) the accessibility is typically limited by the capability of using the English language. This excludes a huge amount of people. In this work, we discuss Knowledge Graph Question Answering (KGQA) systems that aim at providing natural language access to data stored in Knowledge Graphs (KG). While several KGQA systems have been proposed, only very few have dealt with a language other than English. In this work, we follow our research agenda of enabling speakers of any language to access the knowledge stored in KGs. Because of the lack of native support for many languages, we use machine translation (MT) tools to evaluate KGQA systems regarding questions in languages that are unsupported by a KGQA system. In total, our evaluation is based on 8 different languages (including some that never were evaluated before). For the intensive evaluation, we extend the QALD-9 dataset for KGQA with Wikidata queries and high-quality translations. The extension was done in a crowdsourcing manner by native speakers of the different languages. By using multiple KGQA systems for the evaluation, we were enabled to investigate and answer the main research question: “Can MT be an alternative for multilingual KGQA systems?”. The evaluation results demonstrated that the monolingual KGQA systems can be effectively ported to the new languages with MT tools.

References

[1]
Nitish Aggarwal. 2012. Cross Lingual Semantic Search by Improving Semantic Similarity and Relatedness Measures. In Proceedings of the 11th International Conference on The Semantic Web - Volume Part II (Boston, MA) (ISWC’12). Springer-Verlag, Berlin, Heidelberg, 375–382. https://doi.org/10.1007/978-3-642-35173-0_26
[2]
Kisuh Ahn, Beatrice Alex, Johan Bos, Tiphaine Dalmas, Jochen L Leidner, and Matthew B Smillie. 2004. Cross-lingual question answering using off-the-shelf machine translation. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 446–457. https://doi.org/10.1007/11519645_44
[3]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In The Semantic Web, Karl Aberer, Key-Sun Choi, Natasha Noy, Dean Allemang, Kyung-Il Lee, Lyndon Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudré-Mauroux (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 722–735.
[4]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. Association for Computing Machinery, New York, NY, USA, 1247–1250. https://doi.org/10.1145/1376616.1376746
[5]
Andreas Both, Dennis Diefenbach, Kuldeep Singh, Saedeeh Shekarpour, Didier Cherix, and Christoph Lange. 2016. Qanary–a methodology for vocabulary-driven open question answering systems. In European Semantic Web Conference. Springer International Publishing, Cham, 625–641.
[6]
Andreas Both, Axel-Cyrille Ngonga Ngomo, Ricardo Usbeck, Denis Lukovnikov, Christiane Lemke, and Maximilian Speicher. 2014. A Service-Oriented Search Framework for Full Text, Geospatial and Semantic Search. In Proceedings of the 10th International Conference on Semantic Systems (Leipzig, Germany) (SEM ’14). Association for Computing Machinery, New York, NY, USA, 65–72. https://doi.org/10.1145/2660517.2660528
[7]
Mikhail Burtsev, Alexander Seliverstov, Rafael Airapetyan, Mikhail Arkhipov, Dilyara Baymurzina, Nickolay Bushkov, Olga Gureenkova, Taras Khakhulin, Yuri Kuratov, Denis Kuznetsov, Alexey Litinsky, Varvara Logacheva, Alexey Lymar, Valentin Malykh, Maxim Petrov, Vadim Polulyakh, Leonid Pugachev, Alexey Sorokin, Maria Vikhreva, and Marat Zaynutdinov. 2018. DeepPavlov: Open-Source Library for Dialogue Systems. In Proceedings of ACL 2018, System Demonstrations. Association for Computational Linguistics, Melbourne, Australia, 122–127.
[8]
Casimiro Pio Carrino, Marta Ruiz Costa-Jussà, and José Adrián Rodríguez Fonollosa. 2020. Automatic Spanish translation of SQuAD dataset for multi-lingual question answering. In LREC 2020: 12th International Conference on Language Resources and Evaluation: Marseílle, France: May 13-15, 2020: conference proceedings. European Language Resources Association (ELRA), 5515–5523.
[9]
Ryan Cotterell, Sabrina J. Mielke, Jason Eisner, and Brian Roark. 2018. Are All Languages Equally Hard to Language-Model?. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 536–541. https://doi.org/10.18653/v1/N18-2085
[10]
Ruixiang Cui, Rahul Aralikatte, Heather Lent, and Daniel Hershcovich. 2021. Multilingual Compositional Wikidata Questions. arxiv:2108.03509 [cs.CL]
[11]
Dennis Diefenbach, A. Both, K. Singh, and P. Maret. 2020. Towards a Question Answering System over the Semantic Web. Semantic Web 11(2020), 421–439. https://doi.org/10.3233/SW-190343
[12]
Dennis Diefenbach, Vanessa Lopez, Kamal Singh, and Pierre Maret. 2018. Core techniques of question answering systems over knowledge bases: a survey. Knowledge and Information systems 55, 3 (2018), 529–569. https://doi.org/10.1007/s10115-017-1100-y
[13]
Dennis Diefenbach, Kuldeep Singh, Andreas Both, Didier Cherix, Christoph Lange, and Sören Auer. 2017. The Qanary Ecosystem: Getting New Insights by Composing Question Answering Pipelines. In Web Engineering - 17th International Conference, ICWE 2017, Rome, Italy, June 5-8, 2017, Proceedings(Lecture Notes in Computer Science, Vol. 10360), Jordi Cabot, Roberto De Virgilio, and Riccardo Torlone (Eds.). Springer, 171–189. https://doi.org/10.1007/978-3-319-60131-1_10
[14]
George Doddington. 2002. Automatic Evaluation of Machine Translation Quality Using N-Gram Co-Occurrence Statistics. In Proceedings of the Second International Conference on Human Language Technology Research (San Diego, California) (HLT ’02). Morgan Kaufmann Publishers Inc., 138–145.
[15]
André Freitas, João Gabriel Oliveira, Seán O’Riain, Edward Curry, and João Carlos Pereira Da Silva. 2011. Querying Linked Data Using Semantic Relatedness: A Vocabulary Independent Approach. In Proceedings of the 16th International Conference on Natural Language Processing and Information Systems (Alicante, Spain) (NLDB’11). Springer-Verlag, Berlin, Heidelberg, 40–51.
[16]
Konrad Höffner, Sebastian Walter, Edgard Marx, Ricardo Usbeck, Jens Lehmann, and Axel-Cyrille Ngonga Ngomo. 2016. Survey on Challenges of Question Answering in the Semantic Web. Semantic Web 8 (11 2016). https://doi.org/10.3233/SW-160247
[17]
Daniel Jurafsky and James H. Martin. 2020. Speech and Language Processing (3rd Edition draft).Prentice-Hall, Inc., USA, Chapter Chapter 23: Question Answering.
[18]
Daniel Keysers, Nathanael Schärli, Nathan Scales, Hylke Buisman, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz Stafiniak, Tibor Tihon, Dmitry Tsarkov, Xiao Wang, Marc van Zee, and Olivier Bousquet. 2020. Measuring Compositional Generalization: A Comprehensive Method on Realistic Data. In International Conference on Learning Representations (ICLR). https://openreview.net/pdf?id=SygcCnNKwr
[19]
Ekaterina Loginova, Stalin Varanasi, and Günter Neumann. 2020. Towards End-to-End Multilingual Question Answering. Information Systems Frontiers (ISF) 22 (3 2020), 1–14.
[20]
A. Neves, Andre Lamurias, and F. Couto. 2020. Biomedical Question Answering using Extreme Multi-Label Classification and Ontologies in the Multilingual Panorama. In Semantic Indexing and Information Retrieval for Health Held in conjunction with the 42nd European Conference on Information Retrieval (SIIRH@ECIR).
[21]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.
[22]
Thomas Pellissier Tanon, Marcos Dias de Assunção, Eddy Caron, and Fabian M. Suchanek. 2018. Demoing Platypus – A Multilingual Question Answering Platform for Wikidata. In The Semantic Web: ESWC 2018 Satellite Events, Aldo Gangemi, Anna Lisa Gentile, Andrea Giovanni Nuzzolese, Sebastian Rudolph, Maria Maleshkova, Heiko Paulheim, Jeff Z Pan, and Mehwish Alam (Eds.). Springer International Publishing, Cham, 111–116.
[23]
Aleksandr Perevalov, Dennis Diefenbach, Ricardo Usbeck, and Andreas Both. 2022. QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers. In 2022 IEEE 16th International Conference on Semantic Computing (ICSC). IEEE. arxiv:2202.00120 [cs.CL]
[24]
Amir Pouran Ben Veyseh. 2016. Cross-Lingual Question Answering Using Common Semantic Space. In Proceedings of TextGraphs-10: the Workshop on Graph-based Methods for Natural Language Processing. Association for Computational Linguistics, 15–19. https://doi.org/10.18653/v1/W16-1403
[25]
Ivan Rybin, Vladislav Korablinov, Pavel Efimov, and Pavel Braslavski. 2021. RuBQ 2.0: An Innovated Russian Question Answering Dataset. In The Semantic Web. Springer International Publishing, Cham, 532–547.
[26]
Kuldeep Singh, Arun Sethupat Radhakrishna, Andreas Both, Saeedeh Shekarpour, Ioanna Lytra, Ricardo Usbeck, Akhilesh Vyas, Akmal Khikmatullaev, Dharmen Punjani, Christoph Lange, Maria Esther Vidal, Jens Lehmann, and Sören Auer. 2018. Why Reinvent the Wheel: Let’s Build Question Answering Systems Together. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1247–1256. https://doi.org/10.1145/3178876.3186023
[27]
Daniil Sorokin and Iryna Gurevych. 2018. Modeling semantics with gated graph neural networks for knowledge base question answering. arXiv preprint arXiv:1808.04126(2018).
[28]
Jörg Tiedemann and Santhosh Thottingal. 2020. OPUS-MT — Building open translation services for the World. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT). Lisbon, Portugal.
[29]
Kai Ming Ting. 2010. Precision and Recall. In Encyclopedia of Machine Learning, Claude Sammut and Geoffrey I. Webb (Eds.). Springer, Boston, MA, 781–781. https://doi.org/10.1007/978-0-387-30164-8_652
[30]
Ricardo Usbeck, Ria Hari Gusmita, Axel-Cyrille Ngonga Ngomo, and Muhammad Saleem. 2018. 9th Challenge on Question Answering over Linked Data (QALD-9). In Joint proceedings of the 4th Workshop on Semantic Deep Learning (SemDeep-4) and NLIWoD4: Natural Language Interfaces for the Web of Data (NLIWOD-4) and 9th Question Answering over Linked Data challenge (QALD-9) co-located with 17th International Semantic Web Conference (ISWC 2018), Monterey, California, United States of America, October 8th - 9th, 2018.58–64.
[31]
Ricardo Usbeck, Michael Röder, Axel-Cyrille Ngonga Ngomo, Ciro Baron, Andreas Both, Martin Brümmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix, Bernd Eickmann, Paolo Ferragina, Christiane Lemke, Andrea Moro, Roberto Navigli, Francesco Piccinno, Giuseppe Rizzo, Harald Sack, René Speck, Raphaël Troncy, Jörg Waitelonis, and Lars Wesemann. 2015. GERBIL: General Entity Annotator Benchmarking Framework. In Proceedings of the 24th International Conference on World Wide Web (Florence, Italy) (WWW ’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1133–1143. https://doi.org/10.1145/2736277.2741626
[32]
Martin van Hees. 2015. Web-based automatic translation: the Yandex.Translate API.
[33]
Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A Free Collaborative Knowledgebase. Commun. ACM 57, 10 (Sept. 2014), 78–85. https://doi.org/10.1145/2629489
[34]
Lei Zhang, Maribel Acosta, Michael Färber, Steffen Thoma, and Achim Rettinger. 2017. BreXearch: Exploring Brexit Data Using Cross-Lingual and Cross-Media Semantic Search. In International Semantic Web Conference (Posters, Demos & Industry Tracks).
[35]
Lei Zhang, Michael Färber, and Achim Rettinger. 2016. XKnowSearch! Exploiting Knowledge Bases for Entity-Based Cross-Lingual Information Retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (Indianapolis, Indiana, USA) (CIKM ’16). Association for Computing Machinery, New York, NY, USA, 2425–2428. https://doi.org/10.1145/2983323.2983324
[36]
Yucheng Zhou, Xiubo Geng, Tao Shen, Wenqiang Zhang, and Daxin Jiang. 2021. Improving Zero-Shot Cross-lingual Transfer for Multilingual Question Answering over Knowledge Graph. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 5822–5834. https://doi.org/10.18653/v1/2021.naacl-main.465
[37]
Óscar Ferrández, Christian Spurk, Milen Kouylekov, Iustin Dornescu, Sergio Ferrández, Matteo Negri, Rubén Izquierdo, David Tomás, Constantin Orasan, Guenter Neumann, Bernardo Magnini, and Jose Luis Vicedo. 2011. The QALL-ME Framework: A specifiable-domain multilingual Question Answering architecture. Journal of Web Semantics 9, 2 (2011), 137 – 145.

Cited By

View all
  • (2024)Multilingual question answering systems for knowledge graphs – a surveySemantic Web10.3233/SW-24363315:5(2089-2124)Online publication date: 9-Oct-2024
  • (2024)A comparative evaluation for question answering over Greek texts by using machine translation and BERTLanguage Resources and Evaluation10.1007/s10579-024-09745-9Online publication date: 19-Jun-2024
  • (2024)2M-NER: contrastive learning for multilingual and multimodal NER with language and modal fusionApplied Intelligence10.1007/s10489-024-05490-254:8(6252-6268)Online publication date: 9-May-2024
  • Show More Cited By

Index Terms

  1. Can Machine Translation be a Reasonable Alternative for Multilingual Question Answering Systems over Knowledge Graphs?
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '22: Proceedings of the ACM Web Conference 2022
        April 2022
        3764 pages
        ISBN:9781450390965
        DOI:10.1145/3485447
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 25 April 2022

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. knowledge graph question answering
        2. machine translation
        3. multilingual question answering
        4. question answering dataset

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        WWW '22
        Sponsor:
        WWW '22: The ACM Web Conference 2022
        April 25 - 29, 2022
        Virtual Event, Lyon, France

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)42
        • Downloads (Last 6 weeks)6
        Reflects downloads up to 08 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Multilingual question answering systems for knowledge graphs – a surveySemantic Web10.3233/SW-24363315:5(2089-2124)Online publication date: 9-Oct-2024
        • (2024)A comparative evaluation for question answering over Greek texts by using machine translation and BERTLanguage Resources and Evaluation10.1007/s10579-024-09745-9Online publication date: 19-Jun-2024
        • (2024)2M-NER: contrastive learning for multilingual and multimodal NER with language and modal fusionApplied Intelligence10.1007/s10489-024-05490-254:8(6252-6268)Online publication date: 9-May-2024
        • (2024)Understanding SPARQL Queries: Are We Already There? Multilingual Natural Language Generation Based on SPARQL Queries and Large Language ModelsThe Semantic Web – ISWC 202410.1007/978-3-031-77850-6_10(173-191)Online publication date: 11-Nov-2024
        • (2024)Language Models as SPARQL Query Filtering for Improving the Quality of Multilingual Question Answering over Knowledge GraphsWeb Engineering10.1007/978-3-031-62362-2_1(3-18)Online publication date: 17-Jun-2024
        • (2023)Evaluating a Radius-based Pipeline for Question Answering over Cultural (CIDOC-CRM based) Knowledge GraphsProceedings of the 34th ACM Conference on Hypertext and Social Media10.1145/3603163.3609067(1-10)Online publication date: 4-Sep-2023
        • (2023)Lingua Franca – Entity-Aware Machine Translation Approach for Question Answering over Knowledge GraphsProceedings of the 12th Knowledge Capture Conference 202310.1145/3587259.3627567(122-130)Online publication date: 5-Dec-2023
        • (2023)Leader-Generator Net: Dividing Skill and Implicitness for Conquering FairytaleQAProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591710(791-801)Online publication date: 19-Jul-2023
        • (2023)Knowledge graph completion using topological correlation and multi-perspective independenceKnowledge-Based Systems10.1016/j.knosys.2022.110031259:COnline publication date: 10-Jan-2023
        • (2022)Enhancing Multilingual Accessibility of Question Answering over Knowledge GraphsCompanion Proceedings of the Web Conference 202210.1145/3487553.3524197(349-353)Online publication date: 25-Apr-2022

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media