research-article

Multilingual COVID-QA: Learning towards Global Information Sharing via Web Question Answering in Multiple Languages

Authors:

Dongyan ZhaoAuthors Info & Claims

WWW '21: Proceedings of the Web Conference 2021

Pages 2590 - 2600

https://doi.org/10.1145/3442381.3449991

Published: 03 June 2021 Publication History

Abstract

Since late December 2019, it has been reported an outbreak of atypical pneumonia, now known as COVID-19 caused by the novel coronavirus. Cases have spread to more than 200 countries and regions internationally. World Health Organization (WHO) officially declares the coronavirus outbreak a pandemic and the public health emergency has caused world-wide impact to daily lives: people are advised to keep social distance, in-person events have been moved online, and some function facilitates have been locked-down. Alternatively, the Web becomes an active venue for people to share information. With respect to the on-going topic, people continuously post questions online and seek for answers. Yet, sharing global information conveyed in different languages is challenging because the language barrier is intrinsically unfriendly to monolingual speakers. In this paper, we propose a multilingual COVID-QA model to answer people’s questions in their own languages while the model is able to absorb knowledge from other languages. Another challenge is that in most cases, the information to share does not have parallel data in multiple languages. To this end, we propose a novel framework which incorporates (unsupervised) translation alignment to learn as pseudo-parallel data. Then we train multilingual question-answering mapping and generation. We demonstrate the effectiveness of our proposed approach compared against a series of competitive baselines. In this way, we make it easier to share global information across the language barriers, and hopefully we contribute to the battle against COVID-19.

References

[1]

Muhammad Abdul-Mageed, AbdelRahim Elmadany, Dinesh Pabbi, Kunal Verma, and Rannie Lin. 2020. Mega-COV: A Billion-Scale Dataset of 65 Languages For COVID-19. arXiv preprint arXiv:2005.06012.

[2]

Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. Unsupervised Statistical Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3632–3642.

[3]

Akari Asai, Jungo Kasai, Jonathan H Clark, Kenton Lee, Eunsol Choi, and Hannaneh Hajishirzi. 2020. XOR QA: Cross-lingual open-retrieval question answering. arXiv preprint arXiv:2010.11856(2020).

[4]

Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015.

[5]

Christopher JC Burges. 2013. Towards the machine comprehension of text: An essay. TechReport: MSR-TR-2013-125(2013).

[6]

Chen Chen, Lisong Qiu, Zhenxin Fu, Junfei Liu, and Rui Yan. 2019. Multilingual Dialogue Generation with Shared-Private Memory. In NLPCC’19. 42–54.

[7]

Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1870–1879.

[8]

Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018. Word translation without parallel data. In Proceedings of the International Conference on Learning Representations (ICLR).

[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.

[10]

Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In ACL-IJCNLP’15. 1723–1732.

[11]

Andre Esteva, Anuprit Kale, Romain Paulus, Kazuma Hashimoto, Wenpeng Yin, Dragomir Radev, and Richard Socher. 2020. Co-search: Covid-19 information retrieval with semantic search, question answering, and abstractive summarization. arXiv preprint arXiv:2006.09595(2020).

[12]

Anthony Ferritto, Sara Rosenthal, Mihaela Bornea, Kazi Hasan, Rishav Chakravarti, Salim Roukos, Radu Florian, and Avirup Sil. 2020. A Multilingual Reading Comprehension System for more than 100 Languages. In Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations. 41–47.

[13]

Zhenxin Fu, Yu Wu, Hailei Zhang, Yichuan Hu, Dongyan Zhao, and Rui Yan. 2020. Be Aware of the Hot Zone: A Warning System of Hazard Area Prediction to Intervene Novel Coronavirus COVID-19 Outbreak. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2241–2250.

Digital Library

[14]

Yanchao Hao, Yuanzhe Zhang, Kang Liu, Shizhu He, Zhanyi Liu, Hua Wu, and Jun Zhao. 2017. An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 221–231.

[15]

Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learning for machine translation. In Advances in neural information processing systems. 820–828.

[16]

Karl Moritz Hermann, Tomáš Kočiskỳ, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1. 1693–1701.

[17]

Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daumé III. 2014. A neural network for factoid question answering over paragraphs. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 633–644.

[18]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).

[19]

Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. 2017. RACE: Large-scale ReAding Comprehension Dataset From Examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 785–794.

[20]

Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Unsupervised Machine Translation Using Monolingual Corpora Only. In International Conference on Learning Representations.

[21]

Jinhyuk Lee, Sean S Yi, Minbyul Jeong, Mujeen Sung, Wonjin Yoon, Yonghwa Choi, Miyoung Ko, and Jaewoo Kang. 2020. Answering questions on covid-19 in real-time. arXiv preprint arXiv:2006.15830(2020).

[22]

Juntao Li, Chang Liu, Jian Wang, Lidong Bing, Hongsong Li, Xiaozhong Liu, Dongyan Zhao, and Rui Yan. 2020. Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8212–8219.

[23]

Chia-Wei Liu, Ryan Lowe, Iulian Vlad Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2122–2132.

[24]

Jiahua Liu, Yankai Lin, Zhiyuan Liu, and Maosong Sun. 2019. XQA: A Cross-lingual Open-domain Question Answering Dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2358–2368.

[25]

Timo Möller, Anthony Reina, Raghavan Jayakumar, and Lawrence Livermore. 2020. COVID-QA: A Question & Answering Dataset for COVID-19. (2020).

[26]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. In CoCo@ NIPS.

[27]

David Oniani and Yanshan Wang. 2020. A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19. arXiv preprint arXiv:2006.10964(2020).

[28]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311–318.

[29]

Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How Multilingual is Multilingual BERT?. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4996–5001.

[30]

Boyu Qiu, Xu Chen, Jungang Xu, and Yingfei Sun. 2019. A survey on neural machine reading comprehension. arXiv preprint arXiv:1906.03824(2019).

[31]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383–2392.

[32]

Kirk Roberts, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, Kyle Lo, Ian Soboroff, Ellen Voorhees, Lucy Lu Wang, and William R Hersh. 2020. TREC-COVID: Rationale and Structure of an Information Retrieval Shared Task for COVID-19. Journal of the American Medical Informatics Association (2020).

[33]

Muhammad Saad, Muhammad Hassan, and Fareed Zaffar. 2020. Towards Characterizing the COVID-19 Awareness on Twitter. arXiv preprint arXiv:2005.08379(2020).

[34]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 86–96.

[35]

Hao Sha, Mohammad Al Hasan, George Mohler, and P Jeffrey Brantingham. 2020. Dynamic topic modeling of the COVID-19 Twitter narrative among US governors and cabinet executives. arXiv preprint arXiv:2004.11692(2020).

[36]

Karishma Sharma, Sungyong Seo, Chuizheng Meng, Sirisha Rambhatla, and Yan Liu. 2020. COVID-19 on Social Media: Analyzing Misinformation in Twitter Conversations. arXiv preprint arXiv:2003.12309(2020).

[37]

Dan Su, Yan Xu, Tiezheng Yu, Farhad Bin Siddique, Elham J Barezi, and Pascale Fung. 2020. CAiRE-COVID: A Question Answering and Multi-Document Summarization System for COVID-19 Research. arXiv preprint arXiv:2005.03975(2020).

[38]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems.

[39]

Raphael Tang, Rodrigo Nogueira, Edwin Zhang, Nikhil Gupta, Phuong Cam, Kyunghyun Cho, and Jimmy Lin. 2020. Rapidly Bootstrapping a Question Answering Dataset for COVID-19. arXiv preprint arXiv:2004.11339(2020).

[40]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.

[41]

Akhila Sri Manasa Venigalla, Dheeraj Vagavolu, and Sridhar Chimalakonda. 2020. Mood of India During Covid-19–An Interactive Web Portal Based on Emotion Analysis of Twitter Data. arXiv preprint arXiv:2005.02955(2020).

[42]

Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol, and Léon Bottou. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.Journal of machine learning research 11, 12 (2010).

[43]

Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Darrin Eide, Kathryn Funk, Rodney Kinney, Ziyang Liu, William Merrill, 2020. CORD-19: The Covid-19 Open Research Dataset. arXiv preprint arXiv:2004.10706(2020).

[44]

Xuan Wang, Weili Liu, Aabhas Chauhan, Yingjun Guan, and Jiawei Han. 2020. Automatic Textual Evidence Mining in COVID-19 Literature. arXiv preprint arXiv:2004.12563(2020).

[45]

Jerry Wei, Chengyu Huang, Soroush Vosoughi, and Jason Wei. 2020. What Are People Asking About COVID-19? A Question Classification Dataset. arXiv preprint arXiv:2005.12522(2020).

[46]

Rui Yan. 2018. “Chitty-Chitty-Chat Bot”: Deep Learning for Conversational AI. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 5520–5526.

[47]

Rui Yan, Yiping Song, and Hua Wu. 2016. Learning to respond with deep neural networks for retrieval-based human-computer conversation system. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 55–64.

Digital Library

[48]

Kai-Cheng Yang, Christopher Torres-Lugo, and Filippo Menczer. 2020. Prevalence of low-credibility information on twitter during the covid-19 outbreak. arXiv preprint arXiv:2004.14484(2020).

[49]

Edwin Zhang, Nikhil Gupta, Rodrigo Nogueira, Kyunghyun Cho, and Jimmy Lin. 2020. Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned. arXiv preprint arXiv:2004.05125(2020).

[50]

Yuan Zhang, Xiaoqing Zhang, Yichuan Hu, Guanchun Wang, and Rui Yan. 2021. WULAI-QA: Web Understanding and Learning with AI towards Document-based Question Answering against COVID-19. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining.

Digital Library

[51]

Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, and Enhong Chen. 2018. Joint training for neural machine translation models with monolingual data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

Cited By

Herrera-Espejel PRach S(2023)The Use of Machine Translation for Outreach and Health Communication in Epidemiology and Public Health: Scoping ReviewJMIR Public Health and Surveillance10.2196/508149(e50814)Online publication date: 20-Nov-2023
https://doi.org/10.2196/50814
Lahoti PMittal NSingh G(2023)OptBertDCNN: A framework based on BERT and optimized Deep Convolutional Neural Network for MQAProceedings of the 2023 Fifteenth International Conference on Contemporary Computing10.1145/3607947.3608055(515-522)Online publication date: 3-Aug-2023
https://dl.acm.org/doi/10.1145/3607947.3608055
Qamar FLatif SShah A(2023)Techniques, datasets, evaluation metrics and future directions of a question answering systemKnowledge and Information Systems10.1007/s10115-023-02019-w66:4(2235-2268)Online publication date: 22-Dec-2023
https://dl.acm.org/doi/10.1007/s10115-023-02019-w
Show More Cited By

Multilingual COVID-QA: Learning towards Global Information Sharing via Web Question Answering in Multiple Languages
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Bilingual Question Answering Using CINDI_QA at QA@CLEF 2007
Advances in Multilingual and Multimodal Information Retrieval

This article presents the first participation of the CINDI group in the Multiple Language Question Answering Cross Language Evaluation Forum (QA@CLEF). We participated in a track using French as source language and English as target language. CINDI_QA ...
Probabilistic models for answer-ranking in multilingual question-answering

This article presents two probabilistic models for answering ranking in the multilingual question-answering (QA) task, which finds exact answers to a natural language question written in different languages. Although some probabilistic methods have been ...
Priberam's question answering system in QA@CLEF 2008
CLEF'08: Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access

This paper describes the changes implemented in Priberam's question answering (QA) system, followed by the discussion of the results obtained in Portuguese and Spanish monolingual runs at QA@CLEF 2008. We enhanced the syntactic analysis of the question ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '21: Proceedings of the Web Conference 2021

April 2021

4054 pages

ISBN:9781450383127

DOI:10.1145/3442381

Editors:
Jure Leskovec
Stanford
,
Marko Grobelnik
Jožef Stefan Institute
,
Marc Najork
Google
,
Jie Tang
Tsinghua University
,
Leila Zia
Wikimedia Foundation

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '21

Sponsor:

SIGWEB

WWW '21: The Web Conference 2021

April 19 - 23, 2021

Ljubljana, Slovenia

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
278
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Herrera-Espejel PRach S(2023)The Use of Machine Translation for Outreach and Health Communication in Epidemiology and Public Health: Scoping ReviewJMIR Public Health and Surveillance10.2196/508149(e50814)Online publication date: 20-Nov-2023
https://doi.org/10.2196/50814
Lahoti PMittal NSingh G(2023)OptBertDCNN: A framework based on BERT and optimized Deep Convolutional Neural Network for MQAProceedings of the 2023 Fifteenth International Conference on Contemporary Computing10.1145/3607947.3608055(515-522)Online publication date: 3-Aug-2023
https://dl.acm.org/doi/10.1145/3607947.3608055
Qamar FLatif SShah A(2023)Techniques, datasets, evaluation metrics and future directions of a question answering systemKnowledge and Information Systems10.1007/s10115-023-02019-w66:4(2235-2268)Online publication date: 22-Dec-2023
https://dl.acm.org/doi/10.1007/s10115-023-02019-w
Al-Garadi MYang YSarker A(2022)The Role of Natural Language Processing during the COVID-19 Pandemic: Health Applications, Opportunities, and ChallengesHealthcare10.3390/healthcare1011227010:11(2270)Online publication date: 12-Nov-2022
https://doi.org/10.3390/healthcare10112270
Xue JYabe TTsubouchi KMa JUkkusuri SZhang ARangwala H(2022)Multiwave COVID-19 Prediction from Social Awareness Using Web Search and Mobility DataProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539172(4279-4289)Online publication date: 14-Aug-2022
https://dl.acm.org/doi/10.1145/3534678.3539172
Leite GAlbuquerque APinheiro P(2021)Applications of Technological Solutions in Primary Ways of Preventing Transmission of Respiratory Infectious Diseases—A Systematic Literature ReviewInternational Journal of Environmental Research and Public Health10.3390/ijerph18201076518:20(10765)Online publication date: 14-Oct-2021
https://doi.org/10.3390/ijerph182010765
Yan RLiao WZhao DWen J(2021)Multi-Response Awareness for Retrieval-Based Conversations: Respond with Diversity via Dynamic Representation LearningACM Transactions on Information Systems10.1145/347045039:4(1-29)Online publication date: 20-Sep-2021
https://dl.acm.org/doi/10.1145/3470450

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten