skip to main content
10.1145/3442381.3449991acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Multilingual COVID-QA: Learning towards Global Information Sharing via Web Question Answering in Multiple Languages

Published:03 June 2021Publication History

ABSTRACT

Since late December 2019, it has been reported an outbreak of atypical pneumonia, now known as COVID-19 caused by the novel coronavirus. Cases have spread to more than 200 countries and regions internationally. World Health Organization (WHO) officially declares the coronavirus outbreak a pandemic and the public health emergency has caused world-wide impact to daily lives: people are advised to keep social distance, in-person events have been moved online, and some function facilitates have been locked-down. Alternatively, the Web becomes an active venue for people to share information. With respect to the on-going topic, people continuously post questions online and seek for answers. Yet, sharing global information conveyed in different languages is challenging because the language barrier is intrinsically unfriendly to monolingual speakers. In this paper, we propose a multilingual COVID-QA model to answer people’s questions in their own languages while the model is able to absorb knowledge from other languages. Another challenge is that in most cases, the information to share does not have parallel data in multiple languages. To this end, we propose a novel framework which incorporates (unsupervised) translation alignment to learn as pseudo-parallel data. Then we train multilingual question-answering mapping and generation. We demonstrate the effectiveness of our proposed approach compared against a series of competitive baselines. In this way, we make it easier to share global information across the language barriers, and hopefully we contribute to the battle against COVID-19.

References

  1. Muhammad Abdul-Mageed, AbdelRahim Elmadany, Dinesh Pabbi, Kunal Verma, and Rannie Lin. 2020. Mega-COV: A Billion-Scale Dataset of 65 Languages For COVID-19. arXiv preprint arXiv:2005.06012.Google ScholarGoogle Scholar
  2. Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. Unsupervised Statistical Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3632–3642.Google ScholarGoogle ScholarCross RefCross Ref
  3. Akari Asai, Jungo Kasai, Jonathan H Clark, Kenton Lee, Eunsol Choi, and Hannaneh Hajishirzi. 2020. XOR QA: Cross-lingual open-retrieval question answering. arXiv preprint arXiv:2010.11856(2020).Google ScholarGoogle Scholar
  4. Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015.Google ScholarGoogle Scholar
  5. Christopher JC Burges. 2013. Towards the machine comprehension of text: An essay. TechReport: MSR-TR-2013-125(2013).Google ScholarGoogle Scholar
  6. Chen Chen, Lisong Qiu, Zhenxin Fu, Junfei Liu, and Rui Yan. 2019. Multilingual Dialogue Generation with Shared-Private Memory. In NLPCC’19. 42–54.Google ScholarGoogle Scholar
  7. Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1870–1879.Google ScholarGoogle ScholarCross RefCross Ref
  8. Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018. Word translation without parallel data. In Proceedings of the International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  9. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.Google ScholarGoogle Scholar
  10. Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In ACL-IJCNLP’15. 1723–1732.Google ScholarGoogle Scholar
  11. Andre Esteva, Anuprit Kale, Romain Paulus, Kazuma Hashimoto, Wenpeng Yin, Dragomir Radev, and Richard Socher. 2020. Co-search: Covid-19 information retrieval with semantic search, question answering, and abstractive summarization. arXiv preprint arXiv:2006.09595(2020).Google ScholarGoogle Scholar
  12. Anthony Ferritto, Sara Rosenthal, Mihaela Bornea, Kazi Hasan, Rishav Chakravarti, Salim Roukos, Radu Florian, and Avirup Sil. 2020. A Multilingual Reading Comprehension System for more than 100 Languages. In Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations. 41–47.Google ScholarGoogle Scholar
  13. Zhenxin Fu, Yu Wu, Hailei Zhang, Yichuan Hu, Dongyan Zhao, and Rui Yan. 2020. Be Aware of the Hot Zone: A Warning System of Hazard Area Prediction to Intervene Novel Coronavirus COVID-19 Outbreak. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2241–2250.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yanchao Hao, Yuanzhe Zhang, Kang Liu, Shizhu He, Zhanyi Liu, Hua Wu, and Jun Zhao. 2017. An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 221–231.Google ScholarGoogle ScholarCross RefCross Ref
  15. Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learning for machine translation. In Advances in neural information processing systems. 820–828.Google ScholarGoogle Scholar
  16. Karl Moritz Hermann, Tomáš Kočiskỳ, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1. 1693–1701.Google ScholarGoogle Scholar
  17. Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daumé III. 2014. A neural network for factoid question answering over paragraphs. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 633–644.Google ScholarGoogle ScholarCross RefCross Ref
  18. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).Google ScholarGoogle Scholar
  19. Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. 2017. RACE: Large-scale ReAding Comprehension Dataset From Examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 785–794.Google ScholarGoogle ScholarCross RefCross Ref
  20. Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Unsupervised Machine Translation Using Monolingual Corpora Only. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  21. Jinhyuk Lee, Sean S Yi, Minbyul Jeong, Mujeen Sung, Wonjin Yoon, Yonghwa Choi, Miyoung Ko, and Jaewoo Kang. 2020. Answering questions on covid-19 in real-time. arXiv preprint arXiv:2006.15830(2020).Google ScholarGoogle Scholar
  22. Juntao Li, Chang Liu, Jian Wang, Lidong Bing, Hongsong Li, Xiaozhong Liu, Dongyan Zhao, and Rui Yan. 2020. Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8212–8219.Google ScholarGoogle ScholarCross RefCross Ref
  23. Chia-Wei Liu, Ryan Lowe, Iulian Vlad Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2122–2132.Google ScholarGoogle ScholarCross RefCross Ref
  24. Jiahua Liu, Yankai Lin, Zhiyuan Liu, and Maosong Sun. 2019. XQA: A Cross-lingual Open-domain Question Answering Dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2358–2368.Google ScholarGoogle ScholarCross RefCross Ref
  25. Timo Möller, Anthony Reina, Raghavan Jayakumar, and Lawrence Livermore. 2020. COVID-QA: A Question & Answering Dataset for COVID-19. (2020).Google ScholarGoogle Scholar
  26. Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. In CoCo@ NIPS.Google ScholarGoogle Scholar
  27. David Oniani and Yanshan Wang. 2020. A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19. arXiv preprint arXiv:2006.10964(2020).Google ScholarGoogle Scholar
  28. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311–318.Google ScholarGoogle Scholar
  29. Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How Multilingual is Multilingual BERT?. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4996–5001.Google ScholarGoogle ScholarCross RefCross Ref
  30. Boyu Qiu, Xu Chen, Jungang Xu, and Yingfei Sun. 2019. A survey on neural machine reading comprehension. arXiv preprint arXiv:1906.03824(2019).Google ScholarGoogle Scholar
  31. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383–2392.Google ScholarGoogle ScholarCross RefCross Ref
  32. Kirk Roberts, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, Kyle Lo, Ian Soboroff, Ellen Voorhees, Lucy Lu Wang, and William R Hersh. 2020. TREC-COVID: Rationale and Structure of an Information Retrieval Shared Task for COVID-19. Journal of the American Medical Informatics Association (2020).Google ScholarGoogle Scholar
  33. Muhammad Saad, Muhammad Hassan, and Fareed Zaffar. 2020. Towards Characterizing the COVID-19 Awareness on Twitter. arXiv preprint arXiv:2005.08379(2020).Google ScholarGoogle Scholar
  34. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 86–96.Google ScholarGoogle ScholarCross RefCross Ref
  35. Hao Sha, Mohammad Al Hasan, George Mohler, and P Jeffrey Brantingham. 2020. Dynamic topic modeling of the COVID-19 Twitter narrative among US governors and cabinet executives. arXiv preprint arXiv:2004.11692(2020).Google ScholarGoogle Scholar
  36. Karishma Sharma, Sungyong Seo, Chuizheng Meng, Sirisha Rambhatla, and Yan Liu. 2020. COVID-19 on Social Media: Analyzing Misinformation in Twitter Conversations. arXiv preprint arXiv:2003.12309(2020).Google ScholarGoogle Scholar
  37. Dan Su, Yan Xu, Tiezheng Yu, Farhad Bin Siddique, Elham J Barezi, and Pascale Fung. 2020. CAiRE-COVID: A Question Answering and Multi-Document Summarization System for COVID-19 Research. arXiv preprint arXiv:2005.03975(2020).Google ScholarGoogle Scholar
  38. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems.Google ScholarGoogle Scholar
  39. Raphael Tang, Rodrigo Nogueira, Edwin Zhang, Nikhil Gupta, Phuong Cam, Kyunghyun Cho, and Jimmy Lin. 2020. Rapidly Bootstrapping a Question Answering Dataset for COVID-19. arXiv preprint arXiv:2004.11339(2020).Google ScholarGoogle Scholar
  40. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.Google ScholarGoogle Scholar
  41. Akhila Sri Manasa Venigalla, Dheeraj Vagavolu, and Sridhar Chimalakonda. 2020. Mood of India During Covid-19–An Interactive Web Portal Based on Emotion Analysis of Twitter Data. arXiv preprint arXiv:2005.02955(2020).Google ScholarGoogle Scholar
  42. Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol, and Léon Bottou. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.Journal of machine learning research 11, 12 (2010).Google ScholarGoogle Scholar
  43. Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Darrin Eide, Kathryn Funk, Rodney Kinney, Ziyang Liu, William Merrill, 2020. CORD-19: The Covid-19 Open Research Dataset. arXiv preprint arXiv:2004.10706(2020).Google ScholarGoogle Scholar
  44. Xuan Wang, Weili Liu, Aabhas Chauhan, Yingjun Guan, and Jiawei Han. 2020. Automatic Textual Evidence Mining in COVID-19 Literature. arXiv preprint arXiv:2004.12563(2020).Google ScholarGoogle Scholar
  45. Jerry Wei, Chengyu Huang, Soroush Vosoughi, and Jason Wei. 2020. What Are People Asking About COVID-19? A Question Classification Dataset. arXiv preprint arXiv:2005.12522(2020).Google ScholarGoogle Scholar
  46. Rui Yan. 2018. “Chitty-Chitty-Chat Bot”: Deep Learning for Conversational AI. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 5520–5526.Google ScholarGoogle ScholarCross RefCross Ref
  47. Rui Yan, Yiping Song, and Hua Wu. 2016. Learning to respond with deep neural networks for retrieval-based human-computer conversation system. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 55–64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Kai-Cheng Yang, Christopher Torres-Lugo, and Filippo Menczer. 2020. Prevalence of low-credibility information on twitter during the covid-19 outbreak. arXiv preprint arXiv:2004.14484(2020).Google ScholarGoogle Scholar
  49. Edwin Zhang, Nikhil Gupta, Rodrigo Nogueira, Kyunghyun Cho, and Jimmy Lin. 2020. Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned. arXiv preprint arXiv:2004.05125(2020).Google ScholarGoogle Scholar
  50. Yuan Zhang, Xiaoqing Zhang, Yichuan Hu, Guanchun Wang, and Rui Yan. 2021. WULAI-QA: Web Understanding and Learning with AI towards Document-based Question Answering against COVID-19. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, and Enhong Chen. 2018. Joint training for neural machine translation models with monolingual data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  1. Multilingual COVID-QA: Learning towards Global Information Sharing via Web Question Answering in Multiple Languages

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WWW '21: Proceedings of the Web Conference 2021
      April 2021
      4054 pages
      ISBN:9781450383127
      DOI:10.1145/3442381

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 June 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format