ABSTRACT
Since late December 2019, it has been reported an outbreak of atypical pneumonia, now known as COVID-19 caused by the novel coronavirus. Cases have spread to more than 200 countries and regions internationally. World Health Organization (WHO) officially declares the coronavirus outbreak a pandemic and the public health emergency has caused world-wide impact to daily lives: people are advised to keep social distance, in-person events have been moved online, and some function facilitates have been locked-down. Alternatively, the Web becomes an active venue for people to share information. With respect to the on-going topic, people continuously post questions online and seek for answers. Yet, sharing global information conveyed in different languages is challenging because the language barrier is intrinsically unfriendly to monolingual speakers. In this paper, we propose a multilingual COVID-QA model to answer people’s questions in their own languages while the model is able to absorb knowledge from other languages. Another challenge is that in most cases, the information to share does not have parallel data in multiple languages. To this end, we propose a novel framework which incorporates (unsupervised) translation alignment to learn as pseudo-parallel data. Then we train multilingual question-answering mapping and generation. We demonstrate the effectiveness of our proposed approach compared against a series of competitive baselines. In this way, we make it easier to share global information across the language barriers, and hopefully we contribute to the battle against COVID-19.
- Muhammad Abdul-Mageed, AbdelRahim Elmadany, Dinesh Pabbi, Kunal Verma, and Rannie Lin. 2020. Mega-COV: A Billion-Scale Dataset of 65 Languages For COVID-19. arXiv preprint arXiv:2005.06012.Google Scholar
- Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. Unsupervised Statistical Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3632–3642.Google ScholarCross Ref
- Akari Asai, Jungo Kasai, Jonathan H Clark, Kenton Lee, Eunsol Choi, and Hannaneh Hajishirzi. 2020. XOR QA: Cross-lingual open-retrieval question answering. arXiv preprint arXiv:2010.11856(2020).Google Scholar
- Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015.Google Scholar
- Christopher JC Burges. 2013. Towards the machine comprehension of text: An essay. TechReport: MSR-TR-2013-125(2013).Google Scholar
- Chen Chen, Lisong Qiu, Zhenxin Fu, Junfei Liu, and Rui Yan. 2019. Multilingual Dialogue Generation with Shared-Private Memory. In NLPCC’19. 42–54.Google Scholar
- Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1870–1879.Google ScholarCross Ref
- Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018. Word translation without parallel data. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.Google Scholar
- Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In ACL-IJCNLP’15. 1723–1732.Google Scholar
- Andre Esteva, Anuprit Kale, Romain Paulus, Kazuma Hashimoto, Wenpeng Yin, Dragomir Radev, and Richard Socher. 2020. Co-search: Covid-19 information retrieval with semantic search, question answering, and abstractive summarization. arXiv preprint arXiv:2006.09595(2020).Google Scholar
- Anthony Ferritto, Sara Rosenthal, Mihaela Bornea, Kazi Hasan, Rishav Chakravarti, Salim Roukos, Radu Florian, and Avirup Sil. 2020. A Multilingual Reading Comprehension System for more than 100 Languages. In Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations. 41–47.Google Scholar
- Zhenxin Fu, Yu Wu, Hailei Zhang, Yichuan Hu, Dongyan Zhao, and Rui Yan. 2020. Be Aware of the Hot Zone: A Warning System of Hazard Area Prediction to Intervene Novel Coronavirus COVID-19 Outbreak. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2241–2250.Google ScholarDigital Library
- Yanchao Hao, Yuanzhe Zhang, Kang Liu, Shizhu He, Zhanyi Liu, Hua Wu, and Jun Zhao. 2017. An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 221–231.Google ScholarCross Ref
- Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learning for machine translation. In Advances in neural information processing systems. 820–828.Google Scholar
- Karl Moritz Hermann, Tomáš Kočiskỳ, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1. 1693–1701.Google Scholar
- Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daumé III. 2014. A neural network for factoid question answering over paragraphs. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 633–644.Google ScholarCross Ref
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).Google Scholar
- Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. 2017. RACE: Large-scale ReAding Comprehension Dataset From Examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 785–794.Google ScholarCross Ref
- Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Unsupervised Machine Translation Using Monolingual Corpora Only. In International Conference on Learning Representations.Google Scholar
- Jinhyuk Lee, Sean S Yi, Minbyul Jeong, Mujeen Sung, Wonjin Yoon, Yonghwa Choi, Miyoung Ko, and Jaewoo Kang. 2020. Answering questions on covid-19 in real-time. arXiv preprint arXiv:2006.15830(2020).Google Scholar
- Juntao Li, Chang Liu, Jian Wang, Lidong Bing, Hongsong Li, Xiaozhong Liu, Dongyan Zhao, and Rui Yan. 2020. Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8212–8219.Google ScholarCross Ref
- Chia-Wei Liu, Ryan Lowe, Iulian Vlad Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2122–2132.Google ScholarCross Ref
- Jiahua Liu, Yankai Lin, Zhiyuan Liu, and Maosong Sun. 2019. XQA: A Cross-lingual Open-domain Question Answering Dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2358–2368.Google ScholarCross Ref
- Timo Möller, Anthony Reina, Raghavan Jayakumar, and Lawrence Livermore. 2020. COVID-QA: A Question & Answering Dataset for COVID-19. (2020).Google Scholar
- Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. In CoCo@ NIPS.Google Scholar
- David Oniani and Yanshan Wang. 2020. A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19. arXiv preprint arXiv:2006.10964(2020).Google Scholar
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311–318.Google Scholar
- Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How Multilingual is Multilingual BERT?. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4996–5001.Google ScholarCross Ref
- Boyu Qiu, Xu Chen, Jungang Xu, and Yingfei Sun. 2019. A survey on neural machine reading comprehension. arXiv preprint arXiv:1906.03824(2019).Google Scholar
- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383–2392.Google ScholarCross Ref
- Kirk Roberts, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, Kyle Lo, Ian Soboroff, Ellen Voorhees, Lucy Lu Wang, and William R Hersh. 2020. TREC-COVID: Rationale and Structure of an Information Retrieval Shared Task for COVID-19. Journal of the American Medical Informatics Association (2020).Google Scholar
- Muhammad Saad, Muhammad Hassan, and Fareed Zaffar. 2020. Towards Characterizing the COVID-19 Awareness on Twitter. arXiv preprint arXiv:2005.08379(2020).Google Scholar
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 86–96.Google ScholarCross Ref
- Hao Sha, Mohammad Al Hasan, George Mohler, and P Jeffrey Brantingham. 2020. Dynamic topic modeling of the COVID-19 Twitter narrative among US governors and cabinet executives. arXiv preprint arXiv:2004.11692(2020).Google Scholar
- Karishma Sharma, Sungyong Seo, Chuizheng Meng, Sirisha Rambhatla, and Yan Liu. 2020. COVID-19 on Social Media: Analyzing Misinformation in Twitter Conversations. arXiv preprint arXiv:2003.12309(2020).Google Scholar
- Dan Su, Yan Xu, Tiezheng Yu, Farhad Bin Siddique, Elham J Barezi, and Pascale Fung. 2020. CAiRE-COVID: A Question Answering and Multi-Document Summarization System for COVID-19 Research. arXiv preprint arXiv:2005.03975(2020).Google Scholar
- Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems.Google Scholar
- Raphael Tang, Rodrigo Nogueira, Edwin Zhang, Nikhil Gupta, Phuong Cam, Kyunghyun Cho, and Jimmy Lin. 2020. Rapidly Bootstrapping a Question Answering Dataset for COVID-19. arXiv preprint arXiv:2004.11339(2020).Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.Google Scholar
- Akhila Sri Manasa Venigalla, Dheeraj Vagavolu, and Sridhar Chimalakonda. 2020. Mood of India During Covid-19–An Interactive Web Portal Based on Emotion Analysis of Twitter Data. arXiv preprint arXiv:2005.02955(2020).Google Scholar
- Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol, and Léon Bottou. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.Journal of machine learning research 11, 12 (2010).Google Scholar
- Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Darrin Eide, Kathryn Funk, Rodney Kinney, Ziyang Liu, William Merrill, 2020. CORD-19: The Covid-19 Open Research Dataset. arXiv preprint arXiv:2004.10706(2020).Google Scholar
- Xuan Wang, Weili Liu, Aabhas Chauhan, Yingjun Guan, and Jiawei Han. 2020. Automatic Textual Evidence Mining in COVID-19 Literature. arXiv preprint arXiv:2004.12563(2020).Google Scholar
- Jerry Wei, Chengyu Huang, Soroush Vosoughi, and Jason Wei. 2020. What Are People Asking About COVID-19? A Question Classification Dataset. arXiv preprint arXiv:2005.12522(2020).Google Scholar
- Rui Yan. 2018. “Chitty-Chitty-Chat Bot”: Deep Learning for Conversational AI. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 5520–5526.Google ScholarCross Ref
- Rui Yan, Yiping Song, and Hua Wu. 2016. Learning to respond with deep neural networks for retrieval-based human-computer conversation system. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 55–64.Google ScholarDigital Library
- Kai-Cheng Yang, Christopher Torres-Lugo, and Filippo Menczer. 2020. Prevalence of low-credibility information on twitter during the covid-19 outbreak. arXiv preprint arXiv:2004.14484(2020).Google Scholar
- Edwin Zhang, Nikhil Gupta, Rodrigo Nogueira, Kyunghyun Cho, and Jimmy Lin. 2020. Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned. arXiv preprint arXiv:2004.05125(2020).Google Scholar
- Yuan Zhang, Xiaoqing Zhang, Yichuan Hu, Guanchun Wang, and Rui Yan. 2021. WULAI-QA: Web Understanding and Learning with AI towards Document-based Question Answering against COVID-19. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining.Google ScholarDigital Library
- Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, and Enhong Chen. 2018. Joint training for neural machine translation models with monolingual data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarCross Ref
- Multilingual COVID-QA: Learning towards Global Information Sharing via Web Question Answering in Multiple Languages
Recommendations
Bilingual Question Answering Using CINDI_QA at QA@CLEF 2007
Advances in Multilingual and Multimodal Information RetrievalThis article presents the first participation of the CINDI group in the Multiple Language Question Answering Cross Language Evaluation Forum (QA@CLEF). We participated in a track using French as source language and English as target language. CINDI_QA ...
Probabilistic models for answer-ranking in multilingual question-answering
This article presents two probabilistic models for answering ranking in the multilingual question-answering (QA) task, which finds exact answers to a natural language question written in different languages. Although some probabilistic methods have been ...
Priberam's question answering system in QA@CLEF 2008
CLEF'08: Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information accessThis paper describes the changes implemented in Priberam's question answering (QA) system, followed by the discussion of the results obtained in Portuguese and Spanish monolingual runs at QA@CLEF 2008. We enhanced the syntactic analysis of the question ...
Comments