skip to main content
10.1145/3366423.3380060acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Distant Supervision for Multi-Stage Fine-Tuning in Retrieval-Based Question Answering

Published: 20 April 2020 Publication History

Abstract

We tackle the problem of question answering directly on a large document collection, combining simple “bag of words” passage retrieval with a BERT-based reader for extracting answer spans. In the context of this architecture, we present a data augmentation technique using distant supervision to automatically annotate paragraphs as either positive or negative examples to supplement existing training data, which are then used together to fine-tune BERT. We explore a number of details that are critical to achieving high accuracy in this setup: the proper sequencing of different datasets during fine-tuning, the balance between “difficult” vs. “easy” examples, and different approaches to gathering negative examples. Experimental results show that, with the appropriate settings, we can achieve large gains in effectiveness on two English and two Chinese QA datasets. We are able to achieve results at or near the state of the art without any modeling advances, which once again affirms the cliché “there’s no data like more data”.

References

[1]
Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, and Caiming Xiong. 2019. Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering. arXiv:1911.10470 (2019).
[2]
Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268 (2016).
[3]
Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Jason Weston. 2015. Large-scale Simple Question Answering with Memory Networks. arXiv:1506.02075 (2015).
[4]
Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada, 1870–1879.
[5]
Yiming Cui, Ting Liu, Li Xiao, Zhipeng Chen, Wentao Ma, Wanxiang Che, Shijin Wang, and Guoping Hu. 2018. A Span-Extraction Dataset for Chinese Machine Reading Comprehension. arXiv:1810.07366 (2018).
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota, 4171–4186.
[7]
Yair Feldman and Ran El-Yaniv. 2019. Multi-Hop Paragraph Retrieval for Open-Domain Question Answering. arXiv:1906.06606 (2019).
[8]
Minghao Hu, Yuxing Peng, Zhen Huang, and Dongsheng Li. 2019. Retrieve, Read, Rerank: Towards End-to-End Multi-Document Reading Comprehension. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, 2285–2295.
[9]
Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. 2017. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada, 1601–1611.
[10]
Bernhard Kratzwald, Anna Eigenmann, and Stefan Feuerriegel. 2019. RankQA: Neural Question Answering with Answer Re-Ranking. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, 6076–6085.
[11]
Bernhard Kratzwald and Stefan Feuerriegel. 2018. Adaptive Document Retrieval for Deep Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium, 576–581.
[12]
Jinhyuk Lee, Seongjun Yun, Hyunjae Kim, Miyoung Ko, and Jaewoo Kang. 2018. Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium, 565–569.
[13]
Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. 2019. Latent Retrieval for Weakly Supervised Open Domain Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, 6086–6096.
[14]
Yankai Lin, Haozhe Ji, Zhiyuan Liu, and Maosong Sun. 2018. Denoising Distantly Supervised Open-Domain Question Answering. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia, 1736–1745.
[15]
Sewon Min, Victor Zhong, Richard Socher, and Caiming Xiong. 2018. Efficient and Robust Question Answering from Minimal Context over Documents. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia, 1725–1735.
[16]
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana, 2227–2237.
[17]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-training. Technical Report.
[18]
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas, 2383–2392.
[19]
Ellen Riloff. 1996. Automatically Generating Extraction Patterns from Untagged Text. In Proceedings of the Thirteenth National Conference on Artificial Intelligence and Eighth Innovative Applications of Artificial Intelligence Conference, Volume 2. Portland, Oregon, 1044–1049.
[20]
Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, and Hannaneh Hajishirzi. 2019. Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, 4430–4441.
[21]
Chih Chieh Shao, Trois Liu, Yuting Lai, Yiying Tseng, and Sam Tsai. 2018. DRCD: A Chinese Machine Reading Comprehension Dataset. arXiv:1806.00920 (2018).
[22]
Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2019. ERNIE 2.0: A Continual Pre-training Framework for Language Understanding. arXiv:1907.12412 (2019).
[23]
Ellen M. Voorhees and Dawn M. Tice. 1999. The TREC-8 Question Answering Track Evaluation. In Proceedings of the Eighth Text REtrieval Conference (TREC-8). Gaithersburg, Maryland, 83–106.
[24]
Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang, Gerald Tesauro, Bowen Zhou, and Jing Jiang. 2017. R3: Reinforced Reader-Ranker for Open-Domain Question Answering. arXiv:1709.00023 (2017).
[25]
Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, and Murray Campbell. 2018. Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering. arXiv:1711.05116 (2018).
[26]
Zhiguo Wang, Patrick Ng, Xiaofei Ma, Ramesh Nallapati, and Bing Xiang. 2019. Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering. arXiv:1908.08167 (2019).
[27]
Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the Use of Lucene for Information Retrieval Research. In Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017). Tokyo, Japan, 1253–1256.
[28]
Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible Ranking Baselines Using Lucene. Journal of Data and Information Quality 10, 4 (2018), Article 16.
[29]
Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, and Jimmy Lin. 2019. End-to-End Open-Domain Question Answering with BERTserini. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Minneapolis, Minnesota, 72–77.
[30]
Wei Yang, Yuqing Xie, Luchen Tan, Kun Xiong, Ming Li, and Jimmy Lin. 2019. Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering. arXiv:1904.06652 (2019).
[31]
Yi Yang, Wen-tau Yih, and Christopher Meek. 2015. WikiQA: A Challenge Dataset for Open-Domain Question Answering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal, 2013–2018.
[32]
Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, and Peter Clark. 2013. Answer Extraction as Sequence Tagging with Tree Edit Distance. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Atlanta, Georgia, 858–867.
[33]
David Yarowsky. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Cambridge, Massachusetts, 189–196.

Cited By

View all
  • (2024)Does Noise Really Matter? Investigation into the Influence of Noisy Labels on BERT-Based Question Answering SystemInternational Journal of Semantic Computing10.1142/S1793351X2441004618:01(77-96)Online publication date: 30-Jan-2024
  • (2024)The power and potentials of Flexible Query Answering SystemsData & Knowledge Engineering10.1016/j.datak.2023.102246149:COnline publication date: 1-Jan-2024
  • (2023)Toward Best Practices for Training Multilingual Dense Retrieval ModelsACM Transactions on Information Systems10.1145/361344742:2(1-33)Online publication date: 27-Sep-2023
  • Show More Cited By

Index Terms

  1. Distant Supervision for Multi-Stage Fine-Tuning in Retrieval-Based Question Answering
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          WWW '20: Proceedings of The Web Conference 2020
          April 2020
          3143 pages
          ISBN:9781450370233
          DOI:10.1145/3366423
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 20 April 2020

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. BERT
          2. data augmentation
          3. reranking

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WWW '20
          Sponsor:
          WWW '20: The Web Conference 2020
          April 20 - 24, 2020
          Taipei, Taiwan

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)21
          • Downloads (Last 6 weeks)2
          Reflects downloads up to 28 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Does Noise Really Matter? Investigation into the Influence of Noisy Labels on BERT-Based Question Answering SystemInternational Journal of Semantic Computing10.1142/S1793351X2441004618:01(77-96)Online publication date: 30-Jan-2024
          • (2024)The power and potentials of Flexible Query Answering SystemsData & Knowledge Engineering10.1016/j.datak.2023.102246149:COnline publication date: 1-Jan-2024
          • (2023)Toward Best Practices for Training Multilingual Dense Retrieval ModelsACM Transactions on Information Systems10.1145/361344742:2(1-33)Online publication date: 27-Sep-2023
          • (2023)Does Noise Really Matter? Investigation into the Influence of Noisy Labels on Bert-Based Question Answering System2023 IEEE 17th International Conference on Semantic Computing (ICSC)10.1109/ICSC56153.2023.00012(33-40)Online publication date: Feb-2023
          • (2022)Efficient Open Domain Question Answering With Delayed Attention in Transformer-Based ModelsInternational Journal of Data Warehousing and Mining10.4018/IJDWM.29800518:2(1-16)Online publication date: 1-Apr-2022
          • (2022)A question answering model for electrical equipment standards based on subject object attention2022 8th International Conference on Systems and Informatics (ICSAI)10.1109/ICSAI57119.2022.10005504(1-8)Online publication date: 10-Dec-2022
          • (2022)Another Look at DPR: Reproduction of Training and Replication of RetrievalAdvances in Information Retrieval10.1007/978-3-030-99736-6_41(613-626)Online publication date: 5-Apr-2022
          • (2021)Towards a Toolbox for Mining QA-pairs and QAT-triplets from Conversational Data of Public Chats2021 29th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT52173.2021.9435511(94-101)Online publication date: 12-May-2021
          • (2021)The Weak Supervision Approach for Question Answering over Text Using Triplets Recovering with QA-Based Rankers16th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2021)10.1007/978-3-030-87869-6_16(167-177)Online publication date: 23-Sep-2021

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media