research-article

Distant Supervision for Multi-Stage Fine-Tuning in Retrieval-Based Question Answering

Authors:

Nicholas Jing Yuan,

Jimmy LinAuthors Info & Claims

WWW '20: Proceedings of The Web Conference 2020

Pages 2934 - 2940

https://doi.org/10.1145/3366423.3380060

Published: 20 April 2020 Publication History

Abstract

We tackle the problem of question answering directly on a large document collection, combining simple “bag of words” passage retrieval with a BERT-based reader for extracting answer spans. In the context of this architecture, we present a data augmentation technique using distant supervision to automatically annotate paragraphs as either positive or negative examples to supplement existing training data, which are then used together to fine-tune BERT. We explore a number of details that are critical to achieving high accuracy in this setup: the proper sequencing of different datasets during fine-tuning, the balance between “difficult” vs. “easy” examples, and different approaches to gathering negative examples. Experimental results show that, with the appropriate settings, we can achieve large gains in effectiveness on two English and two Chinese QA datasets. We are able to achieve results at or near the state of the art without any modeling advances, which once again affirms the cliché “there’s no data like more data”.

References

[1]

Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, and Caiming Xiong. 2019. Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering. arXiv:1911.10470 (2019).

[2]

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268 (2016).

[3]

Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Jason Weston. 2015. Large-scale Simple Question Answering with Memory Networks. arXiv:1506.02075 (2015).

[4]

Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada, 1870–1879.

[5]

Yiming Cui, Ting Liu, Li Xiao, Zhipeng Chen, Wentao Ma, Wanxiang Che, Shijin Wang, and Guoping Hu. 2018. A Span-Extraction Dataset for Chinese Machine Reading Comprehension. arXiv:1810.07366 (2018).

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota, 4171–4186.

[7]

Yair Feldman and Ran El-Yaniv. 2019. Multi-Hop Paragraph Retrieval for Open-Domain Question Answering. arXiv:1906.06606 (2019).

[8]

Minghao Hu, Yuxing Peng, Zhen Huang, and Dongsheng Li. 2019. Retrieve, Read, Rerank: Towards End-to-End Multi-Document Reading Comprehension. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, 2285–2295.

[9]

Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. 2017. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada, 1601–1611.

[10]

Bernhard Kratzwald, Anna Eigenmann, and Stefan Feuerriegel. 2019. RankQA: Neural Question Answering with Answer Re-Ranking. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, 6076–6085.

[11]

Bernhard Kratzwald and Stefan Feuerriegel. 2018. Adaptive Document Retrieval for Deep Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium, 576–581.

[12]

Jinhyuk Lee, Seongjun Yun, Hyunjae Kim, Miyoung Ko, and Jaewoo Kang. 2018. Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium, 565–569.

[13]

Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. 2019. Latent Retrieval for Weakly Supervised Open Domain Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, 6086–6096.

[14]

Yankai Lin, Haozhe Ji, Zhiyuan Liu, and Maosong Sun. 2018. Denoising Distantly Supervised Open-Domain Question Answering. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia, 1736–1745.

[15]

Sewon Min, Victor Zhong, Richard Socher, and Caiming Xiong. 2018. Efficient and Robust Question Answering from Minimal Context over Documents. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia, 1725–1735.

[16]

Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana, 2227–2237.

[17]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-training. Technical Report.

[18]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas, 2383–2392.

[19]

Ellen Riloff. 1996. Automatically Generating Extraction Patterns from Untagged Text. In Proceedings of the Thirteenth National Conference on Artificial Intelligence and Eighth Innovative Applications of Artificial Intelligence Conference, Volume 2. Portland, Oregon, 1044–1049.

[20]

Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, and Hannaneh Hajishirzi. 2019. Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, 4430–4441.

[21]

Chih Chieh Shao, Trois Liu, Yuting Lai, Yiying Tseng, and Sam Tsai. 2018. DRCD: A Chinese Machine Reading Comprehension Dataset. arXiv:1806.00920 (2018).

[22]

Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2019. ERNIE 2.0: A Continual Pre-training Framework for Language Understanding. arXiv:1907.12412 (2019).

[23]

Ellen M. Voorhees and Dawn M. Tice. 1999. The TREC-8 Question Answering Track Evaluation. In Proceedings of the Eighth Text REtrieval Conference (TREC-8). Gaithersburg, Maryland, 83–106.

[24]

Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang, Gerald Tesauro, Bowen Zhou, and Jing Jiang. 2017. R3: Reinforced Reader-Ranker for Open-Domain Question Answering. arXiv:1709.00023 (2017).

[25]

Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, and Murray Campbell. 2018. Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering. arXiv:1711.05116 (2018).

[26]

Zhiguo Wang, Patrick Ng, Xiaofei Ma, Ramesh Nallapati, and Bing Xiang. 2019. Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering. arXiv:1908.08167 (2019).

[27]

Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the Use of Lucene for Information Retrieval Research. In Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017). Tokyo, Japan, 1253–1256.

Digital Library

[28]

Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible Ranking Baselines Using Lucene. Journal of Data and Information Quality 10, 4 (2018), Article 16.

Digital Library

[29]

Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, and Jimmy Lin. 2019. End-to-End Open-Domain Question Answering with BERTserini. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Minneapolis, Minnesota, 72–77.

[30]

Wei Yang, Yuqing Xie, Luchen Tan, Kun Xiong, Ming Li, and Jimmy Lin. 2019. Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering. arXiv:1904.06652 (2019).

[31]

Yi Yang, Wen-tau Yih, and Christopher Meek. 2015. WikiQA: A Challenge Dataset for Open-Domain Question Answering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal, 2013–2018.

[32]

Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, and Peter Clark. 2013. Answer Extraction as Sequence Tagging with Tree Edit Distance. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Atlanta, Georgia, 858–867.

[33]

David Yarowsky. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Cambridge, Massachusetts, 189–196.

Digital Library

Cited By

Alexandrov DZakharova AButakov N(2024)Does Noise Really Matter? Investigation into the Influence of Noisy Labels on BERT-Based Question Answering SystemInternational Journal of Semantic Computing10.1142/S1793351X2441004618:01(77-96)Online publication date: 30-Jan-2024
https://doi.org/10.1142/S1793351X24410046
Andreasen TBordogna GTré GKacprzyk JLarsen HZadrożny S(2024)The power and potentials of Flexible Query Answering SystemsData & Knowledge Engineering10.1016/j.datak.2023.102246149:COnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.datak.2023.102246
Zhang XOgueji KMa XLin J(2023)Toward Best Practices for Training Multilingual Dense Retrieval ModelsACM Transactions on Information Systems10.1145/361344742:2(1-33)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3613447
Show More Cited By

Index Terms

Distant Supervision for Multi-Stage Fine-Tuning in Retrieval-Based Question Answering

Index terms have been assigned to the content through auto-classification.

Recommendations

Distant Supervision in BERT-based Adhoc Document Retrieval
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Recently introduced pre-trained contextualized autoregressive models like BERT have shown improvements in document retrieval tasks. One of the major limitations of the current approaches can be attributed to the manner they deal with variable-size ...
Entity Extraction from Portuguese Legal Documents Using Distant Supervision
Computational Processing of the Portuguese Language
Abstract
Most approaches to role-filler entity extraction (REE) rely on large labeled training corpora in which entity mentions are directly annotated in the input document. In this work, we leverage an existing knowledge base (KB) of entities to perform ... $_{}$
Boosting Medical Image Segmentation with Partial Class Supervision
Biometric Recognition
Abstract
Medical image data are often limited due to expensive acquisition and annotation processes. Directly using such limited annotated samples can easily lead to the deep learning models overfitting on the training dataset. An alternative way is to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '20: Proceedings of The Web Conference 2020

April 2020

3143 pages

ISBN:9781450370233

DOI:10.1145/3366423

Editors:
Yennun Huang
Acadmica sinica, Taiwan
,
Irwin King
The Chinese University of Hong Kong, Hong Kong
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '20

Sponsor:

SIGWEB

WWW '20: The Web Conference 2020

April 20 - 24, 2020

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
645
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Alexandrov DZakharova AButakov N(2024)Does Noise Really Matter? Investigation into the Influence of Noisy Labels on BERT-Based Question Answering SystemInternational Journal of Semantic Computing10.1142/S1793351X2441004618:01(77-96)Online publication date: 30-Jan-2024
https://doi.org/10.1142/S1793351X24410046
Andreasen TBordogna GTré GKacprzyk JLarsen HZadrożny S(2024)The power and potentials of Flexible Query Answering SystemsData & Knowledge Engineering10.1016/j.datak.2023.102246149:COnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.datak.2023.102246
Zhang XOgueji KMa XLin J(2023)Toward Best Practices for Training Multilingual Dense Retrieval ModelsACM Transactions on Information Systems10.1145/361344742:2(1-33)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3613447
Alexandrov DZakharova AButakov N(2023)Does Noise Really Matter? Investigation into the Influence of Noisy Labels on Bert-Based Question Answering System2023 IEEE 17th International Conference on Semantic Computing (ICSC)10.1109/ICSC56153.2023.00012(33-40)Online publication date: Feb-2023
https://doi.org/10.1109/ICSC56153.2023.00012
Siblini WChallal MPasqual C(2022)Efficient Open Domain Question Answering With Delayed Attention in Transformer-Based ModelsInternational Journal of Data Warehousing and Mining10.4018/IJDWM.29800518:2(1-16)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.4018/IJDWM.298005
Huayong LHui DJunhua HJiajun LChen LBowen Z(2022)A question answering model for electrical equipment standards based on subject object attention2022 8th International Conference on Systems and Informatics (ICSAI)10.1109/ICSAI57119.2022.10005504(1-8)Online publication date: 10-Dec-2022
https://doi.org/10.1109/ICSAI57119.2022.10005504
Ma XSun KPradeep RLi MLin J(2022)Another Look at DPR: Reproduction of Training and Replication of RetrievalAdvances in Information Retrieval10.1007/978-3-030-99736-6_41(613-626)Online publication date: 5-Apr-2022
https://doi.org/10.1007/978-3-030-99736-6_41
Egorov AAlexandrov DButakov N(2021)Towards a Toolbox for Mining QA-pairs and QAT-triplets from Conversational Data of Public Chats2021 29th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT52173.2021.9435511(94-101)Online publication date: 12-May-2021
https://doi.org/10.23919/FRUCT52173.2021.9435511
Alexandrov DButakov NSokhin T(2021)The Weak Supervision Approach for Question Answering over Text Using Triplets Recovering with QA-Based Rankers16th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2021)10.1007/978-3-030-87869-6_16(167-177)Online publication date: 23-Sep-2021
https://doi.org/10.1007/978-3-030-87869-6_16

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten