research-article

Label Distribution Augmented Maximum Likelihood Estimation for Reading Comprehension

Authors:

Xueqi ChengAuthors Info & Claims

WSDM '20: Proceedings of the 13th International Conference on Web Search and Data Mining

Pages 564 - 572

https://doi.org/10.1145/3336191.3371835

Published: 22 January 2020 Publication History

Abstract

Reading comprehension (RC) aims to locate a text span from a context passage to answer the given question. Despite the effectiveness of modern neural RC models, most existing work relies on maximum likelihood estimation (MLE) and ignores the structure of the output space. That is during training, one treats all the text spans do not match the ground truth as equally poor, leading to overconfident predictions on ground truth labels and reduced generalization ability in test. One way to bridge the gap between training and test is to take into account the task reward of alternative outputs using the reinforcement learning (RL) algorithms, which is often deficient in optimization as compared with MLE. In this paper, we propose a new learning criterion for the RC task which combines the merits of both MLE and RL-based methods. Specifically, we show that we are able to derive the distribution of the outputs, i.e., label distribution, using their corresponding task rewards based on the decomposition property of the RC problem. We then optimize the RC model by directly learning towards the auxiliary label distribution, instead of the ground truth label, using the MLE framework. In this way, we can make use of the structure of the output space for better generalization (as RL) via efficient optimization (as MLE). We name our approach as Label Distribution augmented MLE (LD-MLE), which is a general learning criterion that could be adopted by almost all the existing RC models. Experiments on three representative benchmark datasets demonstrate that RC models learned with the LD-MLE criterion can achieve consistently improved results over those based on the traditional MLE and RL-based criteria.

References

[1]

Danqi Chen. 2018. Neural Reading Comprehension and Beyond. Ph.D. Dissertation. Stanford University.

[2]

Christopher Clark and Matt Gardner. 2017. Simple and effective multi-paragraph reading comprehension. arXiv preprint arXiv:1710.10723 (2017).

[3]

Zihang Dai, Qizhe Xie, and Eduard Hovy. 2018. From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1672--1682.

[4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[5]

Maha Elbayad, Laurent Besacier, and Jakob Verbeek. 2018. Token-level and sequence-level loss smoothing for RNN language models. arXiv preprint arXiv:1805.05062 (2018).

[6]

Bin-Bin Gao, Chao Xing, Chen-Wei Xie, Jianxin Wu, and Xin Geng. 2017. Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, Vol. 26, 6 (2017), 2825--2838.

Digital Library

[7]

Xin Geng. 2016. Label distribution learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 7 (2016), 1734--1748.

[8]

Xin Geng, Chao Yin, and Zhi-Hua Zhou. 2013. Facial age estimation by learning from label distributions. IEEE transactions on pattern analysis and machine intelligence, Vol. 35, 10 (2013), 2401--2412.

[9]

Xiaodong He and Li Deng. 2012. Maximum expected bleu training of phrase and lexicon translation models. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 292--301.

Digital Library

[10]

Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Advances in neural information processing systems. 1693--1701.

[11]

Minghao Hu, Yuxing Peng, Zhen Huang, Xipeng Qiu, Furu Wei, and Ming Zhou. 2017. Reinforced mnemonic reader for machine reading comprehension. arXiv preprint arXiv:1705.02798 (2017).

[12]

Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, and Weizhu Chen. 2017. Fusionnet: Fusing via fully-aware attention with application to machine comprehension. arXiv preprint arXiv:1711.07341 (2017).

[13]

Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. 2017. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1601--1611.

[14]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014).

[15]

Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. 1008--1014.

[16]

Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. 2017. RACE: Large-scale ReAding Comprehension Dataset From Examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 785--794.

[17]

Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, Michel Galley, and Jianfeng Gao. 2016. Deep Reinforcement Learning for Dialogue Generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1192--1202.

[18]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out (2004).

[19]

Jiahua Liu, Wan Wei, Maosong Sun, Hao Chen, Yantao Du, and Dekang Lin. 2018b. A Multi-answer Multi-task Framework for Real-world Machine Reading Comprehension. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2109--2118.

[20]

Xiaodong Liu, Yelong Shen, Kevin Duh, and Jianfeng Gao. 2018a. Stochastic Answer Networks for Machine Reading Comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1694--1704.

[21]

Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Advances in Neural Information Processing Systems. 6294--6305.

[22]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268 (2016).

[23]

Mohammad Norouzi, Samy Bengio, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans, et almbox. 2016. Reward augmented maximum likelihood for neural structured prediction. In Advances In Neural Information Processing Systems. 1723--1731.

[24]

Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, 160--167.

Digital Library

[25]

Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 2227--2237. https://doi.org/10.18653/v1/N18--1202

[26]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/languageunsupervised/language understanding paper. pdf (2018).

[27]

Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know What You Don't Know: Unanswerable Questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 784--789.

[28]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000

[29]

Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383--2392.

[30]

Siva Reddy, Danqi Chen, and Christopher D Manning. 2019. Coqa: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, Vol. 7 (2019), 249--266.

[31]

Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2016. Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016).

[32]

Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Minimum Risk Training for Neural Machine Translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 1683--1692.

[33]

Yelong Shen, Po-Sen Huang, Jianfeng Gao, and Weizhu Chen. 2017. Reasonet: Learning to stop reading in machine comprehension. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1047--1055.

Digital Library

[34]

Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057--1063.

[35]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.

[36]

Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. 2017. NewsQA: A Machine Comprehension Dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP. 191--200.

[37]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

[38]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018a. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018).

[39]

Shuohang Wang and Jing Jiang. 2016. Machine comprehension using match-lstm and answer pointer. arXiv preprint arXiv:1608.07905 (2016).

[40]

Wei Wang, Ming Yan, and Chen Wu. 2018b. Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1705--1714.

[41]

Wenhui Wang, Nan Yang, Furu Wei, Baobao Chang, and Ming Zhou. 2017. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 189--198.

[42]

Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1112--1122.

[43]

Caiming Xiong, Victor Zhong, and Richard Socher. 2017. Dcn

[44]

: Mixed objective and deep residual coattention for question answering. arXiv preprint arXiv:1711.00106 (2017).

[45]

Ming Yan, Jiangnan Xia, Chen Wu, Bin Bi, Zhongzhou Zhao, Ji Zhang, Luo Si, Rui Wang, Wei Wang, and Haiqing Chen. 2018. A Deep Cascade Model for Multi-Document Reading Comprehension. arXiv preprint arXiv:1811.11374 (2018).

[46]

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W Cohen, Ruslan Salakhutdinov, and Christopher D Manning. 2018. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600 (2018).

[47]

Alan Yuille and Xuming He. 2012. Probabilistic models of vision and max-margin methods. Frontiers of Electrical and Electronic Engineering, Vol. 7, 1 (2012), 94--106.

Cited By

Yin ZWang YHu XWu YYan HZhang XCao ZHuang XQiu X(2023)Rethinking Label Smoothing on Multi-Hop Question AnsweringChinese Computational Linguistics10.1007/978-981-99-6207-5_5(72-87)Online publication date: 20-Sep-2023
https://doi.org/10.1007/978-981-99-6207-5_5

Index Terms

Label Distribution Augmented Maximum Likelihood Estimation for Reading Comprehension
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Question answering

Recommendations

Simultaneous estimation based on empirical likelihood and general maximum likelihood estimation

One typical problem in simultaneous estimation of mean values is estimating means of normal distributions, however when normality or any other distribution is not specified, more robust estimation procedures are demanded. A new estimation procedure is ...
Reducing bias of the maximum likelihood estimator of shape parameter for the gamma Distribution

The gamma distribution is an important probability distribution in statistics. The maximum likelihood estimator (MLE) of its shape parameter is well known to be considerably biased, so that it has some modified versions. A new modified MLE of the shape ...
Maximum likelihood estimation of the Weibull distribution with reduced bias
Abstract
In this short note, we derive a new bias adjusted maximum likelihood estimate for the shape parameter of the Weibull distribution with complete data and type I censored data. The proposed estimate of the shape parameter is significantly less ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '20: Proceedings of the 13th International Conference on Web Search and Data Mining

January 2020

950 pages

ISBN:9781450368223

DOI:10.1145/3336191

General Chairs:
James Caverlee
Texas A&M University
,
Xia "Ben" Hu
Texas A&M University
,
Program Chairs:
Mounia Lalmas
Spotify
,
Wei Wang
University of California, Los Angeles

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 January 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program of China
Youth Innovation Promotion Association CAS
Foundation and Frontier Research Key Program of Chongqing Science and Technology Commission
Beijing Academy of Artificial Intelligence (BAAI)
National Natural Science Foundation of China (NSFC)

Conference

WSDM '20

Sponsor:

WSDM '20: The Thirteenth ACM International Conference on Web Search and Data Mining

February 3 - 7, 2020

TX, Houston, USA

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
317
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yin ZWang YHu XWu YYan HZhang XCao ZHuang XQiu X(2023)Rethinking Label Smoothing on Multi-Hop Question AnsweringChinese Computational Linguistics10.1007/978-981-99-6207-5_5(72-87)Online publication date: 20-Sep-2023
https://doi.org/10.1007/978-981-99-6207-5_5

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten