Rethinking Label Smoothing on Multi-Hop Question Answering

Yin, Zhangyue; Wang, Yuxin; Hu, Xiannian; Wu, Yiguang; Yan, Hang; Zhang, Xinyu; Cao, Zhao; Huang, Xuanjing; Qiu, Xipeng

doi:10.1007/978-981-99-6207-5_5

Zhangyue Yin¹⁴,
Yuxin Wang¹⁴,
Xiannian Hu¹⁴,
Yiguang Wu¹⁴,
Hang Yan¹⁴,
Xinyu Zhang¹⁵,
Zhao Cao¹⁵,
Xuanjing Huang¹⁴ &
…
Xipeng Qiu¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14232))

Included in the following conference series:

China National Conference on Chinese Computational Linguistics

745 Accesses

Abstract

Multi-Hop Question Answering (MHQA) is a significant area in question answering, requiring multiple reasoning components, including document retrieval, supporting sentence prediction, and answer span extraction. In this work, we present the first application of label smoothing to the MHQA task, aiming to enhance generalization capabilities in MHQA systems while mitigating overfitting of answer spans and reasoning paths in the training set. We introduce a novel label smoothing technique, F1 Smoothing, which incorporates uncertainty into the learning process and is specifically tailored for Machine Reading Comprehension (MRC) tasks. Moreover, we employ a Linear Decay Label Smoothing Algorithm (LDLA) in conjunction with curriculum learning to progressively reduce uncertainty throughout the training process. Experiment on the HotpotQA dataset confirms the effectiveness of our approach in improving generalization and achieving significant improvements, leading to new state-of-the-art performance on the HotpotQA leaderboard.

Y. Wang—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Extractive Question Answering with Contrastive Puzzles and Reweighted Clues

Answer Agnostic Question Generation in Bangla Language

Article Open access 03 January 2024

Multi-stage transfer learning with BERTology-based language models for question answering system in vietnamese

Article 30 January 2023

References

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Danyluk, A.P., Bottou, L., Littman, M.L. (eds.) Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14–18, 2009. ACM International Conference Proceeding Series, vol. 382, pp. 41–48. ACM (2009). https://doi.org/10.1145/1553374.1553380
Chorowski, J., Jaitly, N.: Towards better decoding and language model integration in sequence to sequence models. In: INTERSPEECH (2017)
Google Scholar
Clark, K., Luong, M., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020). https://openreview.net/forum?id=r1xMH1BtvB
Fang, Y., Sun, S., Gan, Z., Pillai, R., Wang, S., Liu, J.: Hierarchical graph network for multi-hop question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8823–8838. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.710
Gao, Y., Wang, W., Herold, C., Yang, Z., Ney, H.: Towards a better understanding of label smoothing in neural machine translation. In: Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pp. 212–223. Association for Computational Linguistics, Suzhou, China (2020). https://aclanthology.org/2020.aacl-main.25
Graça, M., Kim, Y., Schamper, J., Khadivi, S., Ney, H.: Generalizing back-translation in neural machine translation. In: Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers), pp. 45–52. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/W19-5205, https://aclanthology.org/W19-5205
Groeneveld, D., Khot, T., Mausam, Sabharwal, A.: A simple yet strong pipeline for HotpotQA. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8839–8845. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.711, https://aclanthology.org/2020.emnlp-main.711
He, P., Gao, J., Chen, W.: Debertav 3: improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. ArXiv preprint abs/ arXiv: 2111.09543 (2021)
Kočiský, T., et al.: The NarrativeQA reading comprehension challenge. Trans. Assoc. Comput. Linguist. 6, 317–328 (2018)
Article Google Scholar
Li, R., Wang, L., Wang, S., Jiang, Z.: Asynchronous multi-grained graph network for interpretable multi-hop reading comprehension. In: IJCA, pp. 3857–3863 (2021)
Google Scholar
Li, X.Y., Lei, W.J., Yang, Y.B.: From easy to hard: two-stage selector and reader for multi-hop question answering. ArXiv preprint abs/ arXiv: 2205.11729 (2022)
Liu, Y., et al.: Roberta: A robustly optimized bert pretraining approach. ArXiv preprint abs/ arXiv: 1907.11692 (2019)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Lukasik, M., Bhojanapalli, S., Menon, A.K., Kumar, S.: Does label smoothing mitigate label noise? In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 6448–6458. PMLR (2020). https://proceedings.mlr.press/v119/lukasik20a.html
Lukasik, M., Jain, H., Menon, A., Kim, S., Bhojanapalli, S., Yu, F., Kumar, S.: Semantic label smoothing for sequence to sequence problems. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4992–4998. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.405
Müller, R., Kornblith, S., Hinton, G.E.: When does label smoothing help? In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8–14 December 2019, Vancouver, BC, Canada, pp. 4696–4705 (2019). https://proceedings.neurips.cc/paper/2019/hash/f1748d6b0fd9d439f71450117eba2725-Abstract.html
Nishida, K., et al.: Answering while summarizing: Multi-task learning for multi-hop QA with evidence extraction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. pp. 2335–2345. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1225, https://aclanthology.org/P19-1225
Penha, G., Hauff, C.: Weakly supervised label smoothing. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12657, pp. 334–341. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_33
Chapter Google Scholar
Pereyra, G., Tucker, G., Chorowski, J., Kaiser, Ł., Hinton, G.: Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548 (2017)
Qiu, L., et al.: Dynamically fused graph network for multi-hop reasoning. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6140–6150. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1617, https://aclanthology.org/P19-1617
Saha, S., Das, S., Srihari, R.: Similarity based label smoothing for dialogue generation. ArXiv preprint abs/ arXiv: 2107.11481 (2021
Shao, N., Cui, Y., Liu, T., Wang, S., Hu, G.: Is Graph structure necessary for multi-hop question answering? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7187–7192. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.583, https://aclanthology.org/2020.emnlp-main.583
Su, L., Guo, J., Fan, Y., Lan, Y., Cheng, X.: Label distribution augmented maximum likelihood estimation for reading comprehension. In: Caverlee, J., Hu, X.B., Lalmas, M., Wang, W. (eds.) WSDM 2020: The Thirteenth ACM International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020, pp. 564–572. ACM (2020). https://doi.org/10.1145/3336191.3371835
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June, 2016, pp. 2818–2826. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.308
Tu, M., Huang, K., Wang, G., Huang, J., He, X., Zhou, B.: Select, answer and explain: interpretable multi-hop reading comprehension over multiple documents (2020)
Google Scholar
Welbl, J., Stenetorp, P., Riedel, S.: Constructing datasets for multi-hop reading comprehension across documents. Trans. Asso. Comput. Linguist. 6, 287–302 (2018)
Google Scholar
Wolf, T., et al.: Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-demos.6, https://aclanthology.org/2020.emnlp-demos.6
Wu, B., Zhang, Z., Zhao, H.: Graph-free multi-hop reading comprehension: a select-to-guide strategy. ArXiv preprint abs/ arXiv: 2107.11823 (2021)
Xu, Y., Xu, Y., Qian, Q., Li, H., Jin, R.: Towards understanding label smoothing. ArXiv preprint abs/ arXiv: 2006.11653 (2020)
Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W., Salakhutdinov, R., Manning, C.D.: HotpotQA: A dataset for diverse, explainable multi-hop question answering. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2369–2380. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/D18-1259, https://aclanthology.org/D18-1259
Zhao, Z., Wu, S., Yang, M., Chen, K., Zhao, T.: Robust machine reading comprehension by learning soft labels. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 2754–2759. International Committee on Computational Linguistics, Barcelona, Spain (Online) (2020). https://doi.org/10.18653/v1/2020.coling-main.248, https://aclanthology.org/2020.coling-main.248

Download references

Acknowledgement

We would like to express our heartfelt thanks to the students and teachers of Fudan Natural Language Processing Lab. Their thoughtful suggestions, viewpoints, and enlightening discussions have made significant contributions to this work. We also greatly appreciate the strong support from Huawei Poisson Lab for our work, and their invaluable advice. We are sincerely grateful to the anonymous reviewers and the domain chairs, whose constructive feedback played a crucial role in enhancing the quality of our research. This work was supported by the National Key Research and Development Program of China (No.2022CSJGG0801), National Natural Science Foundation of China (No.62022027) and CAAI-Huawei MindSpore Open Fund.

Author information

Authors and Affiliations

School of Computer Science, Fudan University, Shanghai, China
Zhangyue Yin, Yuxin Wang, Xiannian Hu, Yiguang Wu, Hang Yan, Xuanjing Huang & Xipeng Qiu
Huawei Poisson Lab, Hangzhou, China
Xinyu Zhang & Zhao Cao

Authors

Zhangyue Yin
View author publications
You can also search for this author in PubMed Google Scholar
Yuxin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiannian Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yiguang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hang Yan
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Cao
View author publications
You can also search for this author in PubMed Google Scholar
Xuanjing Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xipeng Qiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xipeng Qiu .

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Maosong Sun
Harbin Institute of Technology, Harbin, China
Bing Qin
Fudan University, Shanghai, China
Xipeng Qiu
School of Computing and Information, Singapore Management University, Singapore, Singapore
Jiang Jing
Institute of Software, Chinese Academy of Sciences, Beijing, China
Xianpei Han
Beijing Language and Culture University, Beijing, China
Gaoqi Rao
Chinese Academy of Sciences, Institute of Automation, Beijing, China
Yubo Chen

7 Appendix A

In order to alleviate the complexity introduced by multiple for loops in the F1 Smoothing method, we have optimized Eq. (12) and Eq. (13). We use $L_a=e^{*}-s^{*}+1$ and $L_p=e-s+1$ to denote respectively the length of gold answer and predicted answer.

$$\begin{aligned} q_s(t|x)=\sum _{\xi =t}^{L-1} \text {F1}\left( (t,\xi ),a_{\text {gold}}\right) . \end{aligned}$$

(16)

If $t < s^{*}$, the distribution is

$$\begin{aligned} q_s(t|x)= \sum _{\xi =s^{*}}^{e^{*}} \frac{2(\xi -s^{*}+1)}{L_p+L_a} + \sum _{\xi =e^{*}+1}^{L-1} \frac{2L_a}{L_p+L_a}, \end{aligned}$$

(17)

else if $s^{*} \le t \le e^{*}$, we have the following distribution

$$\begin{aligned} q_s(t|x)=\sum _{\xi =s}^{e^{*}} \frac{2L_p}{L_p+L_a} + \sum _{\xi =e^{*}+1}^{L-1} \frac{2(e^{*}-s+1)}{L_p+L_a}. \end{aligned}$$

(18)

In Eq. 17 and 18, $L_p=e-i+1$.

We can get $q_e(t|x)$ similarly. If $t > e^{*}$,

$$\begin{aligned} q_e(t|x)= \sum _{\xi =s^{*}}^{e^{*}} \frac{2(e^{*}-\xi +1)}{L_p+L_a} + \sum _{\xi =0}^{s^{*}-1} \frac{2L_a}{L_p+L_a}, \end{aligned}$$

(19)

else if $s^{*} \le t \le e^{*}$,

$$\begin{aligned} q_e(t|x)= \sum _{\xi =s^{*}}^{e} \frac{2L_p}{L_p+L_a} + \sum _{\xi =0}^{s^{*}-1} \frac{2(e-s^{*}+1)}{L_p+L_a}. \end{aligned}$$

(20)

In Eqs. 19 and 20, $L_p=i-s+1$.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yin, Z. et al. (2023). Rethinking Label Smoothing on Multi-Hop Question Answering. In: Sun, M., et al. Chinese Computational Linguistics. CCL 2023. Lecture Notes in Computer Science(), vol 14232. Springer, Singapore. https://doi.org/10.1007/978-981-99-6207-5_5

Download citation

DOI: https://doi.org/10.1007/978-981-99-6207-5_5
Published: 20 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6206-8
Online ISBN: 978-981-99-6207-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Rethinking Label Smoothing on Multi-Hop Question Answering

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Extractive Question Answering with Contrastive Puzzles and Reweighted Clues

Answer Agnostic Question Generation in Bangla Language

Multi-stage transfer learning with BERTology-based language models for question answering system in vietnamese

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

7 Appendix A

7 Appendix A

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us