Abstract
Multi-Hop Question Answering (MHQA) is a significant area in question answering, requiring multiple reasoning components, including document retrieval, supporting sentence prediction, and answer span extraction. In this work, we present the first application of label smoothing to the MHQA task, aiming to enhance generalization capabilities in MHQA systems while mitigating overfitting of answer spans and reasoning paths in the training set. We introduce a novel label smoothing technique, F1 Smoothing, which incorporates uncertainty into the learning process and is specifically tailored for Machine Reading Comprehension (MRC) tasks. Moreover, we employ a Linear Decay Label Smoothing Algorithm (LDLA) in conjunction with curriculum learning to progressively reduce uncertainty throughout the training process. Experiment on the HotpotQA dataset confirms the effectiveness of our approach in improving generalization and achieving significant improvements, leading to new state-of-the-art performance on the HotpotQA leaderboard.
Y. Wang—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Danyluk, A.P., Bottou, L., Littman, M.L. (eds.) Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14–18, 2009. ACM International Conference Proceeding Series, vol. 382, pp. 41–48. ACM (2009). https://doi.org/10.1145/1553374.1553380
Chorowski, J., Jaitly, N.: Towards better decoding and language model integration in sequence to sequence models. In: INTERSPEECH (2017)
Clark, K., Luong, M., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020). https://openreview.net/forum?id=r1xMH1BtvB
Fang, Y., Sun, S., Gan, Z., Pillai, R., Wang, S., Liu, J.: Hierarchical graph network for multi-hop question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8823–8838. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.710
Gao, Y., Wang, W., Herold, C., Yang, Z., Ney, H.: Towards a better understanding of label smoothing in neural machine translation. In: Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pp. 212–223. Association for Computational Linguistics, Suzhou, China (2020). https://aclanthology.org/2020.aacl-main.25
Graça, M., Kim, Y., Schamper, J., Khadivi, S., Ney, H.: Generalizing back-translation in neural machine translation. In: Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers), pp. 45–52. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/W19-5205, https://aclanthology.org/W19-5205
Groeneveld, D., Khot, T., Mausam, Sabharwal, A.: A simple yet strong pipeline for HotpotQA. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8839–8845. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.711, https://aclanthology.org/2020.emnlp-main.711
He, P., Gao, J., Chen, W.: Debertav 3: improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. ArXiv preprint abs/ arXiv: 2111.09543 (2021)
Kočiský, T., et al.: The NarrativeQA reading comprehension challenge. Trans. Assoc. Comput. Linguist. 6, 317–328 (2018)
Li, R., Wang, L., Wang, S., Jiang, Z.: Asynchronous multi-grained graph network for interpretable multi-hop reading comprehension. In: IJCA, pp. 3857–3863 (2021)
Li, X.Y., Lei, W.J., Yang, Y.B.: From easy to hard: two-stage selector and reader for multi-hop question answering. ArXiv preprint abs/ arXiv: 2205.11729 (2022)
Liu, Y., et al.: Roberta: A robustly optimized bert pretraining approach. ArXiv preprint abs/ arXiv: 1907.11692 (2019)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Lukasik, M., Bhojanapalli, S., Menon, A.K., Kumar, S.: Does label smoothing mitigate label noise? In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 6448–6458. PMLR (2020). https://proceedings.mlr.press/v119/lukasik20a.html
Lukasik, M., Jain, H., Menon, A., Kim, S., Bhojanapalli, S., Yu, F., Kumar, S.: Semantic label smoothing for sequence to sequence problems. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4992–4998. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.405
Müller, R., Kornblith, S., Hinton, G.E.: When does label smoothing help? In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8–14 December 2019, Vancouver, BC, Canada, pp. 4696–4705 (2019). https://proceedings.neurips.cc/paper/2019/hash/f1748d6b0fd9d439f71450117eba2725-Abstract.html
Nishida, K., et al.: Answering while summarizing: Multi-task learning for multi-hop QA with evidence extraction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. pp. 2335–2345. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1225, https://aclanthology.org/P19-1225
Penha, G., Hauff, C.: Weakly supervised label smoothing. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12657, pp. 334–341. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_33
Pereyra, G., Tucker, G., Chorowski, J., Kaiser, Ł., Hinton, G.: Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548 (2017)
Qiu, L., et al.: Dynamically fused graph network for multi-hop reasoning. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6140–6150. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1617, https://aclanthology.org/P19-1617
Saha, S., Das, S., Srihari, R.: Similarity based label smoothing for dialogue generation. ArXiv preprint abs/ arXiv: 2107.11481 (2021
Shao, N., Cui, Y., Liu, T., Wang, S., Hu, G.: Is Graph structure necessary for multi-hop question answering? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7187–7192. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.583, https://aclanthology.org/2020.emnlp-main.583
Su, L., Guo, J., Fan, Y., Lan, Y., Cheng, X.: Label distribution augmented maximum likelihood estimation for reading comprehension. In: Caverlee, J., Hu, X.B., Lalmas, M., Wang, W. (eds.) WSDM 2020: The Thirteenth ACM International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020, pp. 564–572. ACM (2020). https://doi.org/10.1145/3336191.3371835
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June, 2016, pp. 2818–2826. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.308
Tu, M., Huang, K., Wang, G., Huang, J., He, X., Zhou, B.: Select, answer and explain: interpretable multi-hop reading comprehension over multiple documents (2020)
Welbl, J., Stenetorp, P., Riedel, S.: Constructing datasets for multi-hop reading comprehension across documents. Trans. Asso. Comput. Linguist. 6, 287–302 (2018)
Wolf, T., et al.: Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-demos.6, https://aclanthology.org/2020.emnlp-demos.6
Wu, B., Zhang, Z., Zhao, H.: Graph-free multi-hop reading comprehension: a select-to-guide strategy. ArXiv preprint abs/ arXiv: 2107.11823 (2021)
Xu, Y., Xu, Y., Qian, Q., Li, H., Jin, R.: Towards understanding label smoothing. ArXiv preprint abs/ arXiv: 2006.11653 (2020)
Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W., Salakhutdinov, R., Manning, C.D.: HotpotQA: A dataset for diverse, explainable multi-hop question answering. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2369–2380. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/D18-1259, https://aclanthology.org/D18-1259
Zhao, Z., Wu, S., Yang, M., Chen, K., Zhao, T.: Robust machine reading comprehension by learning soft labels. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 2754–2759. International Committee on Computational Linguistics, Barcelona, Spain (Online) (2020). https://doi.org/10.18653/v1/2020.coling-main.248, https://aclanthology.org/2020.coling-main.248
Acknowledgement
We would like to express our heartfelt thanks to the students and teachers of Fudan Natural Language Processing Lab. Their thoughtful suggestions, viewpoints, and enlightening discussions have made significant contributions to this work. We also greatly appreciate the strong support from Huawei Poisson Lab for our work, and their invaluable advice. We are sincerely grateful to the anonymous reviewers and the domain chairs, whose constructive feedback played a crucial role in enhancing the quality of our research. This work was supported by the National Key Research and Development Program of China (No.2022CSJGG0801), National Natural Science Foundation of China (No.62022027) and CAAI-Huawei MindSpore Open Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
7 Appendix A
7 Appendix A
In order to alleviate the complexity introduced by multiple for loops in the F1 Smoothing method, we have optimized Eq. (12) and Eq. (13). We use \(L_a=e^{*}-s^{*}+1\) and \(L_p=e-s+1\) to denote respectively the length of gold answer and predicted answer.
If \(t < s^{*}\), the distribution is
else if \(s^{*} \le t \le e^{*}\), we have the following distribution
In Eq. 17 and 18, \(L_p=e-i+1\).
We can get \(q_e(t|x)\) similarly. If \(t > e^{*}\),
else if \(s^{*} \le t \le e^{*}\),
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yin, Z. et al. (2023). Rethinking Label Smoothing on Multi-Hop Question Answering. In: Sun, M., et al. Chinese Computational Linguistics. CCL 2023. Lecture Notes in Computer Science(), vol 14232. Springer, Singapore. https://doi.org/10.1007/978-981-99-6207-5_5
Download citation
DOI: https://doi.org/10.1007/978-981-99-6207-5_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6206-8
Online ISBN: 978-981-99-6207-5
eBook Packages: Computer ScienceComputer Science (R0)