Abstract
This study investigates the task of Multi-span Question Answering (MSQA). Currently, the MSQA task is primarily modeled as a sequence tagging problem, predicting whether each word is a part of an answer. However, this approach independently predicts words without fully utilizing a comprehensive understanding of the complexities in MSQA. In this paper, we propose a novel model, Contrastive Span Selector. Our model utilizes a multi-head biaffine attention mechanism to generate the span representations and employs a CNN block for span-wise interaction. Additionally, we incorporate the question and a global token into the encoding process, projecting all vectors into a shared representation space. To train our model, we employ contrastive learning with a dynamic threshold to control the similarity boundary between answer spans and non-answer spans. Our model outperforms the tagger model by 6.32 in F1 score for exact match on the MultiSpanQA multi-span setting and 5.69 on the expand setting, establishing it as the state-of-the-art model for MSQA. The code is available at: https://github.com/phzh24/Contrastive-Span-Selector.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dasigi, P., Liu, N.F., Marasović, A., Smith, N.A., Gardner, M.: Quoref: a reading comprehension dataset with questions requiring coreferential reasoning. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5925–5932. Association for Computational Linguistics, Hong Kong (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: North American Chapter of the Association for Computational Linguistics (NAACL), pp. 4171–4186 (2019)
Dozat, T., Manning, C.D.: Deep biaffine attention for neural dependency parsing. In: International Conference on Learning Representations (ICLR) (2017)
Dua, D., Wang, Y., Dasigi, P., Stanovsky, G., Singh, S., Gardner, M.: DROP: a reading comprehension benchmark requiring discrete reasoning over paragraphs. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 2368–2378. Association for Computational Linguistics, Minneapolis (2019)
Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016)
Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Li, H., Tomko, M., Vasardani, M., Baldwin, T.: Multispanqa: a dataset for multi-span question answering. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1250–1260 (2022)
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019)
Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for SQuAD. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne (2018)
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392. Association for Computational Linguistics, Austin (2016)
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Third Workshop on Very Large Corpora (1995)
Sang, E.F.T.K.: Transforming a chunker to a parser. In: Computational Linguistics in the Netherlands 2000, pp. 177–188. Brill (2001)
Segal, E., Efrat, A., Shoham, M., Globerson, A., Berant, J.: A simple and effective model for answering multi-span questions. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3074–3080 (2020)
Yan, H., Sun, Y., Li, X., Qiu, X.: An embarrassingly easy but strong baseline for nested named entity recognition. arXiv preprint arXiv:2208.04534 (2022)
Yu, J., Bohnet, B., Poesio, M.: Named entity recognition as dependency parsing. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6470–6476 (2020)
Zhang, S., Cheng, H., Gao, J., Poon, H.: optimizing bi-encoder for named entity recognition via contrastive learning. In: ICLR 2023 poster (2022)
Acknowledgments
The authors would like to thank the three anonymous reviewers for their comments on this paper. This work is supported by the National Key Research and Development Program of China (No.2020YFC0833300).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, P., Xiong, G., Zhao, W. (2024). CSS: Contrastive Span Selector for Multi-span Question Answering. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14325. Springer, Singapore. https://doi.org/10.1007/978-981-99-7019-3_22
Download citation
DOI: https://doi.org/10.1007/978-981-99-7019-3_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7018-6
Online ISBN: 978-981-99-7019-3
eBook Packages: Computer ScienceComputer Science (R0)