Abstract
Semantic sentence matching concerns predicting the relationship between a pair of natural language sentences. Recently, many methods based on interaction structure have been proposed, usually involving encoder, matching, and aggregation parts. Although some of them obtain impressive results, the simple encoder training from scratch cannot extract the global features of sentences effectively, and the transmission of information in the stacked network will cause certain loss. In this paper, we propose a Densely-connected Inference-Attention network (DCIA) to maximize the use of the feature from each layer of the network by dense connection mechanism and to get robust encoder by self-supervised learning (SSL) based on contrastive method, which can maximize the mutual information between global features and local features of input data. We have conducted experiments on Quora, MRPC, and SICK dataset, the experimental results show that our method owns competitive results on these dataset where we drive 89.13%, 78.1% and 87.7% accuracies respectively. In addition, the accuracy of DCIA with SSL will surpass the one of DCIA without SSL by about 2%.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: EMNLP (2015)
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “Siamese” time delay neural network. In: NIPS (1993)
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: EMNLP (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
Dolan, W., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: IWP@IJCNLP (2005)
Gong, Y., Luo, H., Zhang, J.: Natural language inference over interaction space. In: ICLR (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: ICLR (2019)
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)
Jernite, Y., Bowman, S.R., Sontag, D.: Discourse-based objectives for fast unsupervised sentence representation learning. Computation and Language. arXiv:1705.00557 (2017)
Kim, S., Kang, I., Kwak, N.: Semantic sentence matching with densely-connected recurrent and co-attentive information. In: AAAI (2018)
Kiros, R., et al.: Skip-thought vectors. In: NIPS (2015)
Klein, T., Nabi, M.: Contrastive self-supervised learning for commonsense reasoning. In: ACL (2020)
Lan, W., Xu, W.: Neural network models for paraphrase identification, semantic textual similarity, natural language inference, and question answering. In: COLING (2018)
Liu, X., et al.: Self-supervised learning: Generative or contrastive. arXiv preprint arXiv:2006.08218 (2020)
Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., Zamparelli, R.: A sick cure for the evaluation of compositional distributional semantic models. In: LREC (2014)
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: EMNLP (2014)
Qian, C., Zhu, X., Ling, Z.H., Si, W., Inkpen, D.: Enhanced LSTM for natural language inference. In: ACL (2017)
Radford, A., Jozefowicz, R., Sutskever, I., A, B.: Learning to generate reviews and discovering sentiment. Machine Learning. arXiv:1704.01444 (2018)
Ravanelli, M., et al.: Multi-task self-supervised learning for robust speech recognition. In: ICASSP (2020)
Siddhant, A., et al.: Leveraging monolingual data with self-supervision for multilingual neural machine translation. In: ACL (2020)
Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: NIPS (2015)
Subramanian, S., Trischler, A., Bengio, Y., Pal, C.J.: Learning general purpose distributed sentence representations via large scale multi-task learning. In: ICLR (2018)
Tomar, G.S., Duque, T., Täckström, O., Uszkoreit, J., Das, D.: Neural paraphrase identification of questions with noisy pretraining. In: SWCN@EMNLP (2017)
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Wang, H., et al.: Self-supervised learning for contextualized extractive summarization. In: ACL (2019)
Wang, S., Jiang, J.: A compare-aggregate model for matching text sequences. In: ICLR (2016)
Wang, Z., Hamza, W., Florian, R.: Bilateral multi-perspective matching for natural language sentences. In: IJCAI (2017)
Yang, R., Zhang, J., Gao, X., Ji, F., Chen, H.: Simple and effective text matching with richer alignment features. In: ACL (2019)
Zhou, X., et al.: Multi-turn response selection for chatbots with deep attention matching network. In: ACL (2018)
Acknowledgements
This paper is supported by National Key Research and Development Program of China under grant No. 2018YFB0204403, No. 2017YFB1401202 and No. 2018YFB1003500. Corresponding author is Jianzong Wang from Ping An Technology (Shenzhen) Co., Ltd.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Yu, F., Wang, J., Tao, D., Cheng, N., Xiao, J. (2021). Self-supervised Learning for Semantic Sentence Matching with Dense Transformer Inference Network. In: U, L.H., Spaniol, M., Sakurai, Y., Chen, J. (eds) Web and Big Data. APWeb-WAIM 2021. Lecture Notes in Computer Science(), vol 12858. Springer, Cham. https://doi.org/10.1007/978-3-030-85896-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-85896-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85895-7
Online ISBN: 978-3-030-85896-4
eBook Packages: Computer ScienceComputer Science (R0)