Abstract
For natural language processing tasks with unlabeled or partially labeled datasets, it is vital to learn sentence representations in an unsupervised manner. However, unsupervised methods pale by comparison to supervised ones on many tasks. Recently, some unsupervised methods propose to learn sentence representations by maximizing the mutual information between text representations of different levels, such as global MI maximization: global and global representations, local MI maximization: local and global representations. Among these methods, local MI maximization encourages the global representations to capture useful information that shared across the local contexts. Despite this advantage, this method suffers from the inherent gap of semantic information contained in the global representations and the local representations. Consequently, the performance is inferior to models using global MI maximization as well as supervised ones. In this paper, we propose an unsupervised sentence embedding method by maximizing the mutual information of augmented text representations. Experimental results show that our model achieves an average of 73.36% Spearman’s correlation on a series of semantic text similarity tasks, a 7 points improvement compared to the previous best model using local MI maximization. Furthermore, our model outperforms models using global MI maximization and close the gap to supervised methods to 1.5 points.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A.: Semeval-2012 task 6: a pilot on semantic textual similarity. In: * SEM 2012: The First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pp. 385–393 (2012)
Belghazi, M.I., et al.: Mutual information neural estimation. In: International Conference on Machine Learning, pp. 531–540. PMLR (2018)
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 (2015)
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: Semeval-2017 task 1: semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055 (2017)
Cer, D., et al.: Universal sentence encoder for English. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 169–174 (2018)
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Superv.sied learning of universal sentence representations from natural language inference data. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 670–680 (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (1) (2019)
Hill, F., Cho, K., Korhonen, A.: Learning distributed representations of sentences from unlabelled data. In: HLT-NAACL (2016)
Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: International Conference on Learning Representations (2018)
Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196. PMLR (2014)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A., Bottou, L.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(12) (2010)
Li, B., Zhou, H., He, J., Wang, M., Yang, Y., Li, L.: On the sentence embeddings from pre-trained language models. arXiv preprint arXiv:2011.05864 (2020)
Logeswaran, L., Lee, H.: An efficient framework for learning sentence representations. In: International Conference on Learning Representations (2018)
Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., Zamparelli, R., et al.: A sick cure for the evaluation of compositional distributional semantic models. In: Lrec, pp. 216–223. Reykjavik (2014)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nowozin, S., Cseke, B., Tomioka, R.: f-gan: training generative neural samplers using variational divergence minimization. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 271–279 (2016)
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992 (2019)
Williams, A., Nangia, N., Bowman, S.R.: A broad-coverage challenge corpus for sentence understanding through inference. In: 2018 Conference of the North American Chapter of the Association for Computational Lingusitics: Human Language Technologies, NAACL HLT 2018, pp. 1112–1122. Association for Computational Lingu.sitics (ACL) (2018)
Yan, Y., Li, R., Wang, S., Zhang, F., Wu, W., Xu, W.: Consert: a contrastive framework for self-supervised sentence representation transfer. arXiv preprint arXiv:2105.11741 (2021)
Zhang, Y., He, R., Liu, Z., Lim, K.H., Bing, L.: An unsupervisied sentence embedding method by mutual information maximization. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1601–1610 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sheng, T., Wang, L., He, Z., Sun, M., Jiang, G. (2022). An Unsupervised Sentence Embedding Method by Maximizing the Mutual Information of Augmented Text Representations. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13530. Springer, Cham. https://doi.org/10.1007/978-3-031-15931-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-15931-2_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15930-5
Online ISBN: 978-3-031-15931-2
eBook Packages: Computer ScienceComputer Science (R0)