Skip to main content

An Unsupervised Sentence Embedding Method by Maximizing the Mutual Information of Augmented Text Representations

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2022 (ICANN 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13530))

Included in the following conference series:

Abstract

For natural language processing tasks with unlabeled or partially labeled datasets, it is vital to learn sentence representations in an unsupervised manner. However, unsupervised methods pale by comparison to supervised ones on many tasks. Recently, some unsupervised methods propose to learn sentence representations by maximizing the mutual information between text representations of different levels, such as global MI maximization: global and global representations, local MI maximization: local and global representations. Among these methods, local MI maximization encourages the global representations to capture useful information that shared across the local contexts. Despite this advantage, this method suffers from the inherent gap of semantic information contained in the global representations and the local representations. Consequently, the performance is inferior to models using global MI maximization as well as supervised ones. In this paper, we propose an unsupervised sentence embedding method by maximizing the mutual information of augmented text representations. Experimental results show that our model achieves an average of 73.36% Spearman’s correlation on a series of semantic text similarity tasks, a 7 points improvement compared to the previous best model using local MI maximization. Furthermore, our model outperforms models using global MI maximization and close the gap to supervised methods to 1.5 points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A.: Semeval-2012 task 6: a pilot on semantic textual similarity. In: * SEM 2012: The First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pp. 385–393 (2012)

    Google Scholar 

  2. Belghazi, M.I., et al.: Mutual information neural estimation. In: International Conference on Machine Learning, pp. 531–540. PMLR (2018)

    Google Scholar 

  3. Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 (2015)

  4. Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: Semeval-2017 task 1: semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055 (2017)

  5. Cer, D., et al.: Universal sentence encoder for English. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 169–174 (2018)

    Google Scholar 

  6. Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Superv.sied learning of universal sentence representations from natural language inference data. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 670–680 (2017)

    Google Scholar 

  7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (1) (2019)

    Google Scholar 

  8. Hill, F., Cho, K., Korhonen, A.: Learning distributed representations of sentences from unlabelled data. In: HLT-NAACL (2016)

    Google Scholar 

  9. Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: International Conference on Learning Representations (2018)

    Google Scholar 

  10. Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015)

    Google Scholar 

  11. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196. PMLR (2014)

    Google Scholar 

  12. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A., Bottou, L.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(12) (2010)

    Google Scholar 

  13. Li, B., Zhou, H., He, J., Wang, M., Yang, Y., Li, L.: On the sentence embeddings from pre-trained language models. arXiv preprint arXiv:2011.05864 (2020)

  14. Logeswaran, L., Lee, H.: An efficient framework for learning sentence representations. In: International Conference on Learning Representations (2018)

    Google Scholar 

  15. Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., Zamparelli, R., et al.: A sick cure for the evaluation of compositional distributional semantic models. In: Lrec, pp. 216–223. Reykjavik (2014)

    Google Scholar 

  16. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  17. Nowozin, S., Cseke, B., Tomioka, R.: f-gan: training generative neural samplers using variational divergence minimization. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 271–279 (2016)

    Google Scholar 

  18. Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

  19. Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992 (2019)

    Google Scholar 

  20. Williams, A., Nangia, N., Bowman, S.R.: A broad-coverage challenge corpus for sentence understanding through inference. In: 2018 Conference of the North American Chapter of the Association for Computational Lingusitics: Human Language Technologies, NAACL HLT 2018, pp. 1112–1122. Association for Computational Lingu.sitics (ACL) (2018)

    Google Scholar 

  21. Yan, Y., Li, R., Wang, S., Zhang, F., Wu, W., Xu, W.: Consert: a contrastive framework for self-supervised sentence representation transfer. arXiv preprint arXiv:2105.11741 (2021)

  22. Zhang, Y., He, R., Liu, Z., Lim, K.H., Bing, L.: An unsupervisied sentence embedding method by mutual information maximization. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1601–1610 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianye Sheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sheng, T., Wang, L., He, Z., Sun, M., Jiang, G. (2022). An Unsupervised Sentence Embedding Method by Maximizing the Mutual Information of Augmented Text Representations. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13530. Springer, Cham. https://doi.org/10.1007/978-3-031-15931-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15931-2_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15930-5

  • Online ISBN: 978-3-031-15931-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics