Skip to main content
Log in

Usr-mtl: an unsupervised sentence representation learning framework with multi-task learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Developing the utilized intelligent systems is increasingly important to learn effective text representations, especially extract the sentence features. Numerous previous studies have been concentrated on the task of sentence representation learning based on deep learning approaches. However, the present approaches are mostly proposed with the single task or replied on the labeled corpus when learning the embedding of the sentences. In this paper, we assess the factors in learning sentence representation and propose an efficient unsupervised learning framework with multi-task learning (USR-MTL), in which various text learning tasks are merged into the unitized framework. With the syntactic and semantic features of sentences, three different factors to some extent are reflected in the task of the sentence representation learning that is the wording, or the ordering of the neighbored sentences of a target sentence in other words. Hence, we integrate the word-order learning task, word prediction task, and the sentence-order learning task into the proposed framework to attain meaningful sentence embeddings. Here, the process of sentence embedding learning is reformulated as a multi-task learning framework of the sentence-level task and the two word-level tasks. Moreover, the proposed framework is motivated by an unsupervised learning algorithm utilizing the unlabeled corpus. Based on the experimental results, our approach achieves the state-of-the-art performances on the downstream natural language processing tasks compared to the popular unsupervised representation learning techniques. The experiments on representation visualization and task analysis demonstrate the effectiveness of the tasks in the proposed framework in creating reasonable sentence representations proving the capacity of the proposed unsupervised multi-task framework for the sentence representation learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.shuangyin.li/usrmtl/

References

  1. Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data, Empirical Methods in Natural Language Processing, 670–680

  2. Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems, pp 3294–3302

  3. Gan Z, Pu Y, Henao R, Li C, He X, Carin L (2017) Learning generic sentence representations using convolutional neural networks. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

  4. Logeswaran L, Lee H (2018) An efficient framework for learning sentence representations, international conference on learning representations

  5. Cho K, Van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation, Empirical Methods in Natural Language Processing, 1724–1734

  6. Hill F, Cho K, Korhonen A, Bengio Y (2016) Learning to understand phrases by embedding the dictionary. Transactions of the Association for Computational Linguistics 4:17–30

    Article  Google Scholar 

  7. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740–755

  8. Hill F, Cho K, Korhonen A (2016) Learning distributed representations of sentences from unlabelled data, North American chapter of the Association for Computational Linguistics, 1367–1377

  9. Sileo D, Van De Cruys T, Pradel C, Muller P (June 2019) Mining discourse markers for unsupervised sentence representation learning Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). https://www.aclweb.org/anthology/N19-1351,. Association for Computational Linguistics, Minneapolis, Minnesota, pp 3477–3486

  10. Subramanian S, Trischler A, Bengio Y, Pal C (2018) Learning general purpose distributed sentence representations via large scale multi-task learning, international conference on learning representations

  11. Liu T, Yu S, Xu B, Yin H (2018) Recurrent networks with attention and convolutional networks for sentence representation and classification. Appl Intell 48(10):3797–3806

    Article  Google Scholar 

  12. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196

  13. Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Advances in neural information processing systems , pp 3079–3087

  14. Zhang M, Li W, Li W, Wu Y (2018) Learning universal sentence representations with mean-max attention autoencoder, Empirical Methods in Natural Language Processing, 4514–4523

  15. Xie J, Li Y, Sun Q, Lin Y (2019) Enhancing sentence embedding with dynamic interaction. Appl Intell 49(9):3283–3292

    Article  Google Scholar 

  16. Li S, Zhang Y, Pan R, Mo K (2020) Adaptive probabilistic word embedding. In: Proceedings of The Web Conference 2020, WWW ’20, Association for Computing Machinery, New York, NY, USA, 651–661. https://doi.org/10.1145/3366423.3380147

  17. Li S, Zhang Y, Pan R (2020) Bi-directional recurrent attentional topic model. ACM Trans. Knowl. Discov. Data 14:6. https://doi.org/10.1145/3412371

    Google Scholar 

  18. Li S, Zhang Y, Pan R, Mao M, Yang Y (2017) Recurrent attentional topic model. In: Association for the Advancement of Artificial Intelligence(AAAI), 31th AAAI Conference on Artificial Intelligence (AAAI-17), pp 3223–3229

  19. Zhang Y, Yang Y, Li T, Fujita H (2019) A multitask multiview clustering algorithm in heterogeneous situations based on lle and le. Knowl-Based Syst 163:776–786

    Article  Google Scholar 

  20. Wang H, Yang Y, Liu B, Fujita H (2019) A study of graph-based system for multi-view clustering. Knowl-Based Syst 163: 1009–1019

    Article  Google Scholar 

  21. Yang Q, Zhang Y, Dai W, Pan SJ (2020) Transfer learning, Cambridge University Press

  22. Xiao Q, Dai J, Luo J, Fujita H (2019) Multi-view manifold regularized learning-based method for prioritizing candidate disease mirnas. Knowl-Based Syst 175:118–129

    Article  Google Scholar 

  23. Xu Q, Pan SJ, Xue HH, Yang Q (2010) Multitask learning for protein subcellular location prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8(3):748–759

    Google Scholar 

  24. Cer D, Yang Y, Kong S-, Hua N, Limtiaco N, John RS, Constant N, Guajardo-Cespedes M, Yuan S, Tar C et al (2018) Universal sentence encoder for english. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp 169–174

  25. Liu X, He P, Chen W, Gao J (2019) Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

  26. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  27. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems , pp 5998–6008

  28. Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org, pp 1243–1252

  29. Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems, pp 41–48

  30. Sener O, Koltun V (2018) Multi-task learning as multi-objective optimization. In: Advances in Neural Information Processing Systems, pp 527–538

  31. Désidéri J-A (2012) Multiple-gradient descent algorithm (mgda) for multiobjective optimization. Comptes Rendus Mathematique 350(5-6):313–318

    Article  MathSciNet  Google Scholar 

  32. Zhu Y, Kiros R, Zemel R, Salakhutdinov R, Urtasun R, Torralba A, Fidler S (2015) Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE international conference on computer vision, pp 19–27

  33. Pang B, Lee L (2005) Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Association for Computational Linguistics, pp 115–124

  34. Hu M, Liu B (2004) Mining and summarizing customer reviews, 168–177

  35. Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, meeting of the association for computational linguistics, 271–278

  36. Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. language resources and evaluation 39:165–210

    Article  Google Scholar 

  37. Voorhees EM (2003) Overview of the trec 2003 question answering track., 54–68

  38. Dolan B, Quirk C, Brockett C (2004) Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources, 350

  39. Marelli M, Menini S, Baroni M, Bentivogli L, Bernardi R, Zamparelli R (2014) A sick cure for the evaluation of compositional distributional semantic models, 216–223

  40. Agirre E, Banea C, Cardie C, Cer DM, Diab MT, Gonzalezagirre A, Guo W, Mihalcea R, Rigau G, Wiebe J (2014) Semeval-2014 task 10: Multilingual semantic textual similarity, 81–91

  41. Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp 427–431

  42. Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  43. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196

  44. Hill F, Cho K, Korhonen A (2016) Learning distributed representations of sentences from unlabelled data. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Lvinguistics, pp 1367–1377

  45. Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1724–1734

  46. Karpathy A, Feifei L (2015) Deep visual-semantic alignments for generating image descriptions, computer vision and pattern recognition, 3128–3137

  47. Klein B, Lev G, Sadeh G, Wolf L (2015) Associating neural word embeddings with deep image representations using fisher vectors, 4437–4446

  48. Mao J, Xu W, Yang Y, Wang J, Huang Z, Yuille AL (2015) Deep captioning with multimodal recurrent neural networks (m-rnn), international conference on learning representations

  49. Vendrov I, Kiros R, Fidler S, Urtasun R (2016) Order-embeddings of images and language, international conference on learning representations

  50. Conneau A, Kiela D (2018) Senteval: An evaluation toolkit for universal sentence representations., language resources and evaluation

  51. Zhang M, Wu Y, Li W, Li W (2018) Learning universal sentence representations with mean-max attention autoencoder. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4514–4523

  52. Han L, Kashyap AL, Finin T, Mayfield J, Weese J (2013) Umbc-ebiquity-core: Semantic textual similarity systems, 1, 44–52

  53. Der Maaten LV, Hinton GE (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China(No. 62006083) and Basic and Applied Basic Research Fund of Guangdong Province(No. 2019B1515120085).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuangyin Li.

Ethics declarations

Conflict of interests

We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, W., Li, S. & Lu, Y. Usr-mtl: an unsupervised sentence representation learning framework with multi-task learning. Appl Intell 51, 3506–3521 (2021). https://doi.org/10.1007/s10489-020-02042-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-02042-2

Keywords

Navigation