Abstract
Developing the utilized intelligent systems is increasingly important to learn effective text representations, especially extract the sentence features. Numerous previous studies have been concentrated on the task of sentence representation learning based on deep learning approaches. However, the present approaches are mostly proposed with the single task or replied on the labeled corpus when learning the embedding of the sentences. In this paper, we assess the factors in learning sentence representation and propose an efficient unsupervised learning framework with multi-task learning (USR-MTL), in which various text learning tasks are merged into the unitized framework. With the syntactic and semantic features of sentences, three different factors to some extent are reflected in the task of the sentence representation learning that is the wording, or the ordering of the neighbored sentences of a target sentence in other words. Hence, we integrate the word-order learning task, word prediction task, and the sentence-order learning task into the proposed framework to attain meaningful sentence embeddings. Here, the process of sentence embedding learning is reformulated as a multi-task learning framework of the sentence-level task and the two word-level tasks. Moreover, the proposed framework is motivated by an unsupervised learning algorithm utilizing the unlabeled corpus. Based on the experimental results, our approach achieves the state-of-the-art performances on the downstream natural language processing tasks compared to the popular unsupervised representation learning techniques. The experiments on representation visualization and task analysis demonstrate the effectiveness of the tasks in the proposed framework in creating reasonable sentence representations proving the capacity of the proposed unsupervised multi-task framework for the sentence representation learning.
Similar content being viewed by others
References
Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data, Empirical Methods in Natural Language Processing, 670–680
Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems, pp 3294–3302
Gan Z, Pu Y, Henao R, Li C, He X, Carin L (2017) Learning generic sentence representations using convolutional neural networks. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Logeswaran L, Lee H (2018) An efficient framework for learning sentence representations, international conference on learning representations
Cho K, Van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation, Empirical Methods in Natural Language Processing, 1724–1734
Hill F, Cho K, Korhonen A, Bengio Y (2016) Learning to understand phrases by embedding the dictionary. Transactions of the Association for Computational Linguistics 4:17–30
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740–755
Hill F, Cho K, Korhonen A (2016) Learning distributed representations of sentences from unlabelled data, North American chapter of the Association for Computational Linguistics, 1367–1377
Sileo D, Van De Cruys T, Pradel C, Muller P (June 2019) Mining discourse markers for unsupervised sentence representation learning Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). https://www.aclweb.org/anthology/N19-1351,. Association for Computational Linguistics, Minneapolis, Minnesota, pp 3477–3486
Subramanian S, Trischler A, Bengio Y, Pal C (2018) Learning general purpose distributed sentence representations via large scale multi-task learning, international conference on learning representations
Liu T, Yu S, Xu B, Yin H (2018) Recurrent networks with attention and convolutional networks for sentence representation and classification. Appl Intell 48(10):3797–3806
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Advances in neural information processing systems , pp 3079–3087
Zhang M, Li W, Li W, Wu Y (2018) Learning universal sentence representations with mean-max attention autoencoder, Empirical Methods in Natural Language Processing, 4514–4523
Xie J, Li Y, Sun Q, Lin Y (2019) Enhancing sentence embedding with dynamic interaction. Appl Intell 49(9):3283–3292
Li S, Zhang Y, Pan R, Mo K (2020) Adaptive probabilistic word embedding. In: Proceedings of The Web Conference 2020, WWW ’20, Association for Computing Machinery, New York, NY, USA, 651–661. https://doi.org/10.1145/3366423.3380147
Li S, Zhang Y, Pan R (2020) Bi-directional recurrent attentional topic model. ACM Trans. Knowl. Discov. Data 14:6. https://doi.org/10.1145/3412371
Li S, Zhang Y, Pan R, Mao M, Yang Y (2017) Recurrent attentional topic model. In: Association for the Advancement of Artificial Intelligence(AAAI), 31th AAAI Conference on Artificial Intelligence (AAAI-17), pp 3223–3229
Zhang Y, Yang Y, Li T, Fujita H (2019) A multitask multiview clustering algorithm in heterogeneous situations based on lle and le. Knowl-Based Syst 163:776–786
Wang H, Yang Y, Liu B, Fujita H (2019) A study of graph-based system for multi-view clustering. Knowl-Based Syst 163: 1009–1019
Yang Q, Zhang Y, Dai W, Pan SJ (2020) Transfer learning, Cambridge University Press
Xiao Q, Dai J, Luo J, Fujita H (2019) Multi-view manifold regularized learning-based method for prioritizing candidate disease mirnas. Knowl-Based Syst 175:118–129
Xu Q, Pan SJ, Xue HH, Yang Q (2010) Multitask learning for protein subcellular location prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8(3):748–759
Cer D, Yang Y, Kong S-, Hua N, Limtiaco N, John RS, Constant N, Guajardo-Cespedes M, Yuan S, Tar C et al (2018) Universal sentence encoder for english. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp 169–174
Liu X, He P, Chen W, Gao J (2019) Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems , pp 5998–6008
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org, pp 1243–1252
Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems, pp 41–48
Sener O, Koltun V (2018) Multi-task learning as multi-objective optimization. In: Advances in Neural Information Processing Systems, pp 527–538
Désidéri J-A (2012) Multiple-gradient descent algorithm (mgda) for multiobjective optimization. Comptes Rendus Mathematique 350(5-6):313–318
Zhu Y, Kiros R, Zemel R, Salakhutdinov R, Urtasun R, Torralba A, Fidler S (2015) Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE international conference on computer vision, pp 19–27
Pang B, Lee L (2005) Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Association for Computational Linguistics, pp 115–124
Hu M, Liu B (2004) Mining and summarizing customer reviews, 168–177
Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, meeting of the association for computational linguistics, 271–278
Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. language resources and evaluation 39:165–210
Voorhees EM (2003) Overview of the trec 2003 question answering track., 54–68
Dolan B, Quirk C, Brockett C (2004) Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources, 350
Marelli M, Menini S, Baroni M, Bentivogli L, Bernardi R, Zamparelli R (2014) A sick cure for the evaluation of compositional distributional semantic models, 216–223
Agirre E, Banea C, Cardie C, Cer DM, Diab MT, Gonzalezagirre A, Guo W, Mihalcea R, Rigau G, Wiebe J (2014) Semeval-2014 task 10: Multilingual semantic textual similarity, 81–91
Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp 427–431
Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
Hill F, Cho K, Korhonen A (2016) Learning distributed representations of sentences from unlabelled data. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Lvinguistics, pp 1367–1377
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1724–1734
Karpathy A, Feifei L (2015) Deep visual-semantic alignments for generating image descriptions, computer vision and pattern recognition, 3128–3137
Klein B, Lev G, Sadeh G, Wolf L (2015) Associating neural word embeddings with deep image representations using fisher vectors, 4437–4446
Mao J, Xu W, Yang Y, Wang J, Huang Z, Yuille AL (2015) Deep captioning with multimodal recurrent neural networks (m-rnn), international conference on learning representations
Vendrov I, Kiros R, Fidler S, Urtasun R (2016) Order-embeddings of images and language, international conference on learning representations
Conneau A, Kiela D (2018) Senteval: An evaluation toolkit for universal sentence representations., language resources and evaluation
Zhang M, Wu Y, Li W, Li W (2018) Learning universal sentence representations with mean-max attention autoencoder. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4514–4523
Han L, Kashyap AL, Finin T, Mayfield J, Weese J (2013) Umbc-ebiquity-core: Semantic textual similarity systems, 1, 44–52
Der Maaten LV, Hinton GE (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605
Acknowledgments
This work was supported by National Natural Science Foundation of China(No. 62006083) and Basic and Applied Basic Research Fund of Guangdong Province(No. 2019B1515120085).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xu, W., Li, S. & Lu, Y. Usr-mtl: an unsupervised sentence representation learning framework with multi-task learning. Appl Intell 51, 3506–3521 (2021). https://doi.org/10.1007/s10489-020-02042-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-02042-2