skip to main content
10.1145/3686490.3686492acmotherconferencesArticle/Chapter ViewAbstractPublication PagesspmlConference Proceedingsconference-collections
research-article

An Optimization Method TSDAE -based for Unsupervised Sentence Embedding Learning

Published: 11 October 2024 Publication History

Abstract

As for the research on learning sentence embeddedness, most of the research methods involved at present are designed for labeled data in the general field. The evaluation is also basically carried out on a single task Semantic Textual Similarity (STS). However, many tasks in specific fields are involved in the real society, which presents certain challenges compared with general fields. For example, there are most unlabeled data sets and certain professional knowledge. The learning methods in general fields are difficult to adapt to specific fields or tasks. To solve the problem of unlabeled data sets, specialized knowledge in specific fields or tasks, we propose an optimization method TSDAE-based for unsupervised sentence embedding learning(AOM), to optimize and improve the existing unsupervised sentence embedding learning based on pre-trained transformers and sequential denoising auto encoder (TSDAE)[9]. We firmly believe that improving the training difficulty of noise reduction autoencoders in unsupervised sentence representation learning is the key to achieve further breakthroughs in semantic text similarity (STS) tasks. Based on this idea, this study innovates the existing technology. Our research results show that in STS (Semantic Text similarity) downstream task testing, our proposed improvement strategy only needs one-tenth of the corpus data of the original method to exceed the performance of the original model.

References

[1]
Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics. https://aclanthology.org/2021.emnlp-main.552
[2]
Yehoshua, B. H., & Rudolf, C. . (1953). Semantic information. British Journal for the Philosophy of Science(14), 147-157.
[3]
Wang, H., Li, Y., Huang, Z., Dou, Y., Kong, L., & Shao, J. . (2022). Sncse: contrastive learning for unsupervised sentence embedding with soft negative samples. https://doi.org/10.48550/arXiv.2201.05979
[4]
Goldberg, Y., & Levy, O. . (2014). Word2vec explained: deriving mikolov 's negative-sampling word-embedding method. arXiv. https://doi.org/10.48550/arXiv.1402.3722
[5]
 Lee, S., Jin, X., & Kim, W. . (2016). Sentiment classification for unlabeled dataset using doc2vec with jst. ACM.( August 2016).https://doi.org/10.1145/2971603.2971631
[6]
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics. https://doi.org/10.48550/arXiv.1810.04805
[7]
Carneiro, G., Chan, A. B., Moreno, P. J., & Vasconcelos, N. . (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis & Machine Intelligence, 29(3), 394-410. Supervised Learning of Semantic Classes for Image Annotation and Retrieval | IEEE Journals & Magazine | IEEE Xplore
[8]
Barlow, H. B. . (1989). Unsupervised learning. Neural Computation, 1(3), 295-311. Unsupervised Learning | MIT Press Journals & Magazine | IEEE Xplore
[9]
Wang, K., Reimers, N., & Gurevych, I. . (2021). TSDAE: using transformer-based sequential denoising auto-encoder for unsupervised sentence embedding learning. https://doi.org/10.48550/arXiv.2104.06979
[10]
Lamsiyah, S., Mahdaouy, A. E., Ouatik, S. E. A., & Espinasse, B. . (2023). Unsupervised extractive multi-document summarization method based on transfer learning from bert multi-task fine-tuning:. Journal of Information Science, 49(1), 164-182. https://doi.org/10.1177/0165551521990616
[11]
Rosenthal, S., Ritter, A., Nakov, P., & Stoyanov, V. . (2014). Semeval-2014 task 9: sentiment analysis in twitter.In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)
[12]
Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, and Lucia Specia. 2017. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation. https://doi.org/10.48550/arXiv.1708.00055
[13]
Ranasinghe, T., Orasan, C., & Mitkov, R. . (2019). Semantic Textual Similarity with Siamese Neural Networks. Recent Advances in Natural Language Processing 2019.
[14]
Neal Khosla, and Vignesh Venkataraman. "Learning Sentence Vector Representations to Summarize 006 Yelp Reviews.".https://api.semanticscholar.org/CorpusID:16309514
[15]
Whitelaw, Casey, N. Garg, and S. Argamon . "Using appraisal groups for sentiment analysis." Acm International Conference on Information & Knowledge Management ACM, 2005:625.https://doi.org/10.1145/1099554.1099714
[16]
Bernhard Schölkopf; John Platt; Thomas Hofmann, "Learning with Hypergraphs: Clustering, Classification, and Embedding," in Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference , MIT Press, 2007, pp.1601-1608.https://www.researchgate.net/publication/22162045
[17]
 Li, L., Li, J., Xu, Y., Zhu, H., & Zhang, X. . (2023). Enhancing code summarization with graph embedding and pre-trained model. International Journal of Software Engineering and Knowledge Engineering, 33(11n12), 1765-1786. https://doi.org/10.1142/S0218194023410024
[18]
Hu, H., Richardson, K., Xu, L., Li, L., Kübler, S., & Moss, L.S. (2020). OCNLI: Original Chinese Natural Language Inference. ArXiv, abs/2010.05444.https://api.semanticscholar.org/CorpusID:222291723
[19]
Locke, W. N., & Booth, A. D. . (1956). Machine translation. Journal of the Iee, 2(2), 109-116. https://api.semanticscholar.org/CorpusID:244005239
[20]
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., & Bordes, A. (2017). Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. ArXiv, abs/1705.02364.https://api.semanticscholar.org/CorpusID:28971531
[21]
Liao D .Sentence Embeddings using Supervised Contrastive Learning[J]. 2021.
[22]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics. https://aclanthology.org/D19-1410
[23]
Dorado, A., & Izquierdo, E. . An approach for supervised semantic annotation. https://doi.org/10.1142/9789812704337_0022
[24]
Ding, Y., & Xu, L. . (2018). Learning Sentence Embeddings Based on Weighted Contexts from Unlabelled Data. 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS). IEEE.
[25]
Xue, B., Fu, C., & Shaobin, Z. . (2014). A study on sentiment computing and classification of sina weibo with word2vec. IEEE.
[26]
Jain, V. . (2020). Gloveinit at semeval-2020 task 1: using glove vector initialization for unsupervised lexical semantic change detection. https://doi.org/10.48550/arXiv.2007.05618
[27]
Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R.S., Urtasun, R., Torralba, A., & Fidler, S. (2015). Skip-Thought Vectors. Neural Information Processing Systems. https://doi.org/10.48550/arXiv.1506.06726
[28]
John Giorgi, Osvald Nitski, Bo Wang, and Gary Bader. 2021. DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 879–895, Online. Association for Computational Linguistics. https://aclanthology.org/2021.acl-long.72
[29]
Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. On the Sentence Embeddings from Pre-trained Language Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9119–9130, Online. Association for Computational Linguistics.https://aclanthology.org/2020.emnlp-main.733

Index Terms

  1. An Optimization Method TSDAE -based for Unsupervised Sentence Embedding Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SPML '24: Proceedings of the 2024 7th International Conference on Signal Processing and Machine Learning
    July 2024
    353 pages
    ISBN:9798400717192
    DOI:10.1145/3686490
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Learning Sentence Embeddings
    2. Sequential Denoising Auto-Encoder
    3. Unsupervised Sentence Embedding Learning

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SPML 2024

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 22
      Total Downloads
    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media