skip to main content
10.1145/3478905.3478958acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdsitConference Proceedingsconference-collections
Article

Short text similarity computation method based on feature expansion and Siamese network

Published: 28 September 2021 Publication History

Abstract

Text similarity computation issues is a widely studied problem in natural language processing (NLP). Short text similarity computation is a new and more challenging problem, which cannot be effectively solved by using previous regular text similarity computation approach. The main reason is that, a short text generally contains limited number of words and fewer features can be extracted. In this paper, we propose a short text similarity computation method based on feature expansion and Siamese neural network. Firstly, a latent Dirichlet allocation (LDA) based model is constructed to expand the features of a short text. Then, deep features are extracted by using Siamese neural networks model which contains both convolutional neural networks (CNN) and Bi-directional long short-term memory (BiLSTM). Finally, the similarity of two short texts can be achieved by computing the Manhattan distance between generated feature vectors of these two texts. Experimental results show that, based on the data set of Ant Financial NLP Challenge, our method achieves higher accuracy and F1 score.

References

[1]
Yang J, Li Y, Gao C, Measuring the short text similarity based on semantic and syntactic information. Future Generation Computer Systems, 2021, 114:169-180.
[2]
Yingyuan Xiao. Set Nearest Neighbor Query for Trajectory of Moving Objects. Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 2009, pp. 211-214.
[3]
Yan Shen, Pengqiang Ai, Yingyuan Xiao, Wenguang Zhen. A Tag-Based Personalized News Recommendation Method. 14th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, 2018, pp. 964-970.
[4]
Guang Yang, Yingyuan Xiao. A Robust Similarity Measure Method in CBIR System. 2008 Congress on Image and Signal Processing, 2008, pp.34-38.
[5]
Yingyuan Xiao, Hongya Wang, Fayu Wang, Yunsheng Liu. An Efficient Algorithm for Continuous Nearest Neighbor Queries Based on the VDTPR-tree. Journal of Computational Information Systems, 5(3): 1-6, 2007.
[6]
Xiao-Dan Zhu, Parinaz Sobhani, Hongyu Guo. 2015. Long Short-Term Memory Over Recursive Structures. ICML: 1604-1612.
[7]
Zeyuan Cui,Li Pan,Shijun Liu. Hybrid BiLSTM-Siamese Network for Relation Extraction P. Autonomous Agents and MultiAgent Systems,2019.
[8]
Thyagarajan A. Siamese Recurrent Architectures for Learning Sentence Similarity. Thirtieth Aaai Conference on Artificial Intelligence. AAAI Press, 2016.
[9]
KIM Y. Convolutional neural networks for sentence classification J. arXiv preprint arXiv: 1408. 5882,2014.
[10]
Gang Liu, Jiabao Guo. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing,2019,337.
[11]
Hu Y J, Xin J J, You C H. 2013. Chinese Short-Text Classification Based on Topic Model with High-Frequency Feature Expansion. Journal of Multimedia, 8(4):425-431.
[12]
Zhang, L, Jiang, W, Zhao, Z. Short-text feature expansion and classification based on nonnegative matrix factorization. Int J Intell Syst. 2020; 1- 15. https://doi.org/10.1002/int.22290
[13]
Ant Financial. (2020) Ant Financial Artificial Competition. https://dc.cloud.alipay.com/index#/topic/intro?id=8
[14]
Jonas Mueller, Aditya Thyagarajan. Siamese Recurrent Architectures for Learning Sentence Similarity. AAAI 2016: 2786-2792.
[15]
Yin W, H Schütze, Xiang B. 2015. ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs. Computer Science. CoRR abs/1512.05193.
[16]
Bao W, Bao W, Du J. Attentive Siamese LSTM Network for Semantic Textual Similarity Measure. 2018 International Conference on Asian Language Processing (IALP). 2018: 312-317.
[17]
Tang X, Ma Q, Zhang X. 2021. Attention Consistent Network for Remote Sensing Scene Classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, PP (99):1-1.
[18]
Lai H, Tao Y, Wang C, Bi-directional attention comparison for semantic sentence matching. Multimedia Tools and Applications, 2020, 79(4).
[19]
Pontes E L, Huet, Stéphane, Linhares, Andréa Carneiro. Predicting the Semantic Textual Similarity with Siamese CNN and LSTM. CORIA-TALN-RJC (TALN 2) 2018: 311-320

Cited By

View all
  • (2024)Automated Matchmaking of Researcher Biosketches and Funder Requests for Proposals Using Deep Neural NetworksIEEE Access10.1109/ACCESS.2024.342763112(98096-98106)Online publication date: 2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
DSIT 2021: 2021 4th International Conference on Data Science and Information Technology
July 2021
481 pages
ISBN:9781450390248
DOI:10.1145/3478905
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BiLSTM
  2. CNN
  3. LDA
  4. attention mechanism
  5. short text similarity

Qualifiers

  • Article
  • Research
  • Refereed limited

Conference

DSIT 2021

Acceptance Rates

Overall Acceptance Rate 114 of 277 submissions, 41%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automated Matchmaking of Researcher Biosketches and Funder Requests for Proposals Using Deep Neural NetworksIEEE Access10.1109/ACCESS.2024.342763112(98096-98106)Online publication date: 2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media