skip to main content
10.1145/3532213.3532262acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccaiConference Proceedingsconference-collections
research-article

Lightweight Text Matching Method with Rich Features

Published: 13 July 2022 Publication History

Abstract

Text matching is one of the research hotspots in Natural Language Processing (NLP). The study of text matching is of great practical importance for applications such as text de-duplication, web retrieval, and question answering systems. A lightweight text matching method with rich features is proposed for the problem of large number of model parameters and low efficiency of text matching tasks in natural language processing. The whole model architecture is based on Siamese neural networks with shared parameters. Furthermore, the method utilizes an improved residual Network and attention mechanism for the extraction and alignment of vector representations. Only three key features for alignment operations are retained. In addition, an averaging operation is added to the fusion layer to provide vector representations with rich information for the prediction layer. Experimental results on the paraphrase identification dataset and two natural language inference datasets show that the proposed approach not only effectively reduces the number of parameters compared with existing models but also ensures good text matching performance. Experiments demonstrate that this method can be used in general text matching tasks.

References

[1]
Chen J, Choi E, Durrett G. 2021. Can NLI Models Verify QA Systems' Predictions? [J]. arXiv preprint arXiv:2104.08731.
[2]
Khot T, Sabharwal A, Clark P. 2018. Scitail: A textual entailment dataset from science question answering [C]//Thirty-Second AAAI Conference on Artificial Intelligence.
[3]
Wang Z, Hamza W, Florian R. 2017. Bilateral multi-perspective matching for natural language sentences [J]. arXiv preprint arXiv:1702.03814.
[4]
Rajpurkar P, Zhang J, Lopyrev K, 2016. Squad: 100,000+ questions for machine comprehension of text [J]. arXiv preprint arXiv:1606.05250.
[5]
Rao J, He H, Lin J. 2017. Experiments with convolutional neural network models for answer selection [C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1217-1220.
[6]
Chen Q, Zhu X, Ling Z, 2016. Enhanced lstm for natural language inference [J]. arXiv preprint arXiv:1609.06038.
[7]
Tan C, Wei F, Wang W, 2018. Multiway Attention Networks for Modeling Sentence Pairs [C]//IJCAI. 4411-4417.
[8]
Jiang Z, Zhang Y, Yang Z, 2021. Alignment Rationale for Natural Language Inference [C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 5372-5387.
[9]
Tay Y, Tuan L A, Hui S C. 2017. Compare, compress and propagate: Enhancing neural architectures with alignment factorization for natural language inference [J]. arXiv preprint arXiv:1801.00102.
[10]
Tay Y, Tuan L A, Hui S C. 2018. Co-stack residual affinity networks with multi-level attention refinement for matching text sequences [J]. arXiv preprint arXiv:1810.02938.
[11]
Kim S, Kang I, Kwak N. 2019. Semantic sentence matching with densely-connected recurrent and co-attentive information [C]//Proceedings of the AAAI conference on artificial intelligence. 33(01): 6586-6593.
[12]
Liu X, Duh K, Gao J. 2018. Stochastic answer networks for natural language inference [J]. arXiv preprint arXiv:1804.07888.
[13]
Devlin J, Chang M W, Lee K, 2018. Bert: Pre-training of deep bidirectional transformers for language understanding [J]. arXiv preprint arXiv:1810.04805.
[14]
Peters M E, Neumann M, Iyyer M, 2018. Deep contextualized word representations [J]. arXiv preprint arXiv:1802.05365.
[15]
Brown T B, Mann B, Ryder N, 2020. Language models are few-shot learners [J]. arXiv preprint arXiv:2005.14165.
[16]
Topal M O, Bas A, van Heerden I. 2021. Exploring transformers in natural language generation: Gpt, bert, and xlnet [J]. arXiv preprint arXiv:2102.08036.
[17]
Parikh A P, Täckström O, Das D, 2016. A decomposable attention model for natural language inference [J]. arXiv preprint arXiv:1606.01933.
[18]
Gong Y, Luo H, Zhang J. 2017. Natural language inference over interaction space [J]. arXiv preprint arXiv:1709.04348.
[19]
Collobert R, Weston J, Bottou L, 2011. Natural language processing (almost) from scratch [J]. Journal of machine learning research, 12(ARTICLE): 2493-2537.
[20]
Vaswani A, Shazeer N, Parmar N, 2017. Attention is all you need [J]. Advances in neural information processing systems, 30.
[21]
Basiri M E, Nemati S, Abdar M, 2021. ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis [J]. Future Generation Computer Systems, 115: 279-294.
[22]
Mou L, Men R, Li G, 2015. Natural language inference by tree-based convolution and heuristic matching [J]. arXiv preprint arXiv:1512.08422.
[23]
Bowman S R, Angeli G, Potts C, 2015. A large annotated corpus for learning natural language inference [J]. arXiv preprint arXiv:1508.05326.
[24]
Shankar Iyer, Nikhil Dandekar, and Korn´el Csernai. 2017. First Quora Dataset Release: Question Pairs.
[25]
Paszke A, Gross S, Massa F, 2019. Pytorch: An imperative style, high-performance deep learning library [J]. Advances in neural information processing systems, 32: 8026-8037.
[26]
Bird S, Klein E, Loper E. 2009. Natural language processing with Python: analyzing text with the natural language toolkit [M]. " O'Reilly Media, Inc.".
[27]
Pennington J, Socher R, Manning C D. 2014. Glove: Global vectors for word representation [C]//Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532-1543.
[28]
He K, Zhang X, Ren S, 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification [C]//Proceedings of the IEEE international conference on computer vision. 1026-1034.
[29]
Salimans T, Kingma D P. 2016. Weight normalization: A simple reparameterization to accelerate training of deep neural networks [J]. Advances in neural information processing systems, 29: 901-909.
[30]
Hendrycks D, Gimpel K. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks [J]. arXiv preprint arXiv:1610.02136.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICCAI '22: Proceedings of the 8th International Conference on Computing and Artificial Intelligence
March 2022
809 pages
ISBN:9781450396110
DOI:10.1145/3532213
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attention mechanism
  2. average
  3. improved residual network
  4. lightweight
  5. text matching

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICCAI '22

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 44
    Total Downloads
  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media