skip to main content
10.1145/3573942.3573966acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaiprConference Proceedingsconference-collections
research-article

Multi-granularity Text Semantic Matching Model Based on Knowledge Enhancement

Published: 16 May 2023 Publication History

Abstract

Text matching is a fundamental task in natural language processing. It is widely used in information retrieval, text excavation and other fields. There are many problems, such as multiple meanings of words and improper word segmentation in text. These problems lead to the contextual information of sentences and the implied semantic information not being extracted effectively. Therefore, we propose a Multi-granularity Knowledge Enhancement (MGKE) model. Firstly, we perform text embedding at character and word granularity. At the same time, the HowNet external knowledge base is used to solve the problem of multiple meanings of a word. Secondly, we introduce an attention mechanism to capture the hidden information at both character and word granularity, which enhances the semantic information of the sentences. Finally, we use pooling to extract global and critical information from the sentences to predict the semantic similarity of the sentence pairs. We do some experiments on the LCQMC and BQ datasets. The results show that the model effectively improves the accuracy of short text semantic matching.

References

[1]
Gao J, Galley M, Li L, Neural approaches to conversational AI [J]. 2019, 13(2-3): 127-298.
[2]
Yu K, Chen L, Chen B, Cognitive technology in task-oriented dialogue systems: Concepts, advances and future [J]. Chinese Journal of Computers, 2014, 37(18): 1-17.
[3]
Li H, Xu J. Semantic matching in search [J]. Foundations and Trends in Information retrieval, 2014, 7(5): 343-469.
[4]
PANG L, L.Y.Y., XU J, A review of deep text matching[J]. Chinese Journal of Computers, 2017, 40 (4): 985-1003.
[5]
Zheng Jie. The principle and practice of Chinese nature language process[M]. Beijing: Electronic Industry Press, 2017.
[6]
ThanhThuong, T, Huynh, Improved Semantic Representation and Search Techniques in a Document Retrieval System Design[J]. Journal of Advances in Information Technology, 2015, 6(3): 146-150.
[7]
Mikolov T, Chen K, Corrado G, Efficient estimation of word representations in vector space [J]. 2013.
[8]
Yang Z, Dai Z, Yang Y, Xlnet: Generalized autoregressive pretraining for language understanding [J]. Advances in neural information processing systems, 2019, 32.
[9]
Devlin J, Chang M-W, Lee K, Bert: Pre-training of deep bidirectional transformers for language understanding [J]. 2018.
[10]
Xu J, Liu J, Zhang L, Improve chinese word embeddings by exploiting internal structure[C]// Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. 2016: 1041-1050.
[11]
Liu X, Chen Q, Deng C, Lcqmc: A large-scale chinese question matching corpus[C]// Proceedings of the 27th International Conference on Computational Linguistics. 2018: 1952-1962.
[12]
Chen J, Chen Q, Liu X, The bq corpus: A large-scale domain-specific chinese corpus for sentence semantic equivalence identification[C]// Proceedings of the 2018 conference on empirical methods in natural language processing. 2018: 4946-4951.
[13]
Shi B, Yu* P, Zhao C, Linear correlation constrained joint inversion of seismic and gravity data using squared cosine similarity[C]// International Workshop on Gravity, Electrical & Magnetic Methods and Their Applications, Xi'an, China, May 19–22, 2019. Society of Exploration Geophysicists and Chinese Geophysical Society, 2019: 292-295.
[14]
Salton G, Wong A, Yang C S. A vector space model for automatic indexing [J]. Communications of the ACM, 1975, 18(11): 613-620.
[15]
Kondrak G. N-gram similarity and distance[C]// International symposium on string processing and information retrieval. Springer, Berlin, Heidelberg, 2005: 115-126.
[16]
Niwattanakul S, Singthongchai J, Naenudorn E, Using of Jaccard coefficient for keywords similarity[C]// Proceedings of the international multiconference of engineers and computer scientists. 2013, 1(6): 380-384.
[17]
Bowman S R, Angeli G, Potts C, A large annotated corpus for learning natural language inference [J]. 2015.
[18]
Aghaebrahimian A. Quora question answer dataset[C]// International Conference on Text, Speech, and Dialogue. Springer, Cham, 2017: 66-73.
[19]
Williams A, Nangia N, Bowman S R. A broad-coverage challenge corpus for sentence understanding through inference [J]. 2017.
[20]
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Advances in neural information processing systems, 2012, 25.
[21]
Huang P-S, He X, Gao J, Learning deep structured semantic models for web search using clickthrough data[C]// Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2013: 2333-2338.
[22]
Mohammad Darwich, Shahrul Azman Mohd Noah, Nazlia Omar, Nurul Aida Osman, and Ibrahim Said Ahmad, "Quantifying the Natural Sentiment Strength of Polar Term Senses Using Semantic Gloss Information and Degree Adverbs," Journal of Advances in Information Technology, Vol. 11, No. 3, pp. 109-118, August 2020.
[23]
Hochreiter S, Schmidhuber J. Long short-term memory [J]. Neural computation, 1997, 9(8): 1735-1780.
[24]
Semeniuta S, Barth E. Image classification with recurrent attention models[C]// 2016 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2016: 1-7.
[25]
Yin W, Schütze H, Xiang B, ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs [J]. Transactions of the Association for Computational Linguistics, 2016, 4: 259-272.
[26]
Zhang W, Zhang H, Yang L, Multi-grained Chinese word segmentation with lattice-LSTM [J]. Journal of Chinese Information Processing, 2019, 1: 18-24.
[27]
Li Z, Ding N, Liu Z, Chinese Relation Extraction with Multi-Grained Information and External Linguistic Knowledge[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 4377-4386.
[28]
He T, Huang W, Qiao Y, Text-attentional convolutional neural network for scene text detection [J]. IEEE transactions on image processing, 2016, 25(6): 2529-2541.
[29]
Lai Y, Feng Y, Yu X, Lattice cnns for matching based chinese question answering[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33: 6634-6641.
[30]
Mueller J, Thyagarajan A. Siamese recurrent architectures for learning sentence similarity[C]// Proceedings of the AAAI conference on artificial intelligence. 2016, 30(1).
[31]
Wang Z, Hamza W, Florian R. Bilateral multi-perspective matching for natural language sentences [J]. 2017.
[32]
Chen Q, Zhu X, Ling Z, Enhanced LSTM for natural language inference [J]. 2016.

Index Terms

  1. Multi-granularity Text Semantic Matching Model Based on Knowledge Enhancement

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition
    September 2022
    1221 pages
    ISBN:9781450396899
    DOI:10.1145/3573942
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 May 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Knowledge enhancement
    2. Multi-granularity
    3. Semantic matching

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Key Research and Development Program of Shaanxi Province

    Conference

    AIPR 2022

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 35
      Total Downloads
    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media