skip to main content
research-article

Sentence Semantic Matching Based on 3D CNN for Human–Robot Language Interaction

Published: 16 July 2021 Publication History

Abstract

The development of cognitive robotics brings an attractive scenario where humans and robots cooperate to accomplish specific tasks. To facilitate this scenario, cognitive robots are expected to have the ability to interact with humans with natural language, which depends on natural language understanding (NLU) technologies. As one core task in NLU, sentence semantic matching (SSM) has widely existed in various interaction scenarios. Recently, deep learning–based methods for SSM have become predominant due to their outstanding performance. However, each sentence consists of a sequence of words, and it is usually viewed as one-dimensional (1D) text, leading to the existing available neural models being restricted into 1D sequential networks. A few researches attempt to explore the potential of 2D or 3D neural models in text representation. However, it is hard for their works to capture the complex features in texts, and thus the achieved performance improvement is quite limited. To tackle this challenge, we devise a novel 3D CNN-based SSM (3DSSM) method for human–robot language interaction. Specifically, first, a specific architecture called feature cube network is designed to transform a 1D sentence into a multi-dimensional representation named as semantic feature cube. Then, a 3D CNN module is employed to learn a semantic representation for the semantic feature cube by capturing both the local features embedded in word representations and the sequential information among successive words in a sentence. Given a pair of sentences, their representations are concatenated together to feed into another 3D CNN to capture the interactive features between them to generate the final matching representation. Finally, the semantic matching degree is judged with the sigmoid function by taking the learned matching representation as the input. Extensive experiments on two real-world datasets demonstrate that 3DSSM is able to achieve comparable or even better performance over the state-of-the-art competing methods.

References

[1]
JeanBaptiste Alayrac, Joao Carreira, and Andrew Zisserman. 2019. The visual centrifuge: Model-free layered video representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2457–2466.
[2]
Cen Chen, Xiaolu Zhang, Sheng Ju, Chilin Fu, Caizhi Tang, Jun Zhou, and Xiaolong Li. 2019. AntProphet: An intention mining system behind Alipay’s intelligent customer service bot. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 6497–6499.
[3]
Haolan Chen, Fred X. Han, Di Niu, Dong Liu, Kunfeng Lai, Chenglin Wu, and Yu Xu. 2018. Mix: Multi-channel information crossing for text matching. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 110–119.
[4]
Jing Chen, Qingcai Chen, Xin Liu, Haijun Yang, Daohe Lu, and Buzhou Tang. 2018. The BQ corpus: A large-scale domain-specific Chinese corpus for sentence semantic equivalence identification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4946–4951.
[5]
Lu Chen, Yanbin Zhao, Boer Lyu, Lesheng Jin, Zhi Chen, Su Zhu, and Kai Yu. 2020. Neural graph matching networks for Chinese short text matching. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 6152–6158.
[6]
Kyunghyun Cho, van Bart Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1724–1734.
[7]
Jian Fu, Xipeng Qiu, and Xuanjing Huang. 2016. Convolutional deep neural networks for document-based question answering. In Proceedings of the Natural Language Understanding and Intelligent Applications, Lecture Notes in Computer Science, Vol. 10102. 790–797.
[8]
Guangwei Gao, Yi Yu, Jin Xie, Jian Yang, Meng Yang, and Jian Zhang. 2020. Constructing multilayer locality-constrained matrix regression framework for noise robust face super-resolution. Pattern Recognit. 110 (2020), 107539.
[9]
Guangwei Gao, Yi Yu, Jian Yang, Guo-Jun Qi, and Meng Yang. 2020. Hierarchical deep CNN feature set-based representation learning for robust cross-resolution face recognition. IEEE Trans. Circ. Syst. Video Technol. (2020).
[10]
Guangwei Gao, Yi Yu, Meng Yang, Heyou Chang, Pu Huang, and Dong Yue. 2020. Cross-resolution face recognition with pose variations via multilayer locality-constrained structural orthogonal procrustes regression. Inf. Sci. 506 (2020), 19–36.
[11]
Yichen Gong, Heng Luo, and Jian Zhang. 2018. Natural language inference over interaction space. In Proceedings of the 6th International Conference on Learning Representations. 1–15.
[12]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.
[13]
Xinyu Hua and Lu Wang. 2019. Sentence-level content planning and style specification for neural text generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 591–602.
[14]
Qiang Huang, Jianhui Bu, Weijian Xie, Shengwen Yang, Weijia Wu, and Liping Liu. 2019. Multi-task sentence encoding model for semantic retrieval in question answering systems. In Proceedings of the International Joint Conference on Neural Networks. 1–8.
[15]
Shengqin Jiang, Yuankai Qi, Haokui Zhang, Zongwen Bai, Xiaobo Lu, and Peng Wang. 2020. D3D: Dual 3D convolutional network for real-time action recognition. IEEE Trans. Ind. Inf. 17, 7 (2020), 4584–4593.
[16]
Rushi Lan, Long Sun, Zhenbing Liu, Huimin Lu, Cheng Pang, and Xiaonan Luo. 2020. MADNet: A fast and lightweight network for single-image super resolution. IEEE Trans. Cybern. 51, 3 (2020), 1443–1453.
[17]
Danny Lange. 2019. Cognitive robotics: Making robots sense, understand, and interact. Computer 52, 12 (2019), 39–44.
[18]
Seong-Gyun Leem, In-Chul Yoo, and Dongsuk Yook. 2019. Multitask learning of deep neural network-based keyword spotting for IoT devices. IEEE Trans. Consum. Electron. 65, 2 (2019), 188–194.
[19]
Peixia Li, Dong Wang, Lijun Wang, and Huchuan Lu. 2018. Deep visual tracking: Review and experimental comparison. Pattern Recogn. 76 (2018), 323–338.
[20]
Yulong Li, Dong Zhou, and Wenyu Zhao. 2020. Combining local and global features into a siamese network for sentence similarity. IEEE Access 8 (2020), 75437–75447.
[21]
Zhouhan Lin, Minwei Feng, Cícero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. In Proceedings of the 5th International Conference on Learning Representations. 1–15.
[22]
Fagui Liu, Jingzhong Zheng, Lailei Zheng, and Cheng Chen. 2020. Combining attention-based bidirectional gated recurrent neural network and two-dimensional convolutional neural network for document-level sentiment classification. Neurocomputing 371 (2020), 39–50.
[23]
Mingtong Liu, Yujie Zhang, Jinan Xu, and Yufeng Chen. 2019. Original semantics-oriented attention and deep fusion network for sentence matching. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2652–2661.
[24]
Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, and Qi Ju. 2020. FastBERT: A self-distilling BERT with adaptive inference time. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 6035–6044.
[25]
Xin Liu, Qingcai Chen, Chong Deng, Huajun Zeng, Jing Chen, Dongfang Li, and Buzhou Tang. 2018. LCQMC: A large-scale Chinese question matching corpus. In Proceedings of the 27th International Conference on Computational Linguistics. 1952–1962.
[26]
Xiaomin Liu, Jun-Bao Li, Jeng-Shyang Pan, Shuo Wang, Xudong Lv, and Shuanglong Cui. 2020. Image-matching framework based on region partitioning for target image location. Telecommun. Syst. 74, 3 (2020), 269–286.
[27]
Huimin Lu, Yujie Li, Min Chen, Hyoungseop Kim, and Seiichi Serikawa. 2018. Brain intelligence: Go beyond artificial intelligence. Mob. Netw. Appl. 23, 2 (2018), 368–375.
[28]
Huimin Lu, Yujie Li, Shenglin Mu, Dong Wang, Hyoungseop Kim, and Seiichi Serikawa. 2017. Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet Things. 5, 4 (2017), 2315–2322.
[29]
Huimin Lu, Ming Zhang, Xing Xu, Yujie Li, and Heng Tao Shen. 2021. Deep fuzzy hashing network for efficient image retrieval. IEEE Trans. Fuzzy Syst. 29, 1 (2021), 166–176.
[30]
Wenpeng Lu, Xu Zhang, Huimin Lu, and Fangfang Li. 2020. Deep hierarchical encoding model for sentence semantic matching. J. Vis. Commun. Image Represent. 71 (2020), 102794.
[31]
Wenpeng Lu, Yuteng Zhang, Shoujin Wang, Heyan Huang, Qian Liu, and Sheng Luo. 2021. Concept representation by learning explicit and implicit concept couplings. IEEE Intell. Syst. 36, 1 (2021), 6–15.
[32]
Arindam Mitra, Ishan Shrivastava, and Chitta Baral. 2020. Enhancing natural language inference using new and expanded training data sets and new learning models. In Proceedings of the 34th AAAI Conference on Artificial Intelligence. 8504–8511.
[33]
Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning. 807–814.
[34]
Pin Ni, Yuming Li, Gangmin Li, and Victor Chang. 2020. Natural language understanding approaches based on joint task of intent detection and slot filling for IoT voice interaction. Neural. Comput. Appl. 32 (2020), 16149–16166. https://link.springer.com/article/10.1007/s00521-020-04805-x.
[35]
Guocheng Niu, Hengru Xu, Bolei He, Xinyan Xiao, Hua Wu, and Sheng Gao. 2019. Enhancing local feature extraction with global representation for neural text classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 496–506.
[36]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text matching as image recognition. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2793–2799.
[37]
Chris Paxton, Yonatan Bisk, Jesse Thomason, Arunkumar Byravan, and Dieter Foxl. 2019. Prospection: Interpretable plans from language by predicting the future. In Proceedings of the 2019 International Conference on Robotics and Automation. 6942–6948.
[38]
Juncai Peng, Yuanjie Shao, Nong Sang, and Changxin Gao. 2020. Joint image deblurring and matching with feature-based sparse representation prior. Pattern Recognit. 103 (2020), 107300.
[39]
Eugenio Rubio-Drosdov, Daniel Díaz-Sánchez, Florina Almenárez, Patricia Arias-Cabarcos, and Andrés Marín. 2017. Seamless human-device interaction in the internet of things. IEEE Trans. Consum. Electron. 63, 4 (2017), 490–498.
[40]
Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 11 (1997), 2673–2681.
[41]
Hamidreza Shahidi, Ming Li, and Jimmy Lin. 2020. Two birds, one stone: A simple, unified model for text generation from structured and unstructured data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3864–3870.
[42]
Yang Song, Qinmin Vivian Hu, and Liang He. 2019. P-CNN: Enhancing text matching with positional convolutional neural network. Knowl. Based Syst. 169 (2019), 67–79.
[43]
Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (2014), 1929–1958.
[44]
Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE 2.0: A continual pre-training framework for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence. 8968–8975.
[45]
Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, and Raymond Mooney. 2020. Jointly improving parsing and perception for natural language commands through human-robot dialog. J. Mach. Learn. Res. 67 (2020), 327–374.
[46]
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2019. Generating token-level explanations for natural language inference. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 963–969.
[47]
Heyuan Wang, Fangzhao Wu, Zheng Liu, and Xing Xie. 2020. Fine-grained interest matching for neural news recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 836–845.
[48]
Shoujin Wang, Liang Hu, Yan Wang, Quan Z. Sheng, Mehmet Orgun, and Longbing Cao. 2020. Intention nets: Psychology-inspired user choice behavior modeling for next-basket prediction. In Proceedings of the AAAI Conference on Artificial Intelligence. 6259–6266.
[49]
Shoujin Wang, Liang Hu, Yan Wang, Quan Z. Sheng, Mehmet Orgun, and Longbing Cao. 2020. Intention2Basket: A neural intention-driven approach for dynamic next-basket planning. In Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2333–2339.
[50]
Zhongbin Xie and Shuai Ma. 2019. Dual-view variational auto-encoders for semi-supervised text matching. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 5306–5312.
[51]
Xing Xu, Kaiyi Lin, Lianli Gao, Huimin Lu, Heng Tao Shen, and Xuelong Li. 2020. Learning cross-modal common representations by private-shared subspaces separation. IEEE Trans. Cybern. (2020), 1–15.
[52]
Xing Xu, Huimin Lu, Jingkuan Song, Yang Yang, Hengtao Shen, and Xuelong Li. 2019. Ternary adversarial networks with self-supervision for zero-shot cross-modal retrieval. IEEE Trans. Cybern. 77, 17 (2019), 21847–21860.
[53]
Xing Xu, Tan Wang, Yang Yang, Lin Zuo, Fumin Shen, and Heng Tao Shen. 2020. Cross-modal attention with semantic consistence for image-text matching. IEEE Trans. Neural Networks Learn. Syst. 31, 12 (2020), 5412–5425.
[54]
Bin Yan, Haojie Zhao, Dong Wang, Huchuan Lu, and Xiaoyun Yang. 2019. ‘Skimming-Perusal’ Tracking: A framework for real-time and robust long-term tracking. In Proceedings of the IEEE International Conference on Computer Vision. 2385–2393.
[55]
Yi Yu, Suhua Tang, Kiyoharu Aizawa, and Akiko Aizawa. 2018. Category-based deep CCA for fine-grained venue discovery from multimodal data. IEEE Trans. Neural Netw. Learn. 30, 4 (2018), 1250–1258.
[56]
Chenggong Zhang, Weijuan Zhang, Daren Zha, Pengjie Ren, and Nan Mu. 2019. A multi-granularity neural network for answer sentence selection. In Proceedings of the 2019 International Joint Conference on Neural Networks. 1–7.
[57]
Kun Zhang, Guangyi Lv, Linyuan Wang, Le Wu, Enhong Chen, Fangzhao Wu, and Xing Xie. 2019. DRr-Net: Dynamic re-read network for sentence semantic matching. In Proceedings of the AAAI Conference on Artificial Intelligence. 7442–7449.
[58]
Xu Zhang, Wenpeng Lu, Fangfang Li, Xueping Peng, and Ruoyu Zhang. 2019. Deep feature fusion model for sentence semantic matching. CMC-Comput. Mater. Contin. 61, 2 (2019), 601–616.
[59]
Yuteng Zhang, Wenpeng Lu, Weihua Ou, Guoqiang Zhang, Xu Zhang, Jinyong Cheng, and Weiyu Zhang. 2020. Chinese medical question answer selection via hybrid models based on CNN and GRU. Multim. Tools Appl. 79, 21–22 (2020), 14751–14776.

Cited By

View all
  • (2023)Chinese Lexical Sememe Prediction Using CilinE KnowledgeIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.2022EAP1074E106.A:2(146-153)Online publication date: 1-Feb-2023
  • (2023)Broadcast speech recognition and control system based on Internet of Things sensors for smart citiesJournal of Intelligent Systems10.1515/jisys-2023-006732:1Online publication date: 28-Oct-2023
  • (2023)Semantic Sentence Matching Based on Multiple Parallelly Organized Interaction Layers at Various Granularity Combinations With Two-Stage Aggregation StrategyIEEE Access10.1109/ACCESS.2023.331584011(101498-101513)Online publication date: 2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology  Volume 21, Issue 4
November 2021
520 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/3472282
  • Editor:
  • Ling Lu
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 July 2021
Accepted: 01 February 2021
Received: 01 September 2020
Revised: 01 January 2020
Published in TOIT Volume 21, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Sentence semantic matching
  2. 3D CNN
  3. semantic feature cube
  4. human–robot interaction
  5. representation learning

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Key R&D Program of China
  • National Natural Science Foundation of China
  • Key Program of Science and Technology of Shandong Province

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)39
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Chinese Lexical Sememe Prediction Using CilinE KnowledgeIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.2022EAP1074E106.A:2(146-153)Online publication date: 1-Feb-2023
  • (2023)Broadcast speech recognition and control system based on Internet of Things sensors for smart citiesJournal of Intelligent Systems10.1515/jisys-2023-006732:1Online publication date: 28-Oct-2023
  • (2023)Semantic Sentence Matching Based on Multiple Parallelly Organized Interaction Layers at Various Granularity Combinations With Two-Stage Aggregation StrategyIEEE Access10.1109/ACCESS.2023.331584011(101498-101513)Online publication date: 2023
  • (2023)Constructing better prototype generators with 3D CNNs for few-shot text classificationExpert Systems with Applications10.1016/j.eswa.2023.120124225(120124)Online publication date: Sep-2023
  • (2023)CNN autoencoders and LSTM-based reduced order model for student dropout predictionNeural Computing and Applications10.1007/s00521-023-08894-235:30(22341-22357)Online publication date: 8-Aug-2023
  • (2022)Entity recognition of Chinese medical text based on multi-head self-attention combined with BILSTM-CRFMathematical Biosciences and Engineering10.3934/mbe.202210319:3(2206-2218)Online publication date: 2022
  • (2022)MGMSN: Multi-Granularity Matching Model Based on Siamese Neural NetworkFrontiers in Bioengineering and Biotechnology10.3389/fbioe.2022.83958610Online publication date: 28-Mar-2022
  • (2022)Aspect-Driven User Preference and News Representation Learning for News RecommendationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.318256823:12(25297-25307)Online publication date: Dec-2022
  • (2022)Chinese Sentence Matching with Multiple Alignments and Feature Augmentation2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892521(1-8)Online publication date: 18-Jul-2022
  • (2022)A Sentence Semantic Matching Model Based on Cross-Attention Mechanism2022 3rd International Conference on Computer Science and Management Technology (ICCSMT)10.1109/ICCSMT58129.2022.00089(391-394)Online publication date: Nov-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media