skip to main content
research-article

Sentence Semantic Matching Based on 3D CNN for Human–Robot Language Interaction

Authors Info & Claims
Published:16 July 2021Publication History
Skip Abstract Section

Abstract

The development of cognitive robotics brings an attractive scenario where humans and robots cooperate to accomplish specific tasks. To facilitate this scenario, cognitive robots are expected to have the ability to interact with humans with natural language, which depends on natural language understanding (NLU) technologies. As one core task in NLU, sentence semantic matching (SSM) has widely existed in various interaction scenarios. Recently, deep learning–based methods for SSM have become predominant due to their outstanding performance. However, each sentence consists of a sequence of words, and it is usually viewed as one-dimensional (1D) text, leading to the existing available neural models being restricted into 1D sequential networks. A few researches attempt to explore the potential of 2D or 3D neural models in text representation. However, it is hard for their works to capture the complex features in texts, and thus the achieved performance improvement is quite limited. To tackle this challenge, we devise a novel 3D CNN-based SSM (3DSSM) method for human–robot language interaction. Specifically, first, a specific architecture called feature cube network is designed to transform a 1D sentence into a multi-dimensional representation named as semantic feature cube. Then, a 3D CNN module is employed to learn a semantic representation for the semantic feature cube by capturing both the local features embedded in word representations and the sequential information among successive words in a sentence. Given a pair of sentences, their representations are concatenated together to feed into another 3D CNN to capture the interactive features between them to generate the final matching representation. Finally, the semantic matching degree is judged with the sigmoid function by taking the learned matching representation as the input. Extensive experiments on two real-world datasets demonstrate that 3DSSM is able to achieve comparable or even better performance over the state-of-the-art competing methods.

References

  1. JeanBaptiste Alayrac, Joao Carreira, and Andrew Zisserman. 2019. The visual centrifuge: Model-free layered video representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2457–2466.Google ScholarGoogle ScholarCross RefCross Ref
  2. Cen Chen, Xiaolu Zhang, Sheng Ju, Chilin Fu, Caizhi Tang, Jun Zhou, and Xiaolong Li. 2019. AntProphet: An intention mining system behind Alipay’s intelligent customer service bot. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 6497–6499. Google ScholarGoogle ScholarCross RefCross Ref
  3. Haolan Chen, Fred X. Han, Di Niu, Dong Liu, Kunfeng Lai, Chenglin Wu, and Yu Xu. 2018. Mix: Multi-channel information crossing for text matching. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 110–119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jing Chen, Qingcai Chen, Xin Liu, Haijun Yang, Daohe Lu, and Buzhou Tang. 2018. The BQ corpus: A large-scale domain-specific Chinese corpus for sentence semantic equivalence identification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4946–4951.Google ScholarGoogle ScholarCross RefCross Ref
  5. Lu Chen, Yanbin Zhao, Boer Lyu, Lesheng Jin, Zhi Chen, Su Zhu, and Kai Yu. 2020. Neural graph matching networks for Chinese short text matching. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 6152–6158.Google ScholarGoogle ScholarCross RefCross Ref
  6. Kyunghyun Cho, van Bart Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1724–1734.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jian Fu, Xipeng Qiu, and Xuanjing Huang. 2016. Convolutional deep neural networks for document-based question answering. In Proceedings of the Natural Language Understanding and Intelligent Applications, Lecture Notes in Computer Science, Vol. 10102. 790–797.Google ScholarGoogle ScholarCross RefCross Ref
  8. Guangwei Gao, Yi Yu, Jin Xie, Jian Yang, Meng Yang, and Jian Zhang. 2020. Constructing multilayer locality-constrained matrix regression framework for noise robust face super-resolution. Pattern Recognit. 110 (2020), 107539.Google ScholarGoogle ScholarCross RefCross Ref
  9. Guangwei Gao, Yi Yu, Jian Yang, Guo-Jun Qi, and Meng Yang. 2020. Hierarchical deep CNN feature set-based representation learning for robust cross-resolution face recognition. IEEE Trans. Circ. Syst. Video Technol. (2020).Google ScholarGoogle Scholar
  10. Guangwei Gao, Yi Yu, Meng Yang, Heyou Chang, Pu Huang, and Dong Yue. 2020. Cross-resolution face recognition with pose variations via multilayer locality-constrained structural orthogonal procrustes regression. Inf. Sci. 506 (2020), 19–36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yichen Gong, Heng Luo, and Jian Zhang. 2018. Natural language inference over interaction space. In Proceedings of the 6th International Conference on Learning Representations. 1–15.Google ScholarGoogle Scholar
  12. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Xinyu Hua and Lu Wang. 2019. Sentence-level content planning and style specification for neural text generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 591–602.Google ScholarGoogle ScholarCross RefCross Ref
  14. Qiang Huang, Jianhui Bu, Weijian Xie, Shengwen Yang, Weijia Wu, and Liping Liu. 2019. Multi-task sentence encoding model for semantic retrieval in question answering systems. In Proceedings of the International Joint Conference on Neural Networks. 1–8.Google ScholarGoogle ScholarCross RefCross Ref
  15. Shengqin Jiang, Yuankai Qi, Haokui Zhang, Zongwen Bai, Xiaobo Lu, and Peng Wang. 2020. D3D: Dual 3D convolutional network for real-time action recognition. IEEE Trans. Ind. Inf. 17, 7 (2020), 4584–4593.Google ScholarGoogle ScholarCross RefCross Ref
  16. Rushi Lan, Long Sun, Zhenbing Liu, Huimin Lu, Cheng Pang, and Xiaonan Luo. 2020. MADNet: A fast and lightweight network for single-image super resolution. IEEE Trans. Cybern. 51, 3 (2020), 1443–1453.Google ScholarGoogle ScholarCross RefCross Ref
  17. Danny Lange. 2019. Cognitive robotics: Making robots sense, understand, and interact. Computer 52, 12 (2019), 39–44.Google ScholarGoogle ScholarCross RefCross Ref
  18. Seong-Gyun Leem, In-Chul Yoo, and Dongsuk Yook. 2019. Multitask learning of deep neural network-based keyword spotting for IoT devices. IEEE Trans. Consum. Electron. 65, 2 (2019), 188–194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Peixia Li, Dong Wang, Lijun Wang, and Huchuan Lu. 2018. Deep visual tracking: Review and experimental comparison. Pattern Recogn. 76 (2018), 323–338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yulong Li, Dong Zhou, and Wenyu Zhao. 2020. Combining local and global features into a siamese network for sentence similarity. IEEE Access 8 (2020), 75437–75447.Google ScholarGoogle ScholarCross RefCross Ref
  21. Zhouhan Lin, Minwei Feng, Cícero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. In Proceedings of the 5th International Conference on Learning Representations. 1–15.Google ScholarGoogle Scholar
  22. Fagui Liu, Jingzhong Zheng, Lailei Zheng, and Cheng Chen. 2020. Combining attention-based bidirectional gated recurrent neural network and two-dimensional convolutional neural network for document-level sentiment classification. Neurocomputing 371 (2020), 39–50.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mingtong Liu, Yujie Zhang, Jinan Xu, and Yufeng Chen. 2019. Original semantics-oriented attention and deep fusion network for sentence matching. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2652–2661.Google ScholarGoogle ScholarCross RefCross Ref
  24. Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, and Qi Ju. 2020. FastBERT: A self-distilling BERT with adaptive inference time. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 6035–6044.Google ScholarGoogle ScholarCross RefCross Ref
  25. Xin Liu, Qingcai Chen, Chong Deng, Huajun Zeng, Jing Chen, Dongfang Li, and Buzhou Tang. 2018. LCQMC: A large-scale Chinese question matching corpus. In Proceedings of the 27th International Conference on Computational Linguistics. 1952–1962.Google ScholarGoogle Scholar
  26. Xiaomin Liu, Jun-Bao Li, Jeng-Shyang Pan, Shuo Wang, Xudong Lv, and Shuanglong Cui. 2020. Image-matching framework based on region partitioning for target image location. Telecommun. Syst. 74, 3 (2020), 269–286.Google ScholarGoogle ScholarCross RefCross Ref
  27. Huimin Lu, Yujie Li, Min Chen, Hyoungseop Kim, and Seiichi Serikawa. 2018. Brain intelligence: Go beyond artificial intelligence. Mob. Netw. Appl. 23, 2 (2018), 368–375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Huimin Lu, Yujie Li, Shenglin Mu, Dong Wang, Hyoungseop Kim, and Seiichi Serikawa. 2017. Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet Things. 5, 4 (2017), 2315–2322.Google ScholarGoogle ScholarCross RefCross Ref
  29. Huimin Lu, Ming Zhang, Xing Xu, Yujie Li, and Heng Tao Shen. 2021. Deep fuzzy hashing network for efficient image retrieval. IEEE Trans. Fuzzy Syst. 29, 1 (2021), 166–176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wenpeng Lu, Xu Zhang, Huimin Lu, and Fangfang Li. 2020. Deep hierarchical encoding model for sentence semantic matching. J. Vis. Commun. Image Represent. 71 (2020), 102794.Google ScholarGoogle ScholarCross RefCross Ref
  31. Wenpeng Lu, Yuteng Zhang, Shoujin Wang, Heyan Huang, Qian Liu, and Sheng Luo. 2021. Concept representation by learning explicit and implicit concept couplings. IEEE Intell. Syst. 36, 1 (2021), 6–15.Google ScholarGoogle ScholarCross RefCross Ref
  32. Arindam Mitra, Ishan Shrivastava, and Chitta Baral. 2020. Enhancing natural language inference using new and expanded training data sets and new learning models. In Proceedings of the 34th AAAI Conference on Artificial Intelligence. 8504–8511.Google ScholarGoogle ScholarCross RefCross Ref
  33. Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning. 807–814. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Pin Ni, Yuming Li, Gangmin Li, and Victor Chang. 2020. Natural language understanding approaches based on joint task of intent detection and slot filling for IoT voice interaction. Neural. Comput. Appl. 32 (2020), 16149–16166. https://link.springer.com/article/10.1007/s00521-020-04805-x.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Guocheng Niu, Hengru Xu, Bolei He, Xinyan Xiao, Hua Wu, and Sheng Gao. 2019. Enhancing local feature extraction with global representation for neural text classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 496–506.Google ScholarGoogle ScholarCross RefCross Ref
  36. Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text matching as image recognition. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2793–2799. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Chris Paxton, Yonatan Bisk, Jesse Thomason, Arunkumar Byravan, and Dieter Foxl. 2019. Prospection: Interpretable plans from language by predicting the future. In Proceedings of the 2019 International Conference on Robotics and Automation. 6942–6948.Google ScholarGoogle ScholarCross RefCross Ref
  38. Juncai Peng, Yuanjie Shao, Nong Sang, and Changxin Gao. 2020. Joint image deblurring and matching with feature-based sparse representation prior. Pattern Recognit. 103 (2020), 107300.Google ScholarGoogle ScholarCross RefCross Ref
  39. Eugenio Rubio-Drosdov, Daniel Díaz-Sánchez, Florina Almenárez, Patricia Arias-Cabarcos, and Andrés Marín. 2017. Seamless human-device interaction in the internet of things. IEEE Trans. Consum. Electron. 63, 4 (2017), 490–498.Google ScholarGoogle ScholarCross RefCross Ref
  40. Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 11 (1997), 2673–2681. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Hamidreza Shahidi, Ming Li, and Jimmy Lin. 2020. Two birds, one stone: A simple, unified model for text generation from structured and unstructured data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3864–3870.Google ScholarGoogle ScholarCross RefCross Ref
  42. Yang Song, Qinmin Vivian Hu, and Liang He. 2019. P-CNN: Enhancing text matching with positional convolutional neural network. Knowl. Based Syst. 169 (2019), 67–79.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (2014), 1929–1958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE 2.0: A continual pre-training framework for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence. 8968–8975.Google ScholarGoogle ScholarCross RefCross Ref
  45. Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, and Raymond Mooney. 2020. Jointly improving parsing and perception for natural language commands through human-robot dialog. J. Mach. Learn. Res. 67 (2020), 327–374.Google ScholarGoogle Scholar
  46. James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2019. Generating token-level explanations for natural language inference. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 963–969.Google ScholarGoogle ScholarCross RefCross Ref
  47. Heyuan Wang, Fangzhao Wu, Zheng Liu, and Xing Xie. 2020. Fine-grained interest matching for neural news recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 836–845.Google ScholarGoogle ScholarCross RefCross Ref
  48. Shoujin Wang, Liang Hu, Yan Wang, Quan Z. Sheng, Mehmet Orgun, and Longbing Cao. 2020. Intention nets: Psychology-inspired user choice behavior modeling for next-basket prediction. In Proceedings of the AAAI Conference on Artificial Intelligence. 6259–6266.Google ScholarGoogle ScholarCross RefCross Ref
  49. Shoujin Wang, Liang Hu, Yan Wang, Quan Z. Sheng, Mehmet Orgun, and Longbing Cao. 2020. Intention2Basket: A neural intention-driven approach for dynamic next-basket planning. In Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2333–2339.Google ScholarGoogle ScholarCross RefCross Ref
  50. Zhongbin Xie and Shuai Ma. 2019. Dual-view variational auto-encoders for semi-supervised text matching. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 5306–5312. Google ScholarGoogle ScholarCross RefCross Ref
  51. Xing Xu, Kaiyi Lin, Lianli Gao, Huimin Lu, Heng Tao Shen, and Xuelong Li. 2020. Learning cross-modal common representations by private-shared subspaces separation. IEEE Trans. Cybern. (2020), 1–15.Google ScholarGoogle Scholar
  52. Xing Xu, Huimin Lu, Jingkuan Song, Yang Yang, Hengtao Shen, and Xuelong Li. 2019. Ternary adversarial networks with self-supervision for zero-shot cross-modal retrieval. IEEE Trans. Cybern. 77, 17 (2019), 21847–21860.Google ScholarGoogle Scholar
  53. Xing Xu, Tan Wang, Yang Yang, Lin Zuo, Fumin Shen, and Heng Tao Shen. 2020. Cross-modal attention with semantic consistence for image-text matching. IEEE Trans. Neural Networks Learn. Syst. 31, 12 (2020), 5412–5425.Google ScholarGoogle ScholarCross RefCross Ref
  54. Bin Yan, Haojie Zhao, Dong Wang, Huchuan Lu, and Xiaoyun Yang. 2019. ‘Skimming-Perusal’ Tracking: A framework for real-time and robust long-term tracking. In Proceedings of the IEEE International Conference on Computer Vision. 2385–2393.Google ScholarGoogle ScholarCross RefCross Ref
  55. Yi Yu, Suhua Tang, Kiyoharu Aizawa, and Akiko Aizawa. 2018. Category-based deep CCA for fine-grained venue discovery from multimodal data. IEEE Trans. Neural Netw. Learn. 30, 4 (2018), 1250–1258.Google ScholarGoogle ScholarCross RefCross Ref
  56. Chenggong Zhang, Weijuan Zhang, Daren Zha, Pengjie Ren, and Nan Mu. 2019. A multi-granularity neural network for answer sentence selection. In Proceedings of the 2019 International Joint Conference on Neural Networks. 1–7.Google ScholarGoogle ScholarCross RefCross Ref
  57. Kun Zhang, Guangyi Lv, Linyuan Wang, Le Wu, Enhong Chen, Fangzhao Wu, and Xing Xie. 2019. DRr-Net: Dynamic re-read network for sentence semantic matching. In Proceedings of the AAAI Conference on Artificial Intelligence. 7442–7449.Google ScholarGoogle ScholarCross RefCross Ref
  58. Xu Zhang, Wenpeng Lu, Fangfang Li, Xueping Peng, and Ruoyu Zhang. 2019. Deep feature fusion model for sentence semantic matching. CMC-Comput. Mater. Contin. 61, 2 (2019), 601–616.Google ScholarGoogle ScholarCross RefCross Ref
  59. Yuteng Zhang, Wenpeng Lu, Weihua Ou, Guoqiang Zhang, Xu Zhang, Jinyong Cheng, and Weiyu Zhang. 2020. Chinese medical question answer selection via hybrid models based on CNN and GRU. Multim. Tools Appl. 79, 21–22 (2020), 14751–14776.Google ScholarGoogle Scholar

Index Terms

  1. Sentence Semantic Matching Based on 3D CNN for Human–Robot Language Interaction

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Internet Technology
          ACM Transactions on Internet Technology  Volume 21, Issue 4
          November 2021
          520 pages
          ISSN:1533-5399
          EISSN:1557-6051
          DOI:10.1145/3472282
          • Editor:
          • Ling Lu
          Issue’s Table of Contents

          Copyright © 2021 Association for Computing Machinery.

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 July 2021
          • Accepted: 1 February 2021
          • Received: 1 September 2020
          • Revised: 1 January 2020
          Published in toit Volume 21, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format