Skip to main content
Log in

Hierarchical matching network for multi-turn response selection in retrieval-based chatbots

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Proper response selection is a crucial challenge in retrieval-based chatbots. The state-of-the-art methods match a response with the word sequence of a context, or match the response with each utterance in the context and then accumulate matching information. The former architecture could lose some important local matching information in utterance–response pairs and does not explicitly capture the relationships and dependencies among utterances. The latter architecture does not consider the important global matching information because there is no match between the response and the context at word level. Hence, the above methods have a problem, without considering the fact that matching a response with different levels of a context could match different information for multi-turn response selection. In this work, we propose a hierarchical matching network to match a response with the word and utterance level of a context. At word level, we concatenate the multi-turn context as a long word sequence and then adopt a text matching model to match the response with the word sequence which can capture important matching information at word level. At utterance level, we employ the identical text matching model to match the response with each utterance in the context to capture important matching information for each utterance–response pair and then accumulate the matching information by a recurrent neural network to model the relationships of utterances. At last, the hierarchical matching information is fused to get the final matching information. Experiments on two large-scale public multi-turn response selection datasets show that the proposed model significantly outperforms the state-of-the-art baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://www.dropbox.com/s/2fdn26rj6h9bpvl/ubuntu_data.zip?dl=0.

  2. https://drive.google.com/file/d/154J-neBo20ABtSmJDvm7DK0eTuieAuvw/view?usp=sharing.

  3. https://www.taobao.com.

  4. http://lucene.apache.org/.

  5. https://github.com/google-research/bert.

References

  • Abro WA, Qi G, Gao H, Khan MA, Ali Z (2019) Multi-turn intent determination for goal-oriented dialogue systems. In: 2019 international joint conference on neural networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN.2019.8852246

  • Abro WA, Qi G, Ali Z, Feng Y, Aamir M (2020) Multi-turn intent determination and slot filling with neural networks and regular expressions. Knowl Based Syst 208:106428

    Article  Google Scholar 

  • Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146

    Article  Google Scholar 

  • Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. In: proceedings of the 2015 conference on empirical methods in natural language processing, pp 632–642

  • Chen Q, Wang W (2019) Sequential matching model for end-to-end multi-turn response selection. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 7350–7354

  • Chen Q, Zhu X, Ling ZH, Wei S, Jiang H, Inkpen D (2017) Enhanced LSTM for natural language inference. In: proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1657–1668

  • Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078

  • Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 workshop on deep learning

  • Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537

    MATH  Google Scholar 

  • Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186

  • Fu Z, Cui S, Shang M, Ji F, Zhao D, Chen H, Yan R (2020) Context-to-session matching: Utilizing whole session for response selection in information-seeking dialogue systems. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, p 1605–1613

  • Gu JC, Ling ZH, Liu Q (2019a) Interactive matching network for multi-turn response selection in retrieval-based chatbots. In: proceedings of the 28th ACM international conference on information and knowledge management, pp 2321–2324

  • Gu X, Cho K, Ha JW, Kim S (2019b) DialogWAE: Multimodal response generation with conditional wasserstein auto-encoder. In: International conference on learning representations

  • Hu B, Lu Z, Li H, Chen Q (2014) Convolutional neural network architectures for matching natural language sentences. Adv Neural Inf Process Syst 27:2042–2050

    Google Scholar 

  • Hua K, Feng Z, Tao C, Yan R, Zhang L (2020) Learning to detect relevant contexts and knowledge for response selection in retrieval-based dialogue systems. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 525–534

  • Huang PS, He X, Gao J, Deng L, Acero A, Heck L (2013) Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pp 2333–2338

  • Ji Z, Lu Z, Li H (2014) An information retrieval approach to short text conversation. arXiv preprint arXiv:1408.6988

  • Kadlec R, Schmid M, Kleindienst J (2015) Improved deep learning baselines for ubuntu corpus dialogs. arXiv preprint arXiv:1510.03753

  • Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Third international conference on learning representations

  • Li FL, Qiu M, Chen H, Wang X, Gao X, Huang J, Ren J, Zhao Z, Zhao W, Wang L, et al. (2017a) Alime assist: An intelligent assistant for creating an innovative e-commerce experience. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp 2495–2498

  • Li J, Galley M, Brockett C, Gao J, Dolan B (2016) A diversity-promoting objective function for neural conversation models. In: Proceedings of the 2016 Conference of the North American chapter of the association for computational linguistics: human language technologies, pp 110–119

  • Li J, Monroe W, Shi T, Jean S, Ritter A, Jurafsky D (2017b) Adversarial learning for neural dialogue generation. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2157–2169

  • Lowe R, Pow N, Serban I, Pineau J (2015) The Ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. In: Proceedings of the 16th annual meeting of the special interest group on discourse and dialogue, pp 285–294

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: 1st International conference on learning representations

  • Mou L, Men R, Li G, Xu Y, Zhang L, Yan R, Jin Z (2016) Natural language inference by tree-based convolution and heuristic matching. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2: Short Papers), pp 130–136

  • Pennington J, Socher R, Manning C (2014) GloVe: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  • Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers), pp 2227–2237

  • Serban I, Sordoni A, Bengio Y, Courville A, Pineau J (2016) Building end-to-end dialogue systems using generative hierarchical neural network models. In: AAAI conference on artificial intelligence

  • Shang L, Lu Z, Li H (2015) Neural responding machine for short-text conversation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), pp 1577–1586

  • Shum HY, He Xd, Li D (2018) From eliza to xiaoice: challenges and opportunities with social chatbots. Front Inf Technol Electron Eng 19(1):10–26

    Article  Google Scholar 

  • Tao C, Wu W, Xu C, Hu W, Zhao D, Yan R (2019) One time of interaction may not be enough: Go deep with an interaction-over-interaction network for response selection in dialogues. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1–11

  • Vig J, Ramea K (2019) Comparison of transfer-learning approaches for response selection in multi-turn conversations. In: Workshop on DSTC7

  • Voorhees EM et al (1999) The trec-8 question answering track report. Trec 99:72–82

    Google Scholar 

  • Wan S, Lan Y, Guo J, Xu J, Pang L, Cheng X (2016) A deep architecture for semantic matching with multiple positional sentence representations. In: AAAI conference on artificial intelligence

  • Wang H, Lu Z, Li H, Chen E (2013) A dataset for research on short-text conversations. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 935–945

  • Wang H, Wu Z, Chen J (2019) Multi-turn response selection in retrieval-based chatbots with iterated attentive convolution matching network. In: Proceedings of the 28th ACM international conference on information & knowledge management, pp 1081–1090

  • Wang M, Lu Z, Li H, Liu Q (2015) Syntax-based deep matching of short texts. In: Twenty-Fourth international joint conference on artificial intelligence, pp 1354–1361

  • Wang Z, Hamza W, Florian R (2017) Bilateral multi-perspective matching for natural language sentences. In: Proceedings of the Twenty-Sixth international joint conference on artificial intelligence, IJCAI-17, pp 4144–4150

  • Whang T, Lee D, Lee C, Yang K, Oh D, Lim H (2019) Domain adaptive training bert for response selection. arXiv preprint arXiv:1908.04812

  • Wu Y, Wu W, Xing C, Zhou M, Li Z (2017) Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 496–505

  • Wu Y, Wu W, Yang D, Xu C, Li Z (2018) Neural response generation with dynamic vocabularies. In: AAAI conference on artificial intelligence

  • Xu Z, Liu B, Wang B, Sun C, Wang X (2017) Incorporating loose-structured knowledge into conversation modeling via recall-gate lstm. In: 2017 international joint conference on neural networks (IJCNN), pp 3506–3513

  • Yan R, Song Y, Wu H (2016) Learning to respond with deep neural networks for retrieval-based human-computer conversation system. In: Proceedings of the 39th international ACM SIGIR conference on Research and Development in Information Retrieval, pp 55–64

  • Yang R, Zhang J, Gao X, Ji F, Chen H (2019) Simple and effective text matching with richer alignment features. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4699–4709

  • Yuan C, Zhou W, Li M, Lv S, Zhu F, Han J, Hu S (2019) Multi-hop selector network for multi-turn response selection in retrieval-based chatbots. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 111–120

  • Zhang Z, Li J, Zhu P, Zhao H, Liu G (2018) Modeling multi-turn conversation with deep utterance aggregation. In: Proceedings of the 27th international conference on computational linguistics, pp 3740–3752

  • Zhou X, Dong D, Wu H, Zhao S, Yu D, Tian H, Liu X, Yan R (2016) Multi-view response selection for human-computer conversation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 372–381

  • Zhou X, Li L, Dong D, Liu Y, Chen Y, Zhao WX, Yu D, Wu H (2018) Multi-turn response selection for chatbots with deep attention matching network. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1118–1127

Download references

Acknowledgements

This work is partially supported by the Natural Science Foundation of China (No. 61632011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Wang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, H., Wang, J., Lin, H. et al. Hierarchical matching network for multi-turn response selection in retrieval-based chatbots. Soft Comput 25, 9609–9624 (2021). https://doi.org/10.1007/s00500-021-05699-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-05699-0

Keywords

Navigation