Skip to main content
Log in

Non-goal oriented dialogue agents: state of the art, dataset, and evaluation

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Dialogue agent, a derivative of intelligent agent in the field of computational linguistics, is a computer program that is capable of generating responses and performing conversation in natural language. The field of computational linguistics is flourishing due to the intensive growth of dialogue agents; the most potential one is providing voice controlled smart personal assistant service for handsets and homes. The agents are usable, accessible but perform task-related short conversations. Non-goal-oriented dialogue agents are designed to imitate extended human–human conversations, also called as chit-chat, to provide the consumer with a satisfactory experience on the conversation quality. The design of such agents is primarily defined by a language model, unlike goal-oriented dialogue agents that employees slot based or ontology-based frameworks, hence most of the methods are data-driven. This paper surveys the current state of the art of non-goal-oriented dialogue systems specifically data-driven methods, the most prevalent being deep learning. This paper aims at (a) providing an insight of recent methods and architectures proposed for building context and modeling response along with a comprehensive review of the state of the art (b) examine the type of data set and evaluation methods available (c) present the challenges and limitation that the recent models, dataset and evaluation methods constitute.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Source: Ji et al. (2014)

Similar content being viewed by others

Abbreviations

cDSSM:

Convolutional deep structured semantic model

CNN:

Convolution neural network

CNN-LM:

Convolution neural network language model

CBOW:

Continuous bag of words

GAN:

Generative adversarial network

GRU:

Gated recurrent unit

IR:

Information retrieval

MAP:

Mean average precision

MDP:

Markov decision processes

MLE:

Maximum likelihood estimation

MRR:

Mean reciprocal rank

MT:

Machine translation

NLP:

Natural language processing

NLU:

Natural language understanding

NLG:

Natural language generation

NNLM:

Neural network based language model

POMDP:

Partial observable Markov decision process

PNN:

Probabilistic neural network

PPL:

Perplexity

RNN:

Recurrent neural network

RNN-LM:

Recurrent neural network based language models

SMT:

Statistical machine translation

SVM:

Support vector machine

LDA:

Latent Dirichlet allocation

LSTM:

Long short term memory

LSTM-LM:

Long short term memory-based language model

WER:

Word error rate

References

  • Abdul-Kader Sameera A, Woods JC (2015) Survey on chatbot design techniques in speech conversation systems. Int J Adv Comput Sci Appl 6:7

    Google Scholar 

  • Ameixa D, Coheur L, Redol RA (2013) From subtitles to human interactions: introducing the subtle corpus. Technical report, Tech. rep., INESC-ID (November 2014)

  • Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  • Banchs RE (2012) Movie-dic: a movie dialogue corpus for research and development. In: Proceedings of the 50th annual meeting of the association for computational linguistics (Volume 2: Short Papers), volume 2, pp 203–207

  • Banerjee S, Lavie A (2005) Meteor: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72

  • Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3(Feb):1137–1155

    MATH  Google Scholar 

  • Bengio Y, Simard P, Frasconi P et al (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  Google Scholar 

  • Bobrow DG, Kaplan RM, Kay M, Norman DA, Thompson H, Winograd T (1977) Gus, a frame-driven dialog system. Artif intell 8(2):155–173

    Article  Google Scholar 

  • Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S (2015) Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349

  • Chang F, Dell GS, Bock K (2006) Becoming syntactic. Psychol Rev 113(2):234

    Article  Google Scholar 

  • Chen H, Liu X, Yin D, Tang J (2017) A survey on dialogue systems: recent advances and new frontiers. ACM Sigkdd Explor Newsl 19(2):25–35

    Article  Google Scholar 

  • Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078

  • Choudhary S, Srivastava P, Ungar L, Sedoc J (2017) Domain aware neural dialog system. arXiv preprint arXiv:1708.00897

  • Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555

  • Chung J, Kastner K, Dinh L, Goel K, Courville AC, Bengio Y (2015) A recurrent latent variable model for sequential data. In: Proceedings of NIPS, pp 2980–2988

  • Colby KM, Weber S, Hilf FD (1971) Artificial paranoia. Artif Intell 2(1):1–25

    Article  Google Scholar 

  • Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537

    MATH  Google Scholar 

  • Elman JL (1993) Learning and development in neural networks: the importance of starting small. Cognition 48(1):71–99

    Article  Google Scholar 

  • Fraser B (1999) What are discourse markers? J Pragmat 31(7):931–952

    Article  Google Scholar 

  • Gašić M, Mrkšić N, Rojas-Barahona LM, Su P-H, Ultes S, Vandyke D, Wen T-H, Young S (2017) Dialogue manager domain adaptation using gaussian process reinforcement learning. Comput Speech Lang 45:552–569

    Article  Google Scholar 

  • Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of NIPS, pp 2672–2680

  • Grosz BJ, Sidner CL (1986) Attention, intentions, and the structure of discourse. Comput Linguist 12(3):175–204

    Google Scholar 

  • Hakkani-Tür D, Tür G, Celikyilmaz A, Chen Y-N, Gao J, Deng L, Wang Y-Y (2016) Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In: Interspeech, pp 715–719

  • Herbrich R (2000) Large margin rank boundaries for ordinal regression. In: Advances in large margin classifiers, pp 115–132

  • Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend. In: Proceedings of NIPS, pp 1693–1701

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Hu B, Lu Z, Li H, Chen Q (2014) Convolutional neural network architectures for matching natural language sentences. In: Proceedings of NIPS, pp 2042–2050

  • Jafarpour S, Burges CJ, Ritter A (2010) Filter, rank, and transfer the knowledge: learning to chat. Adv Rank 10:2329–9290

    Google Scholar 

  • Ji Z, Lu Z, Li H (2014) An information retrieval approach to short text conversation. arXiv preprint arXiv:1408.6988

  • Jurafsky D (2000) Speech & language processing. Pearson Education India

  • Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188

  • Kannan A, Vinyals O (2017) Adversarial evaluation of dialogue models. arXiv preprint arXiv:1701.08198

  • Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Thirtieth AAAI conference on artificial intelligence

  • Koehn P (2009) Statistical machine translation. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Kumar A, Irsoy O, Ondruska P, Iyyer M, Bradbury J, Gulrajani I, Zhong V, Paulus R, Socher R (2016) Ask me anything: dynamic memory networks for natural language processing. In: International conference on machine learning, pp 1378–1387

  • LeCun Y, Bengio Y et al (1995) Convolutional networks for images, speech, and time series. Handb Brain Theory Neural Netw 3361(10):1995

    Google Scholar 

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436

    Article  Google Scholar 

  • Li J, Luong M-T, Jurafsky D (2015) A hierarchical neural autoencoder for paragraphs and documents. arXiv preprint arXiv:1506.01057

  • Li J, Monroe W, Ritter A, Galley M, Gao J, Jurafsky D (2016) Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541

  • Li J, Monroe W, Shi T, Jean S, Ritter A, Jurafsky D (2017) Adversarial learning for neural dialogue generation. arXiv preprint arXiv:1701.06547

  • Litman DJ, Silliman S (2004) Itspoke: an intelligent tutoring spoken dialogue system. In: Demonstration papers at HLT-NAACL 2004. Association for Computational Linguistics, pp 5–8

  • Liu C-W, Lowe R, Serban IV, Noseworthy M, Charlin L, Pineau J (2016) How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. arXiv preprint arXiv:1603.08023

  • Lowe R, Noseworthy M, Serban IV, Angelard-Gontier N, Bengio Y, Pineau J (2017a) Towards an automatic turing test: learning to evaluate dialogue responses. arXiv preprint arXiv:1708.07149

  • Lowe R, Pow N, Serban I, Pineau J (2015) The ubuntu dialogue corpus: a large dataset for research in unstructured multi-turn dialogue systems. arXiv preprint arXiv:1506.08909

  • Lowe RT, Pow N, Serban IV, Charlin L, Liu C-W, Pineau J (2017b) Training end-to-end dialogue systems with the ubuntu dialogue corpus. Dialogue Discourse 8(1):31–65

    Article  Google Scholar 

  • Luan Y, Ji Y, Ostendorf M (2016) LSTM based conversation models. arXiv preprint arXiv:1603.09457

  • Mei H, Bansal M, Walter MR (2017) Coherent dialogue with attention-based language models. In: Thirty-first AAAI conference on artificial intelligence

  • Meng F, Lu Z, Wang M, Li H, Jiang W, Liu Q (2015) Encoding source language with convolutional neural network for machine translation. arXiv preprint arXiv:1503.01838

  • Mesnil G, Dauphin Y, Yao K, Bengio Y, Deng L, Hakkani-Tur D, He X, Heck L, Tur G, Yu D et al (2015) Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans Audio Speech Lang Process 23(3):530–539

    Article  Google Scholar 

  • Microsoft (2014)

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  • Mikolov T, Karafiát M, Burget L, Černockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: Eleventh annual conference of the international speech communication association

  • Miyamoto Y, Cho K (2016) Gated word-character recurrent language model. arXiv preprint arXiv:1606.01700

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 311–318

  • Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  • Pierre JM, Butler M, Portnoff J, Aguilar L (2016) Neural discourse modeling of conversations. arXiv preprint arXiv:1607.04576

  • Prakash A, Brockett C, Agrawal P (2016) Emulating human conversations using convolutional neural network-based IR. arXiv preprint arXiv:1606.07056

  • Rajeswar S, Subramanian S, Dutil F, Pal C, Courville A (2017) Adversarial generation of natural language. arXiv preprint arXiv:1705.10929

  • Ritter A, Cherry C, Dolan WB (2011) Data-driven response generation in social media. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 583–593

  • Serban IV, Klinger T, Tesauro G, Talamadupula K, Zhou B, Bengio Y, Courville A (2017a) Multiresolution recurrent neural networks: an application to dialogue response generation. In: Thirty-first AAAI conference on artificial intelligence

  • Serban IV, Lowe R, Henderson P, Charlin L, Pineau J (2015) A survey of available corpora for building data-driven dialogue systems. arXiv preprint arXiv:1512.05742

  • Serban IV, Lowe R, Henderson P, Charlin L, Pineau J (2018) A survey of available corpora for building data-driven dialogue systems: the journal version. Dialogue Discourse 9(1):1–49

    Article  Google Scholar 

  • Serban IV, Sankar C, Germain M, Zhang S, Lin Z, Subramanian S, Kim T, Pieper M, Chandar S, Ke NR, et al. (2017b) A deep reinforcement learning chatbot. arXiv preprint arXiv:1709.02349

  • Serban IV, Sordoni A, Bengio Y, Courville A, Pineau J (2016) Building end-to-end dialogue systems using generative hierarchical neural network models. In: Thirtieth AAAI conference on artificial intelligence

  • Serban IV, Sordoni A, Lowe R, Charlin L, Pineau J, Courville A, Bengio Y (2017c) A hierarchical latent variable encoder–decoder model for generating dialogues. In: Thirty-first AAAI conference on artificial intelligence

  • Shang L, Lu Z, Li H (2015) Neural responding machine for short-text conversation. arXiv preprint arXiv:1503.02364

  • Shao L, Gouws S, Britz D, Goldie A, Strope B, Kurzweil R (2017) Generating high-quality and informative conversation responses with sequence-to-sequence models. arXiv preprint arXiv:1701.03185

  • Sordoni A, Galley M, Auli M, Brockett C, Ji Y, Mitchell M, Nie J-Y, Gao J, Dolan B (2015) A neural network approach to context-sensitive generation of conversational responses. arXiv preprint arXiv:1506.06714

  • Specht DF (1990) Probabilistic neural networks. Neural Netw 3(1):109–118

    Article  Google Scholar 

  • Su P-H, Gašić M, Young S (2018) Reward estimation for dialogue policy optimisation. Comput Speech Lang 51:24–43

    Article  Google Scholar 

  • Suendermann D, Evanini K, Liscombe J, Hunter P, Dayanidhi K, Pieraccini R (2009) From rule-based to statistical grammars: continuous improvement of large-scale spoken dialog systems. In: 2009 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 4713–4716

  • Sundermeyer M, Schlüter R, Ney H (2012) LSTM neural networks for language modeling. In: Thirteenth annual conference of the international speech communication association

  • Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of NIPS, pp 3104–3112

  • Swanson R, Gordon AS (2008) Say anything: a massively collaborative open domain story writing companion. In: Joint international conference on interactive digital storytelling. Springer, pp 32–40

  • Tian Z, Yan R, Mou L, Song Y, Feng Y, Zhao D (2017) How to make context more useful? An empirical study on context-aware neural conversational models. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Vol 2: Short Papers), pp 231–236

  • Tiedemann J (2009) News from opus-a collection of multilingual parallel corpora with tools and interfaces. Recent Adv Nat Lang Process 5:237–248

    Article  Google Scholar 

  • Vinyals O, Le Q (2015) A neural conversational model. arXiv preprint arXiv:1506.05869

  • Walker M, Whittaker S (1990) Mixed initiative in dialogue: an investigation into discourse segmentation. In: Proceedings of the 28th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 70–78

  • Wang H, Lu Z, Li H, Chen E (2013) A dataset for research on short-text conversations. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 935–945

  • Ward W, Issar S (1994) Recent improvements in the CMU spoken language understanding system. In: Proceedings of the workshop on human language technology. Association for Computational Linguistics, pp 213–216

  • Weizenbaum J (1966) Eliza—a computer program for the study of natural language communication between man and machine. Commun ACM 9(1):36–45

    Article  Google Scholar 

  • Wen T-H, Gasic M, Mrksic N, Su P-H, Vandyke D, Young S (2015) Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. arXiv preprint arXiv:1508.01745

  • Weston J, Chopra S, Bordes A (2014) Memory networks. arXiv preprint arXiv:1410.3916

  • Wikipedia (2018) Turn-taking

  • Wu B, Wang B, Xue H (2016a) Ranking responses oriented to conversational relevance in chat-bots. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 652–662

  • Wu Y, Wu W, Xing C, Zhou M, Li Z (2016b) Sequential matching network: a new architecture for multi-turn response selection in retrieval-based chatbots. arXiv preprint arXiv:1612.01627

  • Xing C, Wu W, Wu Y, Liu J, Huang Y, Zhou M, Ma W-Y (2017) Topic aware neural response generation. In: Thirty-first AAAI conference on artificial intelligence

  • Xing C, Wu Y, Wu W, Huang Y, Zhou M (2018) Hierarchical recurrent attention network for response generation. In: Thirty-second AAAI conference on artificial intelligence

  • Yan Z, Duan N, Chen P, Zhou M, Zhou J, Li Z (2017) Building task-oriented dialogue systems for online shopping. In: Thirty-first AAAI conference on artificial intelligence

  • Yang Y, Yih W-t, Meek C (2015) Wikiqa: A challenge dataset for open-domain question answering. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2013–2018

  • Yin W, Schütze H, Xiang B, Zhou B (2016) ABCNN: attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Comput Linguist 4:259–272

    Article  Google Scholar 

  • Zens R, Och FJ, Ney H (2002) Phrase-based statistical machine translation. In: Annual conference on artificial intelligence. Springer, pp 18–32

  • Zhang X, LeCun Y (2015) Text understanding from scratch. arXiv preprint arXiv:1502.01710

  • Zhang Z, Li J, Zhu P, Zhao H, Liu G (2018) Modeling multi-turn conversation with deep utterance aggregation. arXiv preprint arXiv:1806.09102

  • Zhao WX, Jiang J, Weng J, He J, Lim E-P, Yan H, Li X (2011) Comparing twitter and traditional media using topic models. In: European conference on information retrieval. Springer, pp 338–349

  • Zhou G, Luo P, Cao R, Lin F, Chen B, He Q (2017) Mechanism-aware neural machine for dialogue response generation. In: Thirty-first AAAI conference on artificial intelligence

  • Zhou G, Luo P, Xiao Y, Lin F, Chen B, He Q (2018) Elastic responding machine for dialog generation with dynamically mechanism selecting. In: Thirty-second AAAI conference on artificial intelligence

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akanksha Mehndiratta.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mehndiratta, A., Asawa, K. Non-goal oriented dialogue agents: state of the art, dataset, and evaluation. Artif Intell Rev 54, 329–357 (2021). https://doi.org/10.1007/s10462-020-09848-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-020-09848-z

Keywords

Navigation