Abstract:
This paper addresses the policy optimization of a dialogue management scheme based on partially observable Markov decision processes (POMDP), which is designed for out-of...Show MoreMetadata
Abstract:
This paper addresses the policy optimization of a dialogue management scheme based on partially observable Markov decision processes (POMDP), which is designed for out-of-domain (OOD) utterances processing in spoken dialogue system. First, POMDP-Based DM Modeling for OOD Utterances is proposed, together with detail of some principal elements. Then, joint state transition exploration and dialogue policy optimization are performed in batch. Value iteration method of reinforcement learning framework is employed to optimize the dialogue policy. Our approach is tested through interaction with user in a Chinese restricted domain dialogue system supporting to act as a mobile phone recommendation assistant. Evaluation results show that a usable policy can be learnt in just a few hundred dialogues, and the optimized policy can obtain a convergence of good dialogue reward.
Date of Conference: 21-23 November 2016
Date Added to IEEE Xplore: 13 March 2017
ISBN Information: