Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

Hara, Kohei; Inoue, Koji; Takanashi, Katsuya; Kawahara, Tatsuya

doi:10.21437/Interspeech.2018-1442

Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

Kohei Hara, Koji Inoue, Katsuya Takanashi, Tatsuya Kawahara

We address prediction of turn-taking considering related behaviors such as backchannels and fillers. Backchannels are used by listeners to acknowledge that the current speaker can hold the turn. On the other hand, fillers are used by prospective speakers to indicate a will to take a turn. We propose a turn-taking model based on multitask learning in conjunction with prediction of backchannels and fillers. The multitask learning of LSTM neural networks shared by these tasks allows for efficient and generalized learning and thus improves prediction accuracy. Evaluations with two kinds of dialogue corpora of human-robot interaction demonstrate that the proposed multitask learning scheme outperforms the conventional single-task learning.

doi: 10.21437/Interspeech.2018-1442

Cite as: Hara, K., Inoue, K., Takanashi, K., Kawahara, T. (2018) Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers. Proc. Interspeech 2018, 991-995, doi: 10.21437/Interspeech.2018-1442

@inproceedings{hara18_interspeech,
  author={Kohei Hara and Koji Inoue and Katsuya Takanashi and Tatsuya Kawahara},
  title={{Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers}},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={991--995},
  doi={10.21437/Interspeech.2018-1442}
}