ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Neural Spoken-Response Generation Using Prosodic and Linguistic Context for Conversational Systems

Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

Spoken dialogue systems have become widely used in daily life. Such a system must interact with the user socially to truly operate as a partner with humans. In studies of recent dialogue systems, neural response generation led to natural response generation. However, these studies have not considered the acoustic aspects of conversational phenomena, such as the adaptation of prosody. We propose a spoken-response generation model that extends a neural conversational model to deal with pitch control signals. Our proposed model is trained using multimodal dialogue between humans. The generated pitch control signals are input to a speech synthesis system to control the pitch of synthesized speech. Our experiment shows that the proposed system can generate synthesized speech with an appropriate F0 contour as an utterance in context compared to the output of a system without pitch control, although language generation remains an issue.


doi: 10.21437/Interspeech.2021-381

Cite as: Yamazaki, Y., Chiba, Y., Nose, T., Ito, A. (2021) Neural Spoken-Response Generation Using Prosodic and Linguistic Context for Conversational Systems. Proc. Interspeech 2021, 246-250, doi: 10.21437/Interspeech.2021-381

@inproceedings{yamazaki21_interspeech,
  author={Yoshihiro Yamazaki and Yuya Chiba and Takashi Nose and Akinori Ito},
  title={{Neural Spoken-Response Generation Using Prosodic and Linguistic Context for Conversational Systems}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={246--250},
  doi={10.21437/Interspeech.2021-381}
}