Leveraging ASR N-Best in Deep Entity Retrieval

Wang, Haoyu; Chen, John; Laali, Majid; Durda, Kevin; King, Jeff; Campbell, William; Liu, Yang

doi:10.21437/Interspeech.2021-1370

Leveraging ASR N-Best in Deep Entity Retrieval

Haoyu Wang, John Chen, Majid Laali, Kevin Durda, Jeff King, William Campbell, Yang Liu

Entity Retrieval (ER) in spoken dialog systems is a task that retrieves entities in a catalog for the entity mentions in user utterances. ER systems are susceptible to upstream errors, with Automatic Speech Recognition (ASR) errors being particularly troublesome. In this work, we propose a robust deep learning based ER system by leveraging ASR N-best hypotheses. Specifically, we evaluate different neural architectures to infuse ASR N-best through an attention mechanism. On 750 hours of audio data taken from live traffic, our best model achieves 11.07% relative error reduction while maintaining the same performance on rejecting out-of-domain ER requests.

doi: 10.21437/Interspeech.2021-1370

Cite as: Wang, H., Chen, J., Laali, M., Durda, K., King, J., Campbell, W., Liu, Y. (2021) Leveraging ASR N-Best in Deep Entity Retrieval. Proc. Interspeech 2021, 261-265, doi: 10.21437/Interspeech.2021-1370

@inproceedings{wang21b_interspeech,
  author={Haoyu Wang and John Chen and Majid Laali and Kevin Durda and Jeff King and William Campbell and Yang Liu},
  title={{Leveraging ASR N-Best in Deep Entity Retrieval}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={261--265},
  doi={10.21437/Interspeech.2021-1370}
}