ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Joint Retrieval-Extraction Training for Evidence-Aware Dialog Response Selection

Hongyin Luo, James Glass, Garima Lalwani, Yi Zhang, Shang-Wen Li

Neural dialog response selection models infer by scoring each candidate response given the dialog context, and the cross-encoder method yields state-of-the-art (SOTA) results for the task. In the method, the candidate scores are computed by feeding the output embedding of the first token in the input sequence, which is a concatenation of response and context, to a linear layer for making prediction. However, the embeddings of the other tokens in the sequence are not modeled explicitly, and inferring the candidate scores only with the first token makes the result not interpretable. To address the challenge, we propose a Retrieval-EXtraction encoder (REX) for dialog response selection. We augment the existing first-token- or sequence- based retrieval approach with an extraction loss. The loss provides gradient signal from each token during training and allows the model to learn token-level evidence and to select response based on important keywords. We show that REX achieves the new SOTA in the dialog response selection task. Also, our qualitative analysis suggests that REX highlights evidence it infers selections from and makes the inference result interpretable.


doi: 10.21437/Interspeech.2021-1689

Cite as: Luo, H., Glass, J., Lalwani, G., Zhang, Y., Li, S.-W. (2021) Joint Retrieval-Extraction Training for Evidence-Aware Dialog Response Selection. Proc. Interspeech 2021, 3241-3245, doi: 10.21437/Interspeech.2021-1689

@inproceedings{luo21d_interspeech,
  author={Hongyin Luo and James Glass and Garima Lalwani and Yi Zhang and Shang-Wen Li},
  title={{Joint Retrieval-Extraction Training for Evidence-Aware Dialog Response Selection}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={3241--3245},
  doi={10.21437/Interspeech.2021-1689}
}