Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification | IEEE Journals & Magazine | IEEE Xplore

Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification


Abstract:

With successful applications of deep feature learning algorithms, spoken language identification (LID) on long utterances obtains satisfactory performance. However, the p...Show More

Abstract:

With successful applications of deep feature learning algorithms, spoken language identification (LID) on long utterances obtains satisfactory performance. However, the performance on short utterances is drastically degraded even when the LID system is trained using short utterances. The main reason is due to the large variation of the representation on short utterances which results in high model confusion. To narrow the performance gap between long, and short utterances, we proposed a teacher-student representation learning framework based on a knowledge distillation method to improve LID performance on short utterances. In the proposed framework, in addition to training the student model on short utterances with their true labels, the internal representation from the output of a hidden layer of the student model is supervised with the representation corresponding to their longer utterances. By reducing the distance of internal representations between short, and long utterances, the student model can explore robust discriminative representations for short utterances, which is expected to reduce model confusion. We conducted experiments on our in-house LID dataset, and NIST LRE07 dataset, and showed the effectiveness of the proposed methods for short utterance LID tasks.
Page(s): 2674 - 2683
Date of Publication: 14 September 2020

ISSN Information:


Contact IEEE to Subscribe

References

References is not available for this document.