Language identification (LID) systems, which can model high level information such as phonotactics have exhibited superior performance. State-of-the-art models use sequential models to capture the high-level information, but these models are sensitive to the length of the utterance and do not equally generalize over variable length utterances. To effectively capture this information, a feature that can model the long-term temporal context is required. This study aims to capture the long-term temporal context by appending successive shifted delta cepstral (SDC) features. Deep neural networks have been explored for developing LID systems. Experiments have been performed using AP17-OLR database. LID systems developed by stacking SDC features have shown significant improvement compared to the system trained with SDC features. The proposed feature with residual connections in the feed-forward networks reduced the equal error rate from 21.04, 18.02, 16.45 to 14.42, 11.14 and 10.11 on the 1-second, 3-seconds and > 3-second test utterances respectively.
Cite as: Kumar Vuddagiri, R., Vydana, H.K., Kumar Vuppala, A. (2018) Improved Language Identification Using Stacked SDC Features and Residual Neural Network. Proc. 6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018), 210-214, doi: 10.21437/SLTU.2018-44
@inproceedings{kumarvuddagiri18_sltu, author={Ravi {Kumar Vuddagiri} and Hari Krishna Vydana and Anil {Kumar Vuppala}}, title={{Improved Language Identification Using Stacked SDC Features and Residual Neural Network}}, year=2018, booktitle={Proc. 6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018)}, pages={210--214}, doi={10.21437/SLTU.2018-44} }