ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

An Improved Transformer Transducer Architecture for Hindi-English Code Switched Speech Recognition

Ansen Antony, Sumanth Reddy Kota, Akhilesh Lade, Spoorthy V, Shashidhar G. Koolagudi

Due to the extensive usage of technology in many languages throughout the world, interest in Automatic Speech Recognition (ASR) systems for Code-Switching (CS) in speech has grown in recent years. Several studies have shown that End-to-End (E2E) ASR is easier to adopt and works much better in monolingual settings. E2E systems are likewise widely recognised for requiring massive quantities of labelled speech data. Since there is a scarcity in the availability of large amount of CS speech, E2E ASR takes longer computation time and does not offer promising results. In this work, an E2E ASR model system using a transformer-transducer architecture is introduced for code-switched Hindi-English speech, and also addressed training data scarcity by leveraging the vastly available monolingual data. Specifically, the language-specific modules in the Transformer are pre-trained by leveraging the vastly available single language speech datasets. The proposed method also provides a Word Error Rate (WER) of 29.63% and a Transliterated Word Error Rate (T-WER) of 27.42% which is better than the state-of-the-art by 2.19%.


doi: 10.21437/Interspeech.2022-10763

Cite as: Antony, A., Kota, S.R., Lade, A., V, S., Koolagudi, S.G. (2022) An Improved Transformer Transducer Architecture for Hindi-English Code Switched Speech Recognition. Proc. Interspeech 2022, 3123-3127, doi: 10.21437/Interspeech.2022-10763

@inproceedings{antony22_interspeech,
  author={Ansen Antony and Sumanth Reddy Kota and Akhilesh Lade and Spoorthy V and Shashidhar G. Koolagudi},
  title={{An Improved Transformer Transducer Architecture for Hindi-English Code Switched Speech Recognition}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={3123--3127},
  doi={10.21437/Interspeech.2022-10763}
}