Recently developed end-to-end (E2E) automatic speech recognition (ASR) systems demand abundance of transcribed speech data, there are several scenarios where the labeling of speech data is cumbersome and expensive. For a fixed annotation cost, active learning for speech recognition allows to efficiently train the ASR model. In this work, we advance the most common approach for active learning methods which relies on uncertainty sampling technique. In particular, we explore the use of path probability of the decoded sequence as a confidence measure and select the samples with the least confidence for active learning. In order to reduce the sampling bias in active learning, we propose a regularized uncertainty sampling approach that incorporates an i-vector diversity measure. Thus, the active learning in the proposed framework uses a joint score of uncertainty and i-vector diversity. The benefits of the proposed approach are illustrated for an E2E ASR task performed on CSJ and Librispeech datasets. In these experiments, we show that the proposed approach yields considerable improvements over the baseline model using random sampling.
Cite as: Malhotra, K., Bansal, S., Ganapathy, S. (2019) Active Learning Methods for Low Resource End-to-End Speech Recognition. Proc. Interspeech 2019, 2215-2219, doi: 10.21437/Interspeech.2019-2316
@inproceedings{malhotra19_interspeech, author={Karan Malhotra and Shubham Bansal and Sriram Ganapathy}, title={{Active Learning Methods for Low Resource End-to-End Speech Recognition}}, year=2019, booktitle={Proc. Interspeech 2019}, pages={2215--2219}, doi={10.21437/Interspeech.2019-2316} }