Low resourced languages suffer from limited training data and resources. Data augmentation is a common approach to increasing the amount of training data. Additional data is synthesized by manipulating the original data with a variety of methods. Unlike most previous work that focuses on a single technique, we combine multiple, complementary augmentation approaches. The first stage adds noise and perturbs the speed of additional copies of the original audio. The data is further augmented in a second stage, where a novel fMLLR-based augmentation is applied to bottleneck features to further improve performance. A reduction in word error rate is demonstrated on four languages from the IARPA Babel program. We present an analysis exploring why these techniques are beneficial.
Cite as: Hartmann, W., Ng, T., Hsiao, R., Tsakalidis, S., Schwartz, R. (2016) Two-Stage Data Augmentation for Low-Resourced Speech Recognition. Proc. Interspeech 2016, 2378-2382, doi: 10.21437/Interspeech.2016-1386
@inproceedings{hartmann16b_interspeech, author={William Hartmann and Tim Ng and Roger Hsiao and Stavros Tsakalidis and Richard Schwartz}, title={{Two-Stage Data Augmentation for Low-Resourced Speech Recognition}}, year=2016, booktitle={Proc. Interspeech 2016}, pages={2378--2382}, doi={10.21437/Interspeech.2016-1386} }