Two-Stage Data Augmentation for Low-Resourced Speech Recognition

Hartmann, William; Ng, Tim; Hsiao, Roger; Tsakalidis, Stavros; Schwartz, Richard

doi:10.21437/Interspeech.2016-1386

Two-Stage Data Augmentation for Low-Resourced Speech Recognition

William Hartmann, Tim Ng, Roger Hsiao, Stavros Tsakalidis, Richard Schwartz

Low resourced languages suffer from limited training data and resources. Data augmentation is a common approach to increasing the amount of training data. Additional data is synthesized by manipulating the original data with a variety of methods. Unlike most previous work that focuses on a single technique, we combine multiple, complementary augmentation approaches. The first stage adds noise and perturbs the speed of additional copies of the original audio. The data is further augmented in a second stage, where a novel fMLLR-based augmentation is applied to bottleneck features to further improve performance. A reduction in word error rate is demonstrated on four languages from the IARPA Babel program. We present an analysis exploring why these techniques are beneficial.

doi: 10.21437/Interspeech.2016-1386

Cite as: Hartmann, W., Ng, T., Hsiao, R., Tsakalidis, S., Schwartz, R. (2016) Two-Stage Data Augmentation for Low-Resourced Speech Recognition. Proc. Interspeech 2016, 2378-2382, doi: 10.21437/Interspeech.2016-1386

@inproceedings{hartmann16b_interspeech,
  author={William Hartmann and Tim Ng and Roger Hsiao and Stavros Tsakalidis and Richard Schwartz},
  title={{Two-Stage Data Augmentation for Low-Resourced Speech Recognition}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={2378--2382},
  doi={10.21437/Interspeech.2016-1386}
}