Abstract:
Deep learning techniques, requiring large amounts of training data, are currently state-of-the-art in automatic speech recognition (ASR). Corporate giants, such as Google...Show MoreMetadata
Abstract:
Deep learning techniques, requiring large amounts of training data, are currently state-of-the-art in automatic speech recognition (ASR). Corporate giants, such as Google or IBM, train English ASR systems on more than 100k hours of annotated speech, while research on under-resourced languages, such as Romanian, has to deal with as little as 300 hours. In this context, automatic annotation of speech corpora and unsupervised acoustic model training are promising directions to be explored to leverage the lack of data. This study describes the progress made by SpeeD laboratory in this research direction: using an already proven methodology, applying it on large scale (more than 700 hours of unlabeled speech) and analyzing in-depth the experimental results to identify potential future directions. Moreover, we present novel results on Romanian ASR: the methodology leads to a relative Word Error Rate (WER) improvement up to almost 10%.
Date of Conference: 01-03 July 2019
Date Added to IEEE Xplore: 25 July 2019
ISBN Information: