Conferences >2016 IEEE Spoken Language Tec...

Iterative training of a DPGMM-HMM acoustic unit recognizer in a zero resource scenario

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this paper we propose a framework for building a full-fledged acoustic unit recognizer in a zero resource setting, i.e., without any provided labels. For that, we comb...Show More

Metadata

Abstract:

In this paper we propose a framework for building a full-fledged acoustic unit recognizer in a zero resource setting, i.e., without any provided labels. For that, we combine an iterative Dirichlet process Gaussian mixture model (DPGMM) clustering framework with a standard pipeline for supervised GMM-HMM acoustic model (AM) and n-gram language model (LM) training, enhanced by a scheme for iterative model re-training. We use the DPGMM to cluster feature vectors into a dynamically sized set of acoustic units. The frame based class labels serve as transcriptions of the audio data and are used as input to the AM and LM training pipeline. We show that iterative unsupervised model re-training of this DPGMM-HMM acoustic unit recognizer improves performance according to an ABX sound class discriminability task based evaluation. Our results show that the learned models generalize well and that sound class discriminability benefits from contextual information introduced by the language model. Our systems are competitive with supervisedly trained phone recognizers, and can beat the baseline set by DPGMM clustering.

Published in: 2016 IEEE Spoken Language Technology Workshop (SLT)

Date of Conference: 13-16 December 2016

Date Added to IEEE Xplore: 09 February 2017

ISBN Information:

DOI: 10.1109/SLT.2016.7846245

Conference Location: San Diego, CA, USA