Conferences >2013 IEEE Workshop on Automat...

Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Measures of acoustic similarity between words or other units are critical for segmental exemplar-based acoustic models, spoken term discovery, and query-by-example search...Show More

Metadata

Abstract:

Measures of acoustic similarity between words or other units are critical for segmental exemplar-based acoustic models, spoken term discovery, and query-by-example search. Dynamic time warping (DTW) alignment cost has been the most commonly used measure, but it has well-known inadequacies. Some recently proposed alternatives require large amounts of training data. In the interest of finding more efficient, accurate, and low-resource alternatives, we consider the problem of embedding speech segments of arbitrary length into fixed-dimensional spaces in which simple distances (such as cosine or Euclidean) serve as a proxy for linguistically meaningful (phonetic, lexical, etc.) dissimilarities. Such embeddings would enable efficient audio indexing and permit application of standard distance learning techniques to segmental acoustic modeling. In this paper, we explore several supervised and unsupervised approaches to this problem and evaluate them on an acoustic word discrimination task. We identify several embedding algorithms that match or improve upon the DTW baseline in low-resource settings.

Published in: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding

Date of Conference: 08-12 December 2013

Date Added to IEEE Xplore: 09 January 2014

Electronic ISBN:978-1-4799-2756-2

DOI: 10.1109/ASRU.2013.6707765

Conference Location: Olomouc, Czech Republic