Conferences >2018 IEEE Spoken Language Tec...

Word Segmentation From Phoneme Sequences Based On Pitman-Yor Semi-Markov Model Exploiting Subword Information

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Word segmentation from phoneme sequences is essential to identify unknown words -of-vocabulary; OOV) in spoken dialogues. The Pitman-Yor semi-Markov model (PYSMM) is used...Show More

Metadata

Abstract:

Word segmentation from phoneme sequences is essential to identify unknown words -of-vocabulary; OOV) in spoken dialogues. The Pitman-Yor semi-Markov model (PYSMM) is used for word segmentation that handles dynamic increase in vocabularies. The obtained vocabularies, however, still include meaningless entries due to insufficient cues for phoneme sequences. We focus here on using subword information to capture patterns as “words.” We propose 1) a model based on subword N-gram and subword estimation using a vocabulary set, and 2) posterior fusion of the results of a PYSMM and our model to take advantage of both. Our experiments showed 1) the potential of using subword information for OOV acquisition, and 2) that our method outperformed the PYSMM by 1.53 and 1.07 in terms of the F-measure of the obtained OOV set for English and Japanese corpora, respectively.

Published in: 2018 IEEE Spoken Language Technology Workshop (SLT)

Date of Conference: 18-21 December 2018

Date Added to IEEE Xplore: 14 February 2019

ISBN Information:

DOI: 10.1109/SLT.2018.8639607

Conference Location: Athens, Greece