Efficient out-of-vocabulary term detection by n-gram array indices with distance from a syllable lattice | IEEE Conference Publication | IEEE Xplore

Efficient out-of-vocabulary term detection by n-gram array indices with distance from a syllable lattice


Abstract:

For spoken document retrieval, it is very important to con sider Out-of-Vocabulary (OOV) and mis-recognition of spoken words. Therefore, sub-word unit based recognition a...Show More

Abstract:

For spoken document retrieval, it is very important to con sider Out-of-Vocabulary (OOV) and mis-recognition of spoken words. Therefore, sub-word unit based recognition and retrieval methods have been proposed. This paper describes a Japanese spoken document retrieval system that is robust for considering OOV words and mis-recognition of sub-units. We used individual syllables as sub-word unit in continuous speech recognition and an n-gram sequence of syllables in a recognized syllable-based lattice. We propose an n-gram indexing/retrieval method with distance in the syllable lattice for attacking OOV, recognition errors, and high speed retrieval. We applied this method to academic lecture presentation database of 44 hours, and 0.58(F-value) of the OOV words were detected in less than 2.5 milliseconds.
Date of Conference: 22-27 May 2011
Date Added to IEEE Xplore: 11 July 2011
ISBN Information:

ISSN Information:

Conference Location: Prague, Czech Republic

Contact IEEE to Subscribe

References

References is not available for this document.