Abstract:
Field speech data pose great challenges to statistical modeling because the speech signal is often intermixed with extraneous sounds and other environmental noise either ...Show MoreMetadata
Abstract:
Field speech data pose great challenges to statistical modeling because the speech signal is often intermixed with extraneous sounds and other environmental noise either that are too difficult to compensate dynamically or for which it is too expensive to collect sufficient data for proper offline training. We propose a detection based method in which the speech recognizer can sharply tune to only the "meaningful" speech and gracefully ignore the "unwanted" audio segments. The method is designed to be integrated with the frame synchronous search for a single pass processing. In contrast to the conventional keyword spotting techniques, this integration allows the use of the language model for better predicting the detection targets during the search. To study its efficacy, we apply the framework to a spontaneous speech understanding application where cohesive phrases congruent to the domain semantics and application context are used as the salient feature for selective hearing. Experimental results on the effectiveness of the system in dealing with out of domain phrases and other spontaneous speech effects are encouraging.
Date of Conference: 17-21 May 2004
Date Added to IEEE Xplore: 30 August 2004
Print ISBN:0-7803-8484-9
Print ISSN: 1520-6149