Abstract:
The main motivation of this paper is to improve the automatic speech recognition (ASR) hypothesis in the Malay language. Manual news transcription is too expensive and ta...Show MoreMetadata
Abstract:
The main motivation of this paper is to improve the automatic speech recognition (ASR) hypothesis in the Malay language. Manual news transcription is too expensive and takes a long time. Hence, without an ASR system, access to audio archives and searches within them would be restricted to the limited number of textual documents that have been manually transcribed by humans or indexed with keywords. Multiple hypotheses are useful because the single best recognition output still has numerous errors, even for state-of-the-art systems. In this paper, we propose an approach to reduce the word error rate (WER) in an ASR hypothesis. This approach is known as the three-pass combination method using parallel ASR systems. The three-pass combination system based on grapheme rescoring and phone rescoring re-evaluates all of the hypotheses produced by the ASR systems to produce a more accurate hypothesis. To evaluate the performance of the proposed approach, Malay broadcast news contains speech from newscaster, reporter and interviewers in noisy environments recorded from Malaysia local news channels are employed. This approach reduced the WER by 4.4% from 34.5% to 30.1%. The performance of the proposed approach was compared with six approaches that are frequently used for ASR rescoring and combination.
Published in: 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)
Date of Conference: 12-15 July 2015
Date Added to IEEE Xplore: 03 September 2015
ISBN Information: