Conferences >2013 International Conference...

The development and analysis of a Malay broadcasr news corpus

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This paper presents our effort in collecting a Malay broadcast news (BN) speech corpus to support our research in Malay LVCSR. The 53 hours corpus is recorded from the TV...Show More

Metadata

Abstract:

This paper presents our effort in collecting a Malay broadcast news (BN) speech corpus to support our research in Malay LVCSR. The 53 hours corpus is recorded from the TV channels in both Singapore and Malaysia over a 9-month period. To facilitate various researches in LVCSR, besides of orthographic transcription, the corpus provides other metadata such as speaking environment type, speaker identity information, language identity, and topic descriptions. In the orthographic transcription, we also tagged various linguistic phenomena such as disfluencies, code switched words, and proper nouns. We trained an ASR system and achieved a word error rate of 8.5% for anchor speech and 17.1% overall (including reporter and other speakers speech) on 27 hours of test data.

Published in: 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Date of Conference: 25-27 November 2013

Date Added to IEEE Xplore: 11 January 2014

ISBN Information:

DOI: 10.1109/ICSDA.2013.6709862

Conference Location: Gurgaon, India

Contents

References is not available for this document.

The development and analysis of a Malay broadcasr news corpus

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

The development and analysis of a Malay broadcasr news corpus

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?