Development of Multi-lingual Spoken Corpora of Indian Languages

Samudravijaya, K.

doi:10.1007/11939993_79

Development of Multi-lingual Spoken Corpora of Indian Languages

K. Samudravijaya²²

Conference paper

1567 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Abstract

This paper describes a recently initiated effort for collection and transcription of read as well as spontaneous speech data in four Indian languages. The completed preparatory work include the design of phonetically rich sentences, data acquisition setup for recording speech data over telephone channel, a Wizard of Oz setup for acquiring speech data of a spoken dialogue of a caller with the machine in the context of a remote information retrieval task. An account of care taken to collect speech data that is as close to real world as possible is given. The current status of the programme and the set of actions planned to achieve the goal is given.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://au-kbc.org/dfki/index.html
http://tdil.mit.gov.in/corpora/ach-corpora.htm
Agrawal, S., Samudravijaya, K., Arora, K.: Recent Advances of Speech Databases development activity for Indian Languages. In: Proc. of ISCSLP 2006, Companion. COLIPS, Singapore (2006)
Google Scholar
Samudravijaya, K., Rao, P.V.S., Agrawal, S.S.: Hindi Speech Database. In: Proc. Int. Conf. on Spoken Language processing(ICSLP 2000) Beijing China, CDROM paper: 00192.pdf (2000)
Google Scholar
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1
Chourasia, V., Samudravijaya, K., Chandwani, M.: Phonetically Rich Hindi Sentence Corpus for Creation of Speech Database. In: Proc. O-COCOSDA 2005, Indonesia, pp. 132–137 (2005)
Google Scholar
http://gps.tsc.upc.es/veu/personal/sesma/sesma/CorpusCrt/php3

Download references

Author information

Authors and Affiliations

Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai, 400005, India
K. Samudravijaya

Authors

K. Samudravijaya
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Hong Kong, Hong Kong
Qiang Huo
Human Language Technology Department, Institute for Infocomm Research (I2R), 119613, Singapore
Bin Ma
School of Computer Engineering, Nanyang Technological University (NTU), 639798, Singapore
Eng-Siong Chng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Haizhou Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Samudravijaya, K. (2006). Development of Multi-lingual Spoken Corpora of Indian Languages. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_79

Download citation

DOI: https://doi.org/10.1007/11939993_79
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics