Abstract
Speech recognition systems use statistical methods based algorithms, and therefore need several training samples to perform properly. Consequently such systems require huge databases for training and testing. The development of large speech corpora in Europe and in the USA was possible only with the cooperation among research centers, universities, private companies and the government. In these countries, the availability of such databases provided the resources which were responsible for the great improvement in speech technologies in the last few years. In Brazil, such consortiums are not even mentioned, and the researchers have to work with small, locally developed databases. In this article we report an effort to develop a large speech corpus for Brazilian Portuguese to fill this crucial gap.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Albano, E.C. and Moreira, A.A., “Archisegment-Based Letter-to-Phone Conversion for Concatenative Speech Synthesis in Portuguese”, Proceedings ICSLP’96, 1996, v.3, pp. 1708–1711.
Benoît, C. “An intelligibility test using semantically unpredictable sentences: Towards the quantification of linguistic complexity”. Speech Communication 9, 1990, pp. 293–303.
“BD-PUBLICO (Base de Dados em Português eUropeu, vocaBulário Largo, Independente do orador e fala Contínua)” http://www.speech.inesc.pt/bib/Trancoso98a/bdpub.html (31/03/1999).
Cole, R., ed., Survey of the State of the Art in Human Language Technology, http://cslu.cse.ogi.edu/publications/index.htm, (26/10/98).
“EUROM_1: a multilingual European speech database”. http://www.icp.grenet.fr/Relator/multiling/eurom1.html∖#PortugCorpus (31/03/1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ynoguti, C.A., Barbosa, P.A., Violaro, F. (2003). A Large Speech Database for Brazilian Portuguese Spoken Language Research. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds) Computational Processing of the Portuguese Language. PROPOR 2003. Lecture Notes in Computer Science(), vol 2721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45011-4_30
Download citation
DOI: https://doi.org/10.1007/3-540-45011-4_30
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40436-1
Online ISBN: 978-3-540-45011-5
eBook Packages: Springer Book Archive