Skip to main content

A Large Speech Database for Brazilian Portuguese Spoken Language Research

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2721))

  • 464 Accesses

Abstract

Speech recognition systems use statistical methods based algorithms, and therefore need several training samples to perform properly. Consequently such systems require huge databases for training and testing. The development of large speech corpora in Europe and in the USA was possible only with the cooperation among research centers, universities, private companies and the government. In these countries, the availability of such databases provided the resources which were responsible for the great improvement in speech technologies in the last few years. In Brazil, such consortiums are not even mentioned, and the researchers have to work with small, locally developed databases. In this article we report an effort to develop a large speech corpus for Brazilian Portuguese to fill this crucial gap.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Albano, E.C. and Moreira, A.A., “Archisegment-Based Letter-to-Phone Conversion for Concatenative Speech Synthesis in Portuguese”, Proceedings ICSLP’96, 1996, v.3, pp. 1708–1711.

    Google Scholar 

  2. Benoît, C. “An intelligibility test using semantically unpredictable sentences: Towards the quantification of linguistic complexity”. Speech Communication 9, 1990, pp. 293–303.

    Article  Google Scholar 

  3. “BD-PUBLICO (Base de Dados em Português eUropeu, vocaBulário Largo, Independente do orador e fala Contínua)” http://www.speech.inesc.pt/bib/Trancoso98a/bdpub.html (31/03/1999).

  4. Cole, R., ed., Survey of the State of the Art in Human Language Technology, http://cslu.cse.ogi.edu/publications/index.htm, (26/10/98).

  5. “EUROM_1: a multilingual European speech database”. http://www.icp.grenet.fr/Relator/multiling/eurom1.html∖#PortugCorpus (31/03/1999)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ynoguti, C.A., Barbosa, P.A., Violaro, F. (2003). A Large Speech Database for Brazilian Portuguese Spoken Language Research. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds) Computational Processing of the Portuguese Language. PROPOR 2003. Lecture Notes in Computer Science(), vol 2721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45011-4_30

Download citation

  • DOI: https://doi.org/10.1007/3-540-45011-4_30

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40436-1

  • Online ISBN: 978-3-540-45011-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics