A Baseline System for Continuous Speech Recognition of Brazilian Portuguese Using the West Point Brazilian Portuguese Speech Corpus

dos Santos, Fabiano Weimar; Barone, Dante Augusto Couto; Adami, André Gustavo

doi:10.1007/978-3-642-12320-7_18

A Baseline System for Continuous Speech Recognition of Brazilian Portuguese Using the West Point Brazilian Portuguese Speech Corpus

Fabiano Weimar dos Santos²⁴,
Dante Augusto Couto Barone²⁴ &
André Gustavo Adami²⁵

Conference paper

617 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6001))

Abstract

Despite the availability of several speech corpora that can be used to build automatic speech recognition systems, there are only a few corpora for the Brazilian Portuguese (BP) language. This lack of corpora does not allow an extensive and deep research on continuous speech recognition systems for BP. In this work, we present a baseline system for continuous speech recognition for BP and its results using the West Point Brazilian Portuguese Corpus. In addition to the results, the resources developed to build the system are made available for continuing the research on such systems for BP.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sampaio Neto, N., Patrick, C., Adami, A.G., Klautau, A.: Spoltech and ogi-22 baseline systems for speech recognition in brazilian portuguese. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 256–259. Springer, Heidelberg (2008)
Chapter Google Scholar
Teruszkin, R., Junior, F.: Implementation of a Large Vocabulary Continuous Speech Recognition System for Brazilian Portuguese. Journal of Communication and Information Systems 21(3), 204–218 (2006)
Google Scholar
Neto, N.S., Sousa, E., Macedo, V., Adami, A.G., Klautau, A.: Desenvolvimento de software livre usando reconhecimento e síntese de voz: O estado da arte para o português brasileiro. In: 6 Workshop Software Livre, Anais da Trilha Nacional do Workshop Software Livre, Porto Alegre, vol. 1 (2005)
Google Scholar
Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book. Entropic Cambridge Research Laboratory (1997)
Google Scholar
Linguateca: Corpus de extractos de textos electrónicos nilc/folha (2008), http://www.linguateca.pt/cetenfolha/
Morgan, J., Ackerlind, S., Packer, S.: West Point Brazilian Portuguese Speech (2008), http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008S04
Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication (2008)
Google Scholar
Sequitur G2P: Sequitur G2P - A trainable Grapheme-to-Phoneme converter (2008), http://www-i6.informatik.rwth-aachen.de/web/Software/g2p.html
Santos, F., Barone, D., Adami, A.: Validação de Corpus para Reconhecimento de Fala Contínua em Português Brasileiro. In: Proc. V Workshop em Tecnologia da Informação e da Linguagem Humana, TIL 2008 (2008)
Google Scholar
dos Santos, F.W.: Validação de corpus para reconhecimento de fala contínua em português brasileiro. Master’s thesis, Universidade Federal do Rio Grande do Sul (2009)
Google Scholar
Stolcke, A.: SRILM-an Extensible Language Modeling Toolkit. In: Seventh International Conference on Spoken Language Processing, vol. 2, pp. 901–904. ISCA, Denver (2002)
Google Scholar
Young, S.: ATK-An Application Toolkit for HTK (2007)
Google Scholar
VoxForge: Read Prompts and Submit Recordings (2008), http://www.voxforge.org/pt_br/read

Download references

Author information

Authors and Affiliations

Universidade Federal do Rio Grande do Sul (UFRGS), Caixa Postal 15.064 - 91.501-970, Porto Alegre, RS, Brasil
Fabiano Weimar dos Santos & Dante Augusto Couto Barone
Universidade de Caxias do Sul (UCS), Rua Francisco Getúlio Vargas, 1130, CEP 95070-560, Caxias do Sul, RS, Brasil
André Gustavo Adami

Authors

Fabiano Weimar dos Santos
View author publications
You can also search for this author in PubMed Google Scholar
Dante Augusto Couto Barone
View author publications
You can also search for this author in PubMed Google Scholar
André Gustavo Adami
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Núcleo Interinstitucional de Lingüística Computacional (NILC), Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, CP 668 13.560-970, São Carlos-SP, Brasil
Thiago Alexandre Salgueiro Pardo
Faculdade de Ciências de Lisboa, Departamento de Informática, Cidade Universitária, 1749-016, Lisboa, Portugal
António Branco
Signal Processing Laboratory, Universidade Federal do Pará, Rua Augusto Correa. 1, 660750110, Belém, PA, Brazil
Aldebaro Klautau
Pontifícia Universidade do Rio Grande do Sul, Porto Alegre, Brasil
Renata Vieira
Programa de Pós-Graduação em Ciência da Computação - PPGCC Avenida Ipiranga, 6681 - Prédio 32 - Partenon, CEP 90619-900, Porto Alegre, RS, Brasil
Vera Lúcia Strube de Lima

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

dos Santos, F.W., Barone, D.A.C., Adami, A.G. (2010). A Baseline System for Continuous Speech Recognition of Brazilian Portuguese Using the West Point Brazilian Portuguese Speech Corpus. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds) Computational Processing of the Portuguese Language. PROPOR 2010. Lecture Notes in Computer Science(), vol 6001. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12320-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-12320-7_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12319-1
Online ISBN: 978-3-642-12320-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics