A Corpus of Neutral Voice Speech in Brazilian Portuguese

Leite, Pedro H. L.; Hoyle, Edmundo; Antelo, Álvaro; Kruszielski, Luiz F.; Biscainho, Luiz W. P.

doi:10.1007/978-3-030-98305-5_32

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13208))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

689 Accesses
1 Citations
1 Altmetric

Abstract

This work presents a new database containing high sampling rate recordings of a single male speaker reading sentences in Brazilian Portuguese with neutral voice, along with the corresponding text corpus. Intended for synthesis and other speech-oriented applications, the dataset contains text scripts extracted from a popular Brazilian news TV program, read out loud by a trained individual in a controlled environment, resulting in roughly 20 h of audio data. The text was normalized in the recording process and special textual occurrences (e.g. acronyms, numbers, foreign names etc.) were replaced by their phonetic translation to a readable text in Portuguese. There are no noticeable accidental sounds and background noise has been kept to a minimum in all audio samples. To illustrate the potential benefits of having this data available, text-to-speech experiments were conducted using state-of-the-art models for speech synthesis (Tacotron 2 and Waveglow). As a result, we obtained intelligible and natural sounding voices from as few as 8 min of audio samples coming from an unseen target speaker, after having trained over our data; moreover, by increasing the target recording time to 75 min, we have noticeably improved accuracy in pronunciation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available from https://igormq.github.io/datasets/.
2.
A new transcript of the sentences read, including punctuation and graphic accentuation, is available from www.smt.ufrj.br/gpa/propor2022/.
3.
http://www.smt.ufrj.br/gpa/propor2022/audios.

References

The LJ Speech Dataset. https://keithito.com/LJ-Speech-Dataset/. Accessed 23 Oct 2021
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Australia, pp. 5206–5210. IEEE (2015)
Google Scholar
Barker, J., Watanabe, S., Vincent, E., Trmal, J.: The fifth ‘CHiME’ speech separation and recognition challenge: dataset, task and baselines. In: 19th Annual Conference of the International Speech Communication Association (Interspeech 2018), Hyderabad, India, pp. 1561–1565. ISCA (2018)
Google Scholar
Ardila, R., et al.: Common voice: a massively-multilingual speech corpus. In: 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, pp. 4211–4215. ELRA (2020)
Google Scholar
VoxForge. http://www.voxforge.org/home. Accessed 23 Oct 2021
Pratap, V., Xu, Q., Sriram, A., Synnaeve, G., Collobert, R.: MLS: a large-scale multilingual dataset for speech research. In: 21st Annual Conference of the International Speech Communication Association (Interspeech 2020), Shangai, China, pp. 2757–2761. ISCA (2020)
Google Scholar
Salesky, E., et al.: The multilingual TEDx corpus for speech recognition and translation. In: 22nd Annual Conference of the International Speech Communication Association (Interspeech 2021), Brno, Czech Republic, pp. 3655–3659. ISCA (2021)
Google Scholar
TED. https://www.ted.com/. Accessed 23 Oct 2021
Alencar, V., Alcaim, A.: LSF and LPC - derived features for large vocabulary distributed continuous speech recognition in Brazilian Portuguese. In: Asilomar Conference on Signals, Systems and Computers, California, U.S.A., pp. 1237–1241. IEEE (2008)
Google Scholar
Casanova, E., et al.: TTS-Portuguese corpus: a corpus for speech synthesis in Brazilian Portuguese. arXiv preprint https://arxiv.org/abs/2005.05144
Python Programming Language. https://www.python.org/. Accessed 23 Oct 2021
Selenium Framework. https://www.selenium.dev/. Accessed 23 Oct 2021
Jornal Nacional Website. https://g1.globo.com/jornal-nacional/. Accessed 23 Oct 2021
Neumann TLM 102 Microphone. https://www.neumann.com/homestudio/en/tlm-102. Accessed 23 Oct 2021
Apogee Duet Interface. https://www.apogeedigital.com/products/duet. Accessed 23 Oct 2021
Audacity Software. https://www.audacityteam.org/. Accessed 23 Oct 2021
Sox Software. http://sox.sourceforge.net/. Accessed 25 Oct 2021
Shen, J., et al.: Natural TTS synthesis by conditioning WaveNet on MEL spectrogram predictions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, pp. 4779–4783. IEEE (2018)
Google Scholar
Prenger, R.J., Valle, R., Catanzaro, B.: WaveGlow: a flow-based generative network for speech synthesis. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, pp. 3617–3621 (2019)
Google Scholar

Download references

Acknowledgment

This work is partially funded by the National Council for Scientific and Technological Development – CNPq.

Author information

Authors and Affiliations

Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil
Pedro H. L. Leite & Luiz W. P. Biscainho
Globo Comunicação e Participações S.A., Rio de Janeiro, Brazil
Pedro H. L. Leite, Edmundo Hoyle, Álvaro Antelo & Luiz F. Kruszielski

Authors

Pedro H. L. Leite
View author publications
You can also search for this author in PubMed Google Scholar
Edmundo Hoyle
View author publications
You can also search for this author in PubMed Google Scholar
Álvaro Antelo
View author publications
You can also search for this author in PubMed Google Scholar
Luiz F. Kruszielski
View author publications
You can also search for this author in PubMed Google Scholar
Luiz W. P. Biscainho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro H. L. Leite .

Editor information

Editors and Affiliations

Universidade de Fortaleza, Fortaleza, Brazil
Vládia Pinheiro
CiTIUS - Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Pablo Gamallo
Universidade Nova de Lisboa, Lisbon, Portugal
Raquel Amaro
University of Sheffield, Sheffield, UK
Carolina Scarton
INESC-ID, Lisbon, Portugal
Fernando Batista
Federal University of São Carlos, São Carlos, Brazil
Diego Silva
University of Lisbon, Lisbon, Portugal
Catarina Magro
Sentimonitor, Porto Alegre, Brazil
Hugo Pinto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Leite, P.H.L., Hoyle, E., Antelo, Á., Kruszielski, L.F., Biscainho, L.W.P. (2022). A Corpus of Neutral Voice Speech in Brazilian Portuguese. In: Pinheiro, V., et al. Computational Processing of the Portuguese Language. PROPOR 2022. Lecture Notes in Computer Science(), vol 13208. Springer, Cham. https://doi.org/10.1007/978-3-030-98305-5_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-98305-5_32
Published: 16 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98304-8
Online ISBN: 978-3-030-98305-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Corpus of Neutral Voice Speech in Brazilian Portuguese