Skip to main content
Log in

VoCMex: a voice corpus in Mexican Spanish for research in speaker recognition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Voice corpus is an essential element for automatic speaker recognition systems. In order for a corpus to be useful in recognition tasks, it must contain recordings from several speakers pronouncing phonetically balanced utterances; recorded through several sessions using different recording media. This work shows the methodology, development and evaluation of a Mexican Spanish Corpus referred as to VoCMex, which is aimed to support research on speaker recognition. It contains telephone and microphone recordings of 20 male and 13 female speakers, obtained through three sessions. In order to validate the usefulness of the corpus, a speaker identification system was developed and the recognition results were similar compared against those obtained using a known voice corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Auckenthaler, R., Parris, E. S., & Carey, M. J. (1999). Improving a GMM speaker verification system by phonetic weighting. In IEEE international conference on acoustics, speech, and signal processing, ICASSP-1999, Phoenix, AZ (Vol. 1, pp. 313–316).

    Google Scholar 

  • Campbell, J. P. (1995). Testing with the YOHO CD-ROM voice verification corpus. In IEEE international conference on acoustics, speech and signal processing, ICASSP-1995, Detroit, MI (Vol. 1, pp. 341–344).

    Chapter  Google Scholar 

  • Campbell, J. P., & Reynolds, D. A. (1999). Corpora for the evaluation of speaker recognition systems. In IEEE international conference on acoustics, speech, and signal processing, ICASSP-1999, Phoenix, AZ (Vol. 2, pp. 829–832).

    Google Scholar 

  • Casacuberta, F., García, R., Llisterri, J., Nadeu, C., Pardo, J. M., & Rubio, A. (1992). Desarrollo de corpus para investigación en tecnologías del habla (Albayzín). Procesamiento del Lenguaje Natural, 12, 35–42.

    Google Scholar 

  • Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B. Methodological, 39(1), 1–38.

    MathSciNet  MATH  Google Scholar 

  • Faltlhauser, R., & Ruske, G. (2001). Improving speaker recognition performance using phonetically structured Gaussian mixture models. In EUROSPEECH-2001, Aalborg, Denmark (pp. 751–754).

    Google Scholar 

  • Fauve, B., Matrouf, D., Scheffer, N., Bonastre, J., & Mason, J. (2007). State-of-the-art performance in text-independent speaker verification through open-source software. IEEE Transactions on Audio, Speech, and Language Processing, 15(7), 1960–1968.

    Article  Google Scholar 

  • Fredouille, C., Mariéthoz, J., Jaboulet, C., Hennebert, J., Mokbel, C., & Bimbot, F. (2000). Behavior of a Bayesian adaptation method for incremental enrollment in speaker verification. In IEEE int. conf. on acoustics, speech and signal processing (ICASSP2000), Turkey, Istambul (pp. 1197–1200).

    Google Scholar 

  • Hennebert, J., Melin, H., Petrovska, D., & Genoud, D. (2000). Polycost: a telephone speech database for speaker recognition. Speech Communication, 31(2–3), 265–270.

    Article  Google Scholar 

  • Juang, B. H., & Tsuhan, C. (1998). The past, present, and future of speech processing. IEEE Signal Processing Magazine, 15(3), 24–48.

    Article  Google Scholar 

  • Keshet, J., & Bengio, S. (2009). Automatic speech and speaker recognition: large margin and kernel methods. New York: Wiley.

    Book  Google Scholar 

  • Kirschning, I. (2001). Research and development of speech technology & applications for Mexican Spanish at the Tlatoa group. In CHI’01 extended abstracts on human factors in computing systems (CHI EA’01) (pp. 49–50). New York: ACM.

    Chapter  Google Scholar 

  • Martinez, W. L., & Martinez, A. R. (2008). Computational statistics handbook with MatLab (2nd ed.). London: Chapman&Hall/CRC. ISBN 1-58488-566-1.

    MATH  Google Scholar 

  • Messer, K., Matas, J., Kittler, J., Luettin, J., & Maitre, G. (1999). XM2VTSDB: the extended M2VTS database. In Second international conference on audio and video based biometric person authentication, AVBPA-1999, Washington, DC (pp. 166–171).

    Google Scholar 

  • Ortega-García, J., González-Rodríguez, J., Marrero, V., Díaz-Gómez, J., García-Jiménez, R., Lucena-Molina, J., & Sánchez-Molero, J. (2000). AHUMADA: a large speech corpus in Spanish for speaker identification and verification. Speech Communication, 31(2–3), 255–264.

    Article  Google Scholar 

  • Patil, H., & Basu, T. (2009). Development of speech corpora for speaker recognition research and evaluation in Indian languages. International Journal of Speech Technology, 11(1), 17–32.

    Article  Google Scholar 

  • Pérez, H. E. (2003). Frecuencia de fonemas. Revista Electrónica de la Red Temática en Tecnologías del Habla, 1. http://gth-www.die.upm.es/numeros/N1/N1_A4.pdf.

  • Pineda, L. A., Castellanos, H., Cuétara, J., Galescu, L., Juárez, J., Llisterri, J., Pérez, P., & Villaseñor, L. (2010). The corpus DIMEx100: transcription and evaluation. Language Resources and Evaluation, 44(4), 347–370.

    Article  Google Scholar 

  • Przybocki, M., & Martin, A. F. (2004). NIST speaker recognition evaluation chronicles. In ODYS-2004, Toledo, Spain (pp. 15–22).

    Google Scholar 

  • Reynolds, D. (2002). An overview of automatic speaker recognition technology. In IEEE international conference on acoustics, speech, and signal processing, ICASSP-2002, Orlando, FL (Vol. 4, pp. 4072–4075).

    Google Scholar 

  • Reynolds, D. A., & Rose, R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker model. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.

    Article  Google Scholar 

  • Villaseñor-Pineda, L., Montes-y-Gómez, M., Vaufreydaz, D., & Serignat, J. (2003). Elaboración de un Corpus Balanceado para el Cálculo de Modelos Acústicos usando la Web. In XII congreso internacional de computación, CIC-2003, Mexico City, Mexico (pp. 198–200).

    Google Scholar 

  • Zamalloa, M., Bordel, G., Rodríguez, L. J., Peñagarikano, M., & Uribe, J. P. (2006). Selección y pesado de parámetros acústicos mediante algoritmos genéticos para el reconocimiento del locutor. In IV jornadas en tecnologías del habla, 4JTH06, Zaragoza, Spain (pp. 349–354).

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Universidad Autónoma de Baja California (Autonomous University of Baja California), who financed the development of this work through the program 1899 of the 11th Internal announcement for research funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José-Martín Olguín-Espinoza.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Olguín-Espinoza, JM., Mayorga-Ortiz, P., Hidalgo-Silva, H. et al. VoCMex: a voice corpus in Mexican Spanish for research in speaker recognition. Int J Speech Technol 16, 295–302 (2013). https://doi.org/10.1007/s10772-012-9183-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9183-z

Keywords

Navigation