Hidden Markov Models for Artificial Voice Production and Accent Modification

Coto-Jiménez, Marvin; Goddard-Close, John

doi:10.1007/978-3-319-47955-2_34

Marvin Coto-Jiménez^17,18 &
John Goddard-Close¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10022))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

1135 Accesses

Abstract

In this paper, we consider the problem of accent modification between Castilian Spanish and Mexican Spanish. This is an interesting application area for tasks such as the automatic dubbing of pictures and videos with different accents. We initially apply statistical parametric speech synthesis to produce two artificial voices, each with the required accent, using Hidden Markov Models (HMM). This type of speech synthesis technique is capable of learning and reproducing certain essential parameters of the voice in question. We then propose a way to adapt these parameters between the two accents. The prosodic differences in the voices are modeled and transformed directly using this adaptation method. In order to produce the voices initially, we use a speech database that was developed by professional actors from Spain and Mexico. The results obtained from subjective and objective tests are promising, and the method is essentially applicable to accent modification between other Spanish accents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hermansky, H.: Should recognizers have ears? Speech Commun. 25(1), 3–27 (1998)
Article Google Scholar
Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., Oura, K.: Speech synthesis based on hidden markov models. Proc. IEEE 101(5), 1234–1252 (2013)
Article Google Scholar
Lazaridis, A., Khoury, E., Goldman, J.-P., Avanzi, M., Marcel, S., Garner, P.N.: Swiss french regional accent identification. In: Proceedings of Odyssey (2014)
Google Scholar
Woehrling, C., de Mareüil, P.B.: Identification of regional accents in french: perception and categorization. In: INTERSPEECH (2006)
Google Scholar
Leemann, A.: Comparative analysis of voice fundamental frequency behavior of four swiss german dialects: Elektronische daten, Ph.D. dissertation, Selbstverlag (2009)
Google Scholar
Beckman, M., Daz-Campos, M., McGory, J.T., Morgan, T.A.: Intonation across spanish, in the tones and break indices framework. Probus 14(1), 9–36 (2002)
Article Google Scholar
Kawahara, H.: Straight, exploitation of the other aspect of vocoder: perceptually isomorphic decomposition of speech sounds. Acoust. Sci. Technol. 27(6), 349–353 (2006)
Article Google Scholar
Wu, Y.-J., Nankaku, Y., Tokuda, K.: State mapping based method for cross-lingual speaker adaptation in hmm-based speech synthesis. In: Interspeech, pp. 528–531 (2009)
Google Scholar
Wu, Y.-J., King, S., Tokuda, K.: Cross-lingual speaker adaptation for HMM-based speech synthesis. In: 6th International Symposium on Chinese Spoken Language Processing, ISCSLP 2008, p. 14. IEEE (2008)
Google Scholar
Liang, H., Dines, J., Saheer, L.: A comparison of supervised and unsupervised cross-lingual speaker adaptation approaches for HMM-based speech synthesis. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4598–4601. IEEE (2010)
Google Scholar
Oura, K., Tokuda, K., Yamagishi, J., King, S., Wester, M.: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4594–4597. IEEE (2010)
Google Scholar
Yoshimura, T., Hashimoto, K., Oura, K., Nankaku, Y., Tokuda, K.: Cross-lingual speaker adaptation based on factor analysis using bilingual speech data for HMM-based speech synthesis. In: 8th ISCA Speech Synthesis Workshop, pp. 317–322 (2013)
Google Scholar
Nagahama, D., Nose, T., Koriyama, T., Kobayashi, T.: Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)
Google Scholar
Gales, M.J.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)
Article Google Scholar
Acero, A., Deng, L., Kristjansson, T.T., Zhang, J.: HMM adaptation using vector taylor series for noisy speech recognition. In: INTERSPEECH, pp. 869–872 (2000)
Google Scholar
Motlicek, P., Garner, P.N., Kim, N., Cho, J.: Accent adaptation using subspace gaussian mixture models. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7170–7174. IEEE (2013)
Google Scholar
Tamura, M., Masuko, T., Tokuda, K., Kobayashi, T.: Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR. In: Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP01), vol. 2, pp. 805–808. IEEE (2001)
Google Scholar
Liang, H., Dines, J.: An analysis of language mismatch in HMM state mapping-based cross-lingual speaker adaptation. Technical report, Idiap (2010)
Google Scholar
Llisterri, J., Mariño, J.B.: Spanish adaptation of sampa and automatic phonetic transcription. Reporte técnico del ESPRIT PROJECT, vol. 6819 (1993)
Google Scholar
Caballero, M., Moreno, A., Nogueiras, A.: Data driven multidialectal phone set for spanish dialects. In: INTERSPEECH. Citeseer (2004)
Google Scholar
Elra catalogue: Emotional speech synthesis database. http://catalog.elra.info. Accessed 30 Nov 2014
HTS: HMM speech synthesis system. http://hts.sp.nitech.ac.jp/. Accessed 20 Jan 2015
Yan, Q., Vaseghi, S., Rentzos, D., Ho, C.-H.: Analysis by synthesis of acoustic correlates of british, australian and american accents. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2004), vol. 1, p. I637. IEEE (2004)
Google Scholar

Download references

Acknowledgements

This work was supported by the SEP and CONACyT under the Program SEP-CONACyT, CB-2012-01, No.182432, in Mexico, as well as the University of Costa Rica in Costa Rica. We also want to thank ELRA for supplying the original Emotional speech synthesis database.

Author information

Authors and Affiliations

University of Costa Rica, San José, Costa Rica
Marvin Coto-Jiménez
Metropolitan Autonomous University, México, D.F., Mexico
Marvin Coto-Jiménez & John Goddard-Close

Authors

Marvin Coto-Jiménez
View author publications
You can also search for this author in PubMed Google Scholar
John Goddard-Close
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marvin Coto-Jiménez .

Editor information

Editors and Affiliations

INAOE , Tonantzintla, Mexico
Manuel Montes y Gómez
Astrofisica Optica y Electronica, INAOE , Puebla, Mexico
Hugo Jair Escalante
Universidad Nacional de Costa Rica , Heredia, Costa Rica
Alberto Segura
Universidad Nacional de Costa Rica , Heredia, Costa Rica
Juan de Dios Murillo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Coto-Jiménez, M., Goddard-Close, J. (2016). Hidden Markov Models for Artificial Voice Production and Accent Modification. In: Montes y Gómez, M., Escalante, H., Segura, A., Murillo, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2016. IBERAMIA 2016. Lecture Notes in Computer Science(), vol 10022. Springer, Cham. https://doi.org/10.1007/978-3-319-47955-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-47955-2_34
Published: 14 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47954-5
Online ISBN: 978-3-319-47955-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics