Skip to main content

Hidden Markov Models for Artificial Voice Production and Accent Modification

  • Conference paper
  • First Online:
Book cover Advances in Artificial Intelligence - IBERAMIA 2016 (IBERAMIA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10022))

Included in the following conference series:

  • 1135 Accesses

Abstract

In this paper, we consider the problem of accent modification between Castilian Spanish and Mexican Spanish. This is an interesting application area for tasks such as the automatic dubbing of pictures and videos with different accents. We initially apply statistical parametric speech synthesis to produce two artificial voices, each with the required accent, using Hidden Markov Models (HMM). This type of speech synthesis technique is capable of learning and reproducing certain essential parameters of the voice in question. We then propose a way to adapt these parameters between the two accents. The prosodic differences in the voices are modeled and transformed directly using this adaptation method. In order to produce the voices initially, we use a speech database that was developed by professional actors from Spain and Mexico. The results obtained from subjective and objective tests are promising, and the method is essentially applicable to accent modification between other Spanish accents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hermansky, H.: Should recognizers have ears? Speech Commun. 25(1), 3–27 (1998)

    Article  Google Scholar 

  2. Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., Oura, K.: Speech synthesis based on hidden markov models. Proc. IEEE 101(5), 1234–1252 (2013)

    Article  Google Scholar 

  3. Lazaridis, A., Khoury, E., Goldman, J.-P., Avanzi, M., Marcel, S., Garner, P.N.: Swiss french regional accent identification. In: Proceedings of Odyssey (2014)

    Google Scholar 

  4. Woehrling, C., de Mareüil, P.B.: Identification of regional accents in french: perception and categorization. In: INTERSPEECH (2006)

    Google Scholar 

  5. Leemann, A.: Comparative analysis of voice fundamental frequency behavior of four swiss german dialects: Elektronische daten, Ph.D. dissertation, Selbstverlag (2009)

    Google Scholar 

  6. Beckman, M., Daz-Campos, M., McGory, J.T., Morgan, T.A.: Intonation across spanish, in the tones and break indices framework. Probus 14(1), 9–36 (2002)

    Article  Google Scholar 

  7. Kawahara, H.: Straight, exploitation of the other aspect of vocoder: perceptually isomorphic decomposition of speech sounds. Acoust. Sci. Technol. 27(6), 349–353 (2006)

    Article  Google Scholar 

  8. Wu, Y.-J., Nankaku, Y., Tokuda, K.: State mapping based method for cross-lingual speaker adaptation in hmm-based speech synthesis. In: Interspeech, pp. 528–531 (2009)

    Google Scholar 

  9. Wu, Y.-J., King, S., Tokuda, K.: Cross-lingual speaker adaptation for HMM-based speech synthesis. In: 6th International Symposium on Chinese Spoken Language Processing, ISCSLP 2008, p. 14. IEEE (2008)

    Google Scholar 

  10. Liang, H., Dines, J., Saheer, L.: A comparison of supervised and unsupervised cross-lingual speaker adaptation approaches for HMM-based speech synthesis. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4598–4601. IEEE (2010)

    Google Scholar 

  11. Oura, K., Tokuda, K., Yamagishi, J., King, S., Wester, M.: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4594–4597. IEEE (2010)

    Google Scholar 

  12. Yoshimura, T., Hashimoto, K., Oura, K., Nankaku, Y., Tokuda, K.: Cross-lingual speaker adaptation based on factor analysis using bilingual speech data for HMM-based speech synthesis. In: 8th ISCA Speech Synthesis Workshop, pp. 317–322 (2013)

    Google Scholar 

  13. Nagahama, D., Nose, T., Koriyama, T., Kobayashi, T.: Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)

    Google Scholar 

  14. Gales, M.J.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)

    Article  Google Scholar 

  15. Acero, A., Deng, L., Kristjansson, T.T., Zhang, J.: HMM adaptation using vector taylor series for noisy speech recognition. In: INTERSPEECH, pp. 869–872 (2000)

    Google Scholar 

  16. Motlicek, P., Garner, P.N., Kim, N., Cho, J.: Accent adaptation using subspace gaussian mixture models. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7170–7174. IEEE (2013)

    Google Scholar 

  17. Tamura, M., Masuko, T., Tokuda, K., Kobayashi, T.: Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR. In: Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP01), vol. 2, pp. 805–808. IEEE (2001)

    Google Scholar 

  18. Liang, H., Dines, J.: An analysis of language mismatch in HMM state mapping-based cross-lingual speaker adaptation. Technical report, Idiap (2010)

    Google Scholar 

  19. Llisterri, J., Mariño, J.B.: Spanish adaptation of sampa and automatic phonetic transcription. Reporte técnico del ESPRIT PROJECT, vol. 6819 (1993)

    Google Scholar 

  20. Caballero, M., Moreno, A., Nogueiras, A.: Data driven multidialectal phone set for spanish dialects. In: INTERSPEECH. Citeseer (2004)

    Google Scholar 

  21. Elra catalogue: Emotional speech synthesis database. http://catalog.elra.info. Accessed 30 Nov 2014

  22. HTS: HMM speech synthesis system. http://hts.sp.nitech.ac.jp/. Accessed 20 Jan 2015

  23. Yan, Q., Vaseghi, S., Rentzos, D., Ho, C.-H.: Analysis by synthesis of acoustic correlates of british, australian and american accents. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2004), vol. 1, p. I637. IEEE (2004)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the SEP and CONACyT under the Program SEP-CONACyT, CB-2012-01, No.182432, in Mexico, as well as the University of Costa Rica in Costa Rica. We also want to thank ELRA for supplying the original Emotional speech synthesis database.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marvin Coto-Jiménez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Coto-Jiménez, M., Goddard-Close, J. (2016). Hidden Markov Models for Artificial Voice Production and Accent Modification. In: Montes y Gómez, M., Escalante, H., Segura, A., Murillo, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2016. IBERAMIA 2016. Lecture Notes in Computer Science(), vol 10022. Springer, Cham. https://doi.org/10.1007/978-3-319-47955-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47955-2_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47954-5

  • Online ISBN: 978-3-319-47955-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics