Skip to main content

Articulation During Voice Disguise: A Pilot Study

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

  • 1611 Accesses

Abstract

Speakers can conceal their identity by deliberately changing their speech characteristics, or disguising their voices. During voice disguise, speakers alter their normal movements of the articulators, such as tongue positions, according to a predetermined strategy. Even though technology for accurate articulatory measurements has existed for years, few studies have investigated articulation during voice disguise. In this pilot study, we recorded articulation of four speakers during regular and disguised speech using electromagnetic articulography. We analyzed imitation of foreign accents as a voice disguise strategy and utilized functional t-tests as a novel method for revealing articulatory differences between regular and disguised speech. In addition, we evaluated discovered articulatory differences in the light of the performance of an x-vector-based automatic speaker verification system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. VoxCeleb Xvector models system 1a. https://kaldi-asr.org/models/m7. Accessed 10 April 2021

  2. Arnold, D., Tomaschek, F.: The karl eberhards corpus of spontaneously spoken southern german in dialogues-audio and articulatory recordings. In: Kleber, C.D.F. (ed.) Tagungsband der 12. tagung phonetik und phonologie im deutschsprachigen raum, pp. 9–11. Ludwig-Maximilians-Universitat Munchen. Retriev (2016)

    Google Scholar 

  3. Boersma, P., Weenink, D.: Praat: doing phonetics by computer [computer program] (2020). https://praat.org

  4. Canevari, C., Badino, L., Fadiga, L.: A new italian dataset of parallel acoustic and articulatory data. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  5. Fan, J., Yongbing, L.: The impact of l1 negative phonological transfer on l2 word identification and production. Int. J. Linguist. 6(5), 37–50 (2014)

    Article  Google Scholar 

  6. González Hautamäki, R., Hautamäki, V., Kinnunen, T.: On the limits of automatic speaker verification: explaining degraded recognizer scores through acoustic changes resulting from voice disguise. J. Acoust. Soc. Am. 146(1), 693–704 (2019)

    Article  Google Scholar 

  7. Hansen, J.H., Bořil, H.: On the issues of intra-speaker variability and realism in speech, speaker, and language recognition tasks. Speech Commun. 101, 94–108 (2018)

    Article  Google Scholar 

  8. Ji, A., Berry, J.J., Johnson, M.T.: The electromagnetic articulography mandarin accented english (ema-mae) corpus of acoustic and 3d articulatory kinematic data. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7719–7723. IEEE (2014)

    Google Scholar 

  9. Kisler, T., Reichel, U., Schiel, F.: Multilingual processing of speech via web services. Comput. Speech Lang. 45, 326–347 (2017)

    Article  Google Scholar 

  10. Malmi, A., Lippus, P.: Keele asend eesti palatalisatsioonis. J. Est. Finno-Ugric Linguist. 10(1), 105–128 (2019)

    Google Scholar 

  11. Nagrani, A., Chung, J.S., Xie, W., Zisserman, A.: Voxceleb: large-scale speaker verification in the wild. Computer Science and Language, p. 101027 (2019)

    Google Scholar 

  12. Narayanan, S., et al.: Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (tc). J. Acoust. Soc. Am. 136(3), 1307–1311 (2014)

    Article  Google Scholar 

  13. Neuhauser, S.: Voice disguise using a foreign accent: phonetic and linguistic variation. Int. J. Speech Lang. Law 15(2), 131–159 (2008)

    Google Scholar 

  14. Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition And Understanding (ASRU). IEEE Signal Processing Society, Hawaii, US (2011)

    Google Scholar 

  15. Prince, S.J.D., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: Proceedings of ICCV, pp. 1–8. Rio de Janeiro, Brazil (2007). https://doi.org/10.1109/ICCV.2007.4409052

  16. R Core Team: R: A language and environment for statistical computing (2020). https://www.R-project.org/

  17. Ramsay, J., Graves, S., Hooker, G.: fda: Functional data analysis. R package version 5.1.5.1. (2020). https://CRAN.R-project.org/package=fda

  18. Ramsay, J.O., Silverman, B.W.: Functional data analysis (2nd edition). Springer Verlag, NY (2005)

    Google Scholar 

  19. Richmond, K., Hoole, P., King, S.: Announcing the electromagnetic articulography (day 1) subset of the mngu0 articulatory corpus. In: Twelfth Annual Conference of the International Speech Communication Association (2011)

    Google Scholar 

  20. Schötz, S., Frid, J., Gustafsson, L., Löfqvist, A.: Functional data analysis of tongue articulation in palatal vowels: Gothenburg and malmöhus swedish/i: y: 0ff. In: Proceedings of Interspeech, vol. 2013 (2013)

    Google Scholar 

  21. de Silva, V., Ullakonoja, R.: Introduction: russian and finnish in contact. In: de Silva, V., Ullakonoja, R. (eds.) Phonetic of Russian and Finnish: General Description of Phonetic Systems: Experimental Studies on Spontaneous and Read-aloud Speech, pp. 15–20. Peter Lang, Frankfurt a. M. (2009)

    Google Scholar 

  22. Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S.: Deep neural network embeddings for text-independent speaker verification. In: Proceedings of INTERSPEECH, pp. 999–1003. Stockholm, Sweden (2017)

    Google Scholar 

  23. Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE, Calgary, AB, Canada (2018)

    Google Scholar 

  24. Wrench, A.: The mocha-timit articulatory database (1999). www.cstr.ed.ac.uk/research/projects/artic/mocha.html

Download references

Acknowledgments

This project was partly funded by Academy of Finland (project 309629). Einar Meister’s work was supported by the European Regional Development Foundation (the project “Centre of Excellence in Estonian Studies”). We thank Fabian Tomaschek for providing a set of R scripts for post processing of raw EMA data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lauri Tavi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tavi, L., Kinnunen, T., Meister, E., González-Hautamäki, R., Malmi, A. (2021). Articulation During Voice Disguise: A Pilot Study. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87802-3_61

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87801-6

  • Online ISBN: 978-3-030-87802-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics