Articulation During Voice Disguise: A Pilot Study

Tavi, Lauri; Kinnunen, Tomi; Meister, Einar; González-Hautamäki, Rosa; Malmi, Anton

doi:10.1007/978-3-030-87802-3_61

Lauri Tavi^10,11,
Tomi Kinnunen¹¹,
Einar Meister¹²,
Rosa González-Hautamäki^11,14 &
…
Anton Malmi¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

International Conference on Speech and Computer

1611 Accesses

Abstract

Speakers can conceal their identity by deliberately changing their speech characteristics, or disguising their voices. During voice disguise, speakers alter their normal movements of the articulators, such as tongue positions, according to a predetermined strategy. Even though technology for accurate articulatory measurements has existed for years, few studies have investigated articulation during voice disguise. In this pilot study, we recorded articulation of four speakers during regular and disguised speech using electromagnetic articulography. We analyzed imitation of foreign accents as a voice disguise strategy and utilized functional t-tests as a novel method for revealing articulatory differences between regular and disguised speech. In addition, we evaluated discovered articulatory differences in the light of the performance of an x-vector-based automatic speaker verification system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

VoxCeleb Xvector models system 1a. https://kaldi-asr.org/models/m7. Accessed 10 April 2021
Arnold, D., Tomaschek, F.: The karl eberhards corpus of spontaneously spoken southern german in dialogues-audio and articulatory recordings. In: Kleber, C.D.F. (ed.) Tagungsband der 12. tagung phonetik und phonologie im deutschsprachigen raum, pp. 9–11. Ludwig-Maximilians-Universitat Munchen. Retriev (2016)
Google Scholar
Boersma, P., Weenink, D.: Praat: doing phonetics by computer [computer program] (2020). https://praat.org
Canevari, C., Badino, L., Fadiga, L.: A new italian dataset of parallel acoustic and articulatory data. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Fan, J., Yongbing, L.: The impact of l1 negative phonological transfer on l2 word identification and production. Int. J. Linguist. 6(5), 37–50 (2014)
Article Google Scholar
González Hautamäki, R., Hautamäki, V., Kinnunen, T.: On the limits of automatic speaker verification: explaining degraded recognizer scores through acoustic changes resulting from voice disguise. J. Acoust. Soc. Am. 146(1), 693–704 (2019)
Article Google Scholar
Hansen, J.H., Bořil, H.: On the issues of intra-speaker variability and realism in speech, speaker, and language recognition tasks. Speech Commun. 101, 94–108 (2018)
Article Google Scholar
Ji, A., Berry, J.J., Johnson, M.T.: The electromagnetic articulography mandarin accented english (ema-mae) corpus of acoustic and 3d articulatory kinematic data. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7719–7723. IEEE (2014)
Google Scholar
Kisler, T., Reichel, U., Schiel, F.: Multilingual processing of speech via web services. Comput. Speech Lang. 45, 326–347 (2017)
Article Google Scholar
Malmi, A., Lippus, P.: Keele asend eesti palatalisatsioonis. J. Est. Finno-Ugric Linguist. 10(1), 105–128 (2019)
Google Scholar
Nagrani, A., Chung, J.S., Xie, W., Zisserman, A.: Voxceleb: large-scale speaker verification in the wild. Computer Science and Language, p. 101027 (2019)
Google Scholar
Narayanan, S., et al.: Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (tc). J. Acoust. Soc. Am. 136(3), 1307–1311 (2014)
Article Google Scholar
Neuhauser, S.: Voice disguise using a foreign accent: phonetic and linguistic variation. Int. J. Speech Lang. Law 15(2), 131–159 (2008)
Google Scholar
Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition And Understanding (ASRU). IEEE Signal Processing Society, Hawaii, US (2011)
Google Scholar
Prince, S.J.D., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: Proceedings of ICCV, pp. 1–8. Rio de Janeiro, Brazil (2007). https://doi.org/10.1109/ICCV.2007.4409052
R Core Team: R: A language and environment for statistical computing (2020). https://www.R-project.org/
Ramsay, J., Graves, S., Hooker, G.: fda: Functional data analysis. R package version 5.1.5.1. (2020). https://CRAN.R-project.org/package=fda
Ramsay, J.O., Silverman, B.W.: Functional data analysis (2nd edition). Springer Verlag, NY (2005)
Google Scholar
Richmond, K., Hoole, P., King, S.: Announcing the electromagnetic articulography (day 1) subset of the mngu0 articulatory corpus. In: Twelfth Annual Conference of the International Speech Communication Association (2011)
Google Scholar
Schötz, S., Frid, J., Gustafsson, L., Löfqvist, A.: Functional data analysis of tongue articulation in palatal vowels: Gothenburg and malmöhus swedish/i: y: 0ff. In: Proceedings of Interspeech, vol. 2013 (2013)
Google Scholar
de Silva, V., Ullakonoja, R.: Introduction: russian and finnish in contact. In: de Silva, V., Ullakonoja, R. (eds.) Phonetic of Russian and Finnish: General Description of Phonetic Systems: Experimental Studies on Spontaneous and Read-aloud Speech, pp. 15–20. Peter Lang, Frankfurt a. M. (2009)
Google Scholar
Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S.: Deep neural network embeddings for text-independent speaker verification. In: Proceedings of INTERSPEECH, pp. 999–1003. Stockholm, Sweden (2017)
Google Scholar
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE, Calgary, AB, Canada (2018)
Google Scholar
Wrench, A.: The mocha-timit articulatory database (1999). www.cstr.ed.ac.uk/research/projects/artic/mocha.html

Download references

Acknowledgments

This project was partly funded by Academy of Finland (project 309629). Einar Meister’s work was supported by the European Regional Development Foundation (the project “Centre of Excellence in Estonian Studies”). We thank Fabian Tomaschek for providing a set of R scripts for post processing of raw EMA data.

Author information

Authors and Affiliations

School of Humanities, University of Eastern Finland, Joensuu, Finland
Lauri Tavi
School of Computing, University of Eastern Finland, Joensuu, Finland
Lauri Tavi, Tomi Kinnunen & Rosa González-Hautamäki
School of Information Technologies, Tallinn University of Technology, Tallinn, Estonia
Einar Meister
Institute of Estonian and General Linguistics, University of Tartu, Tartu, Estonia
Anton Malmi
Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore
Rosa González-Hautamäki

Authors

Lauri Tavi
View author publications
You can also search for this author in PubMed Google Scholar
Tomi Kinnunen
View author publications
You can also search for this author in PubMed Google Scholar
Einar Meister
View author publications
You can also search for this author in PubMed Google Scholar
Rosa González-Hautamäki
View author publications
You can also search for this author in PubMed Google Scholar
Anton Malmi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lauri Tavi .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tavi, L., Kinnunen, T., Meister, E., González-Hautamäki, R., Malmi, A. (2021). Articulation During Voice Disguise: A Pilot Study. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_61

Download citation

DOI: https://doi.org/10.1007/978-3-030-87802-3_61
Published: 22 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics