A Novel Method to Evaluate the Privacy Protection in Speaker Anonymization

Liu, Wei; Li, Jiakang; Wei, Chunyu; Sun, Meng; Zhang, Xiongwei; Li, Yongqiang

doi:10.1007/978-3-031-06788-4_51

Wei Liu ORCID: orcid.org/0000-0002-9738-5095¹¹,
Jiakang Li¹¹,
Chunyu Wei¹¹,
Meng Sun ORCID: orcid.org/0000-0002-7435-3752¹¹,
Xiongwei Zhang¹¹ &
…
Yongqiang Li¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13339))

Included in the following conference series:

International Conference on Artificial Intelligence and Security

1314 Accesses

Abstract

The technique to hide the real identity of speakers is called speaker anonymization. Aiming at deceiving automatic speaker verification (ASV) systems, speaker anonymization is usually conducted by modifying the temporal or spectral properties of original voices, e.g., by pitch scaling, by vocal tract length normalization (VTLN) or by voice conversion (VC). However, the real identity of anonymized speech can be recovered with a careful re-training of ASVs, e.g., data augmentation by anonymizing voices of the same speaker. In order to evaluate the effectiveness of speaker anonymization, a pre-restoration method for both enrollment and testing data is proposed, investigated and compared for the de-anonymization of anonymized voices. Experimental results show that the pre-restoration method is effective to speaker de-anonymization. Moreover, it is also found that the pre-restoration for testing data performs better than that for enrollment data, which would also be useful to other decision-making tasks involving enrollment and testing stages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Evaluating X-Vector-Based Speaker Anonymization Under White-Box Assessment

Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation

Voice Privacy Using Time-Scale and Pitch Modification

Article 27 January 2024

Notes

1.
https://github.com/kaldi-asr/kaldi.

References

Qian, J., Du, H., Hou, J., Chen, L., Jung, T., Li, X.: Hidebehind: enjoy voice input with voiceprint unclonability and anonymity. In: Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems, pp. 82–94 (2018)
Google Scholar
Zhou, L., Zhong, Q., Wang, T., Lu, S., Hu, H.: Speech enhancement via residual dense generative adversarial network. Comput. Syst. Sci. Eng. 38, 279–289 (2021)
Article Google Scholar
Nisar, S., Khan, M.A., Algarni, F., Wakeel, A., Uddin, M.I.: Speech recognition-based automated visual acuity testing with adaptive mel filter bank. Comput. Syst. Sci. Eng. 70, 2991–3004 (2022)
Google Scholar
Kwon, M.S.: 1D-CNN: speech emotion recognition system using a stacked network with dilated CNN features. Comput. Mater. Continua 67, 4039–4059 (2021)
Article Google Scholar
Lalitha, S., Gupta, D., Zakariah, M., Alotaibi, Y.A.: Mental illness disorder diagnosis using emotion variation detection from continuous English speech. Comput. Mater. Continua 69, 3217–3238 (2021)
Article Google Scholar
Székely, E., Henter, G.E., Beskow, J., Gustafson, J.: Spontaneous conversational speech synthesis from found data. In: Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4435–4439 (2019)
Google Scholar
Arik, S.O., Chen, J., Peng, K., Ping, W., Zhou, Y.: Neural voice cloning with a few samples. arXiv:1802.06006 (2018)
Gomez-Barrero, M., Galbally, J., Rathgeb, C., Busch, C.: General framework to evaluate unlinkability in biometric template protection systems. IEEE Trans. Inf. Forensics 13(6), 1406–1420 (2017)
Article Google Scholar
Fang, F., et al.: Speaker anonymization using x-vector and neural waveform models. In: Proceedings of 10th ISCA Speech Synthesis Workshop, pp. 155–160 (2019)
Google Scholar
Hashimoto, K., Yamagishi, J., Echizen, I.: Privacy-preserving sound to degrade automatic speaker verification performance. In: Proceedings of 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5500–5504 (2016)
Google Scholar
Jin, Q., Toth, A.R., Schultz, T., Black, A.W.: Speaker de-identification via voice transformation. In: Proceedings of 2009 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 529–533 (2009)
Google Scholar
Patino, J., Tomashenko, N., Todisco, M., Nautsch, A., Evans, N.: Speaker anonymisation using the McAdams coefficient. In: Proceedings of Interspeech 2021, pp. 1099–1103. ISCA (2021)
Google Scholar
Perero-Codosero, J.M., Espinoza-Cuadros, F.M., Hernández-Gómez, L.A.: X-vector anonymization using autoencoders and adversarial training for preserving speech privacy. Comput. Speech Lang. 2022, 10135 (2022)
Google Scholar
Srivastava, B.M.L., Vauquier, N., Sahidullah, M., Bellet, A., Tommasi, M., Vincent, E.: Evaluating voice conversion-based privacy protection against informed attackers. In: Proceedings of 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 2802–2806 (2020)
Google Scholar
Zheng, L., Li, J., Sun, M., Zhang, X., Zheng, T.F.: When automatic voice disguise meets automatic speaker verification. IEEE Trans. Inf. Forensics Secur. 16, 823–837 (2021)
Google Scholar
Changrampadi, M.H., Shahina, A., Narayanan, M.B., Khan, A.: End-to-end speech recognition of Tamil language. Intell. Autom. Soft Comput. 32, 1309–1323 (2022)
Article Google Scholar
Wu, Z., Shen, C., Den, A.V.: Hengel: wider or deeper: revisiting the ResNet model for visual recognition. Pattern Recognit. 90, 119–133 (2019)
Article Google Scholar
Mateen, M., Wen, J., Song, S.: Fundus image classification using VGG-19 architecture with PCA and SVD. Symmetry 11(1), 1 (2018)
Article Google Scholar
Sherstinsky, A.: Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D 404, 132306 (2020)
Article MathSciNet Google Scholar
Wang, Y., Wu, H., Huang, J.: Verification of hidden speaker behind transformation disguised voices. Digit. Signal Process. 45, 84–95 (2015)
Article Google Scholar
Sundermann, D., Ney, H.: VTLN-based voice conversion. In: Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology, pp. 556–559 (2003)
Google Scholar
Kobayashi, K., Toda, T.: Sprocket: open-source voice conversion software. In: Proceedings of Odyssey 2018 The Speaker and Language Recognition Workshop, pp. 203–210 (2018)
Google Scholar
Sprocket. https://github.com/k2kobayashi/sprocket
Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S.: Deep neural network embeddings for text-independent speaker verification. In: Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 20–24 (2017)
Google Scholar
Povey, D., et al.: The kaldi speech recognition toolkit. In: Proceedings of IEEE Workshop Automatic Speech Recognition and Understanding (ASRU), pp. 11–15 (2011)
Google Scholar
Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 6–10 (2015)
Google Scholar
SoundTouch audio processing library. http://www.surina.net/soundtouch
Voice-Conversion. https://github.com/DenisStad/Voice-Conversion

Download references

Acknowledgement

The authors would like to thank Linlin Zheng and Yujun Wang for helpful discussions.

Funding

This work was supported by the Natural Science Foundation of Jiangsu Province (BK20180080) and the National Natural Science Foundation of China (62071484).

Author information

Authors and Affiliations

Lab of Intelligent Information Processing, Army Engineering University, Nanjing, China
Wei Liu, Jiakang Li, Chunyu Wei, Meng Sun & Xiongwei Zhang
World Future Leaders Organization, Melbourne, Australia
Yongqiang Li

Authors

Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jiakang Li
View author publications
You can also search for this author in PubMed Google Scholar
Chunyu Wei
View author publications
You can also search for this author in PubMed Google Scholar
Meng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiongwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yongqiang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meng Sun .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Xingming Sun
Nanjing University of Information Science and Technology, Nanjing, China
Xiaorui Zhang
Jinan University, Guangzhou, China
Zhihua Xia
Purdue University, West Lafayette, IN, USA
Elisa Bertino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, W., Li, J., Wei, C., Sun, M., Zhang, X., Li, Y. (2022). A Novel Method to Evaluate the Privacy Protection in Speaker Anonymization. In: Sun, X., Zhang, X., Xia, Z., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2022. Lecture Notes in Computer Science, vol 13339. Springer, Cham. https://doi.org/10.1007/978-3-031-06788-4_51

Download citation

DOI: https://doi.org/10.1007/978-3-031-06788-4_51
Published: 04 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06787-7
Online ISBN: 978-3-031-06788-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics