Abstract
The technique to hide the real identity of speakers is called speaker anonymization. Aiming at deceiving automatic speaker verification (ASV) systems, speaker anonymization is usually conducted by modifying the temporal or spectral properties of original voices, e.g., by pitch scaling, by vocal tract length normalization (VTLN) or by voice conversion (VC). However, the real identity of anonymized speech can be recovered with a careful re-training of ASVs, e.g., data augmentation by anonymizing voices of the same speaker. In order to evaluate the effectiveness of speaker anonymization, a pre-restoration method for both enrollment and testing data is proposed, investigated and compared for the de-anonymization of anonymized voices. Experimental results show that the pre-restoration method is effective to speaker de-anonymization. Moreover, it is also found that the pre-restoration for testing data performs better than that for enrollment data, which would also be useful to other decision-making tasks involving enrollment and testing stages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Qian, J., Du, H., Hou, J., Chen, L., Jung, T., Li, X.: Hidebehind: enjoy voice input with voiceprint unclonability and anonymity. In: Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems, pp. 82–94 (2018)
Zhou, L., Zhong, Q., Wang, T., Lu, S., Hu, H.: Speech enhancement via residual dense generative adversarial network. Comput. Syst. Sci. Eng. 38, 279–289 (2021)
Nisar, S., Khan, M.A., Algarni, F., Wakeel, A., Uddin, M.I.: Speech recognition-based automated visual acuity testing with adaptive mel filter bank. Comput. Syst. Sci. Eng. 70, 2991–3004 (2022)
Kwon, M.S.: 1D-CNN: speech emotion recognition system using a stacked network with dilated CNN features. Comput. Mater. Continua 67, 4039–4059 (2021)
Lalitha, S., Gupta, D., Zakariah, M., Alotaibi, Y.A.: Mental illness disorder diagnosis using emotion variation detection from continuous English speech. Comput. Mater. Continua 69, 3217–3238 (2021)
Székely, E., Henter, G.E., Beskow, J., Gustafson, J.: Spontaneous conversational speech synthesis from found data. In: Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4435–4439 (2019)
Arik, S.O., Chen, J., Peng, K., Ping, W., Zhou, Y.: Neural voice cloning with a few samples. arXiv:1802.06006 (2018)
Gomez-Barrero, M., Galbally, J., Rathgeb, C., Busch, C.: General framework to evaluate unlinkability in biometric template protection systems. IEEE Trans. Inf. Forensics 13(6), 1406–1420 (2017)
Fang, F., et al.: Speaker anonymization using x-vector and neural waveform models. In: Proceedings of 10th ISCA Speech Synthesis Workshop, pp. 155–160 (2019)
Hashimoto, K., Yamagishi, J., Echizen, I.: Privacy-preserving sound to degrade automatic speaker verification performance. In: Proceedings of 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5500–5504 (2016)
Jin, Q., Toth, A.R., Schultz, T., Black, A.W.: Speaker de-identification via voice transformation. In: Proceedings of 2009 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 529–533 (2009)
Patino, J., Tomashenko, N., Todisco, M., Nautsch, A., Evans, N.: Speaker anonymisation using the McAdams coefficient. In: Proceedings of Interspeech 2021, pp. 1099–1103. ISCA (2021)
Perero-Codosero, J.M., Espinoza-Cuadros, F.M., Hernández-Gómez, L.A.: X-vector anonymization using autoencoders and adversarial training for preserving speech privacy. Comput. Speech Lang. 2022, 10135 (2022)
Srivastava, B.M.L., Vauquier, N., Sahidullah, M., Bellet, A., Tommasi, M., Vincent, E.: Evaluating voice conversion-based privacy protection against informed attackers. In: Proceedings of 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 2802–2806 (2020)
Zheng, L., Li, J., Sun, M., Zhang, X., Zheng, T.F.: When automatic voice disguise meets automatic speaker verification. IEEE Trans. Inf. Forensics Secur. 16, 823–837 (2021)
Changrampadi, M.H., Shahina, A., Narayanan, M.B., Khan, A.: End-to-end speech recognition of Tamil language. Intell. Autom. Soft Comput. 32, 1309–1323 (2022)
Wu, Z., Shen, C., Den, A.V.: Hengel: wider or deeper: revisiting the ResNet model for visual recognition. Pattern Recognit. 90, 119–133 (2019)
Mateen, M., Wen, J., Song, S.: Fundus image classification using VGG-19 architecture with PCA and SVD. Symmetry 11(1), 1 (2018)
Sherstinsky, A.: Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D 404, 132306 (2020)
Wang, Y., Wu, H., Huang, J.: Verification of hidden speaker behind transformation disguised voices. Digit. Signal Process. 45, 84–95 (2015)
Sundermann, D., Ney, H.: VTLN-based voice conversion. In: Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology, pp. 556–559 (2003)
Kobayashi, K., Toda, T.: Sprocket: open-source voice conversion software. In: Proceedings of Odyssey 2018 The Speaker and Language Recognition Workshop, pp. 203–210 (2018)
Sprocket. https://github.com/k2kobayashi/sprocket
Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S.: Deep neural network embeddings for text-independent speaker verification. In: Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 20–24 (2017)
Povey, D., et al.: The kaldi speech recognition toolkit. In: Proceedings of IEEE Workshop Automatic Speech Recognition and Understanding (ASRU), pp. 11–15 (2011)
Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 6–10 (2015)
SoundTouch audio processing library. http://www.surina.net/soundtouch
Voice-Conversion. https://github.com/DenisStad/Voice-Conversion
Acknowledgement
The authors would like to thank Linlin Zheng and Yujun Wang for helpful discussions.
Funding
This work was supported by the Natural Science Foundation of Jiangsu Province (BK20180080) and the National Natural Science Foundation of China (62071484).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, W., Li, J., Wei, C., Sun, M., Zhang, X., Li, Y. (2022). A Novel Method to Evaluate the Privacy Protection in Speaker Anonymization. In: Sun, X., Zhang, X., Xia, Z., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2022. Lecture Notes in Computer Science, vol 13339. Springer, Cham. https://doi.org/10.1007/978-3-031-06788-4_51
Download citation
DOI: https://doi.org/10.1007/978-3-031-06788-4_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06787-7
Online ISBN: 978-3-031-06788-4
eBook Packages: Computer ScienceComputer Science (R0)