Skip to main content

A Novel Method to Evaluate the Privacy Protection in Speaker Anonymization

  • Conference paper
  • First Online:
Artificial Intelligence and Security (ICAIS 2022)

Abstract

The technique to hide the real identity of speakers is called speaker anonymization. Aiming at deceiving automatic speaker verification (ASV) systems, speaker anonymization is usually conducted by modifying the temporal or spectral properties of original voices, e.g., by pitch scaling, by vocal tract length normalization (VTLN) or by voice conversion (VC). However, the real identity of anonymized speech can be recovered with a careful re-training of ASVs, e.g., data augmentation by anonymizing voices of the same speaker. In order to evaluate the effectiveness of speaker anonymization, a pre-restoration method for both enrollment and testing data is proposed, investigated and compared for the de-anonymization of anonymized voices. Experimental results show that the pre-restoration method is effective to speaker de-anonymization. Moreover, it is also found that the pre-restoration for testing data performs better than that for enrollment data, which would also be useful to other decision-making tasks involving enrollment and testing stages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/kaldi-asr/kaldi.

References

  1. Qian, J., Du, H., Hou, J., Chen, L., Jung, T., Li, X.: Hidebehind: enjoy voice input with voiceprint unclonability and anonymity. In: Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems, pp. 82–94 (2018)

    Google Scholar 

  2. Zhou, L., Zhong, Q., Wang, T., Lu, S., Hu, H.: Speech enhancement via residual dense generative adversarial network. Comput. Syst. Sci. Eng. 38, 279–289 (2021)

    Article  Google Scholar 

  3. Nisar, S., Khan, M.A., Algarni, F., Wakeel, A., Uddin, M.I.: Speech recognition-based automated visual acuity testing with adaptive mel filter bank. Comput. Syst. Sci. Eng. 70, 2991–3004 (2022)

    Google Scholar 

  4. Kwon, M.S.: 1D-CNN: speech emotion recognition system using a stacked network with dilated CNN features. Comput. Mater. Continua 67, 4039–4059 (2021)

    Article  Google Scholar 

  5. Lalitha, S., Gupta, D., Zakariah, M., Alotaibi, Y.A.: Mental illness disorder diagnosis using emotion variation detection from continuous English speech. Comput. Mater. Continua 69, 3217–3238 (2021)

    Article  Google Scholar 

  6. Székely, E., Henter, G.E., Beskow, J., Gustafson, J.: Spontaneous conversational speech synthesis from found data. In: Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 4435–4439 (2019)

    Google Scholar 

  7. Arik, S.O., Chen, J., Peng, K., Ping, W., Zhou, Y.: Neural voice cloning with a few samples. arXiv:1802.06006 (2018)

  8. Gomez-Barrero, M., Galbally, J., Rathgeb, C., Busch, C.: General framework to evaluate unlinkability in biometric template protection systems. IEEE Trans. Inf. Forensics 13(6), 1406–1420 (2017)

    Article  Google Scholar 

  9. Fang, F., et al.: Speaker anonymization using x-vector and neural waveform models. In: Proceedings of 10th ISCA Speech Synthesis Workshop, pp. 155–160 (2019)

    Google Scholar 

  10. Hashimoto, K., Yamagishi, J., Echizen, I.: Privacy-preserving sound to degrade automatic speaker verification performance. In: Proceedings of 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5500–5504 (2016)

    Google Scholar 

  11. Jin, Q., Toth, A.R., Schultz, T., Black, A.W.: Speaker de-identification via voice transformation. In: Proceedings of 2009 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 529–533 (2009)

    Google Scholar 

  12. Patino, J., Tomashenko, N., Todisco, M., Nautsch, A., Evans, N.: Speaker anonymisation using the McAdams coefficient. In: Proceedings of Interspeech 2021, pp. 1099–1103. ISCA (2021)

    Google Scholar 

  13. Perero-Codosero, J.M., Espinoza-Cuadros, F.M., Hernández-Gómez, L.A.: X-vector anonymization using autoencoders and adversarial training for preserving speech privacy. Comput. Speech Lang. 2022, 10135 (2022)

    Google Scholar 

  14. Srivastava, B.M.L., Vauquier, N., Sahidullah, M., Bellet, A., Tommasi, M., Vincent, E.: Evaluating voice conversion-based privacy protection against informed attackers. In: Proceedings of 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 2802–2806 (2020)

    Google Scholar 

  15. Zheng, L., Li, J., Sun, M., Zhang, X., Zheng, T.F.: When automatic voice disguise meets automatic speaker verification. IEEE Trans. Inf. Forensics Secur. 16, 823–837 (2021)

    Google Scholar 

  16. Changrampadi, M.H., Shahina, A., Narayanan, M.B., Khan, A.: End-to-end speech recognition of Tamil language. Intell. Autom. Soft Comput. 32, 1309–1323 (2022)

    Article  Google Scholar 

  17. Wu, Z., Shen, C., Den, A.V.: Hengel: wider or deeper: revisiting the ResNet model for visual recognition. Pattern Recognit. 90, 119–133 (2019)

    Article  Google Scholar 

  18. Mateen, M., Wen, J., Song, S.: Fundus image classification using VGG-19 architecture with PCA and SVD. Symmetry 11(1), 1 (2018)

    Article  Google Scholar 

  19. Sherstinsky, A.: Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D 404, 132306 (2020)

    Article  MathSciNet  Google Scholar 

  20. Wang, Y., Wu, H., Huang, J.: Verification of hidden speaker behind transformation disguised voices. Digit. Signal Process. 45, 84–95 (2015)

    Article  Google Scholar 

  21. Sundermann, D., Ney, H.: VTLN-based voice conversion. In: Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology, pp. 556–559 (2003)

    Google Scholar 

  22. Kobayashi, K., Toda, T.: Sprocket: open-source voice conversion software. In: Proceedings of Odyssey 2018 The Speaker and Language Recognition Workshop, pp. 203–210 (2018)

    Google Scholar 

  23. Sprocket. https://github.com/k2kobayashi/sprocket

  24. Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S.: Deep neural network embeddings for text-independent speaker verification. In: Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 20–24 (2017)

    Google Scholar 

  25. Povey, D., et al.: The kaldi speech recognition toolkit. In: Proceedings of IEEE Workshop Automatic Speech Recognition and Understanding (ASRU), pp. 11–15 (2011)

    Google Scholar 

  26. Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 6–10 (2015)

    Google Scholar 

  27. SoundTouch audio processing library. http://www.surina.net/soundtouch

  28. Voice-Conversion. https://github.com/DenisStad/Voice-Conversion

Download references

Acknowledgement

The authors would like to thank Linlin Zheng and Yujun Wang for helpful discussions.

Funding

This work was supported by the Natural Science Foundation of Jiangsu Province (BK20180080) and the National Natural Science Foundation of China (62071484).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meng Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, W., Li, J., Wei, C., Sun, M., Zhang, X., Li, Y. (2022). A Novel Method to Evaluate the Privacy Protection in Speaker Anonymization. In: Sun, X., Zhang, X., Xia, Z., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2022. Lecture Notes in Computer Science, vol 13339. Springer, Cham. https://doi.org/10.1007/978-3-031-06788-4_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06788-4_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06787-7

  • Online ISBN: 978-3-031-06788-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics