Detection of Various Speech Forgery Operations Based on Recurrent Neural Network

Yan, Diqun; Wu, Tingting

doi:10.1007/978-981-15-9129-7_29

Detection of Various Speech Forgery Operations Based on Recurrent Neural Network

Conference paper
First Online: 22 October 2020

1688 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1268))

Abstract

Most existed algorithms of speech forensics have been proposed to detect specific forgery operations. In realistic scenes, however, it is difficult to predict the type of the forgery. Since the suspicious speech might have been processed by some unknown forgery operation, it will give a confusing result based on a classifier for a specific forgery operation. To this end, a forensic algorithm based on recurrent neural network (RNN) and linear frequency cepstrum coefficients (LFCC) is proposed to detect four common forgery operations. The LFCC with its derivative coefficients is determined as the forensic feature. An RNN frame with two-layer LSTM is designed with preliminary experiments. Extensive experiments on TIMIT and UME databases show that the detection accuracy for the intra-database evaluation can achieve about 99%, and the detection accuracy for the cross-database can achieve higher than 88%. Finally, compared with the previous algorithm, better performance is obtained by the proposed algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Luo, D., Yang, R., Li, B., et al.: Detection of double compressed AMR audio using stacked autoencoder. IEEE Trans. Inf. Forensics Secur. 12(2), 432–444 (2017)
Article Google Scholar
Jing, X.U., Xia, J.: Digital audio resampling detection based on sparse representation classifier and periodicity of second derivative. J. Digit. Inf. Manag. 13(2), 101–109 (2015)
Google Scholar
Gaka, J., Grzywacz, M., Samborski, R.: Playback attack detection for text-dependent speaker verification over telephone channels. Speech Commun. 67, 143–153 (2015)
Article Google Scholar
Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., Shchemelinin, V.: Audio-replay attack detection countermeasures. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 171–181. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_16
Chapter Google Scholar
Wu, H., Wang, Y., Huang, J.: Identification of electronic disguised speech. IEEE Trans. Inf. Forensics Secur. 9(3), 489–500 (2014)
Article Google Scholar
Cao, W., Wang, H., Zhao, H., Qian, Q., Abdullahi, S.M.: Identification of electronic disguised voices in the noisy environment. In: Shi, Y.Q., Kim, H.J., Perez-Gonzalez, F., Liu, F. (eds.) IWDW 2016. LNCS, vol. 10082, pp. 75–87. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53465-7_6
Chapter Google Scholar
Jeong, B.G., Moon, Y.H., Eom, I.K.: Blind identification of image manipulation type using mixed statistical moments. J. Electron. Imaging 24(1), 013029 (2015)
Article Google Scholar
Li, H., Luo, W., Qiu, X., et al.: Identification of various image operations using residual-based features. IEEE Trans. Circuits Syst. Video Technol. 28(1), 31–45 (2018)
Article Google Scholar
Chen, Q., Luo, W., Luo, D.: Identification of audio processing operations based on convolutional neural network. In: ACM Workshop on Information Hiding and Multimedia Security, Innsbruck, pp. 73–77 (2018)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, pp. 1–9. IEEE (2015)
Google Scholar
Liu, Y., Qian, Y., Chen, N., et al.: Deep feature for text-dependent speaker verification. Speech Commun. 73, 1–13 (2015)
Article Google Scholar
Tian, X., Wu, Z., Xiao, X., et al.: Spoofing detection from a feature representation perspective. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, pp. 2119–2123. IEEE (2016)
Google Scholar
Variani, E., Lei, X., Mcdermott, E., et al.: Deep neural networks for small footprint text-dependent speaker verification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, pp. 4052–4056. IEEE (2014)
Google Scholar
Rana, M., Miglani, S.: Performance analysis of MFCC and LPCC techniques in automatic speech recognition. Int. J. Eng. Comput. Sci. 3(8), 7727–7732 (2014)
Google Scholar
Chen, B., Luo, W., Li, H.: Audio steganalysis with convolutional neural network. In: Conference: the 5th ACM Workshop, Philadelphia, pp. 85–90 (2017)
Google Scholar
Sak, H., Senior, A., Rao, K., et al.: Learning acoustic frame labeling for speech recognition with recurrent neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, pp. 4280–4284. IEEE (2015)
Google Scholar
Timit Acoustic-Phonetic Continuous Speech Corpus. https://catalog.ldc.upenn.edu/LDC93S1. Accessed 20 Feb 2017
Advanced Utilization of Multimedia to Promote Higher Education Reform Speech Database. http://research.nii.ac.jp/src/en/UME-ERJ.html. Accessed 27 Feb 2017
Wu, T.: Digital speech forensics algorithm for multiple forgery operations. Wirel. Commun. Technol. 28(3), 37–44 (2019). (in Chinese)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Science and Engineering, Ningbo University, Ningbo, 315211, China
Diqun Yan & Tingting Wu

Authors

Diqun Yan
View author publications
You can also search for this author in PubMed Google Scholar
Tingting Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diqun Yan .

Editor information

Editors and Affiliations

University of Technology Sydney, Sydney, NSW, Australia
Shui Yu
IBM Zurich Research Laboratory, Zurich, Switzerland
Peter Mueller
Ningbo University, Ningbo, China
Jiangbo Qian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yan, D., Wu, T. (2020). Detection of Various Speech Forgery Operations Based on Recurrent Neural Network. In: Yu, S., Mueller, P., Qian, J. (eds) Security and Privacy in Digital Economy. SPDE 2020. Communications in Computer and Information Science, vol 1268. Springer, Singapore. https://doi.org/10.1007/978-981-15-9129-7_29

Download citation

DOI: https://doi.org/10.1007/978-981-15-9129-7_29
Published: 22 October 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9128-0
Online ISBN: 978-981-15-9129-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics