Abstract
Most existed algorithms of speech forensics have been proposed to detect specific forgery operations. In realistic scenes, however, it is difficult to predict the type of the forgery. Since the suspicious speech might have been processed by some unknown forgery operation, it will give a confusing result based on a classifier for a specific forgery operation. To this end, a forensic algorithm based on recurrent neural network (RNN) and linear frequency cepstrum coefficients (LFCC) is proposed to detect four common forgery operations. The LFCC with its derivative coefficients is determined as the forensic feature. An RNN frame with two-layer LSTM is designed with preliminary experiments. Extensive experiments on TIMIT and UME databases show that the detection accuracy for the intra-database evaluation can achieve about 99%, and the detection accuracy for the cross-database can achieve higher than 88%. Finally, compared with the previous algorithm, better performance is obtained by the proposed algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Luo, D., Yang, R., Li, B., et al.: Detection of double compressed AMR audio using stacked autoencoder. IEEE Trans. Inf. Forensics Secur. 12(2), 432–444 (2017)
Jing, X.U., Xia, J.: Digital audio resampling detection based on sparse representation classifier and periodicity of second derivative. J. Digit. Inf. Manag. 13(2), 101–109 (2015)
Gaka, J., Grzywacz, M., Samborski, R.: Playback attack detection for text-dependent speaker verification over telephone channels. Speech Commun. 67, 143–153 (2015)
Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., Shchemelinin, V.: Audio-replay attack detection countermeasures. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 171–181. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_16
Wu, H., Wang, Y., Huang, J.: Identification of electronic disguised speech. IEEE Trans. Inf. Forensics Secur. 9(3), 489–500 (2014)
Cao, W., Wang, H., Zhao, H., Qian, Q., Abdullahi, S.M.: Identification of electronic disguised voices in the noisy environment. In: Shi, Y.Q., Kim, H.J., Perez-Gonzalez, F., Liu, F. (eds.) IWDW 2016. LNCS, vol. 10082, pp. 75–87. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53465-7_6
Jeong, B.G., Moon, Y.H., Eom, I.K.: Blind identification of image manipulation type using mixed statistical moments. J. Electron. Imaging 24(1), 013029 (2015)
Li, H., Luo, W., Qiu, X., et al.: Identification of various image operations using residual-based features. IEEE Trans. Circuits Syst. Video Technol. 28(1), 31–45 (2018)
Chen, Q., Luo, W., Luo, D.: Identification of audio processing operations based on convolutional neural network. In: ACM Workshop on Information Hiding and Multimedia Security, Innsbruck, pp. 73–77 (2018)
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, pp. 1–9. IEEE (2015)
Liu, Y., Qian, Y., Chen, N., et al.: Deep feature for text-dependent speaker verification. Speech Commun. 73, 1–13 (2015)
Tian, X., Wu, Z., Xiao, X., et al.: Spoofing detection from a feature representation perspective. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, pp. 2119–2123. IEEE (2016)
Variani, E., Lei, X., Mcdermott, E., et al.: Deep neural networks for small footprint text-dependent speaker verification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, pp. 4052–4056. IEEE (2014)
Rana, M., Miglani, S.: Performance analysis of MFCC and LPCC techniques in automatic speech recognition. Int. J. Eng. Comput. Sci. 3(8), 7727–7732 (2014)
Chen, B., Luo, W., Li, H.: Audio steganalysis with convolutional neural network. In: Conference: the 5th ACM Workshop, Philadelphia, pp. 85–90 (2017)
Sak, H., Senior, A., Rao, K., et al.: Learning acoustic frame labeling for speech recognition with recurrent neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, pp. 4280–4284. IEEE (2015)
Timit Acoustic-Phonetic Continuous Speech Corpus. https://catalog.ldc.upenn.edu/LDC93S1. Accessed 20 Feb 2017
Advanced Utilization of Multimedia to Promote Higher Education Reform Speech Database. http://research.nii.ac.jp/src/en/UME-ERJ.html. Accessed 27 Feb 2017
Wu, T.: Digital speech forensics algorithm for multiple forgery operations. Wirel. Commun. Technol. 28(3), 37–44 (2019). (in Chinese)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yan, D., Wu, T. (2020). Detection of Various Speech Forgery Operations Based on Recurrent Neural Network. In: Yu, S., Mueller, P., Qian, J. (eds) Security and Privacy in Digital Economy. SPDE 2020. Communications in Computer and Information Science, vol 1268. Springer, Singapore. https://doi.org/10.1007/978-981-15-9129-7_29
Download citation
DOI: https://doi.org/10.1007/978-981-15-9129-7_29
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9128-0
Online ISBN: 978-981-15-9129-7
eBook Packages: Computer ScienceComputer Science (R0)