Abstract
With the rapid increase in user-generated multimedia content, extensive outreach over social media, and their potential in critical applications such as law enforcement, sourcey identification from re-compressed and noisy multimedia are of great importance. This paper proposes a system for speaker-independent cell-phone identification from recorded audio. This system is capable of dealing with test audio with different speech content and a different speaker compared to the training audio. Each recorded audio has the device fingerprint implicitly embedded in it, which encourages us to design a CNN-based system for learning the device-specific signatures directly from the magnitude of discrete Fourier transform of the audio. This paper also addresses the scenario where the recorded audio is re-compressed due to efficient storage and network transmission requirements, which is a common phenomenon in this age of social media. The scenario of the cell-phone classification from the audio recordings in the presence of additive white Gaussian noise is addressed as well. We show that our proposed system performs as well as the state-of-art systems for the speaker-dependent case with clean audio recordings and exhibits much higher robustness in the speaker-independent case with clean, re-compressed, and noisy audio recordings.
Similar content being viewed by others
Data Availability
The dataset created by the authors might be made publicly available depending on the permissions received from the funding agency.
Code Availability
The code created by the authors for this paper might be made publicly available depending on the permissions received from the funding agency.
References
Aggarwal R, Singh S, Roul AK, Khanna N (2014) Cellphone identification using noise estimates from recorded audio. In: International conference on communications and signal processing (ICCSP). IEEE, pp 1218–1222
Baldini G, Amerini I (2019) Smartphones identification through the built-in microphones with convolutional neural network. IEEE Access 7:158685–158696
Baldini G, Amerini I, Gentile C (2019) Microphone identification using. Convolutional Neural Networks. IEEE Sensors Letters
Bellard F, Niedermayer M, et al. (2019) FFmpeg. Available from: http://ffmpeg.org
Buchholz R, Kraetzer C, Dittmann J (2009) Microphone classification using Fourier coefficients. In: International workshop on information hiding. Springer, pp 235–246
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3):1–27
Cuccovillo L, Aichroth P (2016) Open-set microphone classification via blind channel analysis. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2074–2078
Cuccovillo L, Mann S, Tagliasacchi M, Aichroth P (2013) Audio tampering detection via microphone classification. In: IEEE 15th international workshop on multimedia signal processing (MMSP). IEEE, pp 177–182
Eskidere Ö (2014) Source microphone identification from speech recordings based on a Gaussian mixture model. Turkish Journal of Electrical Engineering & Computer Sciences 22(3):754–767
Eskidere Ö (2016) Identifying acquisition devices from recorded speech signals using wavelet-based features. Turkish Journal of Electrical Engineering & Computer Sciences 24(3):1942–1954
Garcia-Romero D, Espy-Wilson CY (2010) Automatic acquisition device identification from speech recordings. In: International conference on acoustics speech and signal processing (ICASSP). IEEE, pp 1806–1809
Hanilçi C, Ertas F (2013) Optimizing acoustic features for source cell-phone recognition using speech signals. In: Proceedings of the first ACM workshop on information hiding and multimedia security. ACM, pp 141–148
Hanilçi C, Ertas F, Ertas T, Eskidere Ö (2012) Recognition of brand and models of cell-phones from recorded speech signals. IEEE Trans Inform Forensics Secur 7(2):625–634
Hanilçi C, Kinnunen T (2014) Source cell-phone recognition from recorded speech using non-speech segments. Digital Signal Processing 35:75–85
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Ikram S, Malik H (2012) Microphone identification using higher-order statistics. In: Audio engineering society conference: 46th international conference: audio forensics. Audio Engineering Society
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167
Jiang Y, Leung FH (2019) Source microphone recognition aided by a kernel-based projection method. IEEE Trans Inform Forensics Secur 14(11):2875–2886
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Kotropoulos C, Samaras S (2014) Mobile phone identification using recorded speech signals. In: 19th international conference on digital signal processing (DSP). IEEE, pp 586–591
Kraetzer C, Oermann A, Dittmann J, Lang A (2007) Digital audio forensics: a first practical evaluation on microphone and environment classification. In: Proceedings of the 9th workshop on multimedia & security. ACM, pp 63–74
Kraetzer C, Schott M, Dittmann J (2009) Unweighted fusion in microphone forensics using a decision tree and linear logistic regression models. In: Proceedings of the 11th ACM workshop on Multimedia and security, pp 49–56
Kurniawan F, Rahim M, Mohd S, Khalil MS, Khan MK (2016) Statistical based audio forensic on identical microphones. International Journal of Electrical & Computer Engineering (2088-8708) 6(5)
Li Y, Zhang X, Li X, Zhang Y, Yang J, He Q (2018) Mobile phone clustering from speech recordings using deep representation and spectral clustering. IEEE Trans Inform Forensics Secur 13(4):965–977
Luo D, Korus P, Huang J (2018) Band energy difference for source attribution in audio forensics. IEEE Trans Inform Forensics Secur 13(9):2179–2189
Luo D, Yang R, Li B, Huang J (2016) Detection of double compressed AMR audio using stacked autoencoder. IEEE Trans Inform Forensics Secur 12 (2):432–444
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605
O’Dea S (2020) Number of smartphone users worldwide from 2016 to 2021. https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/
Panagakis Y, Kotropoulos C (2012) Automatic telephone handset identification by sparse representation of random spectral features. In: Proceedings of the on multimedia and security, pp 91–96
Panagakis Y, Kotropoulos C (2012) Telephone handset identification by feature selection and sparse representations. In: IEEE international workshop on information forensics and security (WIFS). IEEE, pp 73–78
Pandey V, Verma VK, Khanna N (2014) Cell-phone identification from audio recordings using PSD of speech-free regions. In: IEEE students’ conference on electrical, electronics and computer science (SCEECS). IEEE, pp 1–6
Poisel R, Tjoa S (2011) Forensics investigations of multimedia data: a review of the state-of-the-art. In: Sixth international conference on IT security incident management and IT forensics. IEEE, pp 48–61
Qin T, Wang R, Yan D, Lin L (2018) Source cell-phone identification in the presence of additive noise from CQT domain. Information 9(8):205
Rabiner L, Schafer R (1978) Digital processing of speech signals. Prentice-Hall signal processing series. Prentice-Hall
Shen Y, Jia J, Cai L (2012) Detecting double compressed AMR-format audio recordings. In: Proc. of the 10th phonetics conference of China (PCC), pp 1–5
Stamm MC, Wu M, Liu KR (2013) Information forensics: an overview of the first decade. IEEE Access 1:167–200
Verma V, Agarwal N, Khanna N (2018) Dct-domain deep convolutional neural networks for multiple JPEG compression classification. Signal Processing: Image Communication 67:22–33
Verma V, Khanna N (2019) CNN-based system for speaker independent cell-phone identification from recorded audio. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 53–61
Verma V, Khaturia P, Khanna N (2018) Cell-phone identification from recompressed audio recordings. In: Twenty fourth national conference on communications (NCC). IEEE, pp 1–6
Vu HQ, Liu S, Yang X, Li Z, Ren Y (2012) Identifying microphone from noisy recordings by using representative instance one class-classification approach. Journal of Networks
Wang Q, Zhang R (2016) Double JPEG compression forensics based on a convolutional neural network. EURASIP Journal on Information Security 2016:23
Wojcicki K (2020) HTK MFCC MATLAB. https://in.mathworks.com/matlabcentral/fileexchange/32849-htk-mfcc-matlab
Zakariah M, Khan MK, Malik H (2018) Digital multimedia audio forensics: past, present and future. Multimed Tools Applic 77(1):1009–1040
Zou L, He Q, Wu J (2017) Source cell phone verification from speech recordings using sparse representation. Digital Signal Processing 62:125–136
Acknowledgements
We would like to thank Mr. Da Luo of Shenzhen University for providing us the feature extraction code of [25]. This material is based upon work partially supported by a grant from the Department of Science and Technology (DST), New Delhi, India, under Award Number ECR/2015/000583. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies.
Funding
This material is based upon work partially supported by a grant from the Department of Science and Technology (DST), New Delhi, India, under Award Number ECR/2015/000583.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A preliminary version of this paper appeared in IEEE CVPR’19 Workshop on Media Forensics [38].
Rights and permissions
About this article
Cite this article
Verma, V., Khanna, N. Speaker-independent source cell-phone identification for re-compressed and noisy audio recordings. Multimed Tools Appl 80, 23581–23603 (2021). https://doi.org/10.1007/s11042-020-10205-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10205-z