Skip to main content
Log in

Speaker-independent source cell-phone identification for re-compressed and noisy audio recordings

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the rapid increase in user-generated multimedia content, extensive outreach over social media, and their potential in critical applications such as law enforcement, sourcey identification from re-compressed and noisy multimedia are of great importance. This paper proposes a system for speaker-independent cell-phone identification from recorded audio. This system is capable of dealing with test audio with different speech content and a different speaker compared to the training audio. Each recorded audio has the device fingerprint implicitly embedded in it, which encourages us to design a CNN-based system for learning the device-specific signatures directly from the magnitude of discrete Fourier transform of the audio. This paper also addresses the scenario where the recorded audio is re-compressed due to efficient storage and network transmission requirements, which is a common phenomenon in this age of social media. The scenario of the cell-phone classification from the audio recordings in the presence of additive white Gaussian noise is addressed as well. We show that our proposed system performs as well as the state-of-art systems for the speaker-dependent case with clean audio recordings and exhibits much higher robustness in the speaker-independent case with clean, re-compressed, and noisy audio recordings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The dataset created by the authors might be made publicly available depending on the permissions received from the funding agency.

Code Availability

The code created by the authors for this paper might be made publicly available depending on the permissions received from the funding agency.

Notes

  1. https://in.mathworks.com/videos/managing-and-sharing-matlab-code-98671.html

  2. https://in.mathworks.com/videos/top-10-productivity-tools-in-matlab-95250.html

  3. https://in.mathworks.com/videos/spectral-analysis-with-matlab-95557.html

References

  1. Aggarwal R, Singh S, Roul AK, Khanna N (2014) Cellphone identification using noise estimates from recorded audio. In: International conference on communications and signal processing (ICCSP). IEEE, pp 1218–1222

  2. Baldini G, Amerini I (2019) Smartphones identification through the built-in microphones with convolutional neural network. IEEE Access 7:158685–158696

    Article  Google Scholar 

  3. Baldini G, Amerini I, Gentile C (2019) Microphone identification using. Convolutional Neural Networks. IEEE Sensors Letters

  4. Bellard F, Niedermayer M, et al. (2019) FFmpeg. Available from: http://ffmpeg.org

  5. Buchholz R, Kraetzer C, Dittmann J (2009) Microphone classification using Fourier coefficients. In: International workshop on information hiding. Springer, pp 235–246

  6. Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3):1–27

    Article  Google Scholar 

  7. Cuccovillo L, Aichroth P (2016) Open-set microphone classification via blind channel analysis. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2074–2078

  8. Cuccovillo L, Mann S, Tagliasacchi M, Aichroth P (2013) Audio tampering detection via microphone classification. In: IEEE 15th international workshop on multimedia signal processing (MMSP). IEEE, pp 177–182

  9. Eskidere Ö (2014) Source microphone identification from speech recordings based on a Gaussian mixture model. Turkish Journal of Electrical Engineering & Computer Sciences 22(3):754–767

    Article  Google Scholar 

  10. Eskidere Ö (2016) Identifying acquisition devices from recorded speech signals using wavelet-based features. Turkish Journal of Electrical Engineering & Computer Sciences 24(3):1942–1954

    Article  Google Scholar 

  11. Garcia-Romero D, Espy-Wilson CY (2010) Automatic acquisition device identification from speech recordings. In: International conference on acoustics speech and signal processing (ICASSP). IEEE, pp 1806–1809

  12. Hanilçi C, Ertas F (2013) Optimizing acoustic features for source cell-phone recognition using speech signals. In: Proceedings of the first ACM workshop on information hiding and multimedia security. ACM, pp 141–148

  13. Hanilçi C, Ertas F, Ertas T, Eskidere Ö (2012) Recognition of brand and models of cell-phones from recorded speech signals. IEEE Trans Inform Forensics Secur 7(2):625–634

    Article  Google Scholar 

  14. Hanilçi C, Kinnunen T (2014) Source cell-phone recognition from recorded speech using non-speech segments. Digital Signal Processing 35:75–85

    Article  Google Scholar 

  15. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

  16. Ikram S, Malik H (2012) Microphone identification using higher-order statistics. In: Audio engineering society conference: 46th international conference: audio forensics. Audio Engineering Society

  17. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167

  18. Jiang Y, Leung FH (2019) Source microphone recognition aided by a kernel-based projection method. IEEE Trans Inform Forensics Secur 14(11):2875–2886

    Article  Google Scholar 

  19. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  20. Kotropoulos C, Samaras S (2014) Mobile phone identification using recorded speech signals. In: 19th international conference on digital signal processing (DSP). IEEE, pp 586–591

  21. Kraetzer C, Oermann A, Dittmann J, Lang A (2007) Digital audio forensics: a first practical evaluation on microphone and environment classification. In: Proceedings of the 9th workshop on multimedia & security. ACM, pp 63–74

  22. Kraetzer C, Schott M, Dittmann J (2009) Unweighted fusion in microphone forensics using a decision tree and linear logistic regression models. In: Proceedings of the 11th ACM workshop on Multimedia and security, pp 49–56

  23. Kurniawan F, Rahim M, Mohd S, Khalil MS, Khan MK (2016) Statistical based audio forensic on identical microphones. International Journal of Electrical & Computer Engineering (2088-8708) 6(5)

  24. Li Y, Zhang X, Li X, Zhang Y, Yang J, He Q (2018) Mobile phone clustering from speech recordings using deep representation and spectral clustering. IEEE Trans Inform Forensics Secur 13(4):965–977

    Article  Google Scholar 

  25. Luo D, Korus P, Huang J (2018) Band energy difference for source attribution in audio forensics. IEEE Trans Inform Forensics Secur 13(9):2179–2189

    Article  Google Scholar 

  26. Luo D, Yang R, Li B, Huang J (2016) Detection of double compressed AMR audio using stacked autoencoder. IEEE Trans Inform Forensics Secur 12 (2):432–444

    Article  Google Scholar 

  27. van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605

    MATH  Google Scholar 

  28. O’Dea S (2020) Number of smartphone users worldwide from 2016 to 2021. https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/

  29. Panagakis Y, Kotropoulos C (2012) Automatic telephone handset identification by sparse representation of random spectral features. In: Proceedings of the on multimedia and security, pp 91–96

  30. Panagakis Y, Kotropoulos C (2012) Telephone handset identification by feature selection and sparse representations. In: IEEE international workshop on information forensics and security (WIFS). IEEE, pp 73–78

  31. Pandey V, Verma VK, Khanna N (2014) Cell-phone identification from audio recordings using PSD of speech-free regions. In: IEEE students’ conference on electrical, electronics and computer science (SCEECS). IEEE, pp 1–6

  32. Poisel R, Tjoa S (2011) Forensics investigations of multimedia data: a review of the state-of-the-art. In: Sixth international conference on IT security incident management and IT forensics. IEEE, pp 48–61

  33. Qin T, Wang R, Yan D, Lin L (2018) Source cell-phone identification in the presence of additive noise from CQT domain. Information 9(8):205

    Article  Google Scholar 

  34. Rabiner L, Schafer R (1978) Digital processing of speech signals. Prentice-Hall signal processing series. Prentice-Hall

  35. Shen Y, Jia J, Cai L (2012) Detecting double compressed AMR-format audio recordings. In: Proc. of the 10th phonetics conference of China (PCC), pp 1–5

  36. Stamm MC, Wu M, Liu KR (2013) Information forensics: an overview of the first decade. IEEE Access 1:167–200

    Article  Google Scholar 

  37. Verma V, Agarwal N, Khanna N (2018) Dct-domain deep convolutional neural networks for multiple JPEG compression classification. Signal Processing: Image Communication 67:22–33

    Google Scholar 

  38. Verma V, Khanna N (2019) CNN-based system for speaker independent cell-phone identification from recorded audio. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 53–61

  39. Verma V, Khaturia P, Khanna N (2018) Cell-phone identification from recompressed audio recordings. In: Twenty fourth national conference on communications (NCC). IEEE, pp 1–6

  40. Vu HQ, Liu S, Yang X, Li Z, Ren Y (2012) Identifying microphone from noisy recordings by using representative instance one class-classification approach. Journal of Networks

  41. Wang Q, Zhang R (2016) Double JPEG compression forensics based on a convolutional neural network. EURASIP Journal on Information Security 2016:23

  42. Wojcicki K (2020) HTK MFCC MATLAB. https://in.mathworks.com/matlabcentral/fileexchange/32849-htk-mfcc-matlab

  43. Zakariah M, Khan MK, Malik H (2018) Digital multimedia audio forensics: past, present and future. Multimed Tools Applic 77(1):1009–1040

    Article  Google Scholar 

  44. Zou L, He Q, Wu J (2017) Source cell phone verification from speech recordings using sparse representation. Digital Signal Processing 62:125–136

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Mr. Da Luo of Shenzhen University for providing us the feature extraction code of [25]. This material is based upon work partially supported by a grant from the Department of Science and Technology (DST), New Delhi, India, under Award Number ECR/2015/000583. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies.

Funding

This material is based upon work partially supported by a grant from the Department of Science and Technology (DST), New Delhi, India, under Award Number ECR/2015/000583.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nitin Khanna.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this paper appeared in IEEE CVPR’19 Workshop on Media Forensics [38].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Verma, V., Khanna, N. Speaker-independent source cell-phone identification for re-compressed and noisy audio recordings. Multimed Tools Appl 80, 23581–23603 (2021). https://doi.org/10.1007/s11042-020-10205-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10205-z

Keywords

Navigation