skip to main content
10.1145/2482513.2482520acmconferencesArticle/Chapter ViewAbstractPublication Pagesih-n-mmsecConference Proceedingsconference-collections
research-article

Optimizing acoustic features for source cell-phone recognition using speech signals

Published: 17 June 2013 Publication History

Abstract

This paper presents comparison and optimization of acoustic features for source cell-phone recognition using recorded speech signals. Different acoustic feature extraction methods such as Mel-frequency, linear frequency and Bark frequency cepstral coefficients (MFCC, LFCC and BFCC) and linear prediction cepstral coefficients (LPCC) are considered. In addition to different feature sets, the effect of dynamic features, delta and double-delta coefficients (Δ and Δ2), and feature normalizations, cepstral mean normalization (CMN), cepstral variance normalization (CVN) and cepstral mean and variance normalization (CMVN) are also examined on the performance of source cell-phone recognition. The same support vector machine (SVM) classifier with fixed parameters and the same cell-phone dataset are used in the experiments in order to make a fair comparison of different features and feature normalization techniques.

References

[1]
İ. Avcıbaş. Audio steganalysis with content-independent distortion measures. IEEE Signal Processing Letters, 13(2):92--95, Feb. 2006.
[2]
İ. Avcıbaş, N. D. Memon, and B. Sankur. Steganalysis using image quality metrics. IEEE Transactions on Image Processing, 12(2):221--229, 2003.
[3]
S. Bayram,.I. Avcıbaş, B. Sankur, and N. Memon. Image manipulation detection. Journal of Electronic Imaging, 15(4):1--17, Dec. 2006.
[4]
F. Bimbot, J.-F. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-Garcia, D. Petrovska-Delacrétaz, and D. A. Reynolds. A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing., 2004(4):430--451, 2004.
[5]
W. M. Campbell. Generalized linear discriminant sequence kernels for speaker recognition. In Proceedings of the IEEE Int. Conf. Audio, Speech and Sig. Processing (ICASSP'02), pages 161--164, 2002.
[6]
W. M. Campbell, J. P. Campbell, D. A. Reynolds, E. Singer, and P. A. Torres-Carrasquillo. Support vector machines for speaker and language recognition. Computer Speech & Language, 20(2--3):210--229, 2006.
[7]
C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):1--27, 2011. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
[8]
K. Daoudi and J. Louradour. A comparison between sequence kernels for SVM speaker verification. In Proceedings of the IEEE Int. Conf. Audio, Speech and Sig. Processing (ICASSP'09), pages 4241--4244, 2009.
[9]
A. E. Dirik, H. T. Sencar, and N. D. Memon. Flatbed scanner identification based on dust and scratches over scanner platen. In Proceedings of the IEEE Int. Conf. Audio, Speech and Sig. Processing (ICASSP'09), pages 1385--1388, 2009.
[10]
S. Furui. Digital Speech Processing, Synthesis, and Recognition. New York and Basel: Marcel Dekker, Inc., 1989.
[11]
C. Hanilçi and F. Ertaş. Investigation of the effect of data duration and speaker gender on text-independent speaker recognition. Computers & Electrical Engineering, 39(2):441--452, 2013.
[12]
C. Hanilçi, F. Ertaş, T. Ertaş, and Ö. Eskidere. Recognition of brand and model of cell-phones from recorded speech signals. IEEE Transactions on Information Forensics and Security, 7(2):625--634, 2012.
[13]
N. Khanna. Scanner identification using feature-based processing and analysis. IEEE Transactions on Information Forensics and Security, 4(1):123--139, 2009.
[14]
N. Khanna, A. K. Mikkilineni, A. F. Martone, G. N. Ali, G. T. C. Chiu, J. P. Allebach, and E. J. Delp. A survey of forensic characterization methods for physical devices. Digital Investigation, 3:17--28, Sept. 2006.
[15]
B. E. Koenig. Authentication of forensic audio recordings. Journal of Audio Engineering Society, 38(1--2):3--33, Jan.-Feb. 1990.
[16]
B. E. Koenig and D. S. Lacey. Forensic authentication of digital audio recordings. Journal of Audio Engineering Society, 57(9):662--695, Sept. 2009.
[17]
F.-H. Liu, R. M. Stern, X. Huang, and A. Acero. Efficient cepstral normalization for robust speech recognition. In Proceedings of the Workshop on Human Language Technology, pages 69--74, 1993.
[18]
Q. Liu, A. H. Sung, and M. Qiao. Temporal derivative-based spectrum and mel-cepstrum audio steganalysis. IEEE Transactions on Information Forensics and Security, 4(3):359--368, 2009.
[19]
Q. Liu, A. H. Sung, and M. Qiao. Derivative-based audio steganalysis. ACM Transactions on Multimedia Computing, Communications and Applications, 7(3):18:1--18:19, 2011.
[20]
P. C. Loizou. Speech Enhancement: Theory and Practice . CRC Press, 1st edition, June 2007.
[21]
J. Lukáŝ, J. Fridrich, and M. Goljan. Digital camera identification from sensor pattern noise. IEEE Transactions on Information Forensics and Security, 1(2):205--214, June 2006.
[22]
J. Makhoul. Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4):561--580, Apr. 1975.
[23]
R. C. Mayer. Audio forensic examination. IEEE Signal Processing Magazine, 26(2):84--94, March 2009.
[24]
A. K. Mikkilineni, N. Khanna, and E. J. Delp. Texture based attacks on intrinsic signature based printer identification. In Proceedings of the Media Forensics and Security, volume 7541, 2010.
[25]
Y. Panagakis and C. Kotropoulos. Automatic telephone handset identification by sparse representation of random spectral features. In Proceedings of the Multimedia and Security, pages 91--96. ACM, 2012.
[26]
D. A. Reynolds. Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing, 2(4), Oct. 1994.
[27]
D. A. Reynolds. Large Population Speaker Identification Using Clean and Telephone Speech. IEEE Signal Processing Letters, 2:46--48, Mar. 1995.
[28]
D. P. N. Rodríguez, J. A. Apolinário, and L. W. P. Biscainho. Audio authenticity: detecting ENF discontinuity with high precision phase analysis. IEEE Transactions on Information Forensics and Security, 5(3):534--543, Sept. 2010.
[29]
P. Rose. Forensic Speaker Identification. CRC Press, July 2002.
[30]
B. J. Shannon and K. K. Paliwal. A comparative study of filter bank spacing for speech recognition. In Proceedings of the Microelectronic Engineering Research Conference, 2003.
[31]
D. Sharma, P. A. Naylor, N. D. Gaubitch, and M. Brookes. Non intrusive codec identification algorithm. In Proceedings of the IEEE Int. Conf. Audio, Speech and Sig. Processing (ICASSP-2012), pages 4477--4480, 2012.
[32]
T.-F. Wu, C.-J. Lin, and R. C. Weng. Probability estimates for multi-class classification by pairwise coupling. The Journal of Machine Learning Research, 5:975--1005, 2004.
[33]
R. Zheng, S. Zhang, and B. Xu. A comparative study of feature and score normalization for speaker verification. In Proceedings of the 2006 International Conference on Advances in Biometrics, ICB'06, pages 531--538, Berlin, Heidelberg, 2006. Springer-Verlag.

Cited By

View all
  • (2024)I-vector and variability compensation techniques for mobile phone recognitionSTUDIES IN ENGINEERING AND EXACT SCIENCES10.54021/seesv5n2-3685:2(e9486)Online publication date: 21-Oct-2024
  • (2024)Squeeze-and-Excitation Self-Attention Mechanism Enhanced Digital Audio Source Recognition Based on Transfer LearningCircuits, Systems, and Signal Processing10.1007/s00034-024-02850-844:1(480-512)Online publication date: 13-Sep-2024
  • (2023)An End-to-End Transfer Learning Framework of Source Recording Device Identification for Audio Sustainable SecuritySustainability10.3390/su15141127215:14(11272)Online publication date: 19-Jul-2023
  • Show More Cited By

Index Terms

  1. Optimizing acoustic features for source cell-phone recognition using speech signals

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IH&MMSec '13: Proceedings of the first ACM workshop on Information hiding and multimedia security
    June 2013
    242 pages
    ISBN:9781450320818
    DOI:10.1145/2482513
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 June 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. acoustic features
    2. audio forensics
    3. feature normalization
    4. source cell-phone recognition

    Qualifiers

    • Research-article

    Conference

    IH&MMSec '13
    Sponsor:

    Acceptance Rates

    IH&MMSec '13 Paper Acceptance Rate 27 of 74 submissions, 36%;
    Overall Acceptance Rate 128 of 318 submissions, 40%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)I-vector and variability compensation techniques for mobile phone recognitionSTUDIES IN ENGINEERING AND EXACT SCIENCES10.54021/seesv5n2-3685:2(e9486)Online publication date: 21-Oct-2024
    • (2024)Squeeze-and-Excitation Self-Attention Mechanism Enhanced Digital Audio Source Recognition Based on Transfer LearningCircuits, Systems, and Signal Processing10.1007/s00034-024-02850-844:1(480-512)Online publication date: 13-Sep-2024
    • (2023)An End-to-End Transfer Learning Framework of Source Recording Device Identification for Audio Sustainable SecuritySustainability10.3390/su15141127215:14(11272)Online publication date: 19-Jul-2023
    • (2023)Audio Splicing Detection and Localization Based on Acquisition Device TracesIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.329341518(4157-4172)Online publication date: 2023
    • (2023)Audio Source Verification Method Based on Structural Re-parameterization Network2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)10.1109/AINIT59027.2023.10212478(756-759)Online publication date: 16-Jun-2023
    • (2021)Acoustic Imaging Using the Built-In Sensors of a SmartphoneSymmetry10.3390/sym1306106513:6(1065)Online publication date: 14-Jun-2021
    • (2021)Spatial and temporal learning representation for end-to-end recording device identificationEURASIP Journal on Advances in Signal Processing10.1186/s13634-021-00763-12021:1Online publication date: 17-Jul-2021
    • (2021)Speaker-independent source cell-phone identification for re-compressed and noisy audio recordingsMultimedia Tools and Applications10.1007/s11042-020-10205-zOnline publication date: 7-Jan-2021
    • (2019)Anti-Forensics of Audio Source Identification Using Generative Adversarial NetworkIEEE Access10.1109/ACCESS.2019.29600977(184332-184339)Online publication date: 2019
    • (2019)Detecting and locating digital audio forgeries based on singularity analysis with wavelet packetMultimedia Tools and Applications10.1007/s11042-014-2406-375:4(2303-2325)Online publication date: 17-Jan-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media