A spectrogram-based audio fingerprinting system for content-based copy detection

Ouali, Chahid; Dumouchel, Pierre; Gupta, Vishwa

doi:10.1007/s11042-015-3081-8

A spectrogram-based audio fingerprinting system for content-based copy detection

Published: 21 November 2015

Volume 75, pages 9145–9165, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chahid Ouali^1,2,
Pierre Dumouchel¹ &
Vishwa Gupta²

637 Accesses
3 Altmetric
Explore all metrics

Abstract

This paper presents a novel audio fingerprinting method that is highly robust to a variety of audio distortions. It is based on an unconventional audio fingerprint generation scheme. The robustness is achieved by generating different versions of the spectrogram matrix of the audio signal by using a threshold based on the average of the spectral values to prune this matrix. We transform each version of this pruned spectrogram matrix into a 2-D binary image. Multiple versions of these 2-D images suppress noise to a varying degree. This varying degree of noise suppression improves likelihood of one of the images matching a reference image. To speed up matching, we convert each image into an n-dimensional vector, and perform a nearest neighbor search based on this n-dimensional vector. We give results with two different feature parameters and their combination. We test this method on TRECVID 2010 content-based copy detection evaluation dataset, and we validate the performance on TRECVID 2009 dataset also. Experimental results show the effectiveness of these features even when the audio is distorted. We compare the proposed method to two state-of-the-art audio copy detection systems, namely NN-based and Shazam systems. Our method by far outperforms Shazam system for all audio transformations (or distortions) in terms of detection performance, number of missed queries and localization accuracy. Compared to NN-based system, our approach reduces minimal Normalized Detection Cost Rate (min NDCR) by 23 % and improves localization accuracy by 24 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A robust and low-cost video fingerprint extraction method for copy detection

Article 29 November 2016

Audio Fingerprinting System to Detect and Match Audio Recordings

Audio forgery detection and localization with super-resolution spectrogram and keypoint-based clustering approach

Article 25 June 2023

Notes

These are only approximations of the speed difference between the query and the corresponding reference. For example, −180 % means that the query is approximately 2.8 times slower than the reference (1 s of the query corresponds to approximately 0.36 s of the reference).

References

Anguera X, Garzon A, Adamek T (2012) Mask: robust local features for audio fingerprinting. In: 2012 13th IEEE International Conference on Multimedia and Expo, ICME 2012, July 9, 2012 - July 13, 2012, 455–460. Melbourne, VIC, Australia: IEEE Computer Society
Ayari M, Delhumeau J, Douze M, Jégou H, Potapov D, Revaud J, Schmid C, Yuan J(2011) Inria@Trecvid’2011: Copy Detection & Multimedia Event Detection. In: TRECVID workshop
Baluja S, Covell M (2007) Audio fingerprinting: combining computer vision data stream processing. In: 2007 I.E. International Conference on Acoustics, Speech, and Signal Processing, 15–20 April 2007, 213–16. Piscataway, NJ, USA: IEEE
Building Video Queries for Trecvid (2008) Copy Detection Task http://www-nlpir.nist.gov/projects/tv2010/TrecVid2008CopyQueries.pdf. Accessed January 2014
Cano P, Batle E, Kalker T, Haitsma J (2002) A review of algorithms for audio fingerprinting. In: 2002 I.E. 5th Workshop on Multimedia Signal Processing, 9–11 Dec. 2002, 169–73. Piscataway, NJ, USA: IEEE
Ellis D (2009) Robust landmark-based audio fingerprinting, Online Serial],(2009 May), Available at HTTP: http://labrosa.ee.columbia.edu/∼dpwe/resources/matlab/fingerprint, ci4
Gupta VN, Boulianne G, Cardinal P (2012) CRIM’s content-based audio copy detection system for Trecvid 2009. Multimed Tools Appl 60(2):371–87
Article Google Scholar
Haitsma J, Kalker T (2002) A highly robust audio fingerprinting system. In: Ismir
Hartung F, Kutter M (1999) Multimedia watermarking techniques. Proc IEEE 87(7):1079–1107
Article Google Scholar
Heritier M, Gupta V, Gagnon L, Boulianne G, Foucher S, Cardinal P (2009) CRIM’s content-based copy detection system for trecvid. In: Proc. TRECVID-2009. Gaithersburg, MD., USA
Jegou H, Delhumeau J, Jiangbo Y, Gravier G, Gros P (2012) Babaz: a large scale audio search system for video copy detection. In: 2012 I.E. International Conference on Acoustics, Speech and Signal Processing (ICASSP 2012), 25–30 March, 2369–72. Kyoto, Japan
Jiang M, Fang S, Tian YH, Huang T, Gao W (2011) Pku-Idm@ Trecvid 2011 Cbcd: content-based copy detection with cascade of multimodal features and temporal pyramid matching. In: TRECVID workshop
Lebosse J, Brun L, Pailles JC (2007) A robust audio fingerprint extraction algorithm. In: Proceedings of the Fourth IASTED International Conference on Signal Processing, Pattern Recognition and Applications, 14–16 Feb. 2007, 269–74. Anaheim, CA, USA: ACTA Press
Lezi W, Yuan D, Hongliang B, Jiwei Z, Chong H, Wei L (2012) Contented-based large scale web audio copy detection. In: 2012 I.E. International Conference on Multimedia and Expo (ICME), 9–13 July 2012, 961–6. Los Alamitos, CA, USA: IEEE Computer Society
Ouali C, Dumouchel P, Gupta V (2014) A robust audio fingerprinting method for content-based copy detection. In: International Workshop on Content-Based Multimedia Indexing. Austria
Ouali C, Dumouchel P, Gupta V (2014) Robust features for content-based audio copy detection. In: Fifteenth Annual Conference of the International Speech Communication Association. Singapore
Saracoglu A, Esen E, Ates TK, Acar BO, Zubari U, Ozan EC, Ozalp E, Alatan AA, Ciloglu T (2009) Content based copy detection with coarse audio-visual fingerprints. In: 2009 Seventh International Workshop on Content-Based Multimedia Indexing (CBMI), 3–5 June 2009, 213–18. Piscataway, NJ, USA: IEEE
Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and trecvid. In: 8th ACM Multimedia International Workshop on Multimedia Information Retrieval, MIR 2006, co-located with the 2006 ACM International Multimedia Conferenc, October 26, 2006 - October 27, 2006, 321–330. Santa Barbara, CA, United states: Association for Computing Machinery
Wang ALC (2003) An industrial-strength audio search algorithm. In: International Conference on Music Information Retrieval (ISMIR), pp 7–13
Yan K, Hoiem D, Sukthankar R (2005) Computer vision for music identification. In: Proceedings. 2005 I.E. Computer Society Conference on Computer Vision and Pattern Recognition, 20–25 June 2005, vol. 1, 597–604. Los Alamitos, CA, USA: IEEE Comput. Soc
Zhu B, Li W, Wang Z, Xue X (2010) A novel audio fingerprinting method robust to time scale modification and pitch shifting. In: 18th ACM International Conference on Multimedia ACM Multimedia 2010, MM’10, October 25, 2010 - October 29, 2010, 987–990. Firenze, Italy: Association for Computing Machinery

Download references

Author information

Authors and Affiliations

ÉTS (École de Technologie Supérieure), Montreal, Canada
Chahid Ouali & Pierre Dumouchel
CRIM (Computer Research Institute of Montreal), Montreal, Canada
Chahid Ouali & Vishwa Gupta

Authors

Chahid Ouali
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Dumouchel
View author publications
You can also search for this author in PubMed Google Scholar
Vishwa Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chahid Ouali.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ouali, C., Dumouchel, P. & Gupta, V. A spectrogram-based audio fingerprinting system for content-based copy detection. Multimed Tools Appl 75, 9145–9165 (2016). https://doi.org/10.1007/s11042-015-3081-8

Download citation

Received: 02 October 2014
Revised: 02 November 2015
Accepted: 17 November 2015
Published: 21 November 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s11042-015-3081-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A spectrogram-based audio fingerprinting system for content-based copy detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A robust and low-cost video fingerprint extraction method for copy detection

Audio Fingerprinting System to Detect and Match Audio Recordings

Audio forgery detection and localization with super-resolution spectrogram and keypoint-based clustering approach

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now