Abstract
This paper presents a novel audio fingerprinting method that is highly robust to a variety of audio distortions. It is based on an unconventional audio fingerprint generation scheme. The robustness is achieved by generating different versions of the spectrogram matrix of the audio signal by using a threshold based on the average of the spectral values to prune this matrix. We transform each version of this pruned spectrogram matrix into a 2-D binary image. Multiple versions of these 2-D images suppress noise to a varying degree. This varying degree of noise suppression improves likelihood of one of the images matching a reference image. To speed up matching, we convert each image into an n-dimensional vector, and perform a nearest neighbor search based on this n-dimensional vector. We give results with two different feature parameters and their combination. We test this method on TRECVID 2010 content-based copy detection evaluation dataset, and we validate the performance on TRECVID 2009 dataset also. Experimental results show the effectiveness of these features even when the audio is distorted. We compare the proposed method to two state-of-the-art audio copy detection systems, namely NN-based and Shazam systems. Our method by far outperforms Shazam system for all audio transformations (or distortions) in terms of detection performance, number of missed queries and localization accuracy. Compared to NN-based system, our approach reduces minimal Normalized Detection Cost Rate (min NDCR) by 23 % and improves localization accuracy by 24 %.








Similar content being viewed by others
Notes
These are only approximations of the speed difference between the query and the corresponding reference. For example, −180 % means that the query is approximately 2.8 times slower than the reference (1 s of the query corresponds to approximately 0.36 s of the reference).
References
Anguera X, Garzon A, Adamek T (2012) Mask: robust local features for audio fingerprinting. In: 2012 13th IEEE International Conference on Multimedia and Expo, ICME 2012, July 9, 2012 - July 13, 2012, 455–460. Melbourne, VIC, Australia: IEEE Computer Society
Ayari M, Delhumeau J, Douze M, Jégou H, Potapov D, Revaud J, Schmid C, Yuan J(2011) Inria@Trecvid’2011: Copy Detection & Multimedia Event Detection. In: TRECVID workshop
Baluja S, Covell M (2007) Audio fingerprinting: combining computer vision data stream processing. In: 2007 I.E. International Conference on Acoustics, Speech, and Signal Processing, 15–20 April 2007, 213–16. Piscataway, NJ, USA: IEEE
Building Video Queries for Trecvid (2008) Copy Detection Task http://www-nlpir.nist.gov/projects/tv2010/TrecVid2008CopyQueries.pdf. Accessed January 2014
Cano P, Batle E, Kalker T, Haitsma J (2002) A review of algorithms for audio fingerprinting. In: 2002 I.E. 5th Workshop on Multimedia Signal Processing, 9–11 Dec. 2002, 169–73. Piscataway, NJ, USA: IEEE
Ellis D (2009) Robust landmark-based audio fingerprinting, Online Serial],(2009 May), Available at HTTP: http://labrosa.ee.columbia.edu/∼dpwe/resources/matlab/fingerprint, ci4
Gupta VN, Boulianne G, Cardinal P (2012) CRIM’s content-based audio copy detection system for Trecvid 2009. Multimed Tools Appl 60(2):371–87
Haitsma J, Kalker T (2002) A highly robust audio fingerprinting system. In: Ismir
Hartung F, Kutter M (1999) Multimedia watermarking techniques. Proc IEEE 87(7):1079–1107
Heritier M, Gupta V, Gagnon L, Boulianne G, Foucher S, Cardinal P (2009) CRIM’s content-based copy detection system for trecvid. In: Proc. TRECVID-2009. Gaithersburg, MD., USA
Jegou H, Delhumeau J, Jiangbo Y, Gravier G, Gros P (2012) Babaz: a large scale audio search system for video copy detection. In: 2012 I.E. International Conference on Acoustics, Speech and Signal Processing (ICASSP 2012), 25–30 March, 2369–72. Kyoto, Japan
Jiang M, Fang S, Tian YH, Huang T, Gao W (2011) Pku-Idm@ Trecvid 2011 Cbcd: content-based copy detection with cascade of multimodal features and temporal pyramid matching. In: TRECVID workshop
Lebosse J, Brun L, Pailles JC (2007) A robust audio fingerprint extraction algorithm. In: Proceedings of the Fourth IASTED International Conference on Signal Processing, Pattern Recognition and Applications, 14–16 Feb. 2007, 269–74. Anaheim, CA, USA: ACTA Press
Lezi W, Yuan D, Hongliang B, Jiwei Z, Chong H, Wei L (2012) Contented-based large scale web audio copy detection. In: 2012 I.E. International Conference on Multimedia and Expo (ICME), 9–13 July 2012, 961–6. Los Alamitos, CA, USA: IEEE Computer Society
Ouali C, Dumouchel P, Gupta V (2014) A robust audio fingerprinting method for content-based copy detection. In: International Workshop on Content-Based Multimedia Indexing. Austria
Ouali C, Dumouchel P, Gupta V (2014) Robust features for content-based audio copy detection. In: Fifteenth Annual Conference of the International Speech Communication Association. Singapore
Saracoglu A, Esen E, Ates TK, Acar BO, Zubari U, Ozan EC, Ozalp E, Alatan AA, Ciloglu T (2009) Content based copy detection with coarse audio-visual fingerprints. In: 2009 Seventh International Workshop on Content-Based Multimedia Indexing (CBMI), 3–5 June 2009, 213–18. Piscataway, NJ, USA: IEEE
Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and trecvid. In: 8th ACM Multimedia International Workshop on Multimedia Information Retrieval, MIR 2006, co-located with the 2006 ACM International Multimedia Conferenc, October 26, 2006 - October 27, 2006, 321–330. Santa Barbara, CA, United states: Association for Computing Machinery
Wang ALC (2003) An industrial-strength audio search algorithm. In: International Conference on Music Information Retrieval (ISMIR), pp 7–13
Yan K, Hoiem D, Sukthankar R (2005) Computer vision for music identification. In: Proceedings. 2005 I.E. Computer Society Conference on Computer Vision and Pattern Recognition, 20–25 June 2005, vol. 1, 597–604. Los Alamitos, CA, USA: IEEE Comput. Soc
Zhu B, Li W, Wang Z, Xue X (2010) A novel audio fingerprinting method robust to time scale modification and pitch shifting. In: 18th ACM International Conference on Multimedia ACM Multimedia 2010, MM’10, October 25, 2010 - October 29, 2010, 987–990. Firenze, Italy: Association for Computing Machinery
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ouali, C., Dumouchel, P. & Gupta, V. A spectrogram-based audio fingerprinting system for content-based copy detection. Multimed Tools Appl 75, 9145–9165 (2016). https://doi.org/10.1007/s11042-015-3081-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-3081-8