Abstract
The detection of near duplicate images in large databases, such as the ones of popular social networks, digital investigation archives, and surveillance systems, is an important task for a number of image forensics applications. In digital investigation, hashing techniques are commonly used to index large quantities of images for the detection of copies belonging to different archives. In the last few years, different image hashing techniques based on the Bags of Visual Features paradigm appeared in literature. Recently, this paradigm has been augmented by using multiple descriptors (e.g., Bags of Visual Phrases) in order to exploit the coherence between different feature spaces. In this paper we propose to further improve the Bags of Visual Phrases approach considering the coherence between feature spaces not only at the level of image representation, but also during the codebook generation phase. Also we introduce a novel image database specifically designed for the development and benchmarking of near duplicate image retrieval techniques. The dataset consists of more than 3,300 images depicting more than 500 different scenes having at least three real near duplicates. The dataset has a huge variability in terms of geometric and photometric transformations between scenes and their corresponding near duplicates. Finally, we suggest a method to compress the proposed image representation for storage purposes. Experiments show the effectiveness of the proposed near duplicate retrieval technique, which outperforms the original Bags of Visual Phrases approach.









Similar content being viewed by others
Notes
Note that at this stage other encoding methods can be used starting from the aligned vocabulary [7].
We consider a dataset as synthetic when the near duplicates are generated from a set of images (or frames of videos) by using transformations typically available on image manipulation software (e.g., ImageMagick http://www.imagemagick.org), such as colorizing, contrast changing, cropping, despeckling, downsampling, format changing, framing, rotating, scaling, saturation changing, intensity changing, shearing. To generate near duplicates the basic transformations are usually applied changing the different involved parameters and/or making combination of them.
References
Battiato S, Farinella GM, Gallo G, Ravì D (2010) Exploiting textons distributions on spatial hierarchy for scene classification. EURASIP J Image Video Process Article ID 919367:1–13. doi:10.1155/2010/919367
Battiato S, Farinella GM, Messina E, Puglisi G (2012) Robust image alignment for tampering detection. IEEE Trans Inf Forensics Secur 7(4):1105–1117
Battiato S, Farinella GM, Guarnera GC, Meccio T, Puglisi G, Ravì D, Rizzo R (2010) Bags of phrases with codebooks alignment for near duplicate image detection. In: Proceedings of the international acm workshop on multimedia in forensics, security and intelligence (MiFor 2010), in conjunction with international acm multimedia conference, pp 65–70
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Int J Comput Vis Image Understand 110(3):346–359
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 2(4):509–522
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(24):509–521
Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British machine vision conference
Cheng X, Hu Y, Chia L-T (2011) Exploiting local dependencies with spatial-scale space (s-cube) for near-duplicate retrieval. Comput Vis Image Understand 115(6):750–758
Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-hash and tf-idf weighting. In: Proceeding of BMVC
Chum O, Perdoch M, Matas J (2009) Geometric min-hashing: finding a (thick) needle in a haystack. In: IEEE computer society conference on computer vision and pattern recognition, pp 17–24
De Oliveira R, Cherubini M, Oliver N (2010) Looking at near-duplicate videos from a human-centric perspective. ACM Trans Multimedia Comput Commun Appl 6(3):15:1–15:22
Eastlake D, Jones P (2001) RFC 3174. http://tools.ietf.org/html/rfc3174
Freeman W, Adelson E (1991) The design and use of steerable filters. IEEE Trans Pattern Anal Mach Intell 13(9):891–906
Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 1458–1465
Hu Y, Cheng X, Chia L-T, Xie X, Rajan D, Tan A-H (2009) Coherent phrase model for efficient image near-duplicate retrieval. IEEE Trans Multimedia 11(8):1434–1445
Huiskes MJ, Lew MS (2008) The MIR Flickr retrieval evaluation. In: MIR ’08: proceedings of the 2008 ACM International conference on multimedia information retrieval. ACM, New York, NY
Johnson AE, Hebert M (1999) Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans Pattern Analy Mach Intell 21(5):433–449
Jonker R, Volgenant A (1987) A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4):325–340
Ke Y, Sukthankar R, Huston L (2004) Efficient near-duplicate detection and sub-image retrieval. In: Proceeding of ACM multimedia, pp 869–876
Koenderink J, van Doorn A (1987) Representation of local geometry in the visual system. Biol Cybern 55:367–375
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE computer society conference on Computer Vision and Pattern Recognition, CVPR ’06, pp 2169–2178
Lazebnik S, Raginsky M (2009) Supervised learning of quantizer codebooks by information loss minimization. IEEE Trans Pattern Anal Mach Intell 31(7):1294–1309
Lejsek H, ÃormóÃřsdóttir H, Ásmundsson F, DaÃřason K, Jóhannsson ÁÃ, Jónsson BÃ, Amsaleg L (2010) Videntifier forensic: large-scale video identification in practice. In: Proceeding of ACM workshop on multimedia in forensics, security and intelligence, pp 1–6
Leung T, Malik JJ (1999) Recognizing surfaces using three-dimensional textons. In: Proceedings of the IEEE international conference on computer vision, pp 1010–1017
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide-baseline stereo from maximally stable extremal regions. In: Proceedings of the British machine vision conference, pp 384–393
Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vis (IJCV) 60(1):63–86
Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Analy Mach Intell (PAMI) 27(10):1615–1630
Nistèr D, Stewènius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp 2161–2168
Papadimitriou CH, Steiglitz K (1982) Combinatorial optimization: algorithms and complexity. Prentice-Hall, Inc
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the International conference on computer vision and pattern recognition
Rivest RL (1992) RFC 1321. http://tools.ietf.org/html/rfc1321
Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In: Proceedings of the European conference on computer vision, pp 430–443
Rongrong J, Hongxun Y, Wei L, Xiaoshuai S, Tian TQ (2012) Task-dependent visual-codebook compression. IEEE Trans Image Process 21(4):2282–2293
Rongrong J, Duan L-Y, Chen J, Xie L, Yao H, Gao W (2013) Learning to distribute vocabulary indexing for scalable visual search. IEEE Trans Multimedia 15(1):153–166
Saffari A, Bischof H (2007) Clustering in a boosting framework. In: Computer vision winter workshop, pp 75–82
Salton G, McGill M (1983) Introduction to modern information retrieval. McGraw-Hill
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5):513–523
Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering object categories in image collections. In: Proceedings of the international conference on computer vision
Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32
Szeliski R (2010) Computer vision: algorithms and applications. Springer Available at http://szeliski.org/Book
van Gemert LC, Veenman CJ, Smeulders AWM, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32(7):1271–1283
Wang Y, Hou Z, Leman K (2011) Keypoint-based near-duplicate images detection using affine invariant feature and color matching In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), pp 1209–1212
Wu Z, Ke Q, Isard M, Sun J (2009) Bundling features for large scale partial-duplicate web image search. In: Proceedings of the international conference on computer vision and pattern recognition, pp 25–32
Xu D, Chang S-F (2007) Visual event recognition in news video using kernel methods with multi-level temporal alignment. In: Proceeding of IEEE international conference on computer vision and pattern recognition
Xu D, Cham TJ, Yan S, Duan L, Chang S-F (2010) Near duplicate identification with spatially aligned pyramid matching. IEEE Trans Circuits Syst Video Technol (TCSVT) 20(8):1068–1079
Zhang D-Q, Chang S-F (2004) Detecting image near-duplicate by stochastic attributed relational graph matching with learning. In: Proceedings of the ACM multimedia conference, pp 877–884
Zhao W-L, Ngo C-W, Tan H-K, Wu X (2007) Near-duplicate keyframe identification with interest point matching and pattern learning. IEEE Trans Multimedia 9(5):1037–1048
Zhao W-L, Ngo C-W (2009) Scale-rotation invariant pattern entropy for keypoint-based near-duplicate detection. IEEE Trans Image Process 18(2):412–423
Zhao WL, Wu X, Ngo CW (2010) On the annotation of web videos by efficient near-duplicate search. IEEE Trans Multimedia 12(5):448–461
Zhao W-L, Wu X, Ngo C-W (2011) SOTU: a toolkit for efficient near-duplicate image/video & retrieval/detection. Manual for SOTU Version 1.06. http://www.cs.cityu.edu.hk/~wzhao2/sotu.htm
Zhu J, Hoi SC, Lyu MR, Yan S (2008) Near-duplicate keyframe retrieval by nonrigid image matching. In: Proceedings of the ACM multimedia conference, pp 41–50
Acknowledgements
Part of this work has been performed in the project PANORAMA, co-funded by grants from Belgium, Italy, France, the Netherlands, the United Kingdom, and the ENIAC Joint Undertaking. The authors would like to thank Giuseppe Claudio Guarnera, Tony Meccio and Rosetta Rizzo who have given some help at the beginning of this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Battiato, S., Farinella, G.M., Puglisi, G. et al. Aligning codebooks for near duplicate image detection. Multimed Tools Appl 72, 1483–1506 (2014). https://doi.org/10.1007/s11042-013-1470-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1470-4