ABSTRACT
This paper proposes and compares two novel schemes for near duplicate image and video-shot detection. The first approach is based on global hierarchical colour histograms, using Locality Sensitive Hashing for fast retrieval. The second approach uses local feature descriptors (SIFT) and for retrieval exploits techniques used in the information retrieval community to compute approximate set intersections between documents using a min-Hash algorithm.
The requirements for near-duplicate images vary according to the application, and we address two types of near duplicate definition: (i) being perceptually identical (e.g. up to noise, discretization effects, small photometric distortions etc); and (ii) being images of the same 3D scene (so allowing for viewpoint changes and partial occlusion). We define two shots to be near-duplicates if they share a large percentage of near-duplicate frames.
We focus primarily on scalability to very large image and video databases, where fast query processing is necessary. Both methods are designed so that only a small amount of data need be stored for each image. In the case of near-duplicate shot detection it is shown that a weak approximation to histogram matching, consuming substantially less storage, is sufficient for good results. We demonstrate our methods on the TRECVID 2006 data set which contains approximately 165 hours of video (about 17.8M frames with 146K key frames), and also on feature films and pop videos.
- M. Bertini, A. D. Bimbo, and W. Nunziati. Video clip matching using mpeg-7 descriptors and edit distance. In CIVR, pages 133--142, 2006. Google ScholarDigital Library
- A. Broder. On the resemblance and containment of documents. In SEQS: Sequences '91, 1998. Google ScholarDigital Library
- M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SCG, pages 253--262, 2004. Google ScholarDigital Library
- J. Geusebroek, R. van den Boomgaard, A. Smeulders, and H. Geerts. Color invariance. PAMI, 23(12):1338--1350, 2001. Google ScholarDigital Library
- M. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR '06, pages 284--291, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- T. C. Hoad and J. Zobel. Fast video matching with signature alignment. In MIR, pages 262--269, 2003. Google ScholarDigital Library
- P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In IEEE Symposium on Foundations of CS, 2000. Google ScholarDigital Library
- A. Joly, O. Buisson, and C. Frélicot. Content-based copy detection using distortion-based probabilistic similarity search. IEEE Transactions on Multimedia, to appear, 2007. Google ScholarDigital Library
- A. Joly, C. Frelicot, and O. Buisson. Robust content-based video copy identification in a large reference database. In Proc. CIVR, 2003. Google ScholarDigital Library
- Y. Ke, R. Sukthankar, and L. Huston. Efficient near-duplicate detection and sub-image retrieval. In ACM Multimedia, 2004. Google ScholarDigital Library
- D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004. Google ScholarDigital Library
- K. Mikolajczyk and C. Schmid. An affine invariant interest point detector. In Proc. ECCV. Springer-Verlag, 2002. Google ScholarDigital Library
- K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. In Proc. CVPR, 2003.Google ScholarCross Ref
- K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. A comparison of affine region detectors. IJCV, 65(1/2):43--72, 2005. Google ScholarDigital Library
- D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In Proc. CVPR, 2006. Google ScholarDigital Library
- T. Quack, V. Ferrari, and L. Van Gool. Video mining with frequent itemset configurations. In Proc. CIVR, 2006. Google ScholarDigital Library
- F. Schaffalitzky and A. Zisserman. Multi-view matching for unordered image sets, or "How do I organize my holiday snaps?". In Proc. ECCV, 2002. Google ScholarDigital Library
- F. Schaffalitzky and A. Zisserman. Automated location matching in movies. CVIU, 92:236--264, 2003. Google ScholarDigital Library
- J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In Proc. ICCV, 2003. Google ScholarDigital Library
- TRECVID. http://trecvid.nist.gov/.Google Scholar
- Wikipedia. Come into my world. http://en.wikipedia.org/wiki/Come_Into_My_World.Google Scholar
- YouTube. http://www.youtube.com/.Google Scholar
- D. Zhang and S. Chang. Detecting image near-duplicate by stochastic attributed relational graph matching with learning. In ACM Multimedia, 2004. Google ScholarDigital Library
- J. Zhou and X.-P. Zhang. Automatic identification of digital video based on shot-level sequence matching. In ACM MM, pages 515--518, 2005. Google ScholarDigital Library
Index Terms
- Scalable near identical image and shot detection
Recommendations
A Scalable Content-based Image Retrieval Scheme Using Locality-sensitive Hashing
CINC '09: Proceedings of the 2009 International Conference on Computational Intelligence and Natural Computing - Volume 01To develop a fast solution for indexing high-dimensional image contents, which is crucial to building large-scale CBIR systems, is one key challenge in content-based image retrieval (CBIR). In this paper, we propose a scalable content-based image ...
A method using locality-sensitive hashing for large-scale content-based image retrieval
CCDC'09: Proceedings of the 21st annual international conference on Chinese control and decision conferenceTo develop a fast solution for indexing high-dimensional image contents, which is crucial to building large-scale CBIR systems, is one key challenge in content-based image retrieval(CBIR). In this paper, we propose a scalable content-based image ...
Efficient content-based image retrieval using automatic feature selection
ISCV '95: Proceedings of the International Symposium on Computer VisionWe describe a self-organizing framework for content-based retrieval of images from large image databases at the object recognition level. The system uses the theories of optimal projection for optimal feature selection and a hierarchical image database ...
Comments