Abstract
Image collections may contain multiple copies, versions, and fragments of the same image. Storage or retrieval of such duplicates and near-duplicates may be unnecessary and, in the context of collections derived from the web, their presence may represent infringements of copyright. However, identifying image versions is a challenging problem, as they can be subject to a wide range of digital alterations, and is potentially costly as the number of image pairs to be considered is quadratic in collection size. In this paper, we propose a method for finding the pairs of near-duplicates based on manipulation of an image index. Our approach is an adaptation of a robust object recognition technique and a near-duplicate document detection algorithm to this application domain. We show that this method requires only moderate computing resources, and is highly effective at identifying pairs of near-duplicates.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bernstein, Y., Zobel, J.: A scalable system for identifying co-derivative documents. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 55–67. Springer, Heidelberg (2004)
Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Computer Networks 29(8-13), 1157–1166 (1997)
Chang, E., Wang, J.Z., Wiederhold, G.: RIME: A replicated image detector for the world-wide web. In: Proc. SPIE Int. Conf. on Multimedia Storage and Archiving Systems III (1998)
Foo, J.J., Sinha, R.: Pruning SIFT for scalable near-duplicate image matching. In: Proc. ADC Australasian Database Conference (February 2007)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proc. VLDB Int. Conf. on Very Large Data Bases, Edinburgh, Scotland UK, September 1999, pp. 518–529. Morgan Kaufmann, San Francisco (1999)
Hartung, F., Kutter, M.: Multimedia watermarking techniques. Proceedings IEEE (USA) 87(7), 1079–1107 (1999)
Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proc. STOC Int. Conf. on Theory of Computing, Dallas, Texas, USA, May 1998, pp. 604–613. ACM Press, New York (1998)
Jaimes, A., Chang, S.-F., Loui, A.C.: Duplicate detection in consumer photography and news video. In: Proc. MM Int. Conf. on Multimedia, pp. 423–424 (2002)
Johnson, N.F., Duric, Z., Jajodia, S.: On “Fingerprinting” images for recognition. In: Proc. MIS Int. Workshop on Multimedia Information Systems., Indian Wells, California, October 1999, pp. 4–11(1999)
Joho, H., Sanderson, M.: The spirit collection: an overview of a large web collection. SIGIR Forum 38(2), 57–61 (2004)
Kang, X., Huang, J., Shi, Y.Q.: An image watermarking algorithm robust to geometric distortion. In: Proc. IWDW Int. Workshop on Digital Watermarking, Seoul, Korea, November 2002, pp. 212–223. Springer, Heidelberg (2002)
Kang, X., Huang, J., Shi, Y.Q., Lin, Y.: A DWT-DFT composite watermarking scheme robust to both affine transform and JPEG compression. IEEE Trans. Circuits and Systems for Video Technology 13(8), 776–786 (2003)
Ke, Y., Sukthankar, R.: PCA-sift: A more distinctive representation for local image descriptors. In: Proc. CVPR Int. Conf. on Computer Vision and Pattern Recognition, Washington, DC, USA, June–July 2004, pp. 506–513. IEEE Computer Society Press, Los Alamitos (2004)
Ke, Y., Sukthankar, R., Huston, L.: An efficient parts-based near-duplicate and sub-image retrieval system. In: Proc. MM Int. Conf. on Multimedia, October 2004, pp. 869–876. ACM Press, New York (2004)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. Journal of Computer Vision 60(2), 91–110 (2004)
Lu, C.-S., Hsu, C.-Y.: Geometric distortion-resilient image hashing scheme and its applications on copy detection and authentication. Multimedia Systems 11(2), 159–173 (2005)
Luo, J., Nascimento, M.A.: Content based sub-image retrieval via hierarchical tree matching. In: Proc. MMDB Int. Workshop on Multimedia Databases, November 2003, pp. 63–69 (2003)
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. Int. Journal of Computer Vision 60(1), 63–86 (2004)
Qamra, A., Meng, Y., Chang, E.Y.: Enhanced perceptual distance functions and indexing for image replica recognition. IEEE Trans. Pattern Analysis and Machine Intelligence 27(3), 379–391 (2005)
Sebe, N., Lew, M.S., Huijsmans, D.P.: Multi-scale sub-image search. In: Proc. MM Int. Conf. on Multimedia, Orlando, FL, USA, October–November 1999, pp. 79–82. ACM Press, New York (1999)
Shivakumar, N., Garcia-Molina, H.: Finding near-replicas of documents and servers on the web. In: Proc. WebDB Int. Workshop on World Wide Web and Databases, March 1998, pp. 204–212 (1998)
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans on Pattern Analysis and Machine Intelligence 22(12), 1349–1380 (2000)
Zhang, D., Chang, S.-F.: Detecting image near-duplicate by stochastic attributed relational graph matching with learning. In: Proc. MM Int. Conf. on Multimedia, October 2004, pp. 877–884 (2004)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys (June 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Foo, J.J., Sinha, R., Zobel, J. (2006). Discovery of Image Versions in Large Collections. In: Cham, TJ., Cai, J., Dorai, C., Rajan, D., Chua, TS., Chia, LT. (eds) Advances in Multimedia Modeling. MMM 2007. Lecture Notes in Computer Science, vol 4352. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69429-8_44
Download citation
DOI: https://doi.org/10.1007/978-3-540-69429-8_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69428-1
Online ISBN: 978-3-540-69429-8
eBook Packages: Computer ScienceComputer Science (R0)