Skip to main content

Using Redundant Bit Vectors for Near-Duplicate Image Detection

  • Conference paper
Advances in Databases: Concepts, Systems and Applications (DASFAA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4443))

Included in the following conference series:

Abstract

Images are amongst the most widely proliferated form of digital information due to affordable imaging technologies and the Web. In such an environment, the use of digital watermarking for image copyright infringement detection is a challenge. For such tasks, near-duplicate image detection is increasingly attractive due to its ability of automated content analysis; moreover, the application domain also extends to data management. The application of PCA-SIFT features and Locality-Sensitive Hashing (LSH) — for indexing and retrieval — has been shown to be highly effective for this task. In this work, we prune the number of PCA-SIFT features and introduce a modified Redundant Bit Vector (RBV) index. This is the first application of the RBV index that shows near-perfect effectiveness. Using the best parameters of our RBV approach, we observe an average recall and precision of 91% and 98%, respectively, with query response time of under 10 seconds on a collection of 20,000 images. Compared to the baseline (the LSH index), the query response times and index size of the RBV index is 12 times faster and 126 times smaller, respectively. As compared to brute-force sequential scan, the RBV index rapidly reduces the search space to 1/80.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys 33(3), 322–373 (2001)

    Article  Google Scholar 

  2. Corel Corporation: Corel professional photos CD-ROMs (1994)

    Google Scholar 

  3. Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  4. Foo, J.J., Sinha, R.: Pruning SIFT for Scalable Near-duplicate Image Matching. In: Proc. ADC Australian Database Conference (January 2007)

    Google Scholar 

  5. Foo, J.J., Sinha, R., Zobel, J.: Discovery of Image Versions in Large Collections. In: Cham, T.-J., Cai, J., Dorai, C., Rajan, D., Chua, T.-S., Chia, L.-T. (eds.) MMM 2007. LNCS, vol. 4352, pp. 433–442. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proc. VLDB Int. Conf. on Very Large Data Bases, Edinburgh, Scotland, UK, September 1999, pp. 518–529. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  7. Goldstein, J., Platt, J.C., Burges, C.J.C.: Indexing high dimensional rectangles for fast multimedia identification. Technical report, Microsoft Research, Redmond, WA, USA (2003)

    Google Scholar 

  8. Goldstein, J., Plat, J.C., Burges, C.J.C.: Redundant Bit Vectors for Quickly Searching High-Dimensional Regions. In: Winkler, J.R., Niranjan, M., Lawrence, N.D. (eds.) Deterministic and Statistical Methods in Machine Learning. LNCS (LNAI), vol. 3635, pp. 137–158. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Grauman, K., Darrell, T.: Efficient image matching with distributions of local invariant features. In: Proc. CVPR Int. Conf. on Computer Vision and Pattern Recognition, June 2005, pp. 627–634 (2005)

    Google Scholar 

  10. Ke, Y., Sukthankar, R.: PCA-sift: A more distinctive representation for local image descriptors. In: Proc. CVPR Int. Conf. on Computer Vision and Pattern Recognition, Washington, DC, USA, June–July 2004, pp. 506–513. IEEE Computer Society Press, Los Alamitos (2004)

    Google Scholar 

  11. Ke, Y., Sukthankar, R., Huston, L.: An efficient parts-based near-duplicate and sub-image retrieval system. In: Proc. MM Int. Conf. on Multimedia, October 2004, pp. 869–876. ACM Press, New York (2004)

    Google Scholar 

  12. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. Journal of Computer Vision 60(2), 91–110 (2004)

    Article  Google Scholar 

  13. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: Proc. CVPR Int. Conf. on Computer Vision and Pattern Recognition, June 2003, pp. 257–263 (2003)

    Google Scholar 

  14. Qamra, A., Meng, Y., Chang, E.Y.: Enhanced perceptual distance functions and indexing for image replica recognition. IEEE Trans. Pattern Analysis and Machine Intelligence 27(3), 379–391 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ramamohanarao Kotagiri P. Radha Krishna Mukesh Mohania Ekawit Nantajeewarawat

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Foo, J.J., Sinha, R. (2007). Using Redundant Bit Vectors for Near-Duplicate Image Detection. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds) Advances in Databases: Concepts, Systems and Applications. DASFAA 2007. Lecture Notes in Computer Science, vol 4443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71703-4_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71703-4_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71702-7

  • Online ISBN: 978-3-540-71703-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics