Skip to main content

BASIL: Effective Near-Duplicate Image Detection Using Gene Sequence Alignment

  • Conference paper
Advances in Information Retrieval (ECIR 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5993))

Included in the following conference series:

Abstract

Finding near-duplicate images is a task often found in Multimedia Information Retrieval (MIR). Toward this effort, we propose a novel idea by bridging two seemingly unrelated fields – MIR and Biology. That is, we propose to use the popular gene sequence alignment algorithm in Biology, i.e., BLAST, in detecting near-duplicate images. Under the new idea, we study how various image features and gene sequence generation methods (using gene alphabets such as A, C, G, and T in DNA sequences) affect the accuracy and performance of detecting near-duplicate images. Our proposal, termed as BLASTed Image Linkage (BASIL), is empirically validated using various real data sets. This work can be viewed as the “first” step toward bridging MIR and Biology fields in the well-studied near-duplicate image detection problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic Local Alignment Search Tool. J. Mol. Biology 215(3), 403–410 (1990)

    Google Scholar 

  2. Cai, D., He, X., Han, J.: Spectral Regression: A Unified Subspace Learning Framework for Content-Based Image Retrieval. In: ACM Multimedia (2007)

    Google Scholar 

  3. Dong, W., Wang, Z., Charikar, M., Li, K.: Efficiently Matching Sets of Features with Random Histograms. In: ACM Multimedia (2008)

    Google Scholar 

  4. Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F.: Caching Contentbased Queries for Robust and Efficient Image Retrieval. In: EDBT (2009)

    Google Scholar 

  5. Foo, J.J., Zobel, J., Sinha, R.: Clustering near-duplicate images in large collections. In: ACM MIR (2007)

    Google Scholar 

  6. Foo, J.J., Zobel, J., Sinha, R., Tahaghoghi, S.M.M.: Detection of Near-Duplicate Images for Web Search. In: ACM CIVR (2007)

    Google Scholar 

  7. Howarth, P., Rüger, S.M.: Evaluation of Texture Features for Content-Based Image Retrieval. In: ACM CIVR (2004)

    Google Scholar 

  8. Ke, Y., Sukthankar, R., Huston, L.: An Efficient Parts-based Near-Duplicate and Sub-Image Retrieval System. In: ACM Multimedia (2004)

    Google Scholar 

  9. Kim, H., Chang, H., Liu, H., Lee, J., Lee, D.: BIM: Image Matching using Biological Gene Sequence Alignment. In: IEEE Int’l Conf. on Image Processing (ICIP) (November 2009)

    Google Scholar 

  10. Mehta, B., Nangia, S., Gupta, M., Nejdl, W.: Detecting Image Spam using Visual Features and Near Duplicate Detection. In: WWW (2008)

    Google Scholar 

  11. Valle, E., Cord, M., Philipp-Foliguet, S.: High-dimensional Descriptor Indexing for Large Multimedia Databases. In: ACM CIKM (2008)

    Google Scholar 

  12. Wu, X., Hauptmann, A.G., Ngo, C.-W.: Practical Elimination of Near-Duplicates from Web Video Search. In: ACM Multimedia (2007)

    Google Scholar 

  13. Zhang, D.-Q., Chang, S.-F.: Detecting Image Near-Duplicate by Stochastic Attributed Relational Graph Matching with Learning. In: ACM Multimedia, October 2004, pp. 877–884 (2004)

    Google Scholar 

  14. Zhao, W.-L., Ngo, C.-W., Tan, H.-K., Wu, X.: Near-Duplicate Keyframe Identification with Interest Point Matching and Pattern Learning. IEEE Trans. On Multimedia 9, 1037–1048 (2007)

    Article  Google Scholar 

  15. Zheng, Y.-T., Neo, S.-Y., Chua, T.-S., Tian, Q.: The Use of Temporal, Semantic and Visual Partitioning Model for Efficient Near-Duplicate Keyframe Detection in Large Scale News Corpus. In: ACM CIVR (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, Hs., Chang, HW., Lee, J., Lee, D. (2010). BASIL: Effective Near-Duplicate Image Detection Using Gene Sequence Alignment. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12275-0_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12274-3

  • Online ISBN: 978-3-642-12275-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics