BASIL: Effective Near-Duplicate Image Detection Using Gene Sequence Alignment

Kim, Hung-sik; Chang, Hau-Wen; Lee, Jeongkyu; Lee, Dongwon

doi:10.1007/978-3-642-12275-0_22

Hung-sik Kim²⁴,
Hau-Wen Chang²⁴,
Jeongkyu Lee²⁵ &
…
Dongwon Lee²⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5993))

Included in the following conference series:

European Conference on Information Retrieval

2192 Accesses
10 Citations

Abstract

Finding near-duplicate images is a task often found in Multimedia Information Retrieval (MIR). Toward this effort, we propose a novel idea by bridging two seemingly unrelated fields – MIR and Biology. That is, we propose to use the popular gene sequence alignment algorithm in Biology, i.e., BLAST, in detecting near-duplicate images. Under the new idea, we study how various image features and gene sequence generation methods (using gene alphabets such as A, C, G, and T in DNA sequences) affect the accuracy and performance of detecting near-duplicate images. Our proposal, termed as BLASTed Image Linkage (BASIL), is empirically validated using various real data sets. This work can be viewed as the “first” step toward bridging MIR and Biology fields in the well-studied near-duplicate image detection problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic Local Alignment Search Tool. J. Mol. Biology 215(3), 403–410 (1990)
Google Scholar
Cai, D., He, X., Han, J.: Spectral Regression: A Unified Subspace Learning Framework for Content-Based Image Retrieval. In: ACM Multimedia (2007)
Google Scholar
Dong, W., Wang, Z., Charikar, M., Li, K.: Efficiently Matching Sets of Features with Random Histograms. In: ACM Multimedia (2008)
Google Scholar
Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F.: Caching Contentbased Queries for Robust and Efficient Image Retrieval. In: EDBT (2009)
Google Scholar
Foo, J.J., Zobel, J., Sinha, R.: Clustering near-duplicate images in large collections. In: ACM MIR (2007)
Google Scholar
Foo, J.J., Zobel, J., Sinha, R., Tahaghoghi, S.M.M.: Detection of Near-Duplicate Images for Web Search. In: ACM CIVR (2007)
Google Scholar
Howarth, P., Rüger, S.M.: Evaluation of Texture Features for Content-Based Image Retrieval. In: ACM CIVR (2004)
Google Scholar
Ke, Y., Sukthankar, R., Huston, L.: An Efficient Parts-based Near-Duplicate and Sub-Image Retrieval System. In: ACM Multimedia (2004)
Google Scholar
Kim, H., Chang, H., Liu, H., Lee, J., Lee, D.: BIM: Image Matching using Biological Gene Sequence Alignment. In: IEEE Int’l Conf. on Image Processing (ICIP) (November 2009)
Google Scholar
Mehta, B., Nangia, S., Gupta, M., Nejdl, W.: Detecting Image Spam using Visual Features and Near Duplicate Detection. In: WWW (2008)
Google Scholar
Valle, E., Cord, M., Philipp-Foliguet, S.: High-dimensional Descriptor Indexing for Large Multimedia Databases. In: ACM CIKM (2008)
Google Scholar
Wu, X., Hauptmann, A.G., Ngo, C.-W.: Practical Elimination of Near-Duplicates from Web Video Search. In: ACM Multimedia (2007)
Google Scholar
Zhang, D.-Q., Chang, S.-F.: Detecting Image Near-Duplicate by Stochastic Attributed Relational Graph Matching with Learning. In: ACM Multimedia, October 2004, pp. 877–884 (2004)
Google Scholar
Zhao, W.-L., Ngo, C.-W., Tan, H.-K., Wu, X.: Near-Duplicate Keyframe Identification with Interest Point Matching and Pattern Learning. IEEE Trans. On Multimedia 9, 1037–1048 (2007)
Article Google Scholar
Zheng, Y.-T., Neo, S.-Y., Chua, T.-S., Tian, Q.: The Use of Temporal, Semantic and Visual Partitioning Model for Efficient Near-Duplicate Keyframe Detection in Large Scale News Corpus. In: ACM CIVR (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering, Penn State University, USA
Hung-sik Kim & Hau-Wen Chang
Computer Science and Engineering, University of Bridgeport, USA
Jeongkyu Lee
College of Information Sciences and Technology, Penn State University, USA
Dongwon Lee

Authors

Hung-sik Kim
View author publications
You can also search for this author in PubMed Google Scholar
Hau-Wen Chang
View author publications
You can also search for this author in PubMed Google Scholar
Jeongkyu Lee
View author publications
You can also search for this author in PubMed Google Scholar
Dongwon Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Adaptive Information Cluster, Dublin City University, Dublin, 9, Ireland
Cathal Gurrin
The Open University, Walton Hall, MK7 6HF, Milton Keynes, UK
Yulan He
Microsoft Research Ltd, 7 JJ Thomson Avenue, CB3 0FB, Cambridge, UK
Gabriella Kazai
Department of Computer Science, University of Essex, Wivenhoe Park, CO4 3SQ, Colchester, UK
Udo Kruschwitz
The Open University, Walton Hall, Milton Keynes, UK
Suzanne Little
University of London, London, UK
Thomas Roelleke
Knowledge Media Institute, The Open University, MK7 6AA, Milton Keynes, UK
Stefan Rüger
Department of Computing Science, University of Glasgow, 17 Lilybank Gardens, G12 8QQ, Glasgow, UK
Keith van Rijsbergen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, Hs., Chang, HW., Lee, J., Lee, D. (2010). BASIL: Effective Near-Duplicate Image Detection Using Gene Sequence Alignment. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-12275-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12274-3
Online ISBN: 978-3-642-12275-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics