Revisiting Gist-PCA Hashing for Near Duplicate Image Detection

Kim, Hyunwoo; Sohn, SungRyull; Kim, Junmo

doi:10.1007/s11265-018-1360-0

Revisiting Gist-PCA Hashing for Near Duplicate Image Detection

Published: 09 May 2018

Volume 91, pages 575–586, (2019)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Hyunwoo Kim¹,
SungRyull Sohn² &
Junmo Kim²

388 Accesses
5 Citations
Explore all metrics

Abstract

This paper presents a scalable method of near duplicate image detection based on Gist-PCA (principal component analysis) hashing. While most of transform coding methods have been interested in nearest neighbor search with applications to similar image search, we solve a range search problems found in near duplicate detection problems. At first, we argue that the PCA hashing of the Gist descriptor is adequate for near duplicate image detection. Then, we decompose the Gist-PCA binary code into a hash key and a residual binary code for scalability into large-scale datasets. In addition, a multi-block approach is incorporated into the method to deal with strong variations, such as image cropping and border framing. Experimental results show that the proposed method is more accurate and faster than the real-valued Gist descriptor and other nearest neighbor search methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient near-duplicate image detection with a local-based binary representation

Article 30 January 2015

Partial Near-Duplicate Detection in Random Images by a Combination of Detectors

A Review on Near-Duplicate Detection of Images using Computer Vision Techniques

Article 06 January 2020

Notes

They include intensity change (brightness_50, dark_50), blurring (blur_11x11), resizing (resize_h3w3), jpeg compression (jpegcomp_10, jpegcomp_15, jpegcomp_20). image cropping (centercrop_10, centercrop_20, leftcrop_10), border framing (border_w10, border_w20, border_b10, border_b20), and watermarking (watermark_s1a4, watermark_s2a5). Notationally, the first string and the following number denote the variation type and degree, respectively. In border type, b and w mean the black and white border frames, respectively. In cropping, the “centercrop” and “leftcrop” differ in the alignment before the cropping, and the next degree is the cropping ratio in terms of image width and height, which is different from the surface cropping ratio provided in Copydays.

References

von Ahn, L., Liu, R., Blum, M.: Peekaboom: a game for locating objects in images. In: Proceedings of the SIGCHI conference on Human Factors in computing systems, CHI ?06, pp. 55-64. ACM, New York, NY, USA (2006).
Baluja, S., Covell, M.: Beyond near duplicates : Learning hash codes for efficient similar-image retrieval. In: 20th International Conference on Pattern Recognition, pp. 543-547. IEEE (2010).
Brandt, J.: Transform coding for fast approximate nearest neighbor search in high dimensions. In: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010, pp. 1815-1822. IEEE (2010).
Chandrasekhar, V., Takacs, G., Chen, D., Tsai, S.S., Singh, J., Girod, B.: Transform coding of image feature descriptors. Proceedings of SPIE 7257, 725,710-725,710-9 (2009).
Chum, O., Philbin, J., Isard, M., Zisserman, A.: Scalable near identical image and shot detection. In: Proceedings of the ACM International Conference on Image and Video Retrieval (2007).
Chum, O., Philbin, J., Zisserman, A.: Near duplicate image detection: min-hash and tf-idf weighting. In: Proceedings of the British Machine Vision Conference (2008).
Douze, M., Jégou, H., Harsimrat, S., Amsaleg, L., Schmid, C.: Evaluation of GIST descriptors for web-scale image search. In: International Conference on Image and Video Retrieval. Santorini, Greece (2009).
Gong, Y., Lazebnik, S.: Iterative quantization: A procrustean approach to learning binary codes. In: CVPR, pp. 817-824 (2011).
Huiskes, M.J., Lew, M.S.: The mir flickr retrieval evaluation. In: MIR ?08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval. ACM, New York, NY, USA (2008).
Hwang, W., Wang, H., Kim, H., Kee, S.C., Kim, J.: Face recognition system using multiple face model of hybrid fourier feature under uncontrolled illumination variation. IEEE Transactions on Image Processing 20(4), 1152-1165 (2011).
Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pages 604613. ACM, 1998.
Isard, M.: Bundling features for large scale partial-duplicate web image search. IEEE Conference on Computer Vision and Pattern Recognition (2009) pp. 25-32 (2009).
Jegou, H., Douze, M., Schmid, C., Prez, P.: Aggregating local descriptors into a compact image representation. In: CVPR, pp. 3304-3311. IEEE (2010).
Ke, Y., Sukthankar, R., Huston, L., Ke, Y., Sukthankar, R.: Efficient near-duplicate detection and sub-image retrieval. In: In ACM Multimedia, pp. 869-876 (2004).
Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Rep, 2009.
Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. Supervised hashing with kernels. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 20742081. IEEE, 2012.
L. Liu, Y. Lu and C.Y. Suen, “Variable-Length Signature for Near-Duplicate Image Matching,” IEEE Transactions on Image Processing, vol. 24, no. 4, pp. 1282-1296, April 2015.
Mohammad Norouzi and David M Blei. Minimal loss hashing for compact binary codes. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 353360, 2011.
Oliva, A., Torralba, A.: Modeling the shape of the scene : A holistic representation of the spatial envelope. International Journal of Computer Vision 42(3), 145-175 (2001).
Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958-1970 (2008).
Torralba, A., Fergus, R., Weiss, Y.: Small codes and large image databases for recognition. In: CVPR (2008).
Wang, B., Li, Z., Li, M., Ma, W.: Large-scale duplicate detection for web image search. In: IEEE International Conference on Multimedia and Expo, pp. 353-356 (2006).
Wang, J., Kumar, O., Chang, S.F.: Semi-supervised hashing for scalable image retrieval. In: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, pp. 3424-3431 (2010).
Wang, X.J., Zhang, L., Liu, M., Li, Y., Ma, W.Y.: Arista - image search to annotation on billions of web photos. In: CVPR, p. 2987-2994 (2010).
Wang, Z., Josephson, W., Lv, Q., Charikar, M., Li, K.: Filtering image spam with near-duplicate detection. In: In Proceedings of the Fourth Conference on Email and AntiSpam, CEAS 2007 (2007).
Weiss, Y., Torralba, A.B., Fergus, R.: Spectral hashing. In: D. Koller, D. Schuurmans, Y. Bengio, L. Bottou (eds.) NIPS, pp. 1753-1760. MIT Press (2008).
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach, Advances in Database Systems, vol. 32. Springer (2006).
Zhang, L., Chen, L., Jing, F., Deng, K., Ma, W.Y.: Enjoyphoto: a vertical image search engine for enjoying high-quality photos. In: 14th Annual ACM International Conference on Multimedia, pp. 367-376 (2006).
L. Zheng, Y. Lei, G. Qiu and J. Huang, “Near-Duplicate Image Detection in a Visually Salient Riemannian Space,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 5, pp. 1578-1593, Oct. 2012.

Download references

Acknowledgements

This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) 2010-0028680 and 2014-003140.

Author information

Authors and Affiliations

Advanced Innovation Center for Intelligent Robots and Systems, Beijing Institute of Technology, Beijing, China
Hyunwoo Kim
EE Department, KAIST, Daejeon, Korea
SungRyull Sohn & Junmo Kim

Authors

Hyunwoo Kim
View author publications
You can also search for this author in PubMed Google Scholar
SungRyull Sohn
View author publications
You can also search for this author in PubMed Google Scholar
Junmo Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junmo Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, H., Sohn, S. & Kim, J. Revisiting Gist-PCA Hashing for Near Duplicate Image Detection. J Sign Process Syst 91, 575–586 (2019). https://doi.org/10.1007/s11265-018-1360-0

Download citation

Received: 17 December 2015
Revised: 21 December 2017
Accepted: 26 March 2018
Published: 09 May 2018
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s11265-018-1360-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Revisiting Gist-PCA Hashing for Near Duplicate Image Detection

Abstract

Access this article

Similar content being viewed by others

Efficient near-duplicate image detection with a local-based binary representation

Partial Near-Duplicate Detection in Random Images by a Combination of Detectors

A Review on Near-Duplicate Detection of Images using Computer Vision Techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Revisiting Gist-PCA Hashing for Near Duplicate Image Detection

Abstract

Access this article

Similar content being viewed by others

Efficient near-duplicate image detection with a local-based binary representation

Partial Near-Duplicate Detection in Random Images by a Combination of Detectors

A Review on Near-Duplicate Detection of Images using Computer Vision Techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation