Skip to main content
Log in

Revisiting Gist-PCA Hashing for Near Duplicate Image Detection

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

This paper presents a scalable method of near duplicate image detection based on Gist-PCA (principal component analysis) hashing. While most of transform coding methods have been interested in nearest neighbor search with applications to similar image search, we solve a range search problems found in near duplicate detection problems. At first, we argue that the PCA hashing of the Gist descriptor is adequate for near duplicate image detection. Then, we decompose the Gist-PCA binary code into a hash key and a residual binary code for scalability into large-scale datasets. In addition, a multi-block approach is incorporated into the method to deal with strong variations, such as image cropping and border framing. Experimental results show that the proposed method is more accurate and faster than the real-valued Gist descriptor and other nearest neighbor search methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

Notes

  1. They include intensity change (brightness_50, dark_50), blurring (blur_11x11), resizing (resize_h3w3), jpeg compression (jpegcomp_10, jpegcomp_15, jpegcomp_20). image cropping (centercrop_10, centercrop_20, leftcrop_10), border framing (border_w10, border_w20, border_b10, border_b20), and watermarking (watermark_s1a4, watermark_s2a5). Notationally, the first string and the following number denote the variation type and degree, respectively. In border type, b and w mean the black and white border frames, respectively. In cropping, the “centercrop” and “leftcrop” differ in the alignment before the cropping, and the next degree is the cropping ratio in terms of image width and height, which is different from the surface cropping ratio provided in Copydays.

References

  1. von Ahn, L., Liu, R., Blum, M.: Peekaboom: a game for locating objects in images. In: Proceedings of the SIGCHI conference on Human Factors in computing systems, CHI ?06, pp. 55-64. ACM, New York, NY, USA (2006).

  2. Baluja, S., Covell, M.: Beyond near duplicates : Learning hash codes for efficient similar-image retrieval. In: 20th International Conference on Pattern Recognition, pp. 543-547. IEEE (2010).

  3. Brandt, J.: Transform coding for fast approximate nearest neighbor search in high dimensions. In: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010, pp. 1815-1822. IEEE (2010).

  4. Chandrasekhar, V., Takacs, G., Chen, D., Tsai, S.S., Singh, J., Girod, B.: Transform coding of image feature descriptors. Proceedings of SPIE 7257, 725,710-725,710-9 (2009).

  5. Chum, O., Philbin, J., Isard, M., Zisserman, A.: Scalable near identical image and shot detection. In: Proceedings of the ACM International Conference on Image and Video Retrieval (2007).

  6. Chum, O., Philbin, J., Zisserman, A.: Near duplicate image detection: min-hash and tf-idf weighting. In: Proceedings of the British Machine Vision Conference (2008).

  7. Douze, M., Jégou, H., Harsimrat, S., Amsaleg, L., Schmid, C.: Evaluation of GIST descriptors for web-scale image search. In: International Conference on Image and Video Retrieval. Santorini, Greece (2009).

  8. Gong, Y., Lazebnik, S.: Iterative quantization: A procrustean approach to learning binary codes. In: CVPR, pp. 817-824 (2011).

  9. Huiskes, M.J., Lew, M.S.: The mir flickr retrieval evaluation. In: MIR ?08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval. ACM, New York, NY, USA (2008).

  10. Hwang, W., Wang, H., Kim, H., Kee, S.C., Kim, J.: Face recognition system using multiple face model of hybrid fourier feature under uncontrolled illumination variation. IEEE Transactions on Image Processing 20(4), 1152-1165 (2011).

  11. Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pages 604613. ACM, 1998.

  12. Isard, M.: Bundling features for large scale partial-duplicate web image search. IEEE Conference on Computer Vision and Pattern Recognition (2009) pp. 25-32 (2009).

  13. Jegou, H., Douze, M., Schmid, C., Prez, P.: Aggregating local descriptors into a compact image representation. In: CVPR, pp. 3304-3311. IEEE (2010).

  14. Ke, Y., Sukthankar, R., Huston, L., Ke, Y., Sukthankar, R.: Efficient near-duplicate detection and sub-image retrieval. In: In ACM Multimedia, pp. 869-876 (2004).

  15. Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Rep, 2009.

  16. Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. Supervised hashing with kernels. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 20742081. IEEE, 2012.

  17. L. Liu, Y. Lu and C.Y. Suen, “Variable-Length Signature for Near-Duplicate Image Matching,” IEEE Transactions on Image Processing, vol. 24, no. 4, pp. 1282-1296, April 2015.

  18. Mohammad Norouzi and David M Blei. Minimal loss hashing for compact binary codes. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 353360, 2011.

  19. Oliva, A., Torralba, A.: Modeling the shape of the scene : A holistic representation of the spatial envelope. International Journal of Computer Vision 42(3), 145-175 (2001).

  20. Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958-1970 (2008).

  21. Torralba, A., Fergus, R., Weiss, Y.: Small codes and large image databases for recognition. In: CVPR (2008).

  22. Wang, B., Li, Z., Li, M., Ma, W.: Large-scale duplicate detection for web image search. In: IEEE International Conference on Multimedia and Expo, pp. 353-356 (2006).

  23. Wang, J., Kumar, O., Chang, S.F.: Semi-supervised hashing for scalable image retrieval. In: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, pp. 3424-3431 (2010).

  24. Wang, X.J., Zhang, L., Liu, M., Li, Y., Ma, W.Y.: Arista - image search to annotation on billions of web photos. In: CVPR, p. 2987-2994 (2010).

  25. Wang, Z., Josephson, W., Lv, Q., Charikar, M., Li, K.: Filtering image spam with near-duplicate detection. In: In Proceedings of the Fourth Conference on Email and AntiSpam, CEAS 2007 (2007).

  26. Weiss, Y., Torralba, A.B., Fergus, R.: Spectral hashing. In: D. Koller, D. Schuurmans, Y. Bengio, L. Bottou (eds.) NIPS, pp. 1753-1760. MIT Press (2008).

  27. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach, Advances in Database Systems, vol. 32. Springer (2006).

  28. Zhang, L., Chen, L., Jing, F., Deng, K., Ma, W.Y.: Enjoyphoto: a vertical image search engine for enjoying high-quality photos. In: 14th Annual ACM International Conference on Multimedia, pp. 367-376 (2006).

  29. L. Zheng, Y. Lei, G. Qiu and J. Huang, “Near-Duplicate Image Detection in a Visually Salient Riemannian Space,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 5, pp. 1578-1593, Oct. 2012.

Download references

Acknowledgements

This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) 2010-0028680 and 2014-003140.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junmo Kim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, H., Sohn, S. & Kim, J. Revisiting Gist-PCA Hashing for Near Duplicate Image Detection. J Sign Process Syst 91, 575–586 (2019). https://doi.org/10.1007/s11265-018-1360-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-018-1360-0

Keywords

Navigation