Multi-order visual phrase for scalable partial-duplicate visual search

Zhang, Shiliang; Tian, Qi; Huang, Qingming; Gao, Wen; Rui, Yong

doi:10.1007/s00530-014-0369-x

Multi-order visual phrase for scalable partial-duplicate visual search

Special Issue Paper
Published: 05 April 2014

Volume 21, pages 229–241, (2015)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Shiliang Zhang¹,
Qi Tian¹,
Qingming Huang²,
Wen Gao³ &
…
Yong Rui⁴

430 Accesses
2 Citations
Explore all metrics

Abstract

Visual phrase considers multiple visual words and captures extra spatial clues among them. Thus, visual phrase shows better discriminative power than single visual word in image retrieval and matching. Not withstanding their success, existing visual phrases still show obvious shortcomings: (1) limited flexibility, i.e., visual phrases are considered for matching only if they contain the same number of visual words; (2) large quantization error and low repeatability, i.e., quantization errors in visual words are aggregated in visual word combinations and visual phrases, making them harder to be matched than single visual words. To avoid these issues, we propose multi-order visual phrase (MVP) which contains two complementary clues: center visual word quantized from the local descriptor of each image keypoint and the visual and spatial clues of multiple nearby keypoints. Two MVPs are flexibly matched by first matching their center visual words, then estimating a match confidence by checking the spatial and visual consistency of their neighbor keypoints. Therefore, center visual word matching equals to traditional visual word matching, but the neighbor spatial and visual clues checking significantly boosts the discriminative power. MVP does not scarify the repeatability of single visual word and is more robust to quantization error than existing visual phrases. We test our approach in three image retrieval tasks on UKbench, Oxford5K, and 1 million distractor images collected from Flickr. Comparisons with recent retrieval approaches and existing visual phrase features clearly demonstrate the competitive accuracy and significantly better efficiency of MVP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Article 15 September 2023

References

Bao, B., Zhu, G., Shen, J., Yan, S.: Robust image analysis with sparse representation on quantized visual features. IEEE Trans. Image Process. 22(3), 860–871 (2013)
Article MathSciNet Google Scholar
Bay, H., Tuytelaars, T., Gool, L.V.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Article Google Scholar
Brown, M., Lowe, D.: Unsupervised 3D object recognition and reconstruction in unordered datasets. In: IEEE International Conference on 3-D Digital Imaging and Modeling, pp. 56-63. Ottawa, Ontario, Canada (2005)
Brown, M., Loww, D.G.: Automatic panoramic image stitching using invariant features. Int. J. Comput. Vis. 74(1), 59–73 (2007)
Article Google Scholar
Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–391 (1981)
Article MathSciNet Google Scholar
Jégou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: European Conference on Computer Vision. Marseille, France, pp. 304–317 (2008)
Jégou, H., Douze, M., Schmid, C.: Improving bag-of-feature for large scale image search. Int. J. Comput. Vis. 87(3), 316–336 (2010)
Article Google Scholar
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptor into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition (2010)
Juan, L., Gwun, O.: A comparison of SIFT, PCA-SIFT and SURF. Int J Image Processing 3(4), 143–152 (2009)
Google Scholar
Ke, Y., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image descriptors. Comput. Vis. Pattern Recognit. 2, II-506 (2004)
Google Scholar
Ke, Y., Sukthankar, R., Huston, L.: Efficient near-duplicated detection and sub-image retrieval. In: ACM Multimedia. New York City, pp. 10–16 (2004)
Levin, A., Zomet, A., Peleg, S., Weiss. Y.: Seamless image stitching in the gradient domain. In: European Conference on Computer Vision, pp. 377–389. Berlin, Heidelberg (2004)
Liu, D., Hua, G., Viola, P., Chen, T.: Integrated feature selection and higher-order spatial feature extraction for object categorization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Lowe, D.G.: Distinctive image features from scale invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference, pp. 384–391. Cardiff, UK (2002)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)
Article Google Scholar
Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: IEEE Conference on Computer Vision and Pattern Recognition, New York City, NY, pp. 17–22 (2006)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, pp. 17–22 (2007)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an effcient alternative to SIFTor SURF. In: ICCV, pp. 2564–2571. Barcelona, Spain (2011)
Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlatons. IEEE Conf. Comput. Visi. Pattern Recognit. 2, 2033–2040 (2006)
Google Scholar
Shen, X., Lin, Z., Brandt, J., Avidan, S., Wu, Y.: Object retrieval and localization with spatially-constrained similarity measure and k-NN reranking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3013–3020. Providence, Rhode Island, USA (2012)
Shum, H.Y., Szeliski, R.: Systems and experiment paper: construction of panoramic image mosaics with global and local alignment. Int. J. Comput. Vis. 36(2), 101–130 (2000)
Article Google Scholar
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision. Nice, France (2003)
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)
Article Google Scholar
Wang, B., Li, Z., Li, M., Ma, W.Y.: Large-scale duplicate detection for web image search. In: IEEE International Conference on Multimedia and Expo, pp. 353–356. Toronto, Ontario, Canada (2006)
Wang, M., Li, G., Lu, Z., Gao, Y., Chua, T.-S.: When amazon meets google: product visualization by exploring multiple web sources. ACM Trans. Internet Technol 12(4), 12 (2013)
Article Google Scholar
Wang, M., Li, H., Tao, D., Lu, K., Wu, X.: Multimodal graph-based reranking for web image search. IEEE Trans. Image Process. 21(11), 4649–4661 (2012)
Article MathSciNet Google Scholar
Wang, M., Yang, K., Hua, X., Zhang, H.: Towards a relevant and diverse search of social images. IEEE Trans. Multimed. 12(8), 829–842 (2010)
Article Google Scholar
Wang, X., Yang, M., Cour, T., Zhu, S., Yu, K., Han, T.X.: Contextual weighting for vocabulary tree based image retrieval. In: Internationall Conference on Computer Vision, pp. 6–13. Barcelona, Spain (2011)
Wu, Z., Ke, Q., Isard, M., Sun, J.: Bundling feature for large scale partial-duplicated web image search. In: IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL (2009)
Yang, J., Yu, K., Gong, Y., Huang, T. : Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1794–1801. Miami, Florida, USA (2009)
Zhang, S., Huang, Q., Hua, G., Jiang, S., Gao, W., Tian, Q.: Building contextual visual vocabulary for large-scale image applications. In: ACM Multimedia. Florence, Italy (2010)
Zhang, S., Huang, Q., Lu, Y., Gao, W., Tian, Q. : Building pair-wise visual word tree for efficient image re-ranking. In: ICASSP, pp. 794–797. Dallas, Texas, USA (2010)
Zhang, S., Tian, Q., Hua, G., Huang, Q., Li, S.: Descriptive visual words and visual phrases for image applications. In: ACM Multimedia. Beijing, China (2009)
Zhang, S., Tian, Q., Lu, K., Huang, Q., Gao, W.: Edge-SIFT: discriminative binary descriptor for scalable partial-duplicate mobile search. IEEE Trans. Image Process. 22(7), 2889–2902 (2013)
Article Google Scholar
Zhang, S., Yang, M., Wang, X., Lin, Y., Tian, Q.: Semantic-aware co-indexing for image retrieval. In: IEEE International Conference on Computer Vision, Sydney, Australia (2013)
Zhang, Y., Jia, Z., Chen, T.: Image retrieval with geometry-preserving visual phrases. In: IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA (2011)
Zheng, Y.-T., Zhao, M., Neo, S.-Y., Chua, T.-S., Tian, Q.: Visual synset: towards a higher-level visual representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. Anchorage, Alaska, USA (2008)
Zhou, W., Li, H., Lu, Y., Tian, Q.: Large scale image search with geometric coding. In: ACM Multimedia. Arizona, USA (2011)

Download references

Acknowledgments

This work was supported in part to Dr. Qi Tian by ARO grant W911NF-12-1-0057, Faculty Research Awards by NEC Laboratories of America, and 2012 UTSA START-R Research Award respectively. This work was supported in part by National Science Foundation of China (NSFC) 61128007. This work was supported in part by National Basic Research Program of China (973 Program): 2012CB316400, in part by National Natural Science Foundation of China: 61025011 and 61332016.

Author information

Authors and Affiliations

Department of Computer Science, University of Texas at San Antonio, San Antonio, TX, 78249, USA
Shiliang Zhang & Qi Tian
University of Chinese Academy of Sciences, Beijing, 100049, China
Qingming Huang
Peking University, No. 5, Yiheyuan Road, Beijing, 100871, China
Wen Gao
Microsoft Research Asia, Beijing, 100080, China
Yong Rui

Authors

Shiliang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qi Tian
View author publications
You can also search for this author in PubMed Google Scholar
Qingming Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wen Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yong Rui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Tian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, S., Tian, Q., Huang, Q. et al. Multi-order visual phrase for scalable partial-duplicate visual search. Multimedia Systems 21, 229–241 (2015). https://doi.org/10.1007/s00530-014-0369-x

Download citation

Published: 05 April 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s00530-014-0369-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-order visual phrase for scalable partial-duplicate visual search

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-order visual phrase for scalable partial-duplicate visual search

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation