Abstract
Spatial verification methods permit geometrically stable image matching, but still involve a difficult trade-off between robustness as regards incorrect rejection of true correspondences and discriminative power in terms of mismatches. To address this issue, we ask whether an ensemble of weak geometric constraints that correlates with visual similarity only slightly better than a bag-of-visual-words model performs better than a single strong constraint. We consider a family of spatial verification methods and decompose them into fundamental constraints imposed on pairs of feature correspondences. Encompassing such constraints leads us to propose a new method, which takes the best of existing techniques and functions as a unified Ensemble of pAirwise GEometric Relations (EAGER), in terms of both spatial contexts and between-image transformations. We also introduce a novel and robust reranking method, in which the object instances localized by EAGER in high-ranked database images are reissued as new queries. EAGER is extended to develop a smoothness constraint where the similarity between the optimized ranking scores of two instances should be maximally consistent with their geometrically constrained similarity. Reranking is newly formulated as two label propagation problems: one is to assess the confidence of new queries and the other to aggregate new independently executed retrievals. Extensive experiments conducted on four datasets show that EAGER and our reranking method outperform most of their state-of-the-art counterparts, especially when large-scale visual vocabularies are used.














We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., & Sivic, J. (2016). NetVLAD: CNN architecture for weakly supervised place recognition. In CVPR.
Arandjelovic, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In CVPR, pp 2911–2918.
Avrithis, Y. S., & Tolias, G. (2014). Hough pyramid matching: Speeded-up geometry re-ranking for large scale image retrieval. International Journal of Computer Vision, 107(1), 1–19.
Babenko, A., & Lempitsky, V. S. (2015). Aggregating local deep features for image retrieval. In ICCV, pp. 1269–1277.
Babenko, A, Slesarev, A, Chigorin, A, & Lempitsky, V. S. (2014). Neural codes for image retrieval. In ECCV, pp. 584–599.
Bay, H., Ess, A., Tuytelaars, T., & Gool, L. J. V. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3), 346–359.
Cao, Y., Wang, C., Li, Z., Zhang, L., & Zhang, L. (2010). Spatial-bag-of-features. In CVPR, pp. 3352–3359.
Chum, O., Matas, J., & Obdrzálek, S. (2004). Enhancing RANSAC by generalized model optimization. In ACCV.
Chum, O., Mikulík, A., Perdoch, M., & Matas, J. (2011). Total recall II: Query expansion revisited. In CVPR, pp. 889–896.
Chum, O., Perdoch, M., & Matas, J. (2009). Geometric min-hashing: Finding a (thick) needle in a haystack. In CVPR, pp. 17–24.
Chum, O., Philbin, J., Sivic, J., Isard, M., & Zisserman, A. (2007). Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV, pp. 1–8.
Deng, C., Ji, R., Liu, W., Tao, D., & Gao, X. (2013). Visual reranking through weakly supervised multi-graph learning. In ICCV, pp. 2600–2607.
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communication of ACM, 24(6), 381–395.
Gordo, A., Almazán, J., Revaud, J., & Larlus, D. (2016). Deep image retrieval: Learning global representations for image search. In ECCV, pp. 241–257.
Gordo, A., Almazán, J., Revaud, J., & Larlus, D. (2017). End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision, 124(2), 237–254.
Jégou, H., & Chum, O. (2012). Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In ECCV, pp. 774–787.
Jégou, H., Douze, M., & Schmid, C. (2008). Hamming embedding and weak geometric consistency for large scale image search. In ECCV, pp. 304–317.
Jégou, H., Douze, M., & Schmid, C. (2009). On the burstiness of visual elements. In CVPR, pp. 1169–1176.
Jégou, H., Douze, M., & Schmid, C. (2010). Improving bag-of-features for large scale image search. International Journal of Computer Vision, 87(3), 316–336.
Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., & Schmid, C. (2012). Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1704–1716.
Jing, Y., & Baluja, S. (2008). VisualRank: Applying PageRank to large-scale image search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1877–1890.
Johns, E. D., & Yang, G. (2014). Pairwise probabilistic voting: Fast place recognition without RANSAC. In ECCV, pp. 504–519.
Kalantidis, Y., Mellina, C., & Osindero, S. (2016). Cross-dimensional weighting for aggregated deep convolutional features. In ECCV workshops, pp. 685–701.
Li, X., Larson, M., & Hanjalic, A. (2015). Pairwise geometric matching for large-scale object retrieval. In CVPR, pp. 5153–5161.
Liu, Z., Li, H., Zhou, W., & Tian, Q. (2012). Embedding spatial context information into inverted file for large-scale image retrieval. In ACM multimedia, pp. 199–208.
Lowe, D. G. (2004). Distinctive image features from scale invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Mikolajczyk, K., & Schmid, C. (2004). Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86.
Muja, M., & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In VISAPP, pp. 331–340.
Ng, J. Y., Yang, F., & Davis, L. S. (2015). Exploiting local features from deep networks for image retrieval. In CVPR workshops, pp. 53–61.
Pedronette, D. C. G., & da Silva, Torres R. (2013). Image re-ranking and rank aggregation based on similarity of ranked lists. Pattern Recognition, 46(8), 2350–2360.
Perdoch, M., Chum, O., & Matas, J. (2009). Efficient representation of local geometry for large scale object retrieval. In CVPR, pp. 9–16.
Perronnin, F., Liu, Y., Sánchez, J., & Poirier, H. (2010). Large-scale image retrieval with compressed Fisher vectors. In CVPR, pp. 3384–3391.
Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR.
Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR.
Qin, D., Gammeter, S., Bossard, L., Quack, T., & Gool, L. J. V. (2011). Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors. In CVPR, pp. 777–784.
Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In CVPR workshops, pp. 512–519.
Radenovic, F., Tolias, G., & Chum, O. (2016). CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In ECCV, pp. 3–20.
Romberg, S., & Lienhart, R. (2013). Bundle min-hashing. IJMIR, 2(4), 243–259.
Romberg, S., Pueyo, L. G., Lienhart, R., & van Zwol, R. (2011). Scalable logo recognition in real-world images. In ICMR, p. 25.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Sattler, T., Havlena, M., Schindler, K., & Pollefeys, M. (2016). Large-scale location recognition and the geometric burstiness problem. In CVPR, pp. 1582–1590.
Sattler, T., Leibe, B., & Kobbelt, L. (2009). SCRAMSAC: improving RANSAC’s efficiency with a spatial consistency filter. In ICCV, pp. 2090–2097.
Schönberger, J. L., Berg, A. C., & Frahm, J. (2015a). Efficient two-view geometry classification. In GCPR, pp. 53–64.
Schönberger, J. L., Berg, A. C., & Frahm, J. (2015b). PAIGE: PAirwise image geometry encoding for improved efficiency in structure-from-motion. In CVPR, pp. 1009–1018.
Schönberger, J. L., Price, T., Sattler, T., Frahm, J., & Pollefeys, M. (2016). A vote-and-verify strategy for fast spatial verification in image retrieval. In ACCV, pp. 321–337.
Schönberger, J. L., Radenovic, F., Chum, O., & Frahm, J. (2015c). From single image query to detailed 3D reconstruction. In CVPR, pp. 5126–5134.
Shen, X., Lin, Z., Brandt, J., & Wu, Y. (2014). Spatially-constrained similarity measure for large-scale object retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1229–1241.
Silpa-Anan, C., & Hartley, R. I. (2008). Optimised KD-trees for fast image descriptor matching. In CVPR.
Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV, pp. 1470–1477.
Sivic, J., & Zisserman, A. (2009). Efficient visual search of videos cast as text retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 591–606.
Tian, X., Yang, L., Wang, J., Wu, X., & Hua, X. (2011). Bayesian visual reranking. IEEE Transactions on Multimedia, 13(4), 639–652.
Tolias, G., Avrithis, Y. S., & Jégou, H. (2016). Image search with selective match kernels: Aggregation across single and multiple images. International Journal of Computer Vision, 116(3), 247–261.
Tolias, G., & Jégou, H. (2014). Visual query expansion with or without geometry: Refining local descriptors by feature aggregation. Pattern Recognition, 47(10), 3466–3476.
Tolias, G., Kalantidis, Y., Avrithis, Y. S., & Kollias, S. D. (2014). Towards large-scale geometry indexing by feature selection. Computer Vision and Image Understanding, 120, 31–45.
Tolias, G., Sicre, R., Jégou, H. (2015). Particular object retrieval with integral max-pooling of CNN activations. In ICLR.
Turpin, A., & Scholer, F. (2006). User performance versus precision measures for simple search tasks. In SIGIR, pp. 11–18.
Wu, X., & Kashino, K. (2015a). Adaptive dither voting for robust spatial verification. In ICCV, pp. 1877–1885.
Wu, X., & Kashino, K. (2015b). Robust spatial matching as ensemble of weak geometric relations. In BMVC, pp. 25.1–25.12.
Wu, X., & Kashino, K. (2015c). Second-order configuration of local features for geometrically stable image matching and retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 25(8), 1395–1408.
Wu, Z., Ke, Q., Isard, M., & Sun, J. (2009). Bundling features for large scale partial-duplicate web image search. In CVPR, pp. 25–32.
Xu, B., Bu, J., Chen, C., Wang, C., Cai, D., & He, X. (2015a). EMR: A scalable graph-based ranking model for content-based image retrieval. IEEE Transactions on Knowledge and Data Engineering, 27(1), 102–114.
Xu, Z., Yang, Y., & Hauptmann, A. G. (2015b). A discriminative CNN video representation for event detection. In CVPR, pp. 1798–1807.
Yang, Y., & Newsam S. (2011). Spatial pyramid co-occurrence for image classification. In ICCV, pp. 1465–1472.
Zhang, Y., Jia, Z., & Chen, T. (2011). Image retrieval with geometry-preserving visual phrases. In CVPR, pp. 809–816.
Zhou, D., Bousquet, O., Lal, T. N., Weston, J., & Schölkopf, B. (2003). Learning with local and global consistency. In NIPS, pp. 321–328.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Josef Sivic.
About this article
Cite this article
Wu, X., Hiramatsu, K. & Kashino, K. Label Propagation with Ensemble of Pairwise Geometric Relations: Towards Robust Large-Scale Retrieval of Object Instances. Int J Comput Vis 126, 689–713 (2018). https://doi.org/10.1007/s11263-018-1063-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-018-1063-9