Discovering Mid-level Visual Connections in Space and Time

Lee, Yong Jae; Efros, Alexei A.; Hebert, Martial

doi:10.1007/978-3-319-25781-5_2

Yong Jae Lee⁷,
Alexei A. Efros⁸ &
Martial Hebert⁹

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1657 Accesses

Abstract

Finding recurring visual patterns in data underlies much of modern computer vision. The emerging subfield of visual category discovery/visual data mining proposes to cluster visual patterns that capture more complex appearance than low-level blobs, corners, or oriented bars, without requiring any semantic labels. In particular, mid-level visual elements have recently been proposed as a new type of visual primitive, and have been shown to be useful for various recognition tasks. The visual elements are discovered automatically from the data, and thus, have a flexible representation of being either a part, an object, a group of objects, etc. In this chapter, we explore what the mid-level visual representation brings to geo-spatial and longitudinal analyses. Specifically, we present a weakly supervised visual data mining approach that discovers connections between recurring mid-level visual elements in historic (temporal) and geographic (spatial) image collections, and attempts to capture the underlying visual style. In contrast to existing discovery methods that mine for patterns that remain visually consistent throughout the dataset, the goal is to discover visual elements whose appearance changes due to change in time or location, i.e., exhibit consistent stylistic variations across the label space (date or geo-location). To discover these elements, we first identify groups of patches that are style-sensitive. We then incrementally build correspondences to find the same element across the entire dataset. Finally, we train style-aware regressors that model each element’s range of stylistic differences. We apply our approach to date and geo-location prediction and show substantial improvement over several baselines that do not model visual style. We also demonstrate the method’s effectiveness on the related task of fine-grained classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Berg T, Belhumeur P (2013) POOF: part-based one-vs-one features for fine-grained categorization, face verification, and attribute estimation. In: CVPR
Google Scholar
Chai Y, Lempitsky V, Zisserman A (2013) Symbiotic segmentation and part localization for fine-grained categorization. In: ICCV
Google Scholar
Chen CY, Grauman K (2011) Clues from the beaten path: location estimation with bursty sequences of tourist photos. In: CVPR
Google Scholar
Cristani M, Perina A, Castellani U, Murino V (2008) Geolocated image analysis using latent representations. In: CVPR
Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR
Google Scholar
Doersch C, Singh S, Gupta A, Sivic J, Efros AA (2012) What makes Paris look like Paris? In: SIGGRAPH
Google Scholar
Duan K, Parikh D, Crandall D, Grauman K (2012) Discovering localized attributes for fine-grained recognition. In: CVPR
Google Scholar
Faktor A, Irani M (2012) Clustering by composition unsupervised discovery of image categories. In: ECCV
Google Scholar
Farrell R, Oza O, Zhang N, Morariu V, Darrell T, Davis L (2011) Birdlets: subordinate categorization using volumetric primitives and pose-normalized appearance. In: ICCV
Google Scholar
Fu Y, Guo G-D, Huang T (2010) Age synthesis and estimation via faces: a survey. TPAMI
Google Scholar
Gavves E, Fernando B, Snoek C, Smeulders A, Tuytelaars T (2013) Fine-grained categorization by alignments. In: ICCV
Google Scholar
Grauman K, Darrell T (2006) Unsupervised learning of categories from sets of partially matching image features. In: CVPR
Google Scholar
Hays J, Efros A (2008) Im2gps: estimating geographic information from a single image. In: CVPR
Google Scholar
Kalogerakis E, Vesselova O, Hays J, Efros A, Hertzmann A (2009) Image sequence geolocation with human travel priors. In: ICCV
Google Scholar
Kim G, Xing E, Torralba A (2010) Modeling and analysis of dynamic behaviors of web image collections. In: ECCV
Google Scholar
Knopp J, Sivic J, Pajdla T (2010) Avoiding confusing features in place recognition. In: ECCV
Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR
Google Scholar
Lee YJ, Efros AA, Hebert M (2013) Style-aware mid-level representation for discovering visual connections in space and time. In: ICCV
Google Scholar
Lee YJ, Grauman K (2009) Foreground focus: unsupervised learning from partially matching images. In: IJCV, vol 85
Google Scholar
Lee YJ, Grauman K (2011) Object-graphs for context-aware visual category discovery. In: TPAMI
Google Scholar
Li L-J, Su H, Xing E, Fei-Fei L (2010) Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: NIPS
Google Scholar
Malisiewicz T, Efros A (2009) Beyond categories: the visual memex model for reasoning about object relationships. In: NIPS
Google Scholar
Palermo F, Hays J, Efros AA (2012) Dating historical color images. In: ECCV
Google Scholar
Parikh D, Grauman K (2011) Relative attributes. In: ICCV
Google Scholar
Payet N, Todorovic S (2010) From a set of shapes to object discovery. In: ECCV
Google Scholar
Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. In: CVPR
Google Scholar
Rastegariy M, Farhadi A, Forsyth D (2012) Attribute discovery via predictable discriminative binary codes. In: ECCV
Google Scholar
Schindler G, Brown M, Szeliski R (2007) Cityscale location recognition. In CVPR
Google Scholar
Shrivastava A, Singh S, Gupta A (2012) Constrained semi-supervised learning using attributes and comparative attributes. In: ECCV
Google Scholar
Singh S, Gupta A, Efros AA (2012) Unsupervised discovery of mid-level discriminative patches. In: ECCV
Google Scholar
Sivic J, Russell B, Efros A, Zisserman A, Freeman W (2005) Discovering object categories in image collections. In: ICCV
Google Scholar
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: ICCV
Google Scholar
Smola A, Schlkopf B (2003) A tutorial on support vector regression. Technical report, Statistics and Computing
Google Scholar
Tenenbaum J, Freeman W (2000) Separating style and content with bilinear models. Neural Comput 12(6)
Google Scholar
Torralba A, Efros AA (2011) Unbiased look at dataset bias. In: CVPR
Google Scholar
Wah C, Branson S, Perona P, Belongie S (2011) Multiclass recognition part localization with humans in the loop. In: ICCV
Google Scholar
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-UCSD birds-200-2011 dataset. Technical report
Google Scholar
Yang S, Bo L, Wang J, Shapiro L (2012) Unsupervised template learning for fine-grained object recognition. In: NIPS
Google Scholar
Yao B, Khosla A, Fei-Fei L (2011) Combining randomization and discrimination for fine-grained image categorization. In: CVPR
Google Scholar
Zhang N, Farrell R, Darrell T (2012) Pose pooling kernels for sub-category recognition. In: CVPR
Google Scholar
Zhang N, Farrell R, Iandola F, Darrell T (2013) Deformable part descriptors for fine-grained recognition and attribute prediction. In: ICCV
Google Scholar

Download references

Acknowledgments

We thank Olivier Duchenne for helpful discussions. This work was supported in part by Google, ONR MURI N000141010934, and the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory. The U.S. government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL or the U.S. government.

Author information

Authors and Affiliations

Department of Computer Science, UC Davis, Davis, CA, USA
Yong Jae Lee
Department of Electrical Engineering and Computer Science, UC Berkeley, Berkeley, CA, USA
Alexei A. Efros
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Martial Hebert

Authors

Yong Jae Lee
View author publications
You can also search for this author in PubMed Google Scholar
Alexei A. Efros
View author publications
You can also search for this author in PubMed Google Scholar
Martial Hebert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Jae Lee .

Editor information

Editors and Affiliations

Computer Science Department, Stanford University Computer Science Department, Stanford, California, USA
Amir R. Zamir
Decisive Analytics Corporation, Arlington, Virginia, USA
Asaad Hakeem
ETH Zürich, Zürich, Switzerland
Luc Van Gool
University of Central Florida, Orlando, Florida, USA
Mubarak Shah
Facebook, Seattle, Washington, USA
Richard Szeliski

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lee, Y.J., Efros, A.A., Hebert, M. (2016). Discovering Mid-level Visual Connections in Space and Time. In: Zamir, A., Hakeem, A., Van Gool, L., Shah, M., Szeliski, R. (eds) Large-Scale Visual Geo-Localization. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-25781-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-25781-5_2
Published: 06 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25779-2
Online ISBN: 978-3-319-25781-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics