Photo Recall: Using the Internet to Label Your Photos

Kumar, Neeraj; Seitz, Steven

doi:10.1007/978-3-319-25781-5_17

Neeraj Kumar⁷ &
Steven Seitz⁷

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1611 Accesses

Abstract

We describe a system for searching your personal photos using an extremely wide range of text queries, including dates and holidays (Halloween), named and categorical places (Empire State Building or park), events and occasions (Radiohead concert or wedding), activities (skiing), object categories (whales), attributes (outdoors), and object instances (Mona Lisa), and any combination of these—all with no manual labeling required. We accomplish this by correlating information in your photos—the timestamps, GPS locations, and image pixels—to information mined from the Internet. This includes matching dates to holidays listed on Wikipedia, GPS coordinates to places listed on Wikimapia, places and dates to find named events using Google, visual categories using classifiers either pretrained on ImageNet or trained on-the-fly using results from Google Image Search , and object instances using interest point-based matching, again using results from Google Images. We tie all of these disparate sources of information together in a unified way, allowing for fast and accurate searches using whatever information you remember about a photo. We represent all information in our system in a layered graph which prevents duplication of effort and data storage, while simultaneously allowing for fast searches, generating meaningful descriptions of search results, and even suggesting query completions to the user as she types, via auto-complete. We quantitatively evaluate several aspects of our system and show excellent performance in all respects. Please watch a video demonstrating our system in action on a large range of queries at http://youtu.be/Se3bemzhAiY.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Arandjelović R, Zisserman A (2012) Multiple queries for large scale specific object retrieval. In: BMVC
Google Scholar
Bird S, Klein E, Loper E (2009) Natural language processing with Python. O’Reilly Media
Google Scholar
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: COMPSTAT’2010. Springer
Google Scholar
Bush V et al (1945) As we may think. The Atlantic monthly 176(1):101–108
Google Scholar
Chatfield K, Zisserman A (2012) Visor: towards on-the-fly large-scale object category retrieval. In: ACCV
Google Scholar
Cooper M, Foote J, Girgensohn A, Wilcox L (2005) Temporal event clustering for digital photo collections. ACM Trans Multimedia Comput Commun Appl 1(3):269–288. doi:10.1145/1083314.1083317
Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3)
Google Scholar
Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2), 5:1–5:60. doi:10.1145/1348246.1348248
Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR
Google Scholar
Divvala S, Farhadi A, Guestrin C (2014) Learning everything about anything: Webly-supervised visual concept learning
Google Scholar
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. IJCV 88(2):303–338
Google Scholar
Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from google’s image search. In: International conference on computer vision, vol 2, pp 1816–1823
Google Scholar
Joshi D, Luo J, Yu J, Lei P, Gallagher A (2011) Using geotags to derive rich tag-clouds for image annotation. In: Social media modeling and computing, pp 239–256. Springer
Google Scholar
Kirk D, Sellen A, Rother C, Wood K (2006) Understanding photowork. In: SIGCHI, pp 761–770. doi:10.1145/1124772.1124885
Kumar N, Belhumeur PN, Nayar SK (2008) Facetracer: a search engine for large collections of images with faces. In: European conference on computer vision (ECCV)
Google Scholar
Li LJ, Fei-Fei L (2010) Optimol: automatic online picture collection via incremental model learning. Int J Comput Vis 88(2):147–168
Google Scholar
Li X, Chen L, Zhang L, Lin F, Ma WY (2006) Image annotation by large-scale content-based image retrieval. In: ACM international conference on Multimedia, pp 607–610
Google Scholar
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis
Google Scholar
Malisiewicz T, Efros A (2009) Beyond categories: the visual memex model for reasoning about object relationships. In: Advances in neural information processing systems, pp 1222–1230 (2009)
Google Scholar
Miller G et al (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Google Scholar
Naaman M, Song YJ, Paepcke A, Molina HG (2004) Automatic organization for digital photographs with geographic coordinates. In: ACM/IEEE joint conference on digital libraries
Google Scholar
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: IEEE conference computer vision and pattern recognition (CVPR), pp 2161–2168
Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42:145–175. http://dx.doi.org/10.1023/A:1011139631724
Parkhi OM, Vedaldi A, Zisserman A (2012) On-the-fly specific person retrieval. In: International workshop on image analysis for multimedia interactive services
Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Google Scholar
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. Comput Vis Pattern Recog
Google Scholar
Platt J et al (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances Large Margin Class 10(3):61–74
Google Scholar
Quack T, Leibe B, Van Gool L (2008) World-scale mining of objects and events from community photo collections. In: International conference on content-based image and video retrieval, pp 47–56. ACM
Google Scholar
Rohrbach M, Stark M, Szarvas G, Gurevych I, Schiele B (2010) What helps where–and why? semantic relatedness for knowledge transfer. In: Computer vision and pattern recognition (CVPR), pp 910–917
Google Scholar
Scheirer W, Kumar N, Belhumeur PN, Boult TE (2012) Multi-attribute spaces: calibration for attribute fusion and similarity search. In: CVPR
Google Scholar
Sculley D (2010) Web-scale k-means clustering. In: International conference on world wide web, pp 1177–1178. ACM
Google Scholar
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching. In: ICCV
Google Scholar
Weston J, Bengio S, Usunier N (2011) Wsabie: scaling up to large vocabulary image annotation. In: International joint conference on artificial intelligence, pp 2764–2770
Google Scholar
Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) SUN database: large-scale scene recognition from abbey to zoo. In: CVPR
Google Scholar
Zhang L, Hu Y, Li M, Ma W, Zhang H (2004) Efficient propagation for face annotation in family albums. In: ACM international conference on multimedia, pp 716–723
Google Scholar

Download references

Acknowledgments

This work was supported by funding from National Science Foundation grant IIS-1250793, Google, Adobe, Microsoft, Pixar, and the UW Animation Research Labs.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA
Neeraj Kumar & Steven Seitz

Authors

Neeraj Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Steven Seitz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neeraj Kumar .

Editor information

Editors and Affiliations

Computer Science Department, Stanford University Computer Science Department, Stanford, California, USA
Amir R. Zamir
Decisive Analytics Corporation, Arlington, Virginia, USA
Asaad Hakeem
ETH Zürich, Zürich, Switzerland
Luc Van Gool
University of Central Florida, Orlando, Florida, USA
Mubarak Shah
Facebook, Seattle, Washington, USA
Richard Szeliski

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kumar, N., Seitz, S. (2016). Photo Recall: Using the Internet to Label Your Photos. In: Zamir, A., Hakeem, A., Van Gool, L., Shah, M., Szeliski, R. (eds) Large-Scale Visual Geo-Localization. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-25781-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-25781-5_17
Published: 06 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25779-2
Online ISBN: 978-3-319-25781-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics