Skip to main content

Photo Recall: Using the Internet to Label Your Photos

  • Chapter
  • First Online:
Large-Scale Visual Geo-Localization

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

  • 1611 Accesses

Abstract

We describe a system for searching your personal photos using an extremely wide range of text queries, including dates and holidays (Halloween), named and categorical places (Empire State Building or park), events and occasions (Radiohead concert or wedding), activities (skiing), object categories (whales), attributes (outdoors), and object instances (Mona Lisa), and any combination of these—all with no manual labeling required. We accomplish this by correlating information in your photos—the timestamps, GPS locations, and image pixels—to information mined from the Internet. This includes matching dates to holidays listed on Wikipedia, GPS coordinates to places listed on Wikimapia, places and dates to find named events using Google, visual categories using classifiers either pretrained on ImageNet or trained on-the-fly using results from Google Image Search , and object instances using interest point-based matching, again using results from Google Images. We tie all of these disparate sources of information together in a unified way, allowing for fast and accurate searches using whatever information you remember about a photo. We represent all information in our system in a layered graph which prevents duplication of effort and data storage, while simultaneously allowing for fast searches, generating meaningful descriptions of search results, and even suggesting query completions to the user as she types, via auto-complete. We quantitatively evaluate several aspects of our system and show excellent performance in all respects. Please watch a video demonstrating our system in action on a large range of queries at http://youtu.be/Se3bemzhAiY.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    quoted from http://support.apple.com/kb/PH2381.

  2. 2.

    http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html.

  3. 3.

    http://code.flickr.net/2014/10/20/introducing-flickr-park-or-bird/.

  4. 4.

    http://en.wikipedia.org/wiki/List_of_US_holidays.

  5. 5.

    http://wikimapia.org.

References

  1. Arandjelović R, Zisserman A (2012) Multiple queries for large scale specific object retrieval. In: BMVC

    Google Scholar 

  2. Bird S, Klein E, Loper E (2009) Natural language processing with Python. O’Reilly Media

    Google Scholar 

  3. Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: COMPSTAT’2010. Springer

    Google Scholar 

  4. Bush V et al (1945) As we may think. The Atlantic monthly 176(1):101–108

    Google Scholar 

  5. Chatfield K, Zisserman A (2012) Visor: towards on-the-fly large-scale object category retrieval. In: ACCV

    Google Scholar 

  6. Cooper M, Foote J, Girgensohn A, Wilcox L (2005) Temporal event clustering for digital photo collections. ACM Trans Multimedia Comput Commun Appl 1(3):269–288. doi:10.1145/1083314.1083317

    Google Scholar 

  7. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3)

    Google Scholar 

  8. Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2), 5:1–5:60. doi:10.1145/1348246.1348248

    Google Scholar 

  9. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR

    Google Scholar 

  10. Divvala S, Farhadi A, Guestrin C (2014) Learning everything about anything: Webly-supervised visual concept learning

    Google Scholar 

  11. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. IJCV 88(2):303–338

    Google Scholar 

  12. Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from google’s image search. In: International conference on computer vision, vol 2, pp 1816–1823

    Google Scholar 

  13. Joshi D, Luo J, Yu J, Lei P, Gallagher A (2011) Using geotags to derive rich tag-clouds for image annotation. In: Social media modeling and computing, pp 239–256. Springer

    Google Scholar 

  14. Kirk D, Sellen A, Rother C, Wood K (2006) Understanding photowork. In: SIGCHI, pp 761–770. doi:10.1145/1124772.1124885

  15. Kumar N, Belhumeur PN, Nayar SK (2008) Facetracer: a search engine for large collections of images with faces. In: European conference on computer vision (ECCV)

    Google Scholar 

  16. Li LJ, Fei-Fei L (2010) Optimol: automatic online picture collection via incremental model learning. Int J Comput Vis 88(2):147–168

    Google Scholar 

  17. Li X, Chen L, Zhang L, Lin F, Ma WY (2006) Image annotation by large-scale content-based image retrieval. In: ACM international conference on Multimedia, pp 607–610

    Google Scholar 

  18. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis

    Google Scholar 

  19. Malisiewicz T, Efros A (2009) Beyond categories: the visual memex model for reasoning about object relationships. In: Advances in neural information processing systems, pp 1222–1230 (2009)

    Google Scholar 

  20. Miller G et al (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41

    Google Scholar 

  21. Naaman M, Song YJ, Paepcke A, Molina HG (2004) Automatic organization for digital photographs with geographic coordinates. In: ACM/IEEE joint conference on digital libraries

    Google Scholar 

  22. Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: IEEE conference computer vision and pattern recognition (CVPR), pp 2161–2168

    Google Scholar 

  23. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42:145–175. http://dx.doi.org/10.1023/A:1011139631724

  24. Parkhi OM, Vedaldi A, Zisserman A (2012) On-the-fly specific person retrieval. In: International workshop on image analysis for multimedia interactive services

    Google Scholar 

  25. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  26. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. Comput Vis Pattern Recog

    Google Scholar 

  27. Platt J et al (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances Large Margin Class 10(3):61–74

    Google Scholar 

  28. Quack T, Leibe B, Van Gool L (2008) World-scale mining of objects and events from community photo collections. In: International conference on content-based image and video retrieval, pp 47–56. ACM

    Google Scholar 

  29. Rohrbach M, Stark M, Szarvas G, Gurevych I, Schiele B (2010) What helps where–and why? semantic relatedness for knowledge transfer. In: Computer vision and pattern recognition (CVPR), pp 910–917

    Google Scholar 

  30. Scheirer W, Kumar N, Belhumeur PN, Boult TE (2012) Multi-attribute spaces: calibration for attribute fusion and similarity search. In: CVPR

    Google Scholar 

  31. Sculley D (2010) Web-scale k-means clustering. In: International conference on world wide web, pp 1177–1178. ACM

    Google Scholar 

  32. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching. In: ICCV

    Google Scholar 

  33. Weston J, Bengio S, Usunier N (2011) Wsabie: scaling up to large vocabulary image annotation. In: International joint conference on artificial intelligence, pp 2764–2770

    Google Scholar 

  34. Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) SUN database: large-scale scene recognition from abbey to zoo. In: CVPR

    Google Scholar 

  35. Zhang L, Hu Y, Li M, Ma W, Zhang H (2004) Efficient propagation for face annotation in family albums. In: ACM international conference on multimedia, pp 716–723

    Google Scholar 

Download references

Acknowledgments

This work was supported by funding from National Science Foundation grant IIS-1250793, Google, Adobe, Microsoft, Pixar, and the UW Animation Research Labs.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neeraj Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Kumar, N., Seitz, S. (2016). Photo Recall: Using the Internet to Label Your Photos. In: Zamir, A., Hakeem, A., Van Gool, L., Shah, M., Szeliski, R. (eds) Large-Scale Visual Geo-Localization. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-25781-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25781-5_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25779-2

  • Online ISBN: 978-3-319-25781-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics