Abstract
We describe a system for searching your personal photos using an extremely wide range of text queries, including dates and holidays (Halloween), named and categorical places (Empire State Building or park), events and occasions (Radiohead concert or wedding), activities (skiing), object categories (whales), attributes (outdoors), and object instances (Mona Lisa), and any combination of these—all with no manual labeling required. We accomplish this by correlating information in your photos—the timestamps, GPS locations, and image pixels—to information mined from the Internet. This includes matching dates to holidays listed on Wikipedia, GPS coordinates to places listed on Wikimapia, places and dates to find named events using Google, visual categories using classifiers either pretrained on ImageNet or trained on-the-fly using results from Google Image Search , and object instances using interest point-based matching, again using results from Google Images. We tie all of these disparate sources of information together in a unified way, allowing for fast and accurate searches using whatever information you remember about a photo. We represent all information in our system in a layered graph which prevents duplication of effort and data storage, while simultaneously allowing for fast searches, generating meaningful descriptions of search results, and even suggesting query completions to the user as she types, via auto-complete. We quantitatively evaluate several aspects of our system and show excellent performance in all respects. Please watch a video demonstrating our system in action on a large range of queries at http://youtu.be/Se3bemzhAiY.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arandjelović R, Zisserman A (2012) Multiple queries for large scale specific object retrieval. In: BMVC
Bird S, Klein E, Loper E (2009) Natural language processing with Python. O’Reilly Media
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: COMPSTAT’2010. Springer
Bush V et al (1945) As we may think. The Atlantic monthly 176(1):101–108
Chatfield K, Zisserman A (2012) Visor: towards on-the-fly large-scale object category retrieval. In: ACCV
Cooper M, Foote J, Girgensohn A, Wilcox L (2005) Temporal event clustering for digital photo collections. ACM Trans Multimedia Comput Commun Appl 1(3):269–288. doi:10.1145/1083314.1083317
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3)
Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2), 5:1–5:60. doi:10.1145/1348246.1348248
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR
Divvala S, Farhadi A, Guestrin C (2014) Learning everything about anything: Webly-supervised visual concept learning
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. IJCV 88(2):303–338
Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from google’s image search. In: International conference on computer vision, vol 2, pp 1816–1823
Joshi D, Luo J, Yu J, Lei P, Gallagher A (2011) Using geotags to derive rich tag-clouds for image annotation. In: Social media modeling and computing, pp 239–256. Springer
Kirk D, Sellen A, Rother C, Wood K (2006) Understanding photowork. In: SIGCHI, pp 761–770. doi:10.1145/1124772.1124885
Kumar N, Belhumeur PN, Nayar SK (2008) Facetracer: a search engine for large collections of images with faces. In: European conference on computer vision (ECCV)
Li LJ, Fei-Fei L (2010) Optimol: automatic online picture collection via incremental model learning. Int J Comput Vis 88(2):147–168
Li X, Chen L, Zhang L, Lin F, Ma WY (2006) Image annotation by large-scale content-based image retrieval. In: ACM international conference on Multimedia, pp 607–610
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis
Malisiewicz T, Efros A (2009) Beyond categories: the visual memex model for reasoning about object relationships. In: Advances in neural information processing systems, pp 1222–1230 (2009)
Miller G et al (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Naaman M, Song YJ, Paepcke A, Molina HG (2004) Automatic organization for digital photographs with geographic coordinates. In: ACM/IEEE joint conference on digital libraries
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: IEEE conference computer vision and pattern recognition (CVPR), pp 2161–2168
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42:145–175. http://dx.doi.org/10.1023/A:1011139631724
Parkhi OM, Vedaldi A, Zisserman A (2012) On-the-fly specific person retrieval. In: International workshop on image analysis for multimedia interactive services
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. Comput Vis Pattern Recog
Platt J et al (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances Large Margin Class 10(3):61–74
Quack T, Leibe B, Van Gool L (2008) World-scale mining of objects and events from community photo collections. In: International conference on content-based image and video retrieval, pp 47–56. ACM
Rohrbach M, Stark M, Szarvas G, Gurevych I, Schiele B (2010) What helps where–and why? semantic relatedness for knowledge transfer. In: Computer vision and pattern recognition (CVPR), pp 910–917
Scheirer W, Kumar N, Belhumeur PN, Boult TE (2012) Multi-attribute spaces: calibration for attribute fusion and similarity search. In: CVPR
Sculley D (2010) Web-scale k-means clustering. In: International conference on world wide web, pp 1177–1178. ACM
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching. In: ICCV
Weston J, Bengio S, Usunier N (2011) Wsabie: scaling up to large vocabulary image annotation. In: International joint conference on artificial intelligence, pp 2764–2770
Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) SUN database: large-scale scene recognition from abbey to zoo. In: CVPR
Zhang L, Hu Y, Li M, Ma W, Zhang H (2004) Efficient propagation for face annotation in family albums. In: ACM international conference on multimedia, pp 716–723
Acknowledgments
This work was supported by funding from National Science Foundation grant IIS-1250793, Google, Adobe, Microsoft, Pixar, and the UW Animation Research Labs.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Kumar, N., Seitz, S. (2016). Photo Recall: Using the Internet to Label Your Photos. In: Zamir, A., Hakeem, A., Van Gool, L., Shah, M., Szeliski, R. (eds) Large-Scale Visual Geo-Localization. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-25781-5_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-25781-5_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25779-2
Online ISBN: 978-3-319-25781-5
eBook Packages: Computer ScienceComputer Science (R0)