Abstract
Online photo collections have become truly gigantic. Photo sharing sites such as Flickr (http://www.flickr.com/) host billions of photographs, a large portion of which are contributed by tourists. In this paper, we leverage online photo collections to automatically rank canonical views for tourist attractions. Ideal canonical views for a tourist attraction should both be representative of the site and exhibit a diverse set of views (Kennedy and Naaman, International Conference on World Wide Web 297–306, 2008). In order to meet both goals, we rank canonical views in two stages. During the first stage, we use visual features to encode the content of photographs and infer the popularity of each photograph. During the second stage, we rank photographs using a suppression scheme to keep popular views top-ranked while demoting duplicate views. After a ranking is generated, canonical views at various granularities can be retrieved in real-time, which advances over previous work and is a promising feature for real applications. In order to scale canonical view ranking to gigantic online photo collections, we propose to leverage geo-tags (latitudes/longitudes of the location of the scene in the photographs) to speed up the basic algorithm. We preprocess the photo collection to extract subsets of photographs that are geographically clustered (or geo-clusters), and constrain the expensive visual processing within each geo-cluster. We test the algorithm on two large Flickr data sets of Rome and the Yosemite national park, and show promising results on canonical view ranking. For quantitative analysis, we adopt two medium data sets and conduct a subjective comparison with previous work. It shows that while both algorithms are able to produce canonical views of high quality, our algorithm has the advantage of responding in real-time to canonical view retrieval at various granularities.
Similar content being viewed by others
References
ANN. from http://www.cs.umd.edu/~mount/ANN/.
Beis J, Lowe DG (1997) Shape indexing using approximate nearest-neighbor search in high dimensional spaces. IEEE Comp Vision Patt Recog 1000–1006
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Systs 30(1–7):107–117
Brown M, Szeliski R, Winder S (2005) Multi-image matching using multi-scale oriented patches. IEEE Comp Vision Patt Recog 510–517
Bryan K, Leise T (2006) The $25, 000, 000, 000 eigenvector: the linear algebra behind Google. SIAM 48(3):569–581
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24:381–395
Flickr. from http://www.flickr.com/
Georgescu B, Shimshoni I, Meer P (2003) Mean shift based clustering in high dimensions: a texture classification example. IEEE Int Conf Comp Vis 456–463
Google. from http://www.google.com/
GoogleEarthAPI. from http://code.google.com/apis/earth/
Hartley R, Zisserman A (2003) Multiple view geometry in computer vision. Cambridge University Press
Hays J, Efros AA (2008) IM2GPS: estimating geographic information from a single image. IEEE Comp Vision Patt Recog 1–8
Hofmann T (1999) Probabilistic latent semantic analysis. Uncertainty Artif Intell
Jaffe A, Naaman M, Tassa T, Davis M (2006) Generating summaries and visualization for large collections of geo-referenced photographs. MIR 89–98
Jing Y, Baluja S (2008) Pagerank for product image search. International Conference on World Wide Web 307–316
Jing Y, Baluja S, Rowley H (2007) Canonical image selection from the web. CIVR 280–287
Ke Y, Tang X, Jing F (2006) The design of high-level features for photo quality assessment. CVPR 419–426
Kennedy LS, Naaman M (2008) Generating diverse and representative image search results for landmarks. WWW 297–306
Kennedy L, Chang S, Kozintsev I (2006) To search or to lable? Predicting the performance of search-based automatic image classifiers. MIR 249–258
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Mikolajczyk K, Schmid C (2004) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27(10):1615–1630
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Raguram R, Lazebnik S (2008) Computing iconic summaries for general visual concepts. IV
Simon I, Snavely N, Seitz S (2007) Scene summarization for online image collections. ICCV 1–8
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. IEEE Int Conf Comput Vis 2:1470–1477
Yang Y, Wu P, Lee C, Lin K, Hsu W, Chen H (2008) ContextSeer: context search and recommendation at query time for shared consumer photos. ACM Multimedia
Acknowledgments
The authors would like to thank Wei-bang Chen, Srinivasa Datla, Sagar Thapaliya, Richa Tiwari and Liping Zhou for generating ground-truth view clusters on two of the test data sets.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, L., Johnstone, J. & Zhang, C. Ranking canonical views for tourist attractions. Multimed Tools Appl 46, 573–589 (2010). https://doi.org/10.1007/s11042-009-0345-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-009-0345-1