Location Discriminative Vocabulary Coding for Mobile Landmark Search

Ji, Rongrong; Duan, Ling-Yu; Chen, Jie; Yao, Hongxun; Yuan, Junsong; Rui, Yong; Gao, Wen

doi:10.1007/s11263-011-0472-9

Location Discriminative Vocabulary Coding for Mobile Landmark Search

Published: 27 July 2011

Volume 96, pages 290–314, (2012)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Rongrong Ji^1,2,
Ling-Yu Duan¹,
Jie Chen¹,
Hongxun Yao²,
Junsong Yuan³,
Yong Rui⁴ &
…
Wen Gao¹

1584 Accesses
6 Altmetric
Explore all metrics

Abstract

With the popularization of mobile devices, recent years have witnessed an emerging potential for mobile landmark search. In this scenario, the user experience heavily depends on the efficiency of query transmission over a wireless link. As sending a query photo is time consuming, recent works have proposed to extract compact visual descriptors directly on the mobile end towards low bit rate transmission. Typically, these descriptors are extracted based solely on the visual content of a query, and the location cues from the mobile end are rarely exploited. In this paper, we present a Location Discriminative Vocabulary Coding (LDVC) scheme, which achieves extremely low bit rate query transmission, discriminative landmark description, as well as scalable descriptor delivery in a unified framework. Our first contribution is a compact and location discriminative visual landmark descriptor, which is offline learnt in two-step: First, we adopt spectral clustering to segment a city map into distinct geographical regions, where both visual and geographical similarities are fused to optimize the partition of city-scale geo-tagged photos. Second, we propose to learn LDVC in each region with two schemes: (1) a Ranking Sensitive PCA and (2) a Ranking Sensitive Vocabulary Boosting. Both schemes embed location cues to learn a compact descriptor, which minimizes the retrieval ranking loss by replacing the original high-dimensional signatures. Our second contribution is a location aware online vocabulary adaption: We store a single vocabulary in the mobile end, which is efficiently adapted for a region specific LDVC coding once a mobile device enters a given region. The learnt LDVC landmark descriptor is extremely compact (typically 10–50 bits with arithmetical coding) and performs superior over state-of-the-art descriptors. We implemented the framework in a real-world mobile landmark search prototype, which is validated in a million-scale landmark database covering typical areas e.g. Beijing, New York City, Lhasa, Singapore, and Florence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Bay, H., Tuytelaars, T., & Van Gool, L. (2006). SURF: speed up robust features. In ECCV (pp. 450–459).
Google Scholar
Chandrasekhar, V., Takacs, G., Chen, D., Tsai, S., Grzeszczuk, R., & Girod, B. (2009a). CHoG: Compressed histogram of gradients a low bit-rate feature descriptor. In CVPR (pp. 2504–2511).
Google Scholar
Chandrasekhar, V., Takacs, G., Chen, D., Tsai, S., Singh, J., & Girod, B. (2009b). Transform coding of image feature descriptors. In VCIP. doi:10.1117/12.805982.
Google Scholar
Chandrasekhar, V., Chen, D., Lin, A., Takacs, G., Tsai, S., Cheung, N., Reznik, Y., Grzeszczuk, R., & Girod, B. (2010). Comparison of local feature descriptors for mobile visual search. In ICIP (pp. 3885–3888).
Google Scholar
Chen, D., Tsai, S., & Chandrasekhar, V. (2009). Tree histogram coding for mobile image matching. In DCC (pp. 143–152).
Google Scholar
Chen, D., Tsai, S., Chandrasekhar, V., Takacs, G., Vedantham, R., Grzeszczuk, R., & Girod, B. (2010). Inverted index compression for scalable image matching. In DCC (pp. 525–552).
Google Scholar
Crandall, D., Backstrom, L., Huttenlocher, D., & Kleinberg, J. (2009). Mapping the world’s photos. In WWW (pp. 761–770).
Chapter Google Scholar
Cristani, M., Perina, A., Castellani, U., & Murino, V. (2008). Geolocated image analysis using latent representations. In CVPR (pp. 1–9).
Google Scholar
Eade, E.-D., & Drummond, T.-W. (2008). Unified loop closing and recovery for real time monocular SLAM. In BMVC
Google Scholar
Freund, Y., & Schapire, R. (1994). A decision-theoretic generalization of on-line learning and an application to boosting. In European conference on computational learning theory (Vol. 904, pp. 23–37).
Google Scholar
Hays, J., & Efros, A. (2008). IMG2GPS: estimating geographic information from a single image. In CVPR (pp. 1–8).
Google Scholar
Hua, G., Brown, M., & Winder, S. (2007). Discriminant embedding for local image descriptors. In ICCV (pp. 1–8).
Google Scholar
Irschara, A., Zach, C., Frahm, J., & Bischof, H. (2009). From structure-from-motion point clouds to fast location recognition. In CVPR (pp. 2599–2606).
Google Scholar
Jegou, H., Douze, M., & Schmid, C. (2009). Packing bag-of-features. In ICCV (pp. 1–8).
Google Scholar
Jegou, H., Douze, M., Schmid, C., & Perez, P. (2010a). Aggregating local descriptors into a compact image representation. In CVPR (pp. 3304–3311).
Google Scholar
Jegou, H., Douze, M., & Schmid, C. (2010b). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99, 1.
Google Scholar
Jegou et al. (2010c). http://www.irisa.fr/texmex/people/jegou/src/compactimgcodes/index.php.
Ji, R., Xie, X., Yao, H., Ma, W.-Y., & Wu, Y. (2008). Vocabulary tree incremental indexing for scalable scene recognition. In ICME (pp. 869–872).
Google Scholar
Ji, R., Xie, X., Yao, H., & Ma, W.-Y. (2009a). Hierarchical optimization of visual vocabulary for effective and transferable retrieval. In CVPR (pp. 1161–1168).
Google Scholar
Ji, R., Xie, X., Yao, H., & Ma, W.-Y. (2009b). Mining city landmarks from blogs by graph modeling. In ACM Multimedia (pp. 105–114).
Google Scholar
Kalogerakis, E., Vesselova, O., Hays, J., Efros, A., & Hertzmann, A. (2009). Image sequence geolocation with human travel priors. In CVPR (pp. 1–8).
Google Scholar
Ke, Y., & Sukthankar, R. (2004). PCA-SIFT: A more distinctive representation for local image descriptors. In CVPR (pp. II-506–II-513).
Google Scholar
Kennedy, L., Naaman, M., Ahern, S., Nail, R., & Rattenbury, T. (2007). How Flickr helps us make sense of the world: context and content in community-contributed media collections. In ACM Multimedia (pp. 631–640).
Google Scholar
Lee, J.-A., Yow, K.-C., & Sluzek, A. (2008). Image-based information guide on mobile devices. In Advances in Visual Computing (pp. 346–355).
Chapter Google Scholar
Li, X., Wu, C., Zach, C., Lazebnik, S., & Frahm, J.-M. (2008). Modeling and recognition of landmark image collections using iconic scene graphs. In ECCV (pp. 427–440).
Google Scholar
Liu, D., Scott, M., Ji, R., Yao, H., & Xie, X. (2009). Location sensitive indexing for image-based advertising. In ACM Multimedia (pp. 793–796).
Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Makar, M., Chang, C., Chen, D., Tsai, S., & Girod, B. (2009). Compression of image patches for local feature extraction. In ICASSP (pp. 821–824).
Google Scholar
Mikolajczyk, K., & Schmid, C. (2005). Performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.
Article Google Scholar
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Van Gool, L. (2006). A comparison of affine region detectors. International Journal of Computer Vision, 29(11), 1735–1783.
Google Scholar
Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: analysis and an algorithm. In NIPS (pp. 849–856).
Google Scholar
Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR (pp. 2161–2168).
Google Scholar
Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabulary and fast spatial matching. In CVPR (pp. 1–8).
Google Scholar
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management.
Schindler, G., & Brown, M. (2007). City-scale location recognition. In CVPR (pp. 1–7).
Google Scholar
Shao, H., Svoboda, T., Tuytelaars, T., & Van Gool, L. (2003). Hpat indexing for fast object/scene recognition based on local appearance. In CIVR, (Vol. 2728, pp. 71–80).
Google Scholar
Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In ICCV (pp. 1470–1477).
Google Scholar
Tipping, M., & Bishop, C. (1997). Probabilistic principle component analysis. Technical Report, Neural Computing Research Group, Aston University.
Torralba, A., Fergus, R., & Weiss, Y. (2008). Small codes and large databases for recognition. In CVPR (pp. 1–8).
Google Scholar
Tsai, S., Chen, D., Takacs, G., & Chandrasekhar, V. (2010). Location coding for mobile image retrieval. In MobileMedia
Google Scholar
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR (pp. 3360–3367).
Google Scholar
Weiss, Y., Torralba, A., & Fergus, R. (2009). Spectral hashing. In NIPS (pp. 1753–1760).
Google Scholar
Witten, I., Moffat, A., & Bell, T. (1999). Managing gigabytes: compressing and indexing documents and images (2nd edn.). San Francisco: Morgan Kaufmann.
Google Scholar
Xiao, J.-X., Chen, J.-N., Yeung, D.-Y., & Quan, L. (2008). Structuring visual words in 3D for arbitrary-view object localization. In ECCV (pp. 725–737).
Google Scholar
Yeh, T., Lee, J., & Darell, T. (2007). Adaptive vocabulary forest for dynamic indexing and category learning. In CVPR (pp. 1–8).
Google Scholar
Yeo, C., Ahammad, P., & Ramchandran, K. (2008). Rate-efficient visual correspondences using random projections. In ICIP (pp. 217–220).
Google Scholar
Zhang, W., & Kosecka, J. (2006). Image based localization in urban environments. In 3DVT (pp. 33–40).
Google Scholar
Zheng, Y. T., Zhao, M., Song, Y., & Adam, H. (2009). Tour the world: building a web-scale landmark recognition engine. In CVPR (pp. 1085–1092).
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Digital Media, Peking University, Beijing, China
Rongrong Ji, Ling-Yu Duan, Jie Chen & Wen Gao
Visual Intelligence Laboratory, Harbin Institute of Technology, Harbin, China
Rongrong Ji & Hongxun Yao
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Junsong Yuan
Microsoft China Research and Development Group, Beijing, China
Yong Rui

Authors

Rongrong Ji
View author publications
You can also search for this author inPubMed Google Scholar
Ling-Yu Duan
View author publications
You can also search for this author inPubMed Google Scholar
Jie Chen
View author publications
You can also search for this author inPubMed Google Scholar
Hongxun Yao
View author publications
You can also search for this author inPubMed Google Scholar
Junsong Yuan
View author publications
You can also search for this author inPubMed Google Scholar
Yong Rui
View author publications
You can also search for this author inPubMed Google Scholar
Wen Gao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ling-Yu Duan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ji, R., Duan, LY., Chen, J. et al. Location Discriminative Vocabulary Coding for Mobile Landmark Search. Int J Comput Vis 96, 290–314 (2012). https://doi.org/10.1007/s11263-011-0472-9

Download citation

Received: 13 October 2010
Accepted: 01 June 2011
Published: 27 July 2011
Issue Date: February 2012
DOI: https://doi.org/10.1007/s11263-011-0472-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Location Discriminative Vocabulary Coding for Mobile Landmark Search

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DisLocation: Scalable Descriptor Distinctiveness for Location Recognition

Spatial Verification via Compact Words for Mobile Instance Search

Visual and Positioning Information Fusion Towards Urban Place Recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Location Discriminative Vocabulary Coding for Mobile Landmark Search

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DisLocation: Scalable Descriptor Distinctiveness for Location Recognition

Spatial Verification via Compact Words for Mobile Instance Search

Visual and Positioning Information Fusion Towards Urban Place Recognition

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now