skip to main content
research-article

Content vs. Context: Visual and Geographic Information Use in Video Landmark Retrieval

Published: 05 February 2015 Publication History

Abstract

Due to the ubiquity of sensor-equipped smartphones, it has become increasingly feasible for users to capture videos together with associated geographic metadata, for example the location and the orientation of the camera. Such contextual information creates new opportunities for the organization and retrieval of geo-referenced videos. In this study we explore the task of landmark retrieval through the analysis of two types of state-of-the-art techniques, namely media-content-based and geocontext-based retrievals. For the content-based method, we choose the Spatial Pyramid Matching (SPM) approach combined with two advanced coding methods: Sparse Coding (SC) and Locality-Constrained Linear Coding (LLC). For the geo-based method, we present the Geo Landmark Visibility Determination (GeoLVD) approach which computes the visibility of a landmark based on intersections of a camera's field-of-view (FOV) and the landmark's geometric information available from Geographic Information Systems (GIS) and services. We first compare the retrieval results of the two methods, and discuss the strengths and weaknesses of each approach in terms of precision, recall and execution time. Next we analyze the factors that affect the effectiveness for the content-based and the geo-based methods, respectively. Finally we propose a hybrid retrieval method based on the integration of the visual (content) and geographic (context) information, which is shown to achieve significant improvements in our experiments. We believe that the results and observations in this work will enlighten the design of future geo-referenced video retrieval systems, improve our understanding of selecting the most appropriate visual features for indexing and searching, and help in selecting between the most suitable methods for retrieval based on different conditions.

References

[1]
Giuseppe Amato, Fabrizio Falchi, and Fausto Rabitti. 2012. Landmark recognition in VISITO Tuscany. In Multimedia for Cultural Heritage, 1--13.
[2]
Sakire Arslan Ay, Roger Zimmermann, and SeonHo Kim. 2010. Relevance ranking in georeferenced video search. Multimedia Syst. 16, 2, 105--125.
[3]
Sakire Arslan Ay, Roger Zimmermann, and Seon Ho Kim. 2008. Viewable scene modeling for geospatial video search. In Proceedings of the ACM International Conference on Multimedia. 309--318.
[4]
Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Syst. 16, 6, 345--379.
[5]
Yannis Avrithis, Yannis Kalantidis, Giorgos Tolias, and Evaggelos Spyrou. 2010. Retrieving landmark and non-landmark images from community photo collections. In Proceedings of the ACM International Conference on Multimedia. 153--162.
[6]
D. M. Chen, G. Baatz, K. Koser, S. S. Tsai, R. Vedantham, T. Pylvanainen, K. Roimela, Xin Chen, J. Bach, M. Pollefeys, B. Girod, and R. Grzeszczuk. 2011a. City-scale landmark identification on mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 737--744.
[7]
Tao Chen, Kim-Hui Yap, and L.-P. Chau. 2011b. Integrated content and context analysis for mobile landmark recognition. IEEE Trans. Circuits Syst. Video Technol. 1476--1486.
[8]
O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. 2007. Total recall: automatic query expansion with a generative feature model for object retrieval. In Proceedings of the International Conference on Computer Vision. 1--8.
[9]
G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. 2004. Visual categorization with bags of keypoints. In Proceedings of the ECCV International Workshop on Statistical Learning in Computer Vision. 1--22.
[10]
Shaolei Feng and R. Manmatha. 2008. A discrete direct retrieval model for image and video retrieval. In Proceedings of the International Conference on Content-Based Image and Video Retrieval. 427--436.
[11]
Efstratios Gavves, Cees G. M. Snoek, and Arnold W. M. Smeulders. 2012. Visual synonyms for landmark image retrieval. Comput. Vision Image Understand 116, 12, 238--249.
[12]
Qiang Hao, Rui Cai, Zhiwei Li, Lei Zhang, Yanwei Pang, and FengWu. 2012. 3D visual phrases for landmark recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3594--3601.
[13]
E. Hecht. 2001. Optics (4th ed.). Addison-Wesley Publishing Company.
[14]
N.V. Hoàng, V. Gouet-Brunet, M. Rukoz, and M. Manouvrier. 2010. Embedding spatial information into image content description for scene retrieval. Pattern Recog. 43, 9, 3013--3024.
[15]
Zi Huang, Bo Hu, Hong Cheng, Heng Tao Shen, Hongyan Liu, and Xiaofang Zhou. 2010. Mining near-duplicate graph for cluster-based reranking of web video search results. ACM Trans. Inf. Syst. 22:1--22:27.
[16]
Ramesh Jain and Pinaki Sinha. 2010. Content without context is meaningless. In Proceedings of the ACM International Conference on Multimedia. 1259--1268.
[17]
Pascal Kelm, Sebastian Schmiedeke, and Thomas Sikora. 2011. A hierarchical, multi-modal approach for placing videos on the map using millions of Flickr photographs. In Proceedings of the ACM Workshop on Social and Behavioural Networked Media Access. 15--20.
[18]
Lyndon S. Kennedy and Mor Naaman. 2008. Generating diverse and representative image search results for landmarks. In Proceedings of the International Conference on World Wide Web. 297--306.
[19]
Youngwoo Kim, Jinha Kim, and Hwanjo Yu. 2012. GeoSearch: Georeferenced video retrieval system. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining. 1540--1543.
[20]
Yin-Hsi Kuo, Wen-Huang Cheng, Hsuan-Tien Lin, and Winston H. Hsu. 2012. Unsupervised semantic feature discovery for image object retrieval and tag refinement. IEEE Trans. Multimedia, 1079--1090.
[21]
S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2169--2178.
[22]
Zhen Li and Kim-Hui Yap. 2012. Content and context boosting for mobile landmark recognition. Signal Process. Lett. 459--462.
[23]
Xiaotao Liu, Mark Corner, and Prashant Shenoy. 2005. SEVA: sensor-enhanced video annotation. In Proceedings of the ACM International Conference on Multimedia. 618--627.
[24]
David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 91--110.
[25]
Jiebo Luo, Dhiraj Joshi, Jie Yu, and Andrew Gallagher. 2011. Geotagging in multimedia and computer vision—a survey. Multimedia Tools Appl. 51, 1, 187--211.
[26]
Otávio A. B. Penatti, Fernanda B. Silva, Eduardo Valle, Valerie Gouet-Brunet, and Ricardo da S. Torres. 2014. Visual word spatial arrangement for image retrieval and classification. Pattern Recognit. 705--720.
[27]
Otávio A. B. Penatti, Lin Tzy Li, Jurandy Almeida, and Ricardo da S. Torres. 2012. A visual approach for video geocoding using bag-of-scenes. In Proceedings of the ACM International Conference on Multimedia Retrieval. 1--8.
[28]
Adam Rae, Vannesa Murdock, Pavel Serdyukov, and Pascal Kelm. 2011. Working Notes for the Placing Task at MediaEval 2011.
[29]
Zhijie Shen, Sakire Arslan Ay, Seon Ho Kim, and Roger Zimmermann. 2011. Automatic tag generation and ranking for sensor-rich outdoor videos. In Proceedings of the ACM International Conference on Multimedia. 93--102.
[30]
Rainer Simon and Peter Fröhlich. 2007. A mobile application framework for the geospatial web. In Proceedings of the International Conference on World Wide Web. 381--390.
[31]
Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2005. Early Versus Late Fusion in Semantic Video Analysis. In Proceedings of the ACM International Conference on Multimedia. 399--402.
[32]
Fabrice Souvannavong, Bernard Merialdo, and Benoit Huet. 2005. Region-based video content indexing and retrieval. In Proceedings of the International Workshop on Content-Based Multimedia Indexing. 21--23.
[33]
Xinmei Tian, Linjun Yang, Jingdong Wang, Yichen Yang, Xiuqing Wu, and Xian-Sheng Hua. 2008. Bayesian video search reranking. In Proceedings of the ACM International Conference on Multimedia. 131--140.
[34]
Ville Viitaniemi and Jorma Laaksonen. 2008. Experiments on selection of codebooks for local image feature histograms. In Visual Information Systems, Web-Based Visual Information Search and Management, 126--137.
[35]
Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, T. Huang, and Yihong Gong. 2010. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3360--3367.
[36]
Jianchao Yang, Kai Yu, Yihong Gong, and T. Huang. 2009. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1794--1801.
[37]
Kim-Hui Yap, Tao Chen, Zhen Li, and Kui Wu. 2010. A comparative study of mobile-based landmark recognition techniques. IEEE Intell. Syst. 25, 1, 48--57.
[38]
Bo Zhang, Qinlin Li, Hongyang Chao, Bill Chen, Eyal Ofek, and Ying-Qing Xu. 2010. Annotating and navigating tourist videos. In Proceedings of the International Conference on Advances in Geographic Information Systems. 260--269.
[39]
Yan-Tao Zheng, Ming Zhao, Yang Song, H. Adam, U. Buddemeier, A. Bissacco, F. Brucher, T.-S. Chua, and H. Neven. 2009. Tour the world: Building a web-scale landmark recognition engine. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1085--1092.

Cited By

View all
  • (2022)GPS2Vec: Pre-Trained Semantic Embeddings for Worldwide GPS CoordinatesIEEE Transactions on Multimedia10.1109/TMM.2021.306095124(890-903)Online publication date: 2022
  • (2021)Learning Multi-context Aware Location Representations from Large-scale Geotagged ImagesProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475268(899-907)Online publication date: 17-Oct-2021
  • (2021)A video indexing and retrieval computational prototype based on transcribed speechMultimedia Tools and Applications10.1007/s11042-021-11401-1Online publication date: 30-Aug-2021
  • Show More Cited By

Index Terms

  1. Content vs. Context: Visual and Geographic Information Use in Video Landmark Retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 11, Issue 3
    January 2015
    173 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/2733235
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 February 2015
    Accepted: 01 September 2014
    Revised: 01 November 2013
    Received: 01 June 2013
    Published in TOMM Volume 11, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Content-based analysis
    2. geo-referenced videos
    3. landmark retrieval

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • IDM Programme Office through the Centre of Social Media Innovations for Communities (COSMIC)
    • Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative
    • Hongik University new faculty research support fund

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)GPS2Vec: Pre-Trained Semantic Embeddings for Worldwide GPS CoordinatesIEEE Transactions on Multimedia10.1109/TMM.2021.306095124(890-903)Online publication date: 2022
    • (2021)Learning Multi-context Aware Location Representations from Large-scale Geotagged ImagesProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475268(899-907)Online publication date: 17-Oct-2021
    • (2021)A video indexing and retrieval computational prototype based on transcribed speechMultimedia Tools and Applications10.1007/s11042-021-11401-1Online publication date: 30-Aug-2021
    • (2020)A systematic review on content-based video retrievalEngineering Applications of Artificial Intelligence10.1016/j.engappai.2020.10355790:COnline publication date: 1-Apr-2020
    • (2019)Grab-PosisiProceedings of the 3rd ACM SIGSPATIAL International Workshop on Prediction of Human Mobility10.1145/3356995.3364536(1-10)Online publication date: 5-Nov-2019
    • (2019)GPS2VecProceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems10.1145/3347146.3359067(416-419)Online publication date: 5-Nov-2019
    • (2019)Towards Accurate Georeferenced Video Search With Camera Field of View ModelingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2018.284820029:6(1844-1855)Online publication date: Jun-2019
    • (2018)Feature-based Map Matching for Low-Sampling-Rate GPS TrajectoriesACM Transactions on Spatial Algorithms and Systems10.1145/32230494:2(1-24)Online publication date: 10-Aug-2018
    • (2018)User-Click-Data-Based Fine-Grained Image Recognition via Weakly Supervised Metric LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/320966614:3(1-23)Online publication date: 24-Jul-2018
    • (2018)Encoded Semantic Tree for Automatic User Profiling Applied to Personalized Video SummarizationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2016.260283228:1(181-192)Online publication date: 1-Jan-2018
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media