research-article

Content vs. Context: Visual and Geographic Information Use in Video Landmark Retrieval

Authors:

Roger ZimmermannAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 11, Issue 3

Article No.: 39, Pages 1 - 21

https://doi.org/10.1145/2700287

Published: 05 February 2015 Publication History

Abstract

Due to the ubiquity of sensor-equipped smartphones, it has become increasingly feasible for users to capture videos together with associated geographic metadata, for example the location and the orientation of the camera. Such contextual information creates new opportunities for the organization and retrieval of geo-referenced videos. In this study we explore the task of landmark retrieval through the analysis of two types of state-of-the-art techniques, namely media-content-based and geocontext-based retrievals. For the content-based method, we choose the Spatial Pyramid Matching (SPM) approach combined with two advanced coding methods: Sparse Coding (SC) and Locality-Constrained Linear Coding (LLC). For the geo-based method, we present the Geo Landmark Visibility Determination (GeoLVD) approach which computes the visibility of a landmark based on intersections of a camera's field-of-view (FOV) and the landmark's geometric information available from Geographic Information Systems (GIS) and services. We first compare the retrieval results of the two methods, and discuss the strengths and weaknesses of each approach in terms of precision, recall and execution time. Next we analyze the factors that affect the effectiveness for the content-based and the geo-based methods, respectively. Finally we propose a hybrid retrieval method based on the integration of the visual (content) and geographic (context) information, which is shown to achieve significant improvements in our experiments. We believe that the results and observations in this work will enlighten the design of future geo-referenced video retrieval systems, improve our understanding of selecting the most appropriate visual features for indexing and searching, and help in selecting between the most suitable methods for retrieval based on different conditions.

References

[1]

Giuseppe Amato, Fabrizio Falchi, and Fausto Rabitti. 2012. Landmark recognition in VISITO Tuscany. In Multimedia for Cultural Heritage, 1--13.

[2]

Sakire Arslan Ay, Roger Zimmermann, and SeonHo Kim. 2010. Relevance ranking in georeferenced video search. Multimedia Syst. 16, 2, 105--125.

Digital Library

[3]

Sakire Arslan Ay, Roger Zimmermann, and Seon Ho Kim. 2008. Viewable scene modeling for geospatial video search. In Proceedings of the ACM International Conference on Multimedia. 309--318.

Digital Library

[4]

Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Syst. 16, 6, 345--379.

Digital Library

[5]

Yannis Avrithis, Yannis Kalantidis, Giorgos Tolias, and Evaggelos Spyrou. 2010. Retrieving landmark and non-landmark images from community photo collections. In Proceedings of the ACM International Conference on Multimedia. 153--162.

Digital Library

[6]

D. M. Chen, G. Baatz, K. Koser, S. S. Tsai, R. Vedantham, T. Pylvanainen, K. Roimela, Xin Chen, J. Bach, M. Pollefeys, B. Girod, and R. Grzeszczuk. 2011a. City-scale landmark identification on mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 737--744.

Digital Library

[7]

Tao Chen, Kim-Hui Yap, and L.-P. Chau. 2011b. Integrated content and context analysis for mobile landmark recognition. IEEE Trans. Circuits Syst. Video Technol. 1476--1486.

Digital Library

[8]

O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. 2007. Total recall: automatic query expansion with a generative feature model for object retrieval. In Proceedings of the International Conference on Computer Vision. 1--8.

[9]

G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. 2004. Visual categorization with bags of keypoints. In Proceedings of the ECCV International Workshop on Statistical Learning in Computer Vision. 1--22.

[10]

Shaolei Feng and R. Manmatha. 2008. A discrete direct retrieval model for image and video retrieval. In Proceedings of the International Conference on Content-Based Image and Video Retrieval. 427--436.

Digital Library

[11]

Efstratios Gavves, Cees G. M. Snoek, and Arnold W. M. Smeulders. 2012. Visual synonyms for landmark image retrieval. Comput. Vision Image Understand 116, 12, 238--249.

Digital Library

[12]

Qiang Hao, Rui Cai, Zhiwei Li, Lei Zhang, Yanwei Pang, and FengWu. 2012. 3D visual phrases for landmark recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3594--3601.

Digital Library

[13]

E. Hecht. 2001. Optics (4th ed.). Addison-Wesley Publishing Company.

[14]

N.V. Hoàng, V. Gouet-Brunet, M. Rukoz, and M. Manouvrier. 2010. Embedding spatial information into image content description for scene retrieval. Pattern Recog. 43, 9, 3013--3024.

Digital Library

[15]

Zi Huang, Bo Hu, Hong Cheng, Heng Tao Shen, Hongyan Liu, and Xiaofang Zhou. 2010. Mining near-duplicate graph for cluster-based reranking of web video search results. ACM Trans. Inf. Syst. 22:1--22:27.

Digital Library

[16]

Ramesh Jain and Pinaki Sinha. 2010. Content without context is meaningless. In Proceedings of the ACM International Conference on Multimedia. 1259--1268.

Digital Library

[17]

Pascal Kelm, Sebastian Schmiedeke, and Thomas Sikora. 2011. A hierarchical, multi-modal approach for placing videos on the map using millions of Flickr photographs. In Proceedings of the ACM Workshop on Social and Behavioural Networked Media Access. 15--20.

Digital Library

[18]

Lyndon S. Kennedy and Mor Naaman. 2008. Generating diverse and representative image search results for landmarks. In Proceedings of the International Conference on World Wide Web. 297--306.

Digital Library

[19]

Youngwoo Kim, Jinha Kim, and Hwanjo Yu. 2012. GeoSearch: Georeferenced video retrieval system. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining. 1540--1543.

Digital Library

[20]

Yin-Hsi Kuo, Wen-Huang Cheng, Hsuan-Tien Lin, and Winston H. Hsu. 2012. Unsupervised semantic feature discovery for image object retrieval and tag refinement. IEEE Trans. Multimedia, 1079--1090.

Digital Library

[21]

S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2169--2178.

Digital Library

[22]

Zhen Li and Kim-Hui Yap. 2012. Content and context boosting for mobile landmark recognition. Signal Process. Lett. 459--462.

[23]

Xiaotao Liu, Mark Corner, and Prashant Shenoy. 2005. SEVA: sensor-enhanced video annotation. In Proceedings of the ACM International Conference on Multimedia. 618--627.

Digital Library

[24]

David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 91--110.

Digital Library

[25]

Jiebo Luo, Dhiraj Joshi, Jie Yu, and Andrew Gallagher. 2011. Geotagging in multimedia and computer vision—a survey. Multimedia Tools Appl. 51, 1, 187--211.

Digital Library

[26]

Otávio A. B. Penatti, Fernanda B. Silva, Eduardo Valle, Valerie Gouet-Brunet, and Ricardo da S. Torres. 2014. Visual word spatial arrangement for image retrieval and classification. Pattern Recognit. 705--720.

Digital Library

[27]

Otávio A. B. Penatti, Lin Tzy Li, Jurandy Almeida, and Ricardo da S. Torres. 2012. A visual approach for video geocoding using bag-of-scenes. In Proceedings of the ACM International Conference on Multimedia Retrieval. 1--8.

Digital Library

[28]

Adam Rae, Vannesa Murdock, Pavel Serdyukov, and Pascal Kelm. 2011. Working Notes for the Placing Task at MediaEval 2011.

[29]

Zhijie Shen, Sakire Arslan Ay, Seon Ho Kim, and Roger Zimmermann. 2011. Automatic tag generation and ranking for sensor-rich outdoor videos. In Proceedings of the ACM International Conference on Multimedia. 93--102.

Digital Library

[30]

Rainer Simon and Peter Fröhlich. 2007. A mobile application framework for the geospatial web. In Proceedings of the International Conference on World Wide Web. 381--390.

Digital Library

[31]

Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2005. Early Versus Late Fusion in Semantic Video Analysis. In Proceedings of the ACM International Conference on Multimedia. 399--402.

Digital Library

[32]

Fabrice Souvannavong, Bernard Merialdo, and Benoit Huet. 2005. Region-based video content indexing and retrieval. In Proceedings of the International Workshop on Content-Based Multimedia Indexing. 21--23.

[33]

Xinmei Tian, Linjun Yang, Jingdong Wang, Yichen Yang, Xiuqing Wu, and Xian-Sheng Hua. 2008. Bayesian video search reranking. In Proceedings of the ACM International Conference on Multimedia. 131--140.

Digital Library

[34]

Ville Viitaniemi and Jorma Laaksonen. 2008. Experiments on selection of codebooks for local image feature histograms. In Visual Information Systems, Web-Based Visual Information Search and Management, 126--137.

Digital Library

[35]

Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, T. Huang, and Yihong Gong. 2010. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3360--3367.

[36]

Jianchao Yang, Kai Yu, Yihong Gong, and T. Huang. 2009. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1794--1801.

[37]

Kim-Hui Yap, Tao Chen, Zhen Li, and Kui Wu. 2010. A comparative study of mobile-based landmark recognition techniques. IEEE Intell. Syst. 25, 1, 48--57.

Digital Library

[38]

Bo Zhang, Qinlin Li, Hongyang Chao, Bill Chen, Eyal Ofek, and Ying-Qing Xu. 2010. Annotating and navigating tourist videos. In Proceedings of the International Conference on Advances in Geographic Information Systems. 260--269.

Digital Library

[39]

Yan-Tao Zheng, Ming Zhao, Yang Song, H. Adam, U. Buddemeier, A. Bissacco, F. Brucher, T.-S. Chua, and H. Neven. 2009. Tour the world: Building a web-scale landmark recognition engine. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1085--1092.

Cited By

Yin YZhang YLiu ZWang SShah RZimmermann R(2022)GPS2Vec: Pre-Trained Semantic Embeddings for Worldwide GPS CoordinatesIEEE Transactions on Multimedia10.1109/TMM.2021.306095124(890-903)Online publication date: 2022
https://doi.org/10.1109/TMM.2021.3060951
Yin YZhang YLiu ZLiang YWang SShah RZimmermann RShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Learning Multi-context Aware Location Representations from Large-scale Geotagged ImagesProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475268(899-907)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475268
Spolaôr NLee HTakaki WEnsina LParmezan AOliva JCoy CWu F(2021)A video indexing and retrieval computational prototype based on transcribed speechMultimedia Tools and Applications10.1007/s11042-021-11401-1Online publication date: 30-Aug-2021
https://doi.org/10.1007/s11042-021-11401-1
Show More Cited By

Index Terms

Content vs. Context: Visual and Geographic Information Use in Video Landmark Retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Landmark recognition and retrieval: from 2D to 3D
J-HGBU '11: Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding

Existing landmark retrieval methods cannot provide a comprehensive solution, by which user can view different angles of landmark. In this paper, we propose a novel approach to reconstruct and retrieve 3D landmark models by direct 2D to 3D matching. In ...
An integrated semantic-based approach in concept based video retrieval

Multimedia content has been growing quickly and video retrieval is regarded as one of the most famous issues in multimedia research. In order to retrieve a desirable video, users express their needs in terms of queries. Queries can be on object, motion, ...
Content or context?: searching for musical meaning in task-based interactive information retrieval
IIiX '08: Proceedings of the second international symposium on Information interaction in context

Creative professionals search for digital music to accompany moving images using interactive information retrieval systems run by music publishers and record companies. This research-in-progress investigates creative professionals and intermediaries ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 11, Issue 3

January 2015

173 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/2733235

Editor:
Ralf Steinmetz
Technische Universität Darmstadt, Germany

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 February 2015

Accepted: 01 September 2014

Revised: 01 November 2013

Received: 01 June 2013

Published in TOMM Volume 11, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

IDM Programme Office through the Centre of Social Media Innovations for Communities (COSMIC)
Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative
Hongik University new faculty research support fund

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
325
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yin YZhang YLiu ZWang SShah RZimmermann R(2022)GPS2Vec: Pre-Trained Semantic Embeddings for Worldwide GPS CoordinatesIEEE Transactions on Multimedia10.1109/TMM.2021.306095124(890-903)Online publication date: 2022
https://doi.org/10.1109/TMM.2021.3060951
Yin YZhang YLiu ZLiang YWang SShah RZimmermann RShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Learning Multi-context Aware Location Representations from Large-scale Geotagged ImagesProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475268(899-907)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475268
Spolaôr NLee HTakaki WEnsina LParmezan AOliva JCoy CWu F(2021)A video indexing and retrieval computational prototype based on transcribed speechMultimedia Tools and Applications10.1007/s11042-021-11401-1Online publication date: 30-Aug-2021
https://doi.org/10.1007/s11042-021-11401-1
Spolaôr NLee HTakaki WEnsina LCoy CWu F(2020)A systematic review on content-based video retrievalEngineering Applications of Artificial Intelligence10.1016/j.engappai.2020.10355790:COnline publication date: 1-Apr-2020
https://dl.acm.org/doi/10.1016/j.engappai.2020.103557
Huang XYin YLim SWang GHu BVaradarajan JZheng SBulusu AZimmermann R(2019)Grab-PosisiProceedings of the 3rd ACM SIGSPATIAL International Workshop on Prediction of Human Mobility10.1145/3356995.3364536(1-10)Online publication date: 5-Nov-2019
https://dl.acm.org/doi/10.1145/3356995.3364536
Yin YLiu ZZhang YWang SShah RZimmermann R(2019)GPS2VecProceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems10.1145/3347146.3359067(416-419)Online publication date: 5-Nov-2019
https://dl.acm.org/doi/10.1145/3347146.3359067
Shao JHu GSong JLiu XShen H(2019)Towards Accurate Georeferenced Video Search With Camera Field of View ModelingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2018.284820029:6(1844-1855)Online publication date: Jun-2019
https://doi.org/10.1109/TCSVT.2018.2848200
Yin YShah RWang GZimmermann R(2018)Feature-based Map Matching for Low-Sampling-Rate GPS TrajectoriesACM Transactions on Spatial Algorithms and Systems10.1145/32230494:2(1-24)Online publication date: 10-Aug-2018
https://dl.acm.org/doi/10.1145/3223049
Tan MYu JYu ZGao FRui YTao D(2018)User-Click-Data-Based Fine-Grained Image Recognition via Weakly Supervised Metric LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/320966614:3(1-23)Online publication date: 24-Jul-2018
https://dl.acm.org/doi/10.1145/3209666
Yin YThapliya RZimmermann R(2018)Encoded Semantic Tree for Automatic User Profiling Applied to Personalized Video SummarizationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2016.260283228:1(181-192)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1109/TCSVT.2016.2602832
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents