research-article

Multimedia multimodal geocoding

Authors:
Lin Tzy Li

University of Campinas (UNICAMP), Campinas, SP -- Brazil and Telecommunications Res. & Dev. Center, CPqD Foundation, Campinas, SP -- Brazil

University of Campinas (UNICAMP), Campinas, SP -- Brazil and Telecommunications Res. & Dev. Center, CPqD Foundation, Campinas, SP -- Brazil
View Profile

,
Daniel Carlos Guimarães Pedronette

University of Campinas (UNICAMP), Campinas, SP -- Brazil

University of Campinas (UNICAMP), Campinas, SP -- Brazil
View Profile

,
Jurandy Almeida

University of Campinas (UNICAMP), Campinas, SP -- Brazil

University of Campinas (UNICAMP), Campinas, SP -- Brazil
View Profile

,
Otávio A. B. Penatti

University of Campinas (UNICAMP), Campinas, SP -- Brazil

University of Campinas (UNICAMP), Campinas, SP -- Brazil
View Profile

,
Rodrigo Tripodi Calumby

University of Campinas (UNICAMP), Campinas, SP -- Brazil and University of Feira de Santana (UEFS), Feira de Santana, BA -- Brazil

University of Campinas (UNICAMP), Campinas, SP -- Brazil and University of Feira de Santana (UEFS), Feira de Santana, BA -- Brazil
View Profile

,
Ricardo da S. Torres

University of Campinas (UNICAMP), Campinas, SP -- Brazil

University of Campinas (UNICAMP), Campinas, SP -- Brazil
View Profile

SIGSPATIAL '12: Proceedings of the 20th International Conference on Advances in Geographic Information SystemsNovember 2012Pages 474–477https://doi.org/10.1145/2424321.2424393

Published:06 November 2012Publication History

SIGSPATIAL '12: Proceedings of the 20th International Conference on Advances in Geographic Information Systems

Pages 474–477

ABSTRACT

This work is developed in the context of the placing task of the MediaEval 2011 initiative. The objective is to geocode (or geotag) a set of videos, i.e., automatically assign geographical coordinates to them. This paper presents an architecture for multimodal geocoding that exploits both visual and textual descriptions associated with videos. This work also describes our efforts regarding the implementation of this architecture to demonstrate its applicability. Conducted experiments show how our multimodal approach enhances the results compared to relying on a single modality.

References

J. Almeida, N. J. Leite, and R. da S. Torres. Comparison of video sequences with histograms of motion patterns. In ICIP, pages 3673--3676, 2011.Google ScholarCross Ref
R. Candeias and B. Martins. Associating relevant photos to georeferenced textual documents through rank aggregation. In Int. Semantic Web Conf. - Terra Cognita Workshop, 2011. Google ScholarDigital Library
J. Choi, H. Lei, and G. Friedland. The 2011 ICSI video location estimation system. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.Google Scholar
W. B. Croft. Combining approaches to information retrieval. In Adv. in Inf. Retrieval, volume 7, pages 1--36. Springer US, 2002.Google ScholarCross Ref
F. A. Faria, A. Veloso, H. M. de Almeida, E. Valle, R. da S. Torres, M. A. Gonçalves, and W. M. Jr. Learning to rank for content-based image retrieval. In ACM MIR, pages 285--294, 2010. Google ScholarDigital Library
M. Friendly. Corrgrams: Exploratory displays for correlation matrices. The American Statistician, 56(4):316--324, 2002.Google ScholarCross Ref
J. Hays and A. A. Efros. im2gps: estimating geographic information from a single image. In CVPR, 2008.Google ScholarCross Ref
C. B. Jones and R. S. Purves. Geographical information retrieval. Int. J. Geo. Info. Science, 22(3):219--228, 2008. Google ScholarDigital Library
Y. Kalantidis, G. Tolias, Y. Avrithis, M. Phinikettos, E. Spyrou, P. Mylonas, and S. Kollias. Viral: Visual image retrieval and localization. Mult. Tools and App., 51:555--592, 2011. Google ScholarDigital Library
P. Kelm, S. Schmiedeke, and T. Sikora. A hierarchical, multi-modal approach for placing videos on the map using millions of flickr photographs. In Workshop on Social and behavioural networked media access, pages 15--20, 2011. Google ScholarDigital Library
M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, and G. J. F. Jones. Automatic tagging and geotagging in video collections and communities. In ICMR, pages 51:1--51:8, 2011. Google ScholarDigital Library
L. T. Li, J. Almeida, and R. da S. Torres. RECOD working notes for placing task MediaEval 2011. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.Google Scholar
J. Luo, D. Joshi, J. Yu, and A. Gallagher. Geotagging in multimedia and computer vision--a survey. Mult. Tools and App., 51:187--211, 2011. Google ScholarDigital Library
C. D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. Google ScholarDigital Library
D. C. G. Pedronette and R. da S. Torres. Exploiting clustering approaches for image re-ranking. J. Vis. Lang. and Comp., 22(6):453--466, 2011. Google ScholarDigital Library
D. C. G. Pedronette, R. da S. Torres, and R. T. Calumby. Using contextual spaces for image re-ranking and rank aggregation. Mult. Tools and App., pages 1--28, 2012.Google Scholar
O. A. B. Penatti, L. T. Li, J. Almeida, and R. da S. Torres. A Visual Approach for Video Geocoding using Bag-of-Scenes. In ICMR, 2012. Google ScholarDigital Library
A. Rae, V. Murdock, P. Serdyukov, and P. Kelm. Working notes for the placing task at MediaEval 2011. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.Google Scholar
O. Van Laere, S. Schockaert, and B. Dhoedt. Finding locations of flickr resources using language models and similarity search. In International Conference on Multimedia Retrieval, pages 48:1--48:8, 2011. Google ScholarDigital Library

Index Terms

Multimedia multimodal geocoding

Recommendations

A rank aggregation framework for video multimodal geocoding

This paper proposes a rank aggregation framework for video multimodal geocoding. Textual and visual descriptions associated with videos are used to define ranked lists. These ranked lists are later combined, and the resulting ranked list is used to ...
Read More
Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

Common approaches to problems involving multiple modalities (classification, retrieval, hyperlinking, etc.) are early fusion of the initial modalities and crossmodal translation from one modality to the other. Recently, deep neural networks, especially ...
Read More
Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking
iV&L-MM '16: Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion

Video hyperlinking represents a classical example of multimodal problems. Common approaches to such problems are early fusion of the initial modalities and crossmodal translation from one modality to the other. Recently, deep neural networks, especially ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGSPATIAL '12: Proceedings of the 20th International Conference on Advances in Geographic Information Systems
November 2012
642 pages
ISBN:9781450316910
DOI:10.1145/2424321
General Chairs:
Isabel Cruz
University of Illinois at Chicago
,
Craig Knoblock
University of Southern California
,
Program Chairs:
Peer Kröger
Ludwig-Maximilians-Universität München, Germany
,
Egemen Tanin
University of Melbourne, Australia
,
Peter Widmayer
ETH Zürich, Switzerland
Copyright © 2012 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
multimodal geotagging
rank aggregation
video retrieval
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate220of1,116submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 151
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multimedia multimodal geocoding

SIGSPATIAL '12: Proceedings of the 20th International Conference on Advances in Geographic Information Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

A rank aggregation framework for video multimodal geocoding

Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications

Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking