ABSTRACT
This work is developed in the context of the placing task of the MediaEval 2011 initiative. The objective is to geocode (or geotag) a set of videos, i.e., automatically assign geographical coordinates to them. This paper presents an architecture for multimodal geocoding that exploits both visual and textual descriptions associated with videos. This work also describes our efforts regarding the implementation of this architecture to demonstrate its applicability. Conducted experiments show how our multimodal approach enhances the results compared to relying on a single modality.
- J. Almeida, N. J. Leite, and R. da S. Torres. Comparison of video sequences with histograms of motion patterns. In ICIP, pages 3673--3676, 2011.Google ScholarCross Ref
- R. Candeias and B. Martins. Associating relevant photos to georeferenced textual documents through rank aggregation. In Int. Semantic Web Conf. - Terra Cognita Workshop, 2011. Google ScholarDigital Library
- J. Choi, H. Lei, and G. Friedland. The 2011 ICSI video location estimation system. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.Google Scholar
- W. B. Croft. Combining approaches to information retrieval. In Adv. in Inf. Retrieval, volume 7, pages 1--36. Springer US, 2002.Google ScholarCross Ref
- F. A. Faria, A. Veloso, H. M. de Almeida, E. Valle, R. da S. Torres, M. A. Gonçalves, and W. M. Jr. Learning to rank for content-based image retrieval. In ACM MIR, pages 285--294, 2010. Google ScholarDigital Library
- M. Friendly. Corrgrams: Exploratory displays for correlation matrices. The American Statistician, 56(4):316--324, 2002.Google ScholarCross Ref
- J. Hays and A. A. Efros. im2gps: estimating geographic information from a single image. In CVPR, 2008.Google ScholarCross Ref
- C. B. Jones and R. S. Purves. Geographical information retrieval. Int. J. Geo. Info. Science, 22(3):219--228, 2008. Google ScholarDigital Library
- Y. Kalantidis, G. Tolias, Y. Avrithis, M. Phinikettos, E. Spyrou, P. Mylonas, and S. Kollias. Viral: Visual image retrieval and localization. Mult. Tools and App., 51:555--592, 2011. Google ScholarDigital Library
- P. Kelm, S. Schmiedeke, and T. Sikora. A hierarchical, multi-modal approach for placing videos on the map using millions of flickr photographs. In Workshop on Social and behavioural networked media access, pages 15--20, 2011. Google ScholarDigital Library
- M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, and G. J. F. Jones. Automatic tagging and geotagging in video collections and communities. In ICMR, pages 51:1--51:8, 2011. Google ScholarDigital Library
- L. T. Li, J. Almeida, and R. da S. Torres. RECOD working notes for placing task MediaEval 2011. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.Google Scholar
- J. Luo, D. Joshi, J. Yu, and A. Gallagher. Geotagging in multimedia and computer vision--a survey. Mult. Tools and App., 51:187--211, 2011. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. Google ScholarDigital Library
- D. C. G. Pedronette and R. da S. Torres. Exploiting clustering approaches for image re-ranking. J. Vis. Lang. and Comp., 22(6):453--466, 2011. Google ScholarDigital Library
- D. C. G. Pedronette, R. da S. Torres, and R. T. Calumby. Using contextual spaces for image re-ranking and rank aggregation. Mult. Tools and App., pages 1--28, 2012.Google Scholar
- O. A. B. Penatti, L. T. Li, J. Almeida, and R. da S. Torres. A Visual Approach for Video Geocoding using Bag-of-Scenes. In ICMR, 2012. Google ScholarDigital Library
- A. Rae, V. Murdock, P. Serdyukov, and P. Kelm. Working notes for the placing task at MediaEval 2011. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.Google Scholar
- O. Van Laere, S. Schockaert, and B. Dhoedt. Finding locations of flickr resources using language models and similarity search. In International Conference on Multimedia Retrieval, pages 48:1--48:8, 2011. Google ScholarDigital Library
Index Terms
- Multimedia multimodal geocoding
Recommendations
A rank aggregation framework for video multimodal geocoding
This paper proposes a rank aggregation framework for video multimodal geocoding. Textual and visual descriptions associated with videos are used to define ranked lists. These ranked lists are later combined, and the resulting ranked list is used to ...
Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia RetrievalCommon approaches to problems involving multiple modalities (classification, retrieval, hyperlinking, etc.) are early fusion of the initial modalities and crossmodal translation from one modality to the other. Recently, deep neural networks, especially ...
Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking
iV&L-MM '16: Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia FusionVideo hyperlinking represents a classical example of multimodal problems. Common approaches to such problems are early fusion of the initial modalities and crossmodal translation from one modality to the other. Recently, deep neural networks, especially ...
Comments