Benchmarks have the power to bring research communities together to focus on specific research challenges. They drive research forward by making it easier to systematically compare and contrast new solutions, and evaluate their performance with respect to the existing state of the art. In this chapter, we present a retrospective on the Placing Task, a yearly challenge offered by the MediaEval Multimedia Benchmark. The Placing Task, launched in 2010, is a benchmarking task that requires participants to develop algorithms that automatically predict the geolocation of social multimedia (videos and images). This chapter covers the editions of the Placing Task offered in 2010–2013, and also presents an outlook onto 2014. We present the formulation of the task and the task dataset for each year, tracing the design decisions that were made by the organizers, and how each year built on the previous year. Finally, we provide a summary of future directions and challenges for multimodal geolocation, and concluding remarks on how benchmarking has catalyzed research progress in the research area of geolocation prediction for social multimedia.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
Available for download: http://www.st.ewi.tudelft.nl/~hauff/placingTask2013Data.html.
- 14.
The baseline runs used out-of-the-box location prediction software: https://github.com/chauff/ImageLocationEstimation, with geographic filtering enabled.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
J. Almeida, N. Leite, R. Torres, Comparison of video sequences with histograms of motion patterns, in 18th IEEE International Conference on Image Processing (ICIP), September 2011, pp. 3673–3676
A. Badii, M. Einig, T. Piatrik, Overview of the MediaEval 2013 Visual Privacy Task, in Larson et al. [31]
J. Cao, Photo set refinement and tag segmentation in georeferencing Flickr photos, in Larson et al. [31]
J. Choi, V. Ekambaram, G. Friedland, K. Ramchandran, The 2012 ICSI/Berkeley video location estimation system, in Larson et al. [35]
J. Choi, G. Friedland, Data-driven vs. semantic-technology-driven tag-based video location estimation, in Proceedings of the 2011 IEEE Fifth International Conference on Semantic Computing, ICSC ’11. IEEE Computer Society, Washington, DC, pp. 243–246 (2011)
J. Choi, G. Friedland, V. Ekambaram, K. Ramchandran, Multimodal location estimation of consumer media: dealing with sparse training data, in Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, ICME ’12. IEEE Computer Society, Washington, DC, pp. 43–48 (2012)
J. Choi, A. Janin, G. Friedland, The 2010 ICSI video location estimation system, in Larson et al. [33]
J. Choi, H. Lei, V. Ekambaram, P. Kelm, L. Gottlieb, T. Sikora, K. Ramchandran, G. Friedland, Human versus machine: establishing a human baseline for multimodal location estimation, in Proceedings of the 21st ACM International Conference on Multimedia, MM ’13, ACM, New York, pp. 867–876 (2013)
J. Choi, H. Lei, G. Friedland, The 2011 ICSI video location estimation system, in Larson et al. [32]
D.J. Crandall, L. Backstrom, D. Huttenlocher, J. Kleinberg, Mapping the world’s photos, in Proceedings of the 18th International Conference on World Wide Web, WWW ’09, ACM, 2009, pp. 761–770
J. Davies, J. Hare, S. Samangooei, J. Preston, N. Jain, D. Dupplaw, P. Lewis, Identifying the geographic location of an image with a multimodal probability density function, in Larson et al. [31]
D. Ferrès, H. Rodríguez, TALP at MediaEval 2010 Placing Task: geographical focus detection of Flickr textual annotations, in Larson et al. [33]
D. Ferres, H. Rodriguez, TALP at MediaEval 2011 Placing Task: georeferencing Flickr videos with geographical knowledge and information retrieval, in Larson et al. [32]
G. Friedland, J. Choi, Semantic computing and privacy: a case study using inferred geo-location. Int. J. Semant. Comput. 5(1), 79–93 (2011)
G. Friedland, J. Choi, A. Janin, VIDEO2GPS: a demo of multimodal location estimation on Flickr videos, in Proceedings of the 19th ACM International Conference on Multimedia, MM ’11, ACM, New York, pp. 833–834 (2011)
A. Gallagher, D. Joshi, J. Yu, J. Luo, Geo-location inference from image content and user tags, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2009, CVPR Workshops 2009, June 2009, pp. 55–62
C. Hauff, A study on the accuracy of Flickr’s geotag data, in Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’13, ACM, New York, pp. 1037–1040 (2013)
C. Hauff, G.-J. Houben, WISTUD at MediaEval 2011: placing task, in Larson et al. [32]
C. Hauff, G.-J. Houben, Geo-location estimation of Flickr images: social web based enrichment, in Proceedings of the 34th European Conference on Advances in Information Retrieval, ECIR’12. Springer, Berlin, pp. 85–96 (2012)
C. Hauff, G.-J. Houben, Placing images on the world map: a microblog-based enrichment approach, in Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, ACM, New York, pp. 691–700 (2012)
C. Hauff, B. Thomee, M. Trevisiol, Working notes for the placing task at MediaEval 2013, in Larson et al. [31]
J. Hays, A.A. Efros, Im2gps: estimating geographic information from a single image, in CVPR. IEEE Computer Society (2008)
J.M. Perea-Ortega, M.Á. García-Cumbreras, L. Alfonso Ureña-López, M. García-Vega, SINAI at Placing Task of MediaEval 2010, in Larson et al. [33]
P. Kelm, S. Schmiedeke, T. Sikora, VIDEO2GPS: geotagging using collaborative systems, textual and visual features: MediaEval 2010 Placing Task, in Larson et al. [33]
P. Kelm, S. Schmiedeke, T. Sikora, A hierarchical, multi-modal approach for placing videos on the map using millions of Flickr photographs, in ACM Multimedia 2011 (Workshop on Social and Behavioral Networked Media Access—SBNMA), ACM, November 2011
P. Kelm, S. Schmiedeke, T. Sikora, Multi-modal, multi-resource methods for placing Flickr videos on the map, in Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, ACM, New York, pp. 52:1–52:8 (2011)
P. Kelm, S. Schmiedeke, T. Sikora, How spatial segmentation improves the multimodal geo-tagging, in Larson et al. [35]
G. Kordopatis-Zilos, S. Papadopoulos, E. Spyromitros-Xioufis, A.L. Symeonidis, Y. Kompatsiaris, CERTH at MediaEval Placing Task 2013, in Larson et al. [31]
F. Krippner, G. Meier, J. Hartmann, R. Knauf, Placing media items using the XTrieval framework, in Larson et al. [32]
O.V. Laere, S. Schockaert, V. Tanasescu, B. Dhoedt, C. Jones, Georeferencing Wikipedia documents using data from social media. ACM Trans. Inf. Syst. 32(3), (2014)
M. Larson, X. Anguera, T. Reuter, G.J.F. Jones, B. Ionescu, M. Schedl, T. Piatrik, C. Hauff, M. Soleymani (eds.), in Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, October 2013, CEUR-WS.org, online http://ceur-ws.org/Vol-1043 (2013)
M. Larson, A. Rae, C.-H. Demarty, C. Kofler, F. Metze, R. Troncy, V. Mezaris, G.J.F. Jones (eds.), in Working Notes Proceedings of the MediaEval 2011 Workshop, Pisa, Italy, September 2011, CEUR-WS.org, online http://ceur-ws.org/Vol-807 (2011)
M. Larson, M. Soleymani, P. Serdyukov, V. Murdock, G.J.F. Jones (eds.), in Working Notes Proceedings of the MediaEval 2010 Workshop, Pisa, Italy, October 2010, online http://multimediaeval.org/mediaeval2010/2010worknotes (2010)
M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, G.J.F. Jones, Automatic tagging and geotagging in video collections and communities, in Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, ACM, New York, pp. 51:1–51:8 (2011)
M. Larson, S. Schmiedeke, P. Kelm, A. Rae, V. Mezaris, T. Piatrik, M. Soleymani, F. Metze, G.J.F. Jones (eds.), in Working Notes Proceedings of the MediaEval 2012 Workshop, Pisa, Italy, October 2012, CEUR-WS.org, online http://ceur-ws.org/Vol-927 (2012)
M. Larson, M. Soleymani, M. Eskevich, P. Serdyukov, R. Ordelman, G. Jones, The Community and the Crowd: Multimedia Benchmark Dataset Development. MultiMedia, IEEE. 19(3), 15–23 (2012)
H. Lei, J. Choi, G. Friedland, Multimodal city-verification on Flickr videos using acoustic and textual features, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2012, pp. 2273–2276
L. Li, D. Pedronette, J. Almeida, O. Penatti, R. Calumby, R. Torres, A rank aggregation framework for video multimodal geocoding, pp. 1–37 (2013)
L.T. Li, J. Almeida, R.D.S. Torres, RECOD working notes for placing task MediaEval 2011, in Larson et al. [32]
L.T. Li, J. Almeida, D.C.G Pedronette, O. Penatti, R.D.S. Torres, A multimodal approach for video geocoding, in Larson et al. [35]
L.T. Li, J. Almeida, O. Penatti, R. Calumby, D.C.G. Pedronette, M.A. Gonçalves, R.D.S. Torres, Multimodal image geocoding: the 2013 RECOD’s approach, in Larson et al. [31]
X. Li, C. Hauff, M.A. Larson, A. Hanjalic, Preliminary exploration of the use of geographical information for content-based geo-tagging of social video, in Larson et al. [35]
X. Li, M. Riegler, M. Larson, A. Hanjalic, Exploration of feature combination in geo-visual ranking for visual content-based location prediction, in Larson et al. [31]
N. O’Hare, V. Murdock, Modeling locations with social media. Inf. Retr. 16(1), 30–62 (2013)
J. Oomen, P. Over, W. Kraaij, A. Smeaton, Symbiosis between the TrecVid benchmark and video libraries at the Netherlands Institute for Sound and Vision. Int. J. Digit. Libr. 13(2), 91–104 (2013)
O.A.B. Penatti, L.T. Li, J. Almeida, R.D.S. Torres, A visual approach for video geocoding using bag-of-scenes, in Proceedings of the 2Nd ACM International Conference on Multimedia Retrieval, ICMR ’12, ACM, New York, pp. 53:1–53:8 (2012)
A. Popescu, CEA List’s participation at MediaEval 2013 Placing Task, in Larson et al. [31]
A. Popescu, N. Ballas, CEA List’s participation at MediaEval 2012 Placing Task, in Larson et al. [35]
A. Rae, P. Kelm, Working notes for the Placing Task at MediaEval 2012, in Larson et al. [35]
A. Rae, V. Murdock, P. Serdyukov, P. Kelm, Working notes for the Placing Task at MediaEval 2011, in Larson et al. [32]
S. Schmiedeke, C. Kofler, I. Ferrané, Overview of the MediaEval 2012 Tagging Task, Working Notes Proceedings of the MediaEval 2012 Workshop, Santa Croce in Fossabanda, Pisa, Italy, October 4–5, CEUR-WS.org, ISSN 1613–0073 (2012)
P. Serdyukov, V. Murdock, R. van Zwol, Placing Flickr photos on a map, in Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, ACM, New York, pp. 484–491 (2009)
D.A. Shamma, One hundred million creative commons Flickr images for research. http://yahoolabs.tumblr.com/post/89783581601/one-hundred-million-creative-commons-flickr-images-for, month = June, note = Accessed: 30 June 2014 (2014)
A.F. Smeaton, P. Over, W. Kraaij, Evaluation campaigns and TrecVid, in Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, MIR ’06, ACM, New York, pp. 321–330 (2006)
S. Subramanian, V. Vidyasagaran, K. Chandramouli, VIT@MediaEval 2013 Placing Task: location specific tag weighting for language model based placing of images, in Larson et al. [31]
M. Trevisiol, J. Delhumeau, H. Jégou, G. Gravier, How INRIA/IRISA identifies geographic location of a video, in Larson et al. [35]
M. Trevisiol, H. Jégou, J. Delhumeau, G. Gravier, Retrieving geo-location of videos with a divide & conquer hierarchical multimodal approach, in Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval, ICMR ’13, ACM, New York, pp. 1–8 (2013)
O. Van Laere, S. Schockaert, B. Dhoedt, Ghent University at the 2010 Placing Task, in Larson et al. [33]
O. Van Laere, S. Schockaert, B. Dhoedt, Finding locations of Flickr resources using language models and similarity search, in Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, ACM, New York, pp. 48:1–48:8 (2011)
O. Van Laere, S. Schockaert, B. Dhoedt, Ghent University at the 2011 Placing Task, in Larson et al. [32]
O. Van Laere, S. Schockaert, B. Dhoedt, Georeferencing Flickr photos using language models at different levels of granularity: an evidence based approach. J. Web Semant. 16, 17–31 (2012)
O. Van Laere, S. Schockaert, B. Dhoedt, Georeferencing Flickr resources based on textual meta-data. Inf. Sci. 238, 52–74 (2013)
O. Van Laere, S. Schockaert, J. Quinn, F. Langbein, B. Dhoedt, Ghent and CARDIFF University at the 2012 Placing Task, in Larson et al. [35]
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Larson, M. et al. (2015). The Benchmark as a Research Catalyst: Charting the Progress of Geo-prediction for Social Multimedia. In: Choi, J., Friedland, G. (eds) Multimodal Location Estimation of Videos and Images. Springer, Cham. https://doi.org/10.1007/978-3-319-09861-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-09861-6_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09860-9
Online ISBN: 978-3-319-09861-6
eBook Packages: EngineeringEngineering (R0)