Abstract
The consumption of news has changed significantly as the Web has become the most influential medium for information. To analyze and contextualize the large amount of news published every day, the geographic focus of an article is an important aspect in order to enable content-based news retrieval. There are methods and datasets for geolocation estimation from text or photos, but they are typically considered as separate tasks. However, the photo might lack geographical cues and text can include multiple locations, making it challenging to recognize the focus location using a single modality. In this paper, a novel dataset called Multimodal Focus Location of News (MM-Locate-News) is introduced. We evaluate state-of-the-art methods on the new benchmark dataset and suggest novel models to predict the focus location of news using both textual and image content. The experimental results show that the multimodal model outperforms unimodal models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Source code & dataset: https://github.com/TIBHannover/mm-locate-news.
- 2.
References
Andogah, G., Bouma, G., Nerbonne, J.: Every document has a geographical scope. Data Knowl. Eng. 81, 1–20 (2012)
Armitage, J., Kacupaj, E., Tahmasebzadeh, G., Swati Maleshkova, M., Ewerth, R., Lehmann, J.: MLM: a benchmark dataset for multitask learning with multiple languages and modalities. In: International Conference on Information and Knowledge Management (CIKM), pp. 2967–2974 (2020). https://doi.org/10.1145/3340531.3412783
Brank, J., Leban, G., Grobelnik, M.: Semantic annotation of documents based on wikipedia concepts. Informatica (Slovenia) 42(1), 23–32 (2018). http://www.informatica.si/index.php/informatica/article/view/2228
Crandall, D.J., Backstrom, L., Huttenlocher, D.P., Kleinberg, J.M.: Mapping the world’s photos. In: International Conference on World Wide Web (WWW), pp. 761–770 (2009). https://doi.org/10.1145/1526709.1526812
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pp. 4171–4186 (2019). https://doi.org/10.18653/v1/n19-1423
D’Ignazio, C., Bhargava, R., Zuckerman, E., Beck, L.: CLIFF-CLAVIN: determining geographic focus for news articles. In: NewsKDD: Data Science for News Publishing Workshop co-located with ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2014)
Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: European Conference on Computer Vision (ECCV), pp. 241–257 (2016). https://doi.org/10.1007/978-3-319-46466-4_15
Gritta, M., Pilehvar, M.T., Collier, N.: Which melbourne? augmenting geocoding with maps. In: 56th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1285–1296 (2018). https://aclanthology.org/P18-1119/
Halterman, A.: Mordecai: Full text geoparsing and event geocoding. J. Open Source Softw. 2(9), 91 (2017). https://doi.org/10.21105/joss.00091
Hays, J., Efros, A.A.: IM2GPS: estimating geographic information from a single image. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2008)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision (ECCV), pp. 630–645 (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Honnibal, M., Montani, I.: spaCy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing (2017). https://spacy.io
Imani, M.B., Chandra, S., Ma, S., Khan, L., Thuraisingham, B.M.: Focus location extraction from political news reports with bias correction. In: International Conference on Big Data (BigData), pp. 1956–1964 (2017). https://doi.org/10.1109/BigData.2017.8258141
Izbicki, M., Papalexakis, E.E., Tsotras, V.J.: Exploiting the earth’s spherical geometry to geolocate images. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pp. 3–19 (2019). https://doi.org/10.1007/978-3-030-46147-8_1
Kim, H.J., Dunn, E., Frahm, J.: Learned contextual feature reweighting for image geo-localization. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3251–3260 (2017). https://doi.org/10.1109/CVPR.2017.346
Kordopatis-Zilos, G., Papadopoulos, S., Kompatsiaris, I.: Geotagging text content with language models and feature mining. Proc. IEEE 105(10), 1971–1986 (2017). https://doi.org/10.1109/JPROC.2017.2688799
Kordopatis-Zilos, G., Popescu, A., Papadopoulos, S., Kompatsiaris, Y.: Placing images with refined language models and similarity search with PCA-reduced VGG features. In: MediaEval 2016 Workshop. vol. 1739 (2016). http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_13.pdf
Krippendorff, K.: Computing krippendorff’s alpha-reliability (2011). https://repository.upenn.edu/asc_papers/43
Kulkarni, S., Jain, S., Hosseini, M.J., Baldridge, J., Ie, E., Zhang, L.: Multi-level gazetteer-free geocoding. In: International Workshop on Spatial Language Understanding and Grounded Communication for Robotics, pp. 79–88 (2021)
Larson, M.A., Soleymani, M., Gravier, G., Ionescu, B., Jones, G.J.F.: The benchmarking initiative for multimedia evaluation: Mediaeval 2016. IEEE MultiMedia 24(1), 93–96 (2017). https://doi.org/10.1109/MMUL.2017.9
Lieberman, M.D., Samet, H., Sankaranarayanan, J.: Geotagging with local lexicons to build indexes for textually-specified spatial data. In: International Conference on Data Engineering (ICDE), pp. 201–212 (2010). https://doi.org/10.1109/ICDE.2010.5447903
Lin, T., Belongie, S.J., Hays, J.: Cross-view image geolocalization. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 891–898 (2013). https://doi.org/10.1109/CVPR.2013.120
Müller-Budack, E., Pustu-Iren, K., Ewerth, R.: Geolocation estimation of photos using a hierarchical model and scene classification. In: European Conference on Computer Vision (ECCV), pp. 575–592 (2018). https://doi.org/10.1007/978-3-030-01258-8_35
Müller-Budack, E., Theiner, J., Diering, S., Idahl, M., Hakimov, S., Ewerth, R.: Multimodal news analytics using measures of cross-modal entity and context consistency. Int. J. Multimedia Inf. Retrieval 10(2), 111–125 (2021). https://doi.org/10.1007/s13735-021-00207-4
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (ICML), pp. 8748–8763 (2021). http://proceedings.mlr.press/v139/radford21a.html
Ramisa, A., Yan, F., Moreno-Noguer, F., Mikolajczyk, K.: BreakingNews: article annotation by image and text processing. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), 1072–1085 (2018). https://doi.org/10.1109/TPAMI.2017.2721945
Seo, P.H., Weyand, T., Sim, J., Han, B.: CPlaNet: enhancing image geolocalization by combinatorial partitioning of maps. In: European Conference on Computer Vision (ECCV), pp. 544–560 (2018). https://doi.org/10.1007/978-3-030-01249-6_33
Serdyukov, P., Murdock, V., van Zwol, R.: Placing flickr photos on a map. In: SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 484–491 (2009). https://doi.org/10.1145/1571941.1572025
Ulfelder, J., Schrodt, P.: Political Instability Task Force Worldwide Atrocities Event Data Collection Codebook. version 1.0 b2 (2009)
Uzkent, B., et al.: Learning to interpret satellite images using wikipedia. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 3620–3626 (2019). https://doi.org/10.24963/ijcai.2019/502
Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014). https://doi.org/10.1145/2629489
Weyand, T., Araujo, A., Cao, B., Sim, J.: Google landmarks dataset v2 - A large-scale benchmark for instance-level recognition and retrieval. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2572–2581 (2020)
Zhou, B., Lapedriza, À., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2018). https://doi.org/10.1109/TPAMI.2017.2723009
Acknowledgements
This work was partially funded by the EU Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 812997 (CLEOPATRA ITN), and by the Ministry of Lower Saxony for Science and Culture (Responsible AI in digital society, project no. 51171145).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tahmasebzadeh, G., Müller-Budack, E., Hakimov, S., Ewerth, R. (2023). MM-Locate-News: Multimodal Focus Location Estimation in News. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-27077-2_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27076-5
Online ISBN: 978-3-031-27077-2
eBook Packages: Computer ScienceComputer Science (R0)