Skip to main content

MM-Locate-News: Multimodal Focus Location Estimation in News

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2023)

Abstract

The consumption of news has changed significantly as the Web has become the most influential medium for information. To analyze and contextualize the large amount of news published every day, the geographic focus of an article is an important aspect in order to enable content-based news retrieval. There are methods and datasets for geolocation estimation from text or photos, but they are typically considered as separate tasks. However, the photo might lack geographical cues and text can include multiple locations, making it challenging to recognize the focus location using a single modality. In this paper, a novel dataset called Multimodal Focus Location of News (MM-Locate-News) is introduced. We evaluate state-of-the-art methods on the new benchmark dataset and suggest novel models to predict the focus location of news using both textual and image content. The experimental results show that the multimodal model outperforms unimodal models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Source code & dataset: https://github.com/TIBHannover/mm-locate-news.

  2. 2.

    http://eventregistry.org/.

References

  1. Andogah, G., Bouma, G., Nerbonne, J.: Every document has a geographical scope. Data Knowl. Eng. 81, 1–20 (2012)

    Article  Google Scholar 

  2. Armitage, J., Kacupaj, E., Tahmasebzadeh, G., Swati Maleshkova, M., Ewerth, R., Lehmann, J.: MLM: a benchmark dataset for multitask learning with multiple languages and modalities. In: International Conference on Information and Knowledge Management (CIKM), pp. 2967–2974 (2020). https://doi.org/10.1145/3340531.3412783

  3. Brank, J., Leban, G., Grobelnik, M.: Semantic annotation of documents based on wikipedia concepts. Informatica (Slovenia) 42(1), 23–32 (2018). http://www.informatica.si/index.php/informatica/article/view/2228

  4. Crandall, D.J., Backstrom, L., Huttenlocher, D.P., Kleinberg, J.M.: Mapping the world’s photos. In: International Conference on World Wide Web (WWW), pp. 761–770 (2009). https://doi.org/10.1145/1526709.1526812

  5. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)

    Google Scholar 

  6. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pp. 4171–4186 (2019). https://doi.org/10.18653/v1/n19-1423

  7. D’Ignazio, C., Bhargava, R., Zuckerman, E., Beck, L.: CLIFF-CLAVIN: determining geographic focus for news articles. In: NewsKDD: Data Science for News Publishing Workshop co-located with ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2014)

    Google Scholar 

  8. Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: European Conference on Computer Vision (ECCV), pp. 241–257 (2016). https://doi.org/10.1007/978-3-319-46466-4_15

  9. Gritta, M., Pilehvar, M.T., Collier, N.: Which melbourne? augmenting geocoding with maps. In: 56th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1285–1296 (2018). https://aclanthology.org/P18-1119/

  10. Halterman, A.: Mordecai: Full text geoparsing and event geocoding. J. Open Source Softw. 2(9), 91 (2017). https://doi.org/10.21105/joss.00091

  11. Hays, J., Efros, A.A.: IM2GPS: estimating geographic information from a single image. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2008)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision (ECCV), pp. 630–645 (2016). https://doi.org/10.1007/978-3-319-46493-0_38

  13. Honnibal, M., Montani, I.: spaCy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing (2017). https://spacy.io

  14. Imani, M.B., Chandra, S., Ma, S., Khan, L., Thuraisingham, B.M.: Focus location extraction from political news reports with bias correction. In: International Conference on Big Data (BigData), pp. 1956–1964 (2017). https://doi.org/10.1109/BigData.2017.8258141

  15. Izbicki, M., Papalexakis, E.E., Tsotras, V.J.: Exploiting the earth’s spherical geometry to geolocate images. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pp. 3–19 (2019). https://doi.org/10.1007/978-3-030-46147-8_1

  16. Kim, H.J., Dunn, E., Frahm, J.: Learned contextual feature reweighting for image geo-localization. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3251–3260 (2017). https://doi.org/10.1109/CVPR.2017.346

  17. Kordopatis-Zilos, G., Papadopoulos, S., Kompatsiaris, I.: Geotagging text content with language models and feature mining. Proc. IEEE 105(10), 1971–1986 (2017). https://doi.org/10.1109/JPROC.2017.2688799

    Article  Google Scholar 

  18. Kordopatis-Zilos, G., Popescu, A., Papadopoulos, S., Kompatsiaris, Y.: Placing images with refined language models and similarity search with PCA-reduced VGG features. In: MediaEval 2016 Workshop. vol. 1739 (2016). http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_13.pdf

  19. Krippendorff, K.: Computing krippendorff’s alpha-reliability (2011). https://repository.upenn.edu/asc_papers/43

  20. Kulkarni, S., Jain, S., Hosseini, M.J., Baldridge, J., Ie, E., Zhang, L.: Multi-level gazetteer-free geocoding. In: International Workshop on Spatial Language Understanding and Grounded Communication for Robotics, pp. 79–88 (2021)

    Google Scholar 

  21. Larson, M.A., Soleymani, M., Gravier, G., Ionescu, B., Jones, G.J.F.: The benchmarking initiative for multimedia evaluation: Mediaeval 2016. IEEE MultiMedia 24(1), 93–96 (2017). https://doi.org/10.1109/MMUL.2017.9

    Article  Google Scholar 

  22. Lieberman, M.D., Samet, H., Sankaranarayanan, J.: Geotagging with local lexicons to build indexes for textually-specified spatial data. In: International Conference on Data Engineering (ICDE), pp. 201–212 (2010). https://doi.org/10.1109/ICDE.2010.5447903

  23. Lin, T., Belongie, S.J., Hays, J.: Cross-view image geolocalization. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 891–898 (2013). https://doi.org/10.1109/CVPR.2013.120

  24. Müller-Budack, E., Pustu-Iren, K., Ewerth, R.: Geolocation estimation of photos using a hierarchical model and scene classification. In: European Conference on Computer Vision (ECCV), pp. 575–592 (2018). https://doi.org/10.1007/978-3-030-01258-8_35

  25. Müller-Budack, E., Theiner, J., Diering, S., Idahl, M., Hakimov, S., Ewerth, R.: Multimodal news analytics using measures of cross-modal entity and context consistency. Int. J. Multimedia Inf. Retrieval 10(2), 111–125 (2021). https://doi.org/10.1007/s13735-021-00207-4

    Article  Google Scholar 

  26. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (ICML), pp. 8748–8763 (2021). http://proceedings.mlr.press/v139/radford21a.html

  27. Ramisa, A., Yan, F., Moreno-Noguer, F., Mikolajczyk, K.: BreakingNews: article annotation by image and text processing. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), 1072–1085 (2018). https://doi.org/10.1109/TPAMI.2017.2721945

    Article  Google Scholar 

  28. Seo, P.H., Weyand, T., Sim, J., Han, B.: CPlaNet: enhancing image geolocalization by combinatorial partitioning of maps. In: European Conference on Computer Vision (ECCV), pp. 544–560 (2018). https://doi.org/10.1007/978-3-030-01249-6_33

  29. Serdyukov, P., Murdock, V., van Zwol, R.: Placing flickr photos on a map. In: SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 484–491 (2009). https://doi.org/10.1145/1571941.1572025

  30. Ulfelder, J., Schrodt, P.: Political Instability Task Force Worldwide Atrocities Event Data Collection Codebook. version 1.0 b2 (2009)

    Google Scholar 

  31. Uzkent, B., et al.: Learning to interpret satellite images using wikipedia. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 3620–3626 (2019). https://doi.org/10.24963/ijcai.2019/502

  32. Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014). https://doi.org/10.1145/2629489

    Article  Google Scholar 

  33. Weyand, T., Araujo, A., Cao, B., Sim, J.: Google landmarks dataset v2 - A large-scale benchmark for instance-level recognition and retrieval. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2572–2581 (2020)

    Google Scholar 

  34. Zhou, B., Lapedriza, À., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2018). https://doi.org/10.1109/TPAMI.2017.2723009

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially funded by the EU Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 812997 (CLEOPATRA ITN), and by the Ministry of Lower Saxony for Science and Culture (Responsible AI in digital society, project no. 51171145).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Golsa Tahmasebzadeh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tahmasebzadeh, G., Müller-Budack, E., Hakimov, S., Ewerth, R. (2023). MM-Locate-News: Multimodal Focus Location Estimation in News. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27077-2_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27076-5

  • Online ISBN: 978-3-031-27077-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics