MM-Locate-News: Multimodal Focus Location Estimation in News

Tahmasebzadeh, Golsa; Müller-Budack, Eric; Hakimov, Sherzod; Ewerth, Ralph

doi:10.1007/978-3-031-27077-2_16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13833))

Included in the following conference series:

International Conference on Multimedia Modeling

2028 Accesses

Abstract

The consumption of news has changed significantly as the Web has become the most influential medium for information. To analyze and contextualize the large amount of news published every day, the geographic focus of an article is an important aspect in order to enable content-based news retrieval. There are methods and datasets for geolocation estimation from text or photos, but they are typically considered as separate tasks. However, the photo might lack geographical cues and text can include multiple locations, making it challenging to recognize the focus location using a single modality. In this paper, a novel dataset called Multimodal Focus Location of News (MM-Locate-News) is introduced. We evaluate state-of-the-art methods on the new benchmark dataset and suggest novel models to predict the focus location of news using both textual and image content. The experimental results show that the multimodal model outperforms unimodal models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multimodal Geolocation Estimation of News Photos

MCLS: A Large-Scale Multimodal Cross-Lingual Summarization Dataset

Multimodal news analytics using measures of cross-modal entity and context consistency

Article Open access 28 April 2021

Notes

1.
Source code & dataset: https://github.com/TIBHannover/mm-locate-news.
2.
http://eventregistry.org/.

References

Andogah, G., Bouma, G., Nerbonne, J.: Every document has a geographical scope. Data Knowl. Eng. 81, 1–20 (2012)
Article Google Scholar
Armitage, J., Kacupaj, E., Tahmasebzadeh, G., Swati Maleshkova, M., Ewerth, R., Lehmann, J.: MLM: a benchmark dataset for multitask learning with multiple languages and modalities. In: International Conference on Information and Knowledge Management (CIKM), pp. 2967–2974 (2020). https://doi.org/10.1145/3340531.3412783
Brank, J., Leban, G., Grobelnik, M.: Semantic annotation of documents based on wikipedia concepts. Informatica (Slovenia) 42(1), 23–32 (2018). http://www.informatica.si/index.php/informatica/article/view/2228
Crandall, D.J., Backstrom, L., Huttenlocher, D.P., Kleinberg, J.M.: Mapping the world’s photos. In: International Conference on World Wide Web (WWW), pp. 761–770 (2009). https://doi.org/10.1145/1526709.1526812
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pp. 4171–4186 (2019). https://doi.org/10.18653/v1/n19-1423
D’Ignazio, C., Bhargava, R., Zuckerman, E., Beck, L.: CLIFF-CLAVIN: determining geographic focus for news articles. In: NewsKDD: Data Science for News Publishing Workshop co-located with ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2014)
Google Scholar
Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: European Conference on Computer Vision (ECCV), pp. 241–257 (2016). https://doi.org/10.1007/978-3-319-46466-4_15
Gritta, M., Pilehvar, M.T., Collier, N.: Which melbourne? augmenting geocoding with maps. In: 56th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1285–1296 (2018). https://aclanthology.org/P18-1119/
Halterman, A.: Mordecai: Full text geoparsing and event geocoding. J. Open Source Softw. 2(9), 91 (2017). https://doi.org/10.21105/joss.00091
Hays, J., Efros, A.A.: IM2GPS: estimating geographic information from a single image. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2008)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision (ECCV), pp. 630–645 (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Honnibal, M., Montani, I.: spaCy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing (2017). https://spacy.io
Imani, M.B., Chandra, S., Ma, S., Khan, L., Thuraisingham, B.M.: Focus location extraction from political news reports with bias correction. In: International Conference on Big Data (BigData), pp. 1956–1964 (2017). https://doi.org/10.1109/BigData.2017.8258141
Izbicki, M., Papalexakis, E.E., Tsotras, V.J.: Exploiting the earth’s spherical geometry to geolocate images. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pp. 3–19 (2019). https://doi.org/10.1007/978-3-030-46147-8_1
Kim, H.J., Dunn, E., Frahm, J.: Learned contextual feature reweighting for image geo-localization. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3251–3260 (2017). https://doi.org/10.1109/CVPR.2017.346
Kordopatis-Zilos, G., Papadopoulos, S., Kompatsiaris, I.: Geotagging text content with language models and feature mining. Proc. IEEE 105(10), 1971–1986 (2017). https://doi.org/10.1109/JPROC.2017.2688799
Article Google Scholar
Kordopatis-Zilos, G., Popescu, A., Papadopoulos, S., Kompatsiaris, Y.: Placing images with refined language models and similarity search with PCA-reduced VGG features. In: MediaEval 2016 Workshop. vol. 1739 (2016). http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_13.pdf
Krippendorff, K.: Computing krippendorff’s alpha-reliability (2011). https://repository.upenn.edu/asc_papers/43
Kulkarni, S., Jain, S., Hosseini, M.J., Baldridge, J., Ie, E., Zhang, L.: Multi-level gazetteer-free geocoding. In: International Workshop on Spatial Language Understanding and Grounded Communication for Robotics, pp. 79–88 (2021)
Google Scholar
Larson, M.A., Soleymani, M., Gravier, G., Ionescu, B., Jones, G.J.F.: The benchmarking initiative for multimedia evaluation: Mediaeval 2016. IEEE MultiMedia 24(1), 93–96 (2017). https://doi.org/10.1109/MMUL.2017.9
Article Google Scholar
Lieberman, M.D., Samet, H., Sankaranarayanan, J.: Geotagging with local lexicons to build indexes for textually-specified spatial data. In: International Conference on Data Engineering (ICDE), pp. 201–212 (2010). https://doi.org/10.1109/ICDE.2010.5447903
Lin, T., Belongie, S.J., Hays, J.: Cross-view image geolocalization. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 891–898 (2013). https://doi.org/10.1109/CVPR.2013.120
Müller-Budack, E., Pustu-Iren, K., Ewerth, R.: Geolocation estimation of photos using a hierarchical model and scene classification. In: European Conference on Computer Vision (ECCV), pp. 575–592 (2018). https://doi.org/10.1007/978-3-030-01258-8_35
Müller-Budack, E., Theiner, J., Diering, S., Idahl, M., Hakimov, S., Ewerth, R.: Multimodal news analytics using measures of cross-modal entity and context consistency. Int. J. Multimedia Inf. Retrieval 10(2), 111–125 (2021). https://doi.org/10.1007/s13735-021-00207-4
Article Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (ICML), pp. 8748–8763 (2021). http://proceedings.mlr.press/v139/radford21a.html
Ramisa, A., Yan, F., Moreno-Noguer, F., Mikolajczyk, K.: BreakingNews: article annotation by image and text processing. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), 1072–1085 (2018). https://doi.org/10.1109/TPAMI.2017.2721945
Article Google Scholar
Seo, P.H., Weyand, T., Sim, J., Han, B.: CPlaNet: enhancing image geolocalization by combinatorial partitioning of maps. In: European Conference on Computer Vision (ECCV), pp. 544–560 (2018). https://doi.org/10.1007/978-3-030-01249-6_33
Serdyukov, P., Murdock, V., van Zwol, R.: Placing flickr photos on a map. In: SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 484–491 (2009). https://doi.org/10.1145/1571941.1572025
Ulfelder, J., Schrodt, P.: Political Instability Task Force Worldwide Atrocities Event Data Collection Codebook. version 1.0 b2 (2009)
Google Scholar
Uzkent, B., et al.: Learning to interpret satellite images using wikipedia. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 3620–3626 (2019). https://doi.org/10.24963/ijcai.2019/502
Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014). https://doi.org/10.1145/2629489
Article Google Scholar
Weyand, T., Araujo, A., Cao, B., Sim, J.: Google landmarks dataset v2 - A large-scale benchmark for instance-level recognition and retrieval. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2572–2581 (2020)
Google Scholar
Zhou, B., Lapedriza, À., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2018). https://doi.org/10.1109/TPAMI.2017.2723009
Article Google Scholar

Download references

Acknowledgements

This work was partially funded by the EU Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 812997 (CLEOPATRA ITN), and by the Ministry of Lower Saxony for Science and Culture (Responsible AI in digital society, project no. 51171145).

Author information

Authors and Affiliations

TIB–Leibniz Information Centre for Science and Technology, Hannover, Germany
Golsa Tahmasebzadeh, Eric Müller-Budack & Ralph Ewerth
L3S Research Center, Leibniz University Hannover, Hannover, Germany
Golsa Tahmasebzadeh, Eric Müller-Budack & Ralph Ewerth
Computational Linguistics, University of Postdam, Potsdam, Germany
Sherzod Hakimov

Authors

Golsa Tahmasebzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Eric Müller-Budack
View author publications
You can also search for this author in PubMed Google Scholar
Sherzod Hakimov
View author publications
You can also search for this author in PubMed Google Scholar
Ralph Ewerth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Golsa Tahmasebzadeh .

Editor information

Editors and Affiliations

University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
Dublin City University, Dublin, Ireland
Cathal Gurrin
Radboud University Nijmegen, Nijmegen, The Netherlands
Martha Larson
Dublin City University, Dublin, Ireland
Alan F. Smeaton
University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
National Institute of Information and Communications Technology, Tokyo, Japan
Minh-Son Dao
Department of Information Science and Media Studies, University of Bergen, Bergen, Norway
Christoph Trattner
La Trobe University, Melbourne, VIC, Australia
Phoebe Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tahmasebzadeh, G., Müller-Budack, E., Hakimov, S., Ewerth, R. (2023). MM-Locate-News: Multimodal Focus Location Estimation in News. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-27077-2_16
Published: 29 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27076-5
Online ISBN: 978-3-031-27077-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MM-Locate-News: Multimodal Focus Location Estimation in News

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Geolocation Estimation of News Photos

MCLS: A Large-Scale Multimodal Cross-Lingual Summarization Dataset

Multimodal news analytics using measures of cross-modal entity and context consistency

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

MM-Locate-News: Multimodal Focus Location Estimation in News

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Geolocation Estimation of News Photos

MCLS: A Large-Scale Multimodal Cross-Lingual Summarization Dataset

Multimodal news analytics using measures of cross-modal entity and context consistency

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation