A lightweight and multilingual framework for crisis information extraction from Twitter data

Interdonato, Roberto; Guillaume, Jean-Loup; Doucet, Antoine

doi:10.1007/s13278-019-0608-4

A lightweight and multilingual framework for crisis information extraction from Twitter data

Original Article
Published: 01 November 2019

Volume 9, article number 65, (2019)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Roberto Interdonato ORCID: orcid.org/0000-0002-0536-6277¹,
Jean-Loup Guillaume² &
Antoine Doucet²

515 Accesses
15 Citations
Explore all metrics

Abstract

Obtaining relevant timely information during crisis events is a challenging task that can be fundamental to handle the consequences deriving from both unexpected events (e.g., terrorist attacks) and partially predictable ones (i.e., natural disasters). Even though microblogging-based online social networks (e.g., Twitter) have become an attractive data source in these emergency situations, overcoming the information overload deriving from mass events is not trivial. The aim of this work was to enable unsupervised extraction of relevant information from Twitter data during a crisis event, offering a lightweight alternative to learning-based approaches. The proposed lightweight crisis management framework integrates natural language processing and clustering techniques in order to produce a ranking of tweets relevant to a crisis situation based on their informativeness. Experiments carried out on six Twitter collections in two languages (English and French) proved the significance and the flexibility of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Political mud slandering and power dynamics during Indian assembly elections

Article 27 August 2023

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Article 26 October 2022

Telegram channels covering Russia’s invasion of Ukraine: a comparative analysis of large multilingual corpora

Article 03 January 2024

Notes

http://www.internetlivestats.com/twitter-statistics/.
http://docs.oasis-open.org/emergency/edxl-sitrep/v1.0/csprd03/edxl-sitrep-v1.0-csprd03.html.
https://en.oxforddictionaries.com/definition/informative.
https://en.oxforddictionaries.com/definition/credibility.
Since we focus on a Twitter case, but the approach is generalizable to other OSNs, we will use the terms microblog posts and tweets interchangeably.
https://www.wordreference.com/definition/lexicon.
https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.
https://pypi.python.org/pypi/langdetect.
http://scikit-learn.org.
https://github.com/pvoosten/explicit-semantic-analysis.
https://radimrehurek.com/gensim/models/word2vec.html.
https://code.google.com/archive/p/word2vec/.
https://git2017.univ-lr.fr/rinterdo/UnsupervisedFieldObservationIdentification.

References

Arthur D, Vassilvitskii S (2007) k means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pp 1027–1025
Bakshy E, Hofman JM, Mason WA, Watts DJ (2011) Everyone’s an influencer: quantifying influence on Twitter. In: Proceedings ACM conference on web search and web data mining (WSDM), pp 65–74
Basu M, Ghosh K, Das S, Dey R, Bandyopadhyay S, Ghosh S (2017) Identifying post-disaster resource needs and availabilities from microblogs. In Proceedings of IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 427–430
Berlingerio M, Calabrese F, Di Lorenzo G, Dong X, Gkoufas Y, Mavroeidis D (2013) SaferCity: a system for detecting and analyzing incidents from social media. In: Proceedings of international conference on data mining workshops (ICDMW), pp 1077–1080
Bizid I, Nayef N, Boursier P, Faïz S, Doucet A (2015a) Identification of microblogs prominent users during events by learning temporal sequences of features. In: Proceedings ACM conference on information and knowledge management (CIKM), pp 1715–1718
Bizid I, Nayef N, Boursier P, Faïz S, Morcos J (2015b) Prominent users detection during specific events by learning on- and off-topic features of user activities. In: Proceedings IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 500–503
Bizid I, Boursier P, Morcos J, Faïz S (2015c) MASIR: a multi-agent system for real-time information retrieval from microblogs during unexpected events. In: Proceedings of international conference agent and multi-agent systems: technologies and applications (KES-AMSTA), pp 3–13
Bizid I (2016) Prominent microblog users prediction during crisis events: using phase-aware and temporal modeling of users behavior. PhD thesis
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(4–5):993–1022
MATH Google Scholar
Burel G, Saif H, Alani H (2017) Semantic wide and deep learning for detecting crisis-information categories on social media. In: Proceedings of international semantic web conference (ISWC), pp 138–155
Francisco M, Alves-Souza SN, Campos EGL, De Souza LS (2017) Total data quality management and total information quality management applied to costumer relationship management. In: Proceedings of the 9th international conference on information management and engineering, ICIME 2017, pp 40–45
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings international joint conference on artificial intelligence (IJCAI), pp 1606–1611
Goel R, Soni S, Goyal N, Paparrizos J, Wallach HM, Diaz F, Eisenstein J (2016) The social dynamics of language change in online networks. In: Proceedings of international conference social informatics (SocInfo), pp 41–57
Gupta A, Kumaraguru P (2012) Credibility ranking of tweets during high impact events. In: Proceedings of the 1st workshop on privacy and security in online social media (PSOSM), pp 2–8
Gupta A, Kumaraguru P, Castillo C, Meier P (2014) TweetCred: real-time credibility assessment of content on twitter. In: Proceedings of international conference social informatics (SocInfo), pp 228–243
Huang B, Carley KM (2017) On predicting geolocation of tweets using convolutional neural networks. In: International conference on social, cultural, and behavioral modeling (SBP-BRiMS), pp 281–291
Hung K-C, Kalantari M, Rajabifard A (2017) An integrated method for assessing the text content quality of volunteered geographic information in disaster management. IJISCRAM 9(2):1–17
Google Scholar
Imran M, Castillo C, Diaz F, Vieweg S (2015) Processing social media messages in mass emergency. ACM Comput Surv 47(4):1–38
Article Google Scholar
Imran M, Mitra P, Srivastava J (2016) Enabling rapid classification of social media communications during crises. IJISCRAM 8(3):1–17
Google Scholar
Imran M, Elbassuoni S, Castillo C, Diaz F, Meier P (2013) Extracting information nuggets from disaster-related messages in social media. In: 10th proceedings of the international conference on information systems for crisis response and management, Baden-Baden, Germany, May 12–15, 2013
Interdonato R, Doucet A, Guillaume J-L (2018) Unsupervised crisis information extraction from twitter data. In IEEE/ACM 2018 international conference on advances in social networks analysis and mining, ASONAM 2018, Barcelona, Spain, August 28–31, 2018, pp 579–580
Ito J, Song J, Toda H, Koike Y, Oyama S (2015) Assessment of tweet credibility with LDA features. In: Proceedings of international conference on world wide web—companion, pp 953–958
Kwak H, Lee C, Park H, Moon SB (2010) What is Twitter, a social network or a news media? In: Proceedings of ACM conference on world wide web (WWW), pp 591–600
Lee D, Seung HS (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401:788–791
Article Google Scholar
Letsios M, Balalau OD, Danisch M, Orsini E, Sozio M (2016) Finding heaviest k-subgraphs and events in social media. In: Proceedings IEEE international conference on data mining (ICDM), pp 113–120
Ghasemaghaei M, Hassanein K (2015) Online information quality and consumer satisfaction: the moderating roles of contextual factors—a meta-analysis. Inf Manag 52(8):965–981
Article Google Scholar
Mendoza M, Poblete B, Castillo C (2010) Twitter under crisis: can we trust what wert? pp 71–79
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Annual conference on neural information processing systems (NIPS), pp 3111–3119
Nazer TH, Morstatter F, Dani H, Liu H (2016) Finding requests in social media for disaster relief. In: Proceedings of IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 1410–1413
Olteanu A, Castillo C, Diaz F, Vieweg S (2014) Crisislex: a lexicon for collecting and filtering microblogged communications in Crises. In: Proceedings of international conference on weblogs and social media. ICWSM
Qu Y, Huang C, Zhang P, Zhang J (2011) Microblogging after a major disaster in China. In: Proceedings of international conference computer supported cooperative work (CSCW). ACM Press, p 25
Rogstadius J, Vukovic M, Teixeira CA, Kostakos V, Karapanos E, Laredo JA (2013) CrisisTracker: crowdsourced social media curation for disaster awareness. IBM J Res Dev 57(5):4:1–4:13
Article Google Scholar
Seppänen H, Mäkelä J, Luokkala P, Virrantaus K (2013) Developing shared situational awareness for emergency management. Saf Sci 55:1–9
Article Google Scholar
Seppänen H, Virrantaus K (2015) Shared situational awareness and information quality in disaster management. Saf Sci 77:112–122
Article Google Scholar
Shamala P, Ahmad R, Ali HZ, Sedek M (2017) Integrating information quality dimensions into information security risk management (ISRM). J Inf Secur Appl 36:1–10
Google Scholar
Shao M, Li J, Chen F, Huang H, Zhang S, Chen X (2017) An efficient approach to event detection and forecasting in dynamic multivariate social media networks. In: Proceedings of ACM conference on world wide web (WWW), pp 1631–1639
Thomson R, Ito N, Suda H, Lin F, Liu Y, Hayasaka R, Isochi R, Wang Z (2012) Trusting tweets : the fukushima disaster and information source credibility on twitter. Iscram, (April), pp 1–10
Varga I, Sano M, Torisawa K, Hashimoto C, Ohtake K, Kawai T, Oh J-H, De Saeger S (2013) Aid is out there: looking for help from tweets during a large scale disaster, pp 1619–1629
Vieweg S, Hughes AL, Starbird K, Palen L (2010) Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In: Proceedings of the 28th international conference on human factors in computing systems, CHI 2010, Atlanta, Georgia, USA, April 10–15, 2010, pp 1079–1088
Xia X, Yang X, Wu C, Li S, Bao L (2012) Information credibility on twitter in emergency situation. In: Intelligence and security informatics—Pacific Asia workshop, PAISI, volume 7299 LNCS, pp 45–59
Yagci IA, Das S (2018) Measuring design-level information quality in online reviews. Electron Commer Res Appl 30:102–110
Article Google Scholar
Zadeh PA, Wang G, Cavka HB, Staub-French S, Pottinger R (2017) Information quality assessment for facility management. Adv Eng Inform 33:181–205
Article Google Scholar

Download references

Author information

Authors and Affiliations

CIRAD, TETIS, Univ. of Montpellier, APT, Cirad, CNRS, Irstea, Montpellier, France
Roberto Interdonato
L3I, Université de La Rochelle, La Rochelle, France
Jean-Loup Guillaume & Antoine Doucet

Authors

Roberto Interdonato
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Loup Guillaume
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Doucet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roberto Interdonato.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

An abridged version of this paper appeared in Interdonato et al. (2018).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Interdonato, R., Guillaume, JL. & Doucet, A. A lightweight and multilingual framework for crisis information extraction from Twitter data. Soc. Netw. Anal. Min. 9, 65 (2019). https://doi.org/10.1007/s13278-019-0608-4

Download citation

Received: 20 December 2018
Revised: 10 October 2019
Accepted: 14 October 2019
Published: 01 November 2019
DOI: https://doi.org/10.1007/s13278-019-0608-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A lightweight and multilingual framework for crisis information extraction from Twitter data

Abstract

Access this article

Similar content being viewed by others

Political mud slandering and power dynamics during Indian assembly elections

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Telegram channels covering Russia’s invasion of Ukraine: a comparative analysis of large multilingual corpora

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A lightweight and multilingual framework for crisis information extraction from Twitter data

Abstract

Access this article

Similar content being viewed by others

Political mud slandering and power dynamics during Indian assembly elections

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Telegram channels covering Russia’s invasion of Ukraine: a comparative analysis of large multilingual corpora

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation