Detecting evolutionary stages of events on social media: A graph-kernel-based approach

doi:10.1016/j.future.2021.05.006

Future Generation Computer Systems

Volume 123, October 2021, Pages 219-232

https://doi.org/10.1016/j.future.2021.05.006 Get rights and content

Highlights

•
We propose a new KPIG graph to represent events on social media.
•
We propose a graph-kernel method to detect events’ evolutionary stages.
•
We report experimental results on a real dataset and present a case study.

Abstract

Detecting the evolutionary stages of social media events such as Twitter and Sina Weibo is beneficial for enterprises and governments to take necessary actions before emergent phenomena become uncontrollable. Prior work on event extraction from microblogs mostly focused on extracting event summary. However, the evolutionary stages of an event can provide more details about the evolution of the event, which are more critical to predict the future trend of the event. On the other hand, many events have a lifecycle-like evolutionary property, i.e., beginning, developing, climax, descending, and disappearing. Such a feature is helpful to detect the evolutionary stages of events. Thus, we propose a graph-based approach to represent and extract the evolutionary stages of events from microblogs based on this consideration. The contributions of this study are threefold. First, differing from existing methods that use a keyword set or a microblog set to represent an event, we propose a Keyword Popularity Information Graph (KPIG) to represent the keywords and the statistical information of events using a graph. With this mechanism, we can capture both the textual information and the statistical information of an event. Second, based on the KPIG graph, we present a graph-kernel-based approach to measure the similarity among events. Third, we conduct extensive experiments on a real dataset and compare our proposal with several competitor algorithms. The results show that our approach outperforms other competitors in terms of various metrics.

Introduction

Social media analysis has become a hot research topic in recent years [1]. Social media characteristics, such as timeliness, high user participation, and rapid diffusion, provide new opportunities for the early detection and evolution prediction of events. The evolution process of events usually has a specific lifecycle. However, existing research lacks effective methods to accurately extract the evolutionary lifecycle of social media events, which may also affect the accuracy of evolution prediction for events. Detecting the evolution of events helps predict the future trend of events. It is also useful for sentiment analysis over social media because the public sentiment to an event also has the evolutionary property.

Many events in the real world have evolutionary stages, i.e., from birth to death, which is similar to people’s lifecycle. An event’s lifecycle can be defined as a process, including several stages such as beginning, developing, climax, descending, and disappearing, as shown in Fig. 1(a). In addition, there are also some events involving multiple peaks, as shown in Fig. 1(b). The multi-climax event evolution shows an iterative process. However, when we consider social media, it is not a trial task to extract the evolutionary stages of events from social media. There are two challenges:

Challenge 1. The evolution of an event is hard to be captured only by statistical information such as the number of tweets or comments related to the event. For example, if an earthquake event occurred a few days ago, the number of tweets talking about the earthquake will reach a high value. If this number gradually decreases with time and we only consider the number of tweets related to the event, we may conclude that the event has reached the climax and started to degrade. However, this is not always true in the real world. For instance, an earthquake usually incurs many other issues like rescuing, casualties, and city rebuilding. To this end, the event is still at the developing stage. As a result, we need to consider more information like textual information other than statistical event information.

Challenge 2. It is a challenging issue to represent the statistical and textual information of an event effectively. Previous work mainly used keywords or microblogs to describe an event [2], [3], which ignored the linkage information among events and microblogs. Although some works performed evolutionary analysis on events by arranging and linking events according to the timeline [4], [5], they were not enough to detect the evolutionary stages of events because they could measure the similarity among events effectively.

To address the two challenges in extracting the evolutionary stages of events from microblogs, in this paper, we propose a Keyword Popularity Information Graph (KPIG) to represent events and a graph-kernel [6], [7], [8] based approach to detect the evolutionary stages of events. Notably, we make the following three contributions in this paper:

(1) We propose a KPIG graph to integrate textual, statistical, and linkage information for events. Compared to the previous keyword-set based or microblog-set based models, the KPIG graph can provide more rich information about events.

(2) We propose a graph-kernel-based method that runs on KPIG graphs to detect events’ evolutionary stages. Specially, we employ a shortest-path-based graph kernel to measure the similarity and changes between KPIG graphs.

(3) We report extensive experimental results on a real microblog dataset and present a case study to show the effectiveness of our proposal. The results show that our proposal outperforms existing approaches in terms of various metrics.

The rest of this paper is organized as follows. Section 2 provides an overview of the related work. In Section 3, we introduce the KPIG graph. Section 4 presents the graph-kernel-based method to compute the similarity and changes of KPIG graphs. In Section 5, we report the experimental results on a real dataset. Finally, we conclude our work in Section 6.

Section snippets

Related work

This section reviews some of the literature related to our algorithm, including topic detection and tracking, graph kernel, and graph-based text similarity. Event detection for microblogs (or microblogs-like data) has wealthy literature. However, in this study, we focus on detecting events’ evolutionary stages rather than events themselves. Therefore, in this section, we do not cover the existing work about event extraction.

Keyword popularity information graph

In this section, we present the KPIG graph. Table 3 shows the notations used throughout the entire paper. Below, we first present the basic definitions for events in Section 3.1 and then detail the KPIG graph in Section 3.2.

Overview of evolutionary stage detection

Fig. 4 shows the framework for detecting the evolutionary stages of events. First, we preprocess the raw tweet stream dataset, including word segmentation and removing stop words. Second, we divide the tweet stream dataset into sub-sets; each sub-set contains the tweet stream published in the same time interval (e.g., one hour or one day), which is a sub-event. We process these sub-events in chronological order. Third, we run the term clustering method to cluster different expressions that

Performance evaluation

In this section, we compare our proposal with other methods to evaluate effectiveness. We prepare the dataset by crawling posts from Sina Weibo (http://weibo.com). There are several ways to obtain the input dataset of the algorithm. The first way is to use the algorithm proposed in [8] to extract events from the microblog dataset and then link the sub-events that belong to the same event in different periods. Finally, it gets the evolution of texts and events in different periods. When crawling

Conclusions and future work

Detecting the evolutionary stages of events on social media is a new and important issue in social computing. In this paper, we have made unique contributions to this research direction. In summary, the contributions of this study are threefold. First, we proposed a new KPIG to represent social media events. We demonstrated that the KPIG approach can represent the keywords of an event and its statistical information. Second, based on the KPIG structure, we proposed a new approach based on SPGK

CRediT authorship contribution statement

Lin Mu: Methodology, Software, Writing - original draft. Peiquan Jin: Conceptualization, Methodology, Writing - review & editing. Jie Zhao: Conceptualization, Writing - review & editing, Funding acquisition. Enhong Chen: Supervision, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the National Science Foundation of China (no. 62072419 and no. 71273010) and the National Statistical Science Research Project of China (no. 2019LY66). Peiquan Jin and Jie Zhao are joint corresponding authors.

Lin Mu is currently a Ph.D. student in the School of Computer Science and Technology at the University of Science and Technology of China (USTC). His research interests include social network analysis and information extraction.

References (45)

CamachoD. et al.
The four dimensions of social network analysis: An overview of research methods, applications, and software tools
Inf. Fusion
(2020)
HussainA. et al.
Semi-supervised learning for big social data analysis
Neurocomputing
(2018)
VisviziA. et al.
Tweeting and mining OECD-related microcontent in the post-truth era: A cloud-based app
Comput. Hum. Behav.
(2020)
ZhaoM. et al.
ALG: Adaptive low-rank graph regularization for scalable semi-supervised and unsupervised learning
Neurocomputing
(2019)
ZhangZ. et al.
Adaptive non-negative projective semi-supervised learning for inductive classification
Neural Netw.
(2018)
P. Jin, L. Mu, L. Zheng, J. Zhao, L. Yue, News feature extraction for events on social network platforms, in:...
A. Ritter, . Mausam, O. Etzioni, S. Clark, Open domain event extraction from twitter, in: Proceedings of the 18th ACM...
M. Osborne, S. Moran, R. McCreadie, A. Von Lunen, M. Sykora, E. Cano, et al. Real-time detection, tracking, and...
CaiH. et al.
Indexing evolving events from tweet streams
IEEE Trans. Knowl. Data Eng.
(2015)
SugiyamaM. et al.
Halting in random walk kernels

G. Nikolentzos, P. Meladianos, M. Vazirgiannis, Matching node embeddings for graph similarity, in: Proceedings of the...

ShervashidzeN. et al.

Weisfeiler-lehman graph kernels

J. Mach. Learn. Res.

(2011)

AllanJ.

Introduction to topic detection and tracking

YangC.C. et al.

Discovering event evolution graphs from news corpora

IEEE Trans. Syst. Man Cybern. A

(2009)

P. Lee, L.V. Lakshmanan, E.E. Milios, Incremental cluster evolution tracking from highly dynamic network data, in:...

C. Wu, B. Wu, B. Wang, Event evolution model based on random walk model with hot topic extraction, in: Proceedings of...

HuangJ. et al.

A probabilistic method for emerging topic tracking in microblog stream

World Wide Web

(2017)

M. Fedoryszak, B. Frederick, V. Rajaram, C. Zhong, Real-time event detection on social data streams, in: Proceedings of...

HausslerD.

Convolution Kernels on Discrete StructuresTechnical Report

(1999)

J. Ramon, T. Gärtner, Expressivity versus efficiency of graph kernels, in: Proceedings of the 1st International...

T. Horváth, T. Gärtner, S. Wrobel, Cyclic pattern kernels for predictive graph mining, in: Proceedings of the 10th ACM...

K.M. Borgwardt, H.P. Kriegel, Shortest-path kernels on graphs, in: Proceedings of the Fifth IEEE International...

Cited by (10)

Predicting multi-subsequent events and actors in public health emergencies: An event-based knowledge graph approach
2024, Computers and Industrial Engineering
Public health emergencies trigger series of chain reactions that have devastating impacts on society. In addition, the subsequent events and actors in public health emergencies represent comprehensive emergency scenarios. Taking this information into account, predicting subsequent events and actors could motivate governments to take necessary and effective countermeasures. Therefore, we develop a model for predicting subsequent events and potential actors, i.e., a subsequent multievent graph convolutional network (SMEGCN), by utilizing the evolutionary information of events. Specifically, we take both relational information and semantic information into consideration to achieve improved prediction performance and simultaneously predict subsequent actors in a convenient manner. Specifically, we collect data from the Sina microblog concerning the COVID-19 pandemic to form five news datasets by employing a Python-based agent to practically test the performance of our model. The agile principle is applied to identify and handle a series of subsequent events and potential actors. The results show that embedding relational information, semantic information, and context inferences into the prediction model can improve the model performance by approximately 20%. Additionally, a comparative analysis indicates that the SMEGCN model is superior to other methods in terms of predicting both subsequent events and actors. From the perspective of an example analysis, social media, especially official media accounts, stimulates interactions between governments and the public and improves the management effectiveness of governments. However, during periods of emerging public health emergencies, the most important events that should be noted diachronically are treatment, daily life guarantees, and acute and chronic disease treatments, whereas at the end of a public health emergency, the key tasks include how to revive commercial activities and improve the vaccination rate. Drawing conclusions from these discussions, the present study not only contributes to the literature on theoretically and methodologically predicting events and actors but also provides practical suggestions for managing public health emergencies.
A game model and numerical simulation of risk communication in metro emergencies under the influence of emotions
2023, International Journal of Disaster Risk Reduction
To study the influence of subject emotions on risk communication under emergencies, this paper adopts the Rank Dependent Expected Utility (RDEU) theory to define sentiment functions. Using a metro emergency as an example, a game model of risk communication between the public and the management is constructed, then a Nash equilibrium solution analysis is conducted, and finally a numerical simulation using MATLAB is carried out to investigate the optimal strategy for risk communication in different situations. The results show that emotions can affect the risk communication behaviour of the emergency to a certain extent, but are not a determining factor. When both sides of the game have emotions, the pessimistic emotional state will have a greater impact on decision-making compared to the optimistic emotional state. In addition, it is found that opinion leaders significantly influence the choice of risk communication strategies of game players. This study provides theoretical and methodological support for risk communication in emergencies.
Brain Network Analysis of Patients with ADHD Based on Subnetwork Similarity
2023, Shuju Caiji Yu Chuli/Journal of Data Acquisition and Processing
Predicting Multi-Subsequent Events and Actors in Public Health Emergencies: An Event-Based Knowledge Graph Approach
2023, SSRN
Automatically Generating Storylines from Microblogging Platforms
2023, Communications in Computer and Information Science
Application of knowledge graph in power system fault diagnosis and disposal: A critical review and perspectives
2022, Frontiers in Energy Research

View all citing articles on Scopus

Peiquan Jin is an associate professor in the School of Computer Science and Technology at the University of Science and Technology of China (USTC). He is currently a member of IEEE and ACM. His research interests include database systems, Web information extraction, and information retrieval. He has published more than 80 papers in peer-reviewed journals and conferences.

Jie Zhao is a professor in the School of Business and at the head of Department of Electronic Business at Anhui University. Her research interests include social network analysis, web information extraction, and business intelligence. She has published more than 50 papers in peer-reviewed journals and conferences.

Enhong Chen is a professor and vice dean of School of Computer Science and Technology of the University of Science and Technology of China (USTC). He is currently CCF Fellow and IEEE Senior Member (Since 2007). His current research interests are data mining and machine learning, especially social network analysis and recommender systems. He has published more than 200 papers on KDD, ICDM, AAAI, etc. He won the Best Application Paper Award on KDD 2008 and the Best Research Paper Award on ICDM 2011.

View full text