Detecting evolutionary stages of events on social media: A graph-kernel-based approach

https://doi.org/10.1016/j.future.2021.05.006Get rights and content

Highlights

  • We propose a new KPIG graph to represent events on social media.

  • We propose a graph-kernel method to detect events’ evolutionary stages.

  • We report experimental results on a real dataset and present a case study.

Abstract

Detecting the evolutionary stages of social media events such as Twitter and Sina Weibo is beneficial for enterprises and governments to take necessary actions before emergent phenomena become uncontrollable. Prior work on event extraction from microblogs mostly focused on extracting event summary. However, the evolutionary stages of an event can provide more details about the evolution of the event, which are more critical to predict the future trend of the event. On the other hand, many events have a lifecycle-like evolutionary property, i.e., beginning, developing, climax, descending, and disappearing. Such a feature is helpful to detect the evolutionary stages of events. Thus, we propose a graph-based approach to represent and extract the evolutionary stages of events from microblogs based on this consideration. The contributions of this study are threefold. First, differing from existing methods that use a keyword set or a microblog set to represent an event, we propose a Keyword Popularity Information Graph (KPIG) to represent the keywords and the statistical information of events using a graph. With this mechanism, we can capture both the textual information and the statistical information of an event. Second, based on the KPIG graph, we present a graph-kernel-based approach to measure the similarity among events. Third, we conduct extensive experiments on a real dataset and compare our proposal with several competitor algorithms. The results show that our approach outperforms other competitors in terms of various metrics.

Introduction

Social media analysis has become a hot research topic in recent years [1]. Social media characteristics, such as timeliness, high user participation, and rapid diffusion, provide new opportunities for the early detection and evolution prediction of events. The evolution process of events usually has a specific lifecycle. However, existing research lacks effective methods to accurately extract the evolutionary lifecycle of social media events, which may also affect the accuracy of evolution prediction for events. Detecting the evolution of events helps predict the future trend of events. It is also useful for sentiment analysis over social media because the public sentiment to an event also has the evolutionary property.

Many events in the real world have evolutionary stages, i.e., from birth to death, which is similar to people’s lifecycle. An event’s lifecycle can be defined as a process, including several stages such as beginning, developing, climax, descending, and disappearing, as shown in Fig. 1(a). In addition, there are also some events involving multiple peaks, as shown in Fig. 1(b). The multi-climax event evolution shows an iterative process. However, when we consider social media, it is not a trial task to extract the evolutionary stages of events from social media. There are two challenges:

Challenge 1. The evolution of an event is hard to be captured only by statistical information such as the number of tweets or comments related to the event. For example, if an earthquake event occurred a few days ago, the number of tweets talking about the earthquake will reach a high value. If this number gradually decreases with time and we only consider the number of tweets related to the event, we may conclude that the event has reached the climax and started to degrade. However, this is not always true in the real world. For instance, an earthquake usually incurs many other issues like rescuing, casualties, and city rebuilding. To this end, the event is still at the developing stage. As a result, we need to consider more information like textual information other than statistical event information.

Challenge 2. It is a challenging issue to represent the statistical and textual information of an event effectively. Previous work mainly used keywords or microblogs to describe an event [2], [3], which ignored the linkage information among events and microblogs. Although some works performed evolutionary analysis on events by arranging and linking events according to the timeline [4], [5], they were not enough to detect the evolutionary stages of events because they could measure the similarity among events effectively.

To address the two challenges in extracting the evolutionary stages of events from microblogs, in this paper, we propose a Keyword Popularity Information Graph (KPIG) to represent events and a graph-kernel [6], [7], [8] based approach to detect the evolutionary stages of events. Notably, we make the following three contributions in this paper:

(1) We propose a KPIG graph to integrate textual, statistical, and linkage information for events. Compared to the previous keyword-set based or microblog-set based models, the KPIG graph can provide more rich information about events.

(2) We propose a graph-kernel-based method that runs on KPIG graphs to detect events’ evolutionary stages. Specially, we employ a shortest-path-based graph kernel to measure the similarity and changes between KPIG graphs.

(3) We report extensive experimental results on a real microblog dataset and present a case study to show the effectiveness of our proposal. The results show that our proposal outperforms existing approaches in terms of various metrics.

The rest of this paper is organized as follows. Section 2 provides an overview of the related work. In Section 3, we introduce the KPIG graph. Section 4 presents the graph-kernel-based method to compute the similarity and changes of KPIG graphs. In Section 5, we report the experimental results on a real dataset. Finally, we conclude our work in Section 6.

Section snippets

Related work

This section reviews some of the literature related to our algorithm, including topic detection and tracking, graph kernel, and graph-based text similarity. Event detection for microblogs (or microblogs-like data) has wealthy literature. However, in this study, we focus on detecting events’ evolutionary stages rather than events themselves. Therefore, in this section, we do not cover the existing work about event extraction.

Keyword popularity information graph

In this section, we present the KPIG graph. Table 3 shows the notations used throughout the entire paper. Below, we first present the basic definitions for events in Section 3.1 and then detail the KPIG graph in Section 3.2.

Overview of evolutionary stage detection

Fig. 4 shows the framework for detecting the evolutionary stages of events. First, we preprocess the raw tweet stream dataset, including word segmentation and removing stop words. Second, we divide the tweet stream dataset into sub-sets; each sub-set contains the tweet stream published in the same time interval (e.g., one hour or one day), which is a sub-event. We process these sub-events in chronological order. Third, we run the term clustering method to cluster different expressions that

Performance evaluation

In this section, we compare our proposal with other methods to evaluate effectiveness. We prepare the dataset by crawling posts from Sina Weibo (http://weibo.com). There are several ways to obtain the input dataset of the algorithm. The first way is to use the algorithm proposed in [8] to extract events from the microblog dataset and then link the sub-events that belong to the same event in different periods. Finally, it gets the evolution of texts and events in different periods. When crawling

Conclusions and future work

Detecting the evolutionary stages of events on social media is a new and important issue in social computing. In this paper, we have made unique contributions to this research direction. In summary, the contributions of this study are threefold. First, we proposed a new KPIG to represent social media events. We demonstrated that the KPIG approach can represent the keywords of an event and its statistical information. Second, based on the KPIG structure, we proposed a new approach based on SPGK

CRediT authorship contribution statement

Lin Mu: Methodology, Software, Writing - original draft. Peiquan Jin: Conceptualization, Methodology, Writing - review & editing. Jie Zhao: Conceptualization, Writing - review & editing, Funding acquisition. Enhong Chen: Supervision, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the National Science Foundation of China (no. 62072419 and no. 71273010) and the National Statistical Science Research Project of China (no. 2019LY66). Peiquan Jin and Jie Zhao are joint corresponding authors.

Lin Mu is currently a Ph.D. student in the School of Computer Science and Technology at the University of Science and Technology of China (USTC). His research interests include social network analysis and information extraction.

References (45)

  • G. Nikolentzos, P. Meladianos, M. Vazirgiannis, Matching node embeddings for graph similarity, in: Proceedings of the...
  • ShervashidzeN. et al.

    Weisfeiler-lehman graph kernels

    J. Mach. Learn. Res.

    (2011)
  • AllanJ.

    Introduction to topic detection and tracking

  • YangC.C. et al.

    Discovering event evolution graphs from news corpora

    IEEE Trans. Syst. Man Cybern. A

    (2009)
  • P. Lee, L.V. Lakshmanan, E.E. Milios, Incremental cluster evolution tracking from highly dynamic network data, in:...
  • C. Wu, B. Wu, B. Wang, Event evolution model based on random walk model with hot topic extraction, in: Proceedings of...
  • HuangJ. et al.

    A probabilistic method for emerging topic tracking in microblog stream

    World Wide Web

    (2017)
  • M. Fedoryszak, B. Frederick, V. Rajaram, C. Zhong, Real-time event detection on social data streams, in: Proceedings of...
  • HausslerD.

    Convolution Kernels on Discrete StructuresTechnical Report

    (1999)
  • J. Ramon, T. Gärtner, Expressivity versus efficiency of graph kernels, in: Proceedings of the 1st International...
  • T. Horváth, T. Gärtner, S. Wrobel, Cyclic pattern kernels for predictive graph mining, in: Proceedings of the 10th ACM...
  • K.M. Borgwardt, H.P. Kriegel, Shortest-path kernels on graphs, in: Proceedings of the Fifth IEEE International...
  • Cited by (10)

    • Brain Network Analysis of Patients with ADHD Based on Subnetwork Similarity

      2023, Shuju Caiji Yu Chuli/Journal of Data Acquisition and Processing
    • Automatically Generating Storylines from Microblogging Platforms

      2023, Communications in Computer and Information Science
    View all citing articles on Scopus

    Lin Mu is currently a Ph.D. student in the School of Computer Science and Technology at the University of Science and Technology of China (USTC). His research interests include social network analysis and information extraction.

    Peiquan Jin is an associate professor in the School of Computer Science and Technology at the University of Science and Technology of China (USTC). He is currently a member of IEEE and ACM. His research interests include database systems, Web information extraction, and information retrieval. He has published more than 80 papers in peer-reviewed journals and conferences.

    Jie Zhao is a professor in the School of Business and at the head of Department of Electronic Business at Anhui University. Her research interests include social network analysis, web information extraction, and business intelligence. She has published more than 50 papers in peer-reviewed journals and conferences.

    Enhong Chen is a professor and vice dean of School of Computer Science and Technology of the University of Science and Technology of China (USTC). He is currently CCF Fellow and IEEE Senior Member (Since 2007). His current research interests are data mining and machine learning, especially social network analysis and recommender systems. He has published more than 200 papers on KDD, ICDM, AAAI, etc. He won the Best Application Paper Award on KDD 2008 and the Best Research Paper Award on ICDM 2011.

    View full text