Evolutionary game theoretical on-line event detection over tweet streams
Introduction
Over the last decade, On-line Social Networks (OSNs) reached a more and more increasing popularity, becoming a part of every-day life. Just observing what happens every day, millions of people use OSNs to inform friends about important events or to share information within social communities. In particular, microblogging platforms (i.e., Twitter and Snapchat) constitute nowadays the most natural environments where people can continuously report real-life events and share information, personal feelings and sentiments about public or private facts. In addition, the combined use of multimedia (still images, audio, videos) has given a new way of such information spreading, due to the old idiom that an “image is worth a thousands words”.
OSNs and microblogs represent an important origin of information concerning events happening in a given location/country during a certain time period. Reporting those events could provide different perspectives to news items than traditional media due to re-forwarding what read in other posts by adding some personal opinions [1], and also valuable user sentiment about companies/products or a mean for alerts diffusion and emergency situations’ management in a geo-referenced area.
We are witnessing a vast amount of data being exchanges on OSNs, most of them containing replicated information with a light difference among themselves, and single users have the difficulty to follow all of them, with the risk of missing valuable data. It is of pivotal importance to reduce such a redundancy by grouping similar tweets/posts by reporting to the users their salient features (i.e., their most recurrent or important words), expressing the events that have originated them. Such an operation can be done off-line, by collecting a vast of data being exchanged within OSNs and processing them so as to obtain a report to be presented at the user, or on-line, so as to continuously show events to users as the stream of tweets/posts is happening. From the user perspective, the latter approach is more useful as it lowers the latency to get aware of something happening. As an example, OSNs are recently being proposed to detect natural disasters before any public announcements from the authorities, so as to quickly put in place escape plans for the involved population [2].
The basic assumption is that some related words would show an increase in the usage when an event is happening. An event can be thus conventionally represented by a number of keywords showing burst in their appearance count over a tweet/post stream [3]. This is due to the fact that events can cause an information spread over a social community, when users start to share a given tweet or post containing the events itself, by adding some personal opinion or rephrasing the original event content. Thus, it is easy to infer when an event may happen, at it is possible to see a sudden and unpredictable happening of a massive spreading of semantically related tweets/posts over the network within a limited life time, where mostly all of them contains the same number of similar words, or their synonymous. Therefore, event detection is realized by conducting a clustering or linguistic pattern recognition of collected/streamed posts/tweets based on the shown equal/similar words and/or syntactical expressions within them, and returning to the user the clusters with a considerable number of elements (i.e., over a given threshold). There is a relationship between event detection and tweet categorization [4], as tweets need to be clustered and classified, and the content shared among the cluster member can be extracted as representative of the event described by all the clustered tweets. It is therefore possible to conduct the categorization and classification and then realize the event detection, but in this work we preferred to perform both operations at the same time, by driving the classification based on the content similarity of the tweets. Despite in other works, such as [4], [5], tweet classification is done by combining content and structural knowledge, we have preferred to only focus on the tweet textual representation. This is due to the fact that for event detection the underlying topological information of tweets does not represent key knowledge we can exploit; on the contrary to other applications of tweet classification, such as opinion propagation, where such structural knowledge is of upmost importance to study the phenomenon of interest. However, our approach is generic and it can be integrated with a structural content pipeline as done in [4].
Despite limiting the focus of the approach to only nouns and verbs and stemming them to obtain their root, it is hard to return clusters that are meaningful for the users, or the overhead to perform such a task may result unsuitable for an online execution. The current literature has leveraged on the Artificial Intelligence, Statistics, Natural Language Processing, and Big data Analytics for event detection, proposing approaches based on the detection of pre-defined events, or arbitrary events (alternatively defined as supervised and unsupervised detection). Despite pre-defined event detection is simpler to realize, its concrete applications are limited. The arbitrary event detection is more appealing as the approach learns autonomously the events happening within the OSNs, without any pre-configuration. An other classification is among off-line and on-line approaches. On the one hand, the first kind of approaches is based on the storing of tweets/posts within the cloud, and its consequent processing at the end of the day, or month (in the first case). On the other hand, the second class of approaches typically defines a small collection period, which is only some seconds and based on the spreading rate of the streams, and a fast processing for event detection.
The off-line approaches are easy and straightforward to implement by considering the widely known solutions of big data analytics, but implies a considerable delay of the detection time with respect to the event happening. The on-line approaches provide timely results, but may exhibits a lower accuracy than the previous ones, and are tougher to be implemented. Typically off-line approaches are unsupervised, while on-line ones are supervised. This is because the off-line approaches hold the overall view of the exchanged tweets/posts, are equipped with suitable computing resource and do not have any time restrictions, so to be able to look for arbitrary events. On the contrary, on-line approaches needs to fast return a result and have a limited view of the streams (only within the batch), so that they are typically pre-configured with the events of interest to be detected. Such a state-of-the-art literature poses limitations on the applicability and potential impact of such a technology, so it is important to achieve on-line approaches able to detect arbitrary events.
In this paper, we propose a novel solution for real-time detection of events in Twitter, based on the application of information filtering and clustering techniques by adapting the evolutionary clustering in [6] to face the requirements of on-line tweet clustering. Basically, a series of rules to filter the tweets are proposed so as to pre-process them and remove possible noise factors that can compromise the event detection, such as hashtags, retweets and mere syntactic sugar. Afterwards, the tweets are used to fill in a micro-batch and execute clustering. The literature on clustering is extremely vast, and its application to tweets and/or posts has been extensive in the recent decade. However, those related to arbitrary event detection to be run in real-time are limited and exhibit inadequate performance. Our driving idea is to model the overall clustering task as a evolutionary non-cooperative game [6], where players pick an item from the micro-batch or release one based on the reward they can achieve as a measure of item similarity, and the equilibrium is obtained when all the players have the same item selection, expressing a cluster. The evolutionary operators are integrated so as to study the strategy spreading within the population of players and reaching a stable state equilibrium within it (no external strategy can affect the population by bringing one of its individuals to change its strategy with the external one). Such an approach has been favored as the application of evolution brings in the game an indirect form of cooperative behavior, resulting into higher quality levels and lower price of anarchy and distance to the Pareto front of the optimization problem underlying the clustering [7]. Such a proposed work has been implemented within the context of a typical Big Data analytics platform as Apache Cassandra and the Kappa architectural model and assessed with two realistic data-sets obtained from Tweeter, exhibiting higher accuracy and latency than its competing related works.
The paper is organized as in the following. Section 2 describes and analyzes the existing literature on the topic of event detection within the context of OSNs, by indicating their drawbacks. Section 3 presents in details the proposed pre-filtering and clustering approach, while Section 4 illustrates the realized proof-of-concept and its consequent assessment to highlight the achievable performance. We conclude the paper in Section 5 with some final remarks and a plan for future work.
Section snippets
Related works
In the last decade, a plethora of event detection approaches have been proposed in the literature to discover real-world events from social data stream. Recent surveys, such as [8], [9], have already provided a detailed review of the most diffused techniques. In particular, it is possible to characterize event detection approaches using the following classification [10] in two distinct classes: Arbitrary Event Detection and Pre-defined Event Detection. The first group encompasses Clustering,
Methodology
The proposed approach, whose process overview is depicted in Fig. 1, aims at unveiling events from a social stream, using an on-line clustering technique based on the Game Theory.
To this aim, the first subsection introduces a series of definitions supporting the description of our on-line clustering solution. The second subsection describes a pre-processing phase to be done before performing the clustering, which is described in the two subsections. Specifically, the third subsection is then
Experimental results
In this section, we present the proof-of-concepts of our proposed approach and its consequent assessment, where some experimental results related to the efficiency and efficacy of the proposed approach are presented by considering the Twitter social data stream.
We have also empirically measured the latency of our approach, which in theory has a complexity estimated by the work in [6] to be , if pure strategies are considered, or equal to in case of mixed strategies, where is the
Conclusions and future work
In order to cope with the massive amount of tweets/posts a given event can produce, users demands proper means to detect events and to get informed about them without having to go through all the exchanged data and clean them from syntactical sugar. To this aim, this study has analyzed the problems and research challenges underlying on-line detection of arbitrary events within OSNs, and proposed a rule-based approach for filtering tweets and a game theoretical clustering to group tweets and
CRediT authorship contribution statement
Rocco Di Girolamo: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing - original draft, Writing - review & editing. Christian Esposito: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing - original draft, Writing - review & editing. Vincenzo Moscato: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing - original draft, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
All authors approved the version of the manuscript to be published.
Rocco Di Girolamo is a Big Data Engineer at the Microlise company. He received his master’s degree in computer engineering from the University of Naples “Federico II” in 2018. His research interests include real-time architectures for big data and the application of data mining and artificial intelligence techniques.
References (38)
- et al.
A review on trust propagation and opinion dynamics in social networks and group decision making frameworks
Inform. Sci.
(2019) - et al.
Tweet categorization by combining content and structural knowledge
Inf. Fusion
(2016) - et al.
A social-aware online short-text feature selection technique for social media
Inf. Fusion
(2018) - et al.
I-TWEC: Interactive clustering tool for Twitter
Expert Syst. Appl.
(2018) - et al.
An automated text categorization framework based on hyperparameter optimization
Knowl.-Based Syst.
(2018) - et al.
Emerging topic detection in twitter stream based on high utility pattern mining
Expert Syst. Appl.
(2019) - et al.
Imbalanced text sentiment classification using universal and domain-specific knowledge
Knowl.-Based Syst.
(2018) - et al.
Enhanced Heartbeat Graph for emerging event detection on Twitter using time series networks
Expert Syst. Appl.
(2019) - et al.
Developing a twitter-based traffic event detection model using deep learning architectures
Expert Syst. Appl.
(2019) - et al.
Geoburst+: effective and real-time local event detection in geo-tagged tweet streams
ACM Trans. Intell. Syst. Technol. (TIST)
(2018)
Detecting life events from twitter based on temporal semantic features
Knowl.-Based Syst.
Extreme events management using multimedia social networks
Future Gener. Comput. Syst.
Natural disasters detection in social media and satellite imagery: a survey
Multimedia Tools Appl.
Bursty and hierarchical structure in streams
Data Min. Knowl. Discov.
A game-theoretic approach to hypergraph clustering
IEEE Trans. Pattern Anal. Mach. Intell.
Infection and immunization: A new class of evolutionary game dynamics
Games Econom. Behav.
A survey of techniques for event detection in twitter
Comput. Intell.
Survey and experimental analysis of event detection techniques for twitter
Comput. J.
Cited by (37)
Kdb-D2CFR: Solving Multiplayer imperfect-information games with knowledge distillation-based DeepCFR
2023, Knowledge-Based SystemsTaxonPrompt: Taxonomy-aware curriculum prompt learning for few-shot event classification
2023, Knowledge-Based SystemsOptimized Ensemble Approach for Multi-model Event Detection in Big data
2023, International Journal on Recent and Innovation Trends in Computing and CommunicationAnalysis of Hospital Admissions of Neurological Patients in the COVID-19 Era: Comparison Between Hospitals
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Machine Learning Algorithms to Predict Healthcare Associated Infections in a Neonatal Intensive Care Unit
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Comparison Between Two Hospitals to Study the Impact of COVID-19 on Emergency Medicine Activities
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Rocco Di Girolamo is a Big Data Engineer at the Microlise company. He received his master’s degree in computer engineering from the University of Naples “Federico II” in 2018. His research interests include real-time architectures for big data and the application of data mining and artificial intelligence techniques.
Christian Esposito is currently a Tenured Assistant Professor at the University of Salerno, and received the National Qualification in Italy as Associate Professor in Computer Engineering and Computer Science, respectively in May 2017, and July 2018. He was an Assistant Professor at the University of Napoli “Federico II”, and a two-year Research Fellow and short-term Researcher at the Institute of High Performance Computing and Networking (ICAR) of the Italian National Research Council (CNR) from 2011 to 2015. He graduated in Computer Engineering in 2006 and got his PhD in 2009, both at the University of Naples “Federico II”, in Italy. He has published about 100 papers at international journals and conferences, and has been a PC member or involved in the organization of about 60 international conferences/workshops. He regularly serves as a reviewer in journals and conferences in the field of distributed and dependable systems and is member of the editorial board of the International Journal of Computational Science and Engineering and the International Journal of High Performance Computing and Networking, both by Inderscience. He is Associate Editor of the IEEE Access, and has served as guest editor for various special issues at international journals. His interests include positioning systems, reliable and secure communications, game theory and multi-objective optimization.
Vincenzo Moscato is an Associate Professor at the Electrical Engineering and Information Technology Department of University of Naples “Federico II”. He received the Ph.D. degree in Computer Science from the same University by defending the thesis: “Indexing Techniques for Image and Video Databases: an approach based on Animate Vision Paradigm”. He is one of the leaders of PICUS (Pattern and Intelligence Computation for mUltimedia Systems) departmental research groups and a member of the Big Data and Artificial Intelligence national laboratories within the Consorzio Interuniversitario Nazionale per l’1Informatica (CINI). His research activities lay in the area of Multimedia, Big Data, Artificial Intelligence and Social Network Analysis. He was involved in many national and international research projects and coordinated as principal investigator. He was in the program committees of numerous international conferences and in the editorial boards of several important journals. Finally, he was an author of about 160 publications on international journals, conferences proceedings and book chapters.
Giancarlo Sperlí is a Research Fellow at the Department of Electrical and Computer Engineering of the University of Naples “Federico II”. He obtained his PhD in Information Technology and Electrical Engineering at the same University defending his thesis: “Multimedia Social Networks”. He is a member of the PICUS (Pattern and Intelligence Computation for mUltimedia Systems) departmental research group. His main research interests are in the area of Cybersecurity, Semantic Analysis of Multimedia Data and Social Networks Analysis. Finally, he has authored about 50 publications in international journals, conference proceedings and book chapters.