research-article

Analysing political events on Twitter: topic modelling and user community classification

Author:

Anjie FangAuthors Info & Claims

ACM SIGIR Forum, Volume 53, Issue 1

Pages 38 - 39

https://doi.org/10.1145/3458537.3458542

Published: 23 March 2021 Publication History

Get Access

Abstract

Recently, political events, such as elections, have raised a lot of discussions on social media networks, in particular, Twitter. This brings new opportunities for social scientists to address social science tasks, such as understanding what communities said or identifying whether a community has an influence on another. However, identifying these communities and extracting what they said from social media data are challenging and non-trivial tasks.

We aim to make progress towards understanding 'who' (i.e. communities) said 'what' (i.e. discussed topics) and 'when' (i.e. time) during political events on Twitter. While identifying the 'who' can benefit from Twitter user community classification approaches, 'what' they said and 'when' can be effectively addressed on Twitter by extracting their discussed topics using topic modelling approaches that also account for the importance of time on Twitter. To evaluate the quality of these topics, it is necessary to investigate how coherent these topics are to humans. Accordingly, we propose a series of approaches in this thesis.

First, we investigate how to effectively evaluate the coherence of the topics generated using a topic modelling approach. The topic coherence metric evaluates the topical coherence by examining the semantic similarity among words in a topic. We argue that the semantic similarity of words in tweets can be effectively captured by using word embeddings trained using a Twitter background dataset. Through a user study, we demonstrate that our proposed word embedding-based topic coherence metric can assess the coherence of topics like humans [1, 2]. In addition, inspired by the precision at k metric, we propose to evaluate the coherence of a topic model (containing many topics) by averaging the top-ranked topics within the topic model [3]. Our proposed metrics can not only evaluate the coherence of topics and topic models, but also can help users to choose the most coherent topics.

Second, we aim to extract topics with a high coherence from Twitter data. Such topics can be easily interpreted by humans and they can assist to examine 'what' has been discussed and 'when'. Indeed, we argue that topics can be discussed in different time periods (see [4]) and therefore can be effectively identified and distinguished by considering their time periods. Hence, we propose an effective time-sensitive topic modelling approach by integrating the time dimension of tweets (i.e. 'when') [5]. We show that the time dimension helps to generate topics with a high coherence. Hence, we argue that 'what' has been discussed and 'when' can be effectively addressed by our proposed time-sensitive topic modelling approach.

Next, to identify 'who' participated in the topic discussions, we propose approaches to identify the community affiliations of Twitter users, including automatic ground-truth generation approaches and a user community classification approach. We show that the mentioned hashtags and entities in the users' tweets can indicate which community a Twitter user belongs to. Hence, we argue that they can be used to generate the ground-truth data for classifying users into communities. On the other hand, we argue that different communities favour different topic discussions and their community affiliations can be identified by leveraging the discussed topics. Accordingly, we propose a Topic-Based Naive Bayes (TBNB) classification approach to classify Twitter users based on their words and discussed topics [6]. We demonstrate that our TBNB classifier together with the ground-truth generation approaches can effectively identify the community affiliations of Twitter users.

Finally, to show the generalisation of our approaches, we apply our approaches to analyse 3.6 million tweets related to US Election 2016 on Twitter [7]. We show that our TBNB approach can effectively identify the 'who', i.e. classify Twitter users into communities. To investigate 'what' these communities have discussed, we apply our time-sensitive topic modelling approach to extract coherent topics. We finally analyse the community-related topics evaluated and selected using our proposed topic coherence metrics.

Overall, we contribute to provide effective approaches to assist social scientists towards analysing political events on Twitter. These approaches include topic coherence metrics, a time-sensitive topic modelling approach and approaches for classifying the community affiliations of Twitter users. Together they make progress to study and understand the connections and dynamics among communities on Twitter.

Supervisors: Iadh Ounis, Craig Macdonald, Philip Habel

The thesis is available at http://theses.gla.ac.uk/41135/

References

[1]

Anjie Fang, Craig Macdonald, Iadh Ounis, and Philip Habel. Topics in tweets: A user study of topic coherence metrics for Twitter data. In Proc. of ECIR, 2016.

Crossref

Google Scholar

[2]

Anjie Fang, Craig Macdonald, Iadh Ounis, and Philip Habel. Using word embedding to evaluate the coherence of topics from Twitter data. In Proc. of SIGIR, 2016.

Digital Library

Google Scholar

[3]

Anjie Fang, Craig Macdonald, Iadh Ounis, and Philip Habel. Examining the coherence of the top ranked tweet topics. In Proc. of SIGIR, 2016.

Digital Library

Google Scholar

[4]

Anjie Fang, Iadh Ounis, Craig MacDonald, Philip Habel, Xiaoyu Xiong, and Hai-Tao Yu. An effective approach for modelling time features for classifying bursty topics on Twitter. In Proc. of CIKM, 2018.

Digital Library

Google Scholar

[5]

Anjie Fang, Craig Macdonald, Iadh Ounis, Philip Habel, and Xiao Yang. Exploring time-sensitive variational bayesian inference LDA for social media data. In Proc. of ECIR, 2017.

Crossref

Google Scholar

[6]

Anjie Fang, Iadh Ounis, Philip Habel, Craig Macdonald, and Nut Limsopatham. Topiccentric classification of Twitter user's political orientation. In Proc. of SIGIR, 2015.

Digital Library

Google Scholar

[7]

Anjie Fang, Philip Habel, Iadh Ounis, and Craig MacDonald. Votes on twitter: Assessing candidate preferences and topics of discussion during the 2016 us presidential election. SAGE Open, 9(1):1--17, 2019.

Crossref

Google Scholar

Cited By

View all

Yıldırım AUskudarli S(2020)Microblog topic identification using Linked Open DataPLOS ONE10.1371/journal.pone.023686315:8(e0236863)Online publication date: 11-Aug-2020
https://doi.org/10.1371/journal.pone.0236863

Recommendations

Brand-Related Events Detection, Classification and Summarization on Twitter
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

The huge and ever increasing amount of text generated by Twitter users everyday embeds a wealth of information, in particular, about themes that become suddenly relevant to many users as well as about the sentiment polarity that users tend to associate ...
Linguistic Factors Associated With Propagation of Political Opinions in Twitter

In the last decade, social networks have emerged as a significant media platform for dissemination of political information. In this article, we characterize linguistic factors that affect the dissemination of a political message on Twitter. This is ...
Political Opinion Mining from Twitter

Twitter is one of the most popular micro-blogging platform for people to express their political views in and around the elections. Hence during pre-elections twitter becomes a rich resource of data to understand the changing tenor of political leaders ...

Comments

Information & Contributors

Information

Published In

ACM SIGIR Forum Volume 53, Issue 1

June 2019

43 pages

ISSN:0163-5840

DOI:10.1145/3458537

Issue’s Table of Contents

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 March 2021

Published in SIGIR Volume 53, Issue 1

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
143
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Yıldırım AUskudarli S(2020)Microblog topic identification using Linked Open DataPLOS ONE10.1371/journal.pone.023686315:8(e0236863)Online publication date: 11-Aug-2020
https://doi.org/10.1371/journal.pone.0236863

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Brand-Related Events Detection, Classification and Summarization on Twitter

Linguistic Factors Associated With Propagation of Political Opinions in Twitter

Political Opinion Mining from Twitter

Comments

Published In

Publisher

Publication History

Check for updates

Qualifiers

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Recommendations

Brand-Related Events Detection, Classification and Summarization on Twitter

Linguistic Factors Associated With Propagation of Political Opinions in Twitter

Political Opinion Mining from Twitter

Comments

Information

Published In

Publisher

Publication History

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations