Twitter trends: A ranking algorithm analysis on real time data

https://doi.org/10.1016/j.eswa.2020.113990Get rights and content

Highlights

  • The proposal of trend detection method from a stream of tweets.

  • Precise prediction & detection of real-time Twitter trends along with terms ranking.

  • Discusses applications of proposed research in business, industries and advertisers.

Abstract

Social media has recently become popular due to its vast applications. The common people all over the world uses its diverse channels to express personal views, experiences and opinions regarding diverse topics. Social media has revolutionized the way people interact and communicate with each other and overall, it has changed the methods and approaches in about all the aspects of life such as social issue, business, education, health, etc. Thus, sales and marketing departments of multinational industries are focusing on social media trends to analyze current trends and predict future trends by analyzing user generated content on Facebook, Flickr, Twitter, etc. However, the prediction process becomes challenging as the multiplicity of factors affect the popular elements in the social media content. This research paper aims to work on Twitter trend analysis and proposes a trend detection process over streams of tweets. The proposed approach detects the trending topics of the real-time Twitter trends along with ranking the top terms and hashtags. The paper further discusses the motivation for trend prediction over the social media; In addition to exploratory data analysis, the research paper explores the Term Frequency-Inverse Document Frequency (Tf-IDF), Combined Component Approach (CCA) and Biterm Topic Model (BTM) approaches for finding the topics and terms within given topics. In modern competitive world, this research provides investors, advertisers, industries and all the stakeholders. A detailed and comprehensive data analysis which may help them to focus their investment, area of work, marketing, and product.

Introduction

The users’ content generation facility of the social web has changed the conventional World Wide Web (WWW) into the social web. The social web platforms have defined new approaches to interact, and the way people used to live. Now, people can easily communicate with each other all over the world and shares their views about all different topics such as politics, education, health, etc. The transmittance of news is the best example that illustrates the rapid exchange of information all around the world. This method has set people free form government or big organizations-based media options such as television channels for spread of news and information at global level in an instant. Technologies like mobiles, tablets, and smartphones are used to create a plethora of data and spread it through the internet every day on blogs, websites and social network services (SNS’s) like Twitter, YouTube, Facebook, Twitter, etc. These online networks contain information related to personal experiences, opinions, ideas, thoughts, and ideas of people in various modes (Ye, Law, Gu, & Chen, 2011). One can predict the opinions and behaviors based on the opinions and experiences of these people through the content they are posting on these social networks (Figueiredo, Almeida, Gonçalves, & Benevenuto, 2016). As Zhang et al. (2012) are of the view that by collecting the real-time user-generated content related to voting, casting can be used to foresee the election results. The SNS is a platform which uses user-generated data to create relationships and social networks. These networks operate on the Internet and had 2.46 billion users in the whole world in 2017 (Pesonen, 2018). These networks have not only become a part of everyday modern life, but are now an important area of research. They allow researchers on SNS to work on various inquiry topics: the circulation of information in SNS (Scanfeld et al., 2010, Naveed et al., 2011, Weeks et al., 2013), social network structures (Krumm et al., 2008, Kwak et al., 2010, Leskovec et al., 2009), predictions/speculations (Paul and Dredze, 2011, Becker et al., 2010, Figueiredo et al., 2016, Papacharissi et al., 2012) and the influence they have on other resources (Hermida and Thurman, 2008, Ye et al., 2011, Zhang et al., 2012, Papacharissi et al., 2012, Bruns and Burgess, 2012, Lee and Ma, 2012).

The reason behind a hard and hefty research being done in the SNS is due to the importance of the information they can provide. Many of the academic researchers believe that it is important to understand the relation between SNS and news, how they influence each other (Weeks et al., 2013, Lerman and Ghosh, 2010, Lee and Ma, 2012, Nielsen and Schrøder, 2014, Tsagkias et al., 2011) but no one has researched in the characteristics of the contents of each field. Tsagkias, Weerkamp, and De Rijke (2009) state that it is important to study and analyze the characteristics of multiple closely related datasets to understand the hidden informative patterns.

The language of SNS is informal, its grammar is also informal, its keywords are not related to the content, its topic range is private and ranges wide beyond its topic boundaries. This is why it becomes more difficult to encompass any specific event. Text analysis and mining algorithms extract current trends from various text sources which depict the prevailing topics in a particular society. This paper performs trend analysis on the basis of a framework which consists of a number of steps. The first step is a data extraction; it is done by using Twitter streaming API (Application Program Interface). The second step is preprocessing, this step includes data cleaning, data integration, and data reduction. Step three is to calculate Term Frequency-Inverse Document Frequency (TF-IDF), it consists of calculating Term Frequency (TF), calculating Inverse Document Frequency (IDF), and calculating Combined Component Approach (CCA). The fourth and last step is ranking of top topics. This paper analyzes Twitter data by using ranking algorithms. It further extracts helpful and valuable characteristic features of SNS which can also be used in further researches. Overall, the major research contributions of this work are given as follows:

  • In order to find the relevant trends, we use TF-IDF and CCA model to find the relevant topics for the most frequently occurred keywords in the collection of tweets.

  • We use BTM to find the topics in the tweets collection on the basis of different categories. BTM model is ideal for short texts specially tweets as it works on the basis of word co-occurrence patterns and aggregated patterns in the whole corpus for learning topics.

  • We also analyze the frequent keywords of the trending topics from different perspectives and then discuss their characteristics in detail.

  • We perform a detailed analysis on how Twitter trends change in a particular period of time using the real-time data. In this regard, we analyzed the data of trending Twitter topics that are changing with continuously with time.

The following paper is organized as: Section 2 discusses the related work. Section 3 presents the Problem statement and Research Contributions. Section 4 discusses Research Methodology while Section 5 discusses the Experimental Setup. Section 6 encompasses the analysis done on the datasets. The paper analyzes the common trends of the SNS data; furthermore, it also explores the efficiency of the ranking algorithm. While Section 6 discusses the Research implications before concluding the paper in the next section.

Section snippets

Related work

Numerous academic researchers have done work on the relationship between SNS datasets and the topics used in the real-world (Paul and Dredze, 2011, Wu et al., 2017, Wu et al., 2017). They have used SNS data to explore the effects and causes to particular aspects and to understand the public behavioral patterns in certain circumstances.

The connection between the hotel reservations and the traveler reviews about it is explored (Ye et al., 2011). Their study revealed that if positive reviews about

Problem statement and research objectives

This section discusses the problem statement and major objective of this research.

Research methodology

The approach used in this research is based upon another similar model which is based on news and social media services (Jang and Yoon, 2018). Fig. 1 illustrates the proposed framework showing steps of the research methodology carried out to accomplish this research study. First of all, the data extraction module describes that the data was prepared using Twitter streaming API and the results are stored in the .xlsx file. Then, the next step depicts the data preprocessing steps such as data

Experimental setup

This section introduces the method of selection of dataset for collected corpus, and then presents the ordinary characteristics of data from SNS. Table 1 summarizes the source of the data for Social Network Services SNS that is collected. Data is collected using the Twitter application programming interface API that extracts data from Twitter sources. Datasets were collected for 15 days from November 13 to November 28, 2018. The Twitter API collects data of 63,538 tweets with total volume of

Results and discussions

This section of the paper analyzes the characteristics of social network service data and scrutinizes the efficiency of the ranking algorithm. (Salton, Buckley, & management, 1988) summarized the gains in automatic term weighting and also provided different term indexing models with which the content analysis procedures can be compared. (Robertson & Walker, 1999) provided with a Basic Search System (BSS) using weighting functions and term ranking for selection. They compared the significance of

Research implication

Top research implications of Twitter trend detection are event detection that help in finding what is going on around the world and in the country, top news view as most of the Twitter users are talking about news, influential user’s detection as sometimes there are particular users behind the trends, and many more. Other applications include timeline ranking and search query expansion.

Conclusion and future work

This study is conducted to analyze the social trends of people regarding different aspects of life. The activity on social media networks, blogs and microblogs (as Twitter) is mainly executed through texts, links, informative content and images. Some content attracts more attention, especially the content involving visual images, tagging, commenting, links and sharing. Popular content among a particular set of people indicate current trends, highly alert news, interests or information.

CRediT authorship contribution statement

Hikmat Ullah Khan: Conceptualization, Validation, Formal analysis, Investigation, Writing - review & editing, Resources, Supervision, Project administration. Shumaila Nasir: Methodology, Formal analysis, Data curation, Writing - original draft. Kishwar Nasim: Software, Data curation, Formal analysis. Danial Shabbir: Methodology, Visualization. Ahsan Mahmood: Methodology, Software, Writing - review & editing, Visualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (53)

  • Bruns, A. & Burgess, J. J. J. S. (2012). Researching news discussion on Twitter: New methodologies. 13(5-6),...
  • de Almeida, H. M., Gonçalves, M. A., Cristo, M. & Calado, P. (2007). A combined component approach for finding...
  • del Pilar Salas-Zárate, M., Medina-Moreira, J., Álvarez-Sagubay, P. J., Lagos-Ortiz, K., Paredes-Valverde, M. A. &...
  • A. Dhir et al.

    Tweeters on campus: Twitter a learning tool in classroom?

    Journal of Universal Computer Science

    (2013)
  • Figueiredo, F., Almeida, J. M., Gonçalves, M. A. & Benevenuto, F. J. I. S. (2016). Trendlearner: Early prediction of...
  • Fox, E. A., & Shaw, J. A. J. N. s. p. S. (1994). Combination of multiple searches....
  • L.K. Hansen et al.

    Good friends, bad news-affect and virality in Twitter

  • Hermida, A. & Thurman, N. J. J. p. (2008). A clash of cultures: The integration of user-generated content within...
  • B. Jang et al.

    Characteristics analysis of data from news and social network services

    IEEE Access

    (2018)
  • E. Kassens-Noor

    Twitter as a teaching practice to enhance active and informal learning in higher education: The case of sustainable tweets

    Active Learning in Higher Education

    (2012)
  • Krumm, J., Davies, N. & Narayanaswami, C. J. I. P. C. (2008). User-generated content. 7(4),...
  • Kwak, H., Lee, C., Park, H. & Moon, S. (2010). What is Twitter, a social network or a news media? Paper presented at...
  • Lee, C. S. & Ma, L. J. C. i. h. b. (2012). News sharing in social media: The effect of gratifications and prior...
  • Lerman, K. & Ghosh, R. J. I. (2010). Information contagion: An empirical study of the spread of news on Digg and...
  • Leskovec, J., Backstrom, L. & Kleinberg, J. (2009). Meme-tracking and the dynamics of the news cycle. Paper presented...
  • Ahsan Mahmood et al.

    On Modelling for Bias-Aware Sentiment Analysis and Its Impact in Twitter

    Journal of Web Engineering

    (2020)
  • Cited by (33)

    • Identifying the influential nodes in complex social networks using centrality-based approach

      2022, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      One of the most important applications include viral marketing by accelerating the flow of information for marketing different products and services (Bakshy et al., 2012). A comprehensive analysis of influence patterns can help in formulating effective marketing strategies for understanding rapid shifts in specific trends that can provide unique marketing edge or other valuable gains (Ishfaq et al., 2017; Khan et al., 2021). In addition, curbing the spread of unwanted content, negative behavior, viruses are some of the popular applications of influential user mining (Ma et al., 2016; Xia et al., 2015).

    • Optimized Ensemble Approach for Multi-model Event Detection in Big data

      2023, International Journal on Recent and Innovation Trends in Computing and Communication
    View all citing articles on Scopus
    View full text