Time-aware adaptive tweets ranking through deep learning

https://doi.org/10.1016/j.future.2017.07.039Get rights and content

Highlights

  • Time-aware adaptive and personalized learning to rank algorithm for tweets.

  • Comparative multilayer perceptron feed-forward neural network to train comparison among tweets.

  • Tweet content wikification to semantically categorize the posts by linking tweet text to Wikipedia articles.

  • Ranking model evaluation by measuring how many re-tweeted tweets are included in the top-ranked tweet list.

Abstract

Generally, tweets about brands, news and so forth, are mostly delivered to the Twitter user in a reverse chronological order choosing among those twitted by the so-called followed users. Recently, Twitter is facing with information overload by introducing new filtering features, such as “while you are away”, in order to show only a few tweets summarizing the posted ones, and ranking the tweets considering the quality, in addition to timeliness. Trivially enough we state that the strategy to rank the tweets to maximize the user engagement and, why not, augmenting the tweet and re-tweet rates, is not unique. There are several dimensions affecting the ranking, such as time, location, semantic, publisher authority, quality, and so on. We point out that the tweet ranking model should vary according to the user’s context, interests and how those change along the timeline, cyclically, weekly or at specific date-time when the user logs in.

In this work, we introduce a deep learning method attempting to re-adapt the ranking of the tweets by preferring those that are more likely interesting for the user. User’s interests are extracted by mainly considering previous user re-tweets, replies and also the time when they occurred.

We evaluate a ranking model by measuring how many tweets that will be re-tweeted in the near future were included in the top-ranked tweet list. The results of the proposed ranking model revealed good performances overcoming the methods that consider only the reverse-chronological order or user’s interest score. In addition, we pointed out that in our dataset the most impacting features on the performance of proposed ranking model are: publisher authority, tweet content measures, and time-awareness.

Introduction

Context. Nowadays, we are assisting to a social data explosion. Facebook and Twitter are very popular communication platforms so that they are playing an important role in cultural, social, and political events. Social networking is a core part of the online experience [1]. Nevertheless, tons of tweets are daily posted, thousands of them happen every second and people are overwhelmed by the incoming information. Posts are authored by anyone from wherever around the world, and so, Twitter and Facebook have become attractive for spammers [2] compromising also the worth of the information source. The provisioning of the valuable tweet at the right time requires facing with information overload problem introducing filtering and ranking methods considering the user’s interests, the activity that he/she is performing, the quality and relevance of the content, and so on.

In general, tweets are mostly delivered to the user in a reverse chronological order by considering ones that are published by the followed users. Recently, Twitter is facing with information overload proposing a new version of its timeline that ranks tweets by considering also the quality1 and the relevance2 in addition to the timeliness as stated in the official blog. New features are available on Twitter to show you relevant tweets list “in case you missed it”, to give you a subset of tweets based on their popularity, and how you interact with the tweet publisher. From the research point of view, some works are dealing with information overload on Twitter by defining tweets recommendation algorithms [3], [4], personalized ranking [5], filtering, and summarization [6], [7], [8], [9], [10], [11], customized according to several criteria.

In this paper, we emphasize that there is no unique and optimal criteria to rank the tweets maximizing users engagement and, why not, augmenting tweet and re-tweet rates creating more live commentary and conversations. There are several dimensions affecting the ranking, such as time, location, semantic, interestingness, publisher authority, and so on. Their impact on the ranking algorithm changes according to the user’s context, the day of the week, the period of the year, and so forth. Indeed, the preferences change not only for different users but also for the same user according to the context in which user is when he/she comes to social media (i.e., Twitter). In fact, the same user may prefer to be updated by reading breaking news coming from social media when he/she is having a break, or when he/she is watching TV. Some users may prefer tweets related to the sporting event, but only in the hours following football matches. Unlikely, they may prefer to know that something important is happening in the nearby whenever it happens, even if they are searching something else.

Problem definition. Formally, given a time-stamped finite tweet stream TW=tw1,tw2,,twn, with some related information about publisher authority and user u, the task goal is to identify a function to rank the tweets in TW from those that are more relevant for u considering his/her own history (tweets, re-tweets, follows, etc.). The resulting ranking model should be adaptive, personalized and time-aware considering that the user’s interests may change along the timeline and depend on the current context when the user logs in Twitter.

Proposed solution. To achieve the aforementioned goal we define a learning to rank algorithm to sort a set of tweets (sketched in Fig. 1). Actually, learning to rank is a research area intensively investigated and many algorithms have been proposed, and consequently used in several fields including information retrieval tasks, focused search engines, and more recently, they are being adopted also for tweets ranking or recommendation [12]. In literature, we can distinguish the following main supervised approaches [13]: pointwise, pairwise or listwise. The main limitation of these algorithms is that supervised learning powers on the availability of user’s feedbacks about the ranking of items, which are not easy to collect. In this sense, the most promising and natural approach is the pairwise that requires users’ feedback only to determine what are the users’ preferences with respect to pairs of items (i.e., tweets) instead of complete rank lists of them. We adopted a pairwise approach in which user’s preferences are implicitly expressed by re-tweets and replies that we interpret as pairwise comparisons with respect to other tweets, for example, those shown in reverse chronological order, that have not been mentioned by the user. Among others, the pairwise algorithms, such as RankNet [14] and its deep version [15], has revealed good performances in ranking web pages to improve web search experience. In this work, we adopt an algorithm inspired to SortNet [16], a ranking algorithm based on deep neural network to rank tweets including several features to represent user, content, publisher, and so on. The aim is to learn a function to evaluate the choices between two tweets, i.e., twi and twj. Given a pair of tweets twi,twjTW, the aim is to learn a preference function P:TW×TW  {>,<} which evaluates the user’s interests with respect to the pair of tweets, i.e. twi>twj, if twi should be preferred to twj, and twi<twj, vice versa.

Contributions. Unlike other application domains, for instance, web search where learning to rank algorithms have already been widely applied, the strong dynamic nature of the microblogging stresses the importance of the model re-adaptation. This work introduces a deep learning method for tweet ranking capable to re-adapt itself along the timeline and considering different tweet and user’s interests. Time-awareness is implemented by using datetime of the tweets during the ranking model training. More precisely, the main contributions of the proposed research are:

  • Definition of a learning to rank algorithm for tweets; in particular, we use a pairwise algorithm assuming that each re-tweet and/or reply represents a user’s feedback expressing preference for that topic, the publisher’s authority, and so forth;

  • Integration of datetime of the tweet, re-tweet, or reply during the training phase in order to provide different ranking results considering the moment when user logs into the Twitter; in fact, the occurrence of user’s interest may recur cyclically in a given time slot (e.g., weekend, evening, etc.);

  • Implementation of a continuous learning giving new sample items as input tuples for training the ranking model at each time the user expresses his/her preference replying or re-tweeting something;

  • Adoption of tweet content wikification to semantically categorize the posts by linking tweet text to Wikipedia articles; this practice enables us to use corresponding Wikipedia entities to characterize the user’s topics of interest.

Experimental results. Starting from the collected tweet stream, we adopt our framework to perform a personalized tweet rank simulating different accessing time slots, and we evaluate its precision by applying Mean Average Precision (MAP) and Normalized Discount Cumulative Gain (NDCG) metrics. Performances have also been evaluated by omitting some significant features (i.e., tweet publisher’s authority, social relation between tweet author and user, and time-awareness) in order to estimate their impact on the method performance. We evaluate the tweets ranking improvement counting how many top-ranked tweets will be re-tweeted/replied in the near future with respect to the ignored ones. The experimental results reveal promising performance and confirm the unsuitability of a simply reverse chronological order. In addition, we point out that time features play an important role because ranking preferences improves by including time features in the learning phase and considering the time slot when the users log in Twitter.

Outlines. The paper is structured as follows: Section 2 describes some related works; Section 3 discusses the deep neural network architecture used for the ranking model; Section 4 details the features selected to train the model and illustrates how the tweets components are modeled; then, in Section 5 the evaluation results are discussed; finally, the conclusion and future works close the paper.

Section snippets

Related works

This section deals with the main relevant areas of related works: (1) ranking and recommendation in Twitter, (2) learning to rank with deep learning.

Deep neural network architecture for adaptive tweet ranking

The proposed method implements a pairwise preference learning where the function relies on a multilayer perceptron feed-forward neural network sketched in Fig. 2. Inspired to the Comparative Neural Network (CmpNN) introduced in [16], giving as input a couple of tweets (twi and twj), their temporal and user information, the neural network carries out the ranking relation between them i.e., twi>twj (or twi<twj) as shown in Fig. 2. Temporal features allow us to train the system considering the

Feature selection for adaptive tweet ranking

This section describes the set of features selected to represent each component of the 4-tuples t,u,twi,twj used in the defined deep neural network architecture.

The representation of the data-time, t (i.e., the re-tweet timestamp, see Section 3), consists of the day of the week and time slot. The data-time component and the granularity of time slot as well, are important to discover regularities in the dataset about the moments when the user interact on Twitter. We opted to consider four time

Evaluation

To evaluate the proposed ranking method, we collected a tweet stream and calculated the selected features described in Section 4 in order to prepare the training sample to build the ranking model. The test set is composed of the tweets that are adjacent to the stream used for training the model. Given a specific user, we tested the resulting ranking model evaluating the top-ranked tweets obtained by varying input time slot. The input time slot represents the moment when user logs in Twitter.

Conclusion and future works

This work proposed personalized, adaptive and time-aware tweet ranking scheme implementing a learning to rank algorithm by means of a deep neural network. The ranking model is time-aware because among others the system foresees as input features also datatime corresponding to the re-tweets, or replies in order to achieve better performance when the interest of the user change along the timeline. The adaptivity is achieved by implementing the continuous training over the incoming tweet stream.

Carmen De Maio graduated and received the Ph.D. degree in Computer Sciences, both from the University of Salerno, Italy, in 2007 and 2011 respectively. From 2007 until now, she collaborates to several research initiatives mainly focused on Knowledge Extraction and Management from structured and unstructured data defining intelligent systems based on the combination of techniques from Soft Computing, Semantic Web, areas in which she has many publications. Specifically, she has been deeply

References (49)

  • ChenK. et al.

    Collaborative personalized tweet recommendation

  • AlahmadiD.H. et al.

    Twitter-based recommender system to address cold-start: A genetic algorithm based trust modelling and probabilistic sentiment analysis

  • ZhaoY. et al.

    Personalized re-ranking of tweets

  • C. De Maio, G. Fenza, V. Loia, M. Parente, Online query-focused twitter summarizer through fuzzy lattice, in: 2015 IEEE...
  • De Francisci MoralesG. et al.

    From chatter to headlines: harnessing the real-time web for personalized news recommendation

  • PennacchiottiM. et al.

    Making your interests follow you on twitter

  • YeM. et al.

    Exploring social influence for recommendation: a generative model approach

  • DuanY. et al.

    An empirical study on learning to rank of tweets

  • LiuT.-Y.

    Learning to rank for information retrieval

    Found. Trends® Inf. Retr.

    (2009)
  • BurgesC. et al.

    Learning to rank using gradient descent

  • SongY. et al.

    Adapting deep ranknet for personalized search

  • RigutiniL. et al.

    SortNet: Learning to rank by a neural preference function

    IEEE Trans. Neural Netw.

    (2011)
  • GuriniD.F. et al.

    Temporal people-to-people recommendation on social networks with sentiment-based matrix factorization

    Future Gener. Comput. Syst.

    (2017)
  • HsuT.-Y. et al.

    Variable social vector clocks for exploring user interactions in social communication networks

    Int. J. Space-Based Situated Comput.

    (2015)
  • Cited by (0)

    Carmen De Maio graduated and received the Ph.D. degree in Computer Sciences, both from the University of Salerno, Italy, in 2007 and 2011 respectively. From 2007 until now, she collaborates to several research initiatives mainly focused on Knowledge Extraction and Management from structured and unstructured data defining intelligent systems based on the combination of techniques from Soft Computing, Semantic Web, areas in which she has many publications. Specifically, she has been deeply involved in several research projects and she has published extensively about: Fuzzy Decision Making, Ontology Elicitation, Situation and Context Awareness, Semantic Information Retrieval. Recently, she is working in the field of Social Media Analytics and Semantic Web to define intelligent features such as: microblog summarization and context aware information retrieval. In 2014, she is an Assistant Professor at Department of Computer Science at Department of Information Eng., Electrical Eng. and Applied Mathematics, University of Salerno, Italy.

    Giuseppe Fenza graduated and received the Ph.D. degree in Computer Sciences, both from the University of Salerno, Italy, in 2004 and 2009, respectively. From 2009 until now, he collaborates to several research initiatives mainly focused on Knowledge Extraction and Management from structured and unstructured data defining intelligent systems based on the combination of techniques from Soft Computing, Semantic Web, areas in which she has many publications. He has been deeply involved in several EU and Italian Research and Development projects on ICT and, in particular, on Situation Awareness, Service Discovery, Enterprise Information Management and e-Commerce: ARISTOTELE (EU FP7), SENSETIONAL, SIRET, HSEPGEST (PON 2007–2013 Research and Competitiveness), MI Food Exploitation, INVIMALL INtelligent VIrtual MALL, TINAPICA (Industria 2015—Made in Italy). He has published extensively about: Fuzzy Decision Making, Ontology Elicitation, Situation and Context Awareness, Semantic Information Retrieval. Recently, he is working in the field of Time Aware Knowledge Extraction, Process Mining, Social Media Analytics and Semantic Web to define intelligent features such as: microblog summarization, time-aware collaborative filtering, context aware information retrieval, and so on. He is currently a Assistant Professor in Computer Science at Department of Management and Innovation Systems, University of Salerno, Italy.

    Mariacristina Gallo received her Master Degree in Computer Science at the University of Salerno, Italy, in 2009. From 2009 until now, she collaborates to several research initiatives and projects mainly focused on Computational Intelligence, Data Mining, Ontology Learning e Semantic Information Retrieval in different domain, such as health, e-commerce, and enterprise. Recently, she is working in the field of social Media Analytics and Semantic Web to study users’ interests, characteristics of their posts, and potential cyclic nature of both of them.

    Vincenzo Loia (SM’08) received the master’s degree in computer science from the University of Salerno (Italy) in 1984 and the Ph.D. degree in computer science from the University of Paris VI (France), in 1989. Since 1989, he has been a faculty member at the University of Salerno, where he is currently the head of the Department of Management and Innovation Systems. He was the Principal Investigator in a number of industrial research and development projects and in academic research projects. He has authored over 350 original research papers in international journals, book chapters, and in international conference proceedings. He has edited four research books about agent technology, Internet, and soft computing methodologies. His current research interests include merging soft computing and agent technology to design technologically complex environments, with particular interest in Web Intelligence applications. Dr. Loia is the Co-Editor-in-Chief of Soft Computing, and an Editor-in-Chief of Ambient Intelligence and Humanized Computing, both published from Springer-Verlag. He serves as an Associate Editor in several international journals, such as the IEEE Transactions on Industrial Informatics, the IEEE Transactions on Systems, Man, Cybernetics: Systems, the IEEE Transaction on Fuzzy Systems, the IEEE Transactions on Cognitive and Developmental Systems. He holds several roles in the IEEE Society in particular for Computational Intelligence Society (Chair of Emergent Technologies Technical Committee, IEEE CIS European Representative, Vice-Chair of Intelligent System Applications Technical Committee).

    Mimmo Parente received the laurea degree in computer science from Università degli Studi di Salerno, Italy, in 1987 and the Ph.D. in Applied Mathematics and Computer Science from Università degli Studi di Napoli “Federico II”, Italy, in 1992. Since 1991 he has been a faculty member of the Università degli Studi di Salerno, Italy and since 2005 he is full professor of Computer Science. His research interests have been in theoretical computer science, in particular in algorithms and formal languages, and more recently in soft computing with particular interest in social networks and also in practical application of automatic verification. He is the co-founder of the Conference GandALF (Games, Automata, Logics and Formal Verification) and leads the homonym laboratory in Università di Salerno. He is in the governing council of Italian Chapter of Theoretical Computer Science. He currently is the Scientific Director of a research consortium on System and Methods for Competitive IT Companies.

    View full text