Top-k followee recommendation over microblogging systems by exploiting diverse information sources

doi:10.1016/j.future.2014.05.002

Future Generation Computer Systems

Volume 55, February 2016, Pages 534-543

https://doi.org/10.1016/j.future.2014.05.002 Get rights and content

Highlights

•
A novel CF approach for recommending high quality top- $k$ microblogging followees.
•
Latent factor model to exploit both tweet content and social relations.
•
The comprehensive experiments on large-scale traces and Amazon Mechanical Turk.

Abstract

Followee recommendation plays an important role in information sharing over microblogging platforms. We frame this problem as a top- $k$ ranking in collaborative filtering (CF). The difficulty is that explicit user-to-user ratings are not available on microblogging systems. Thus existing CF schemes are not applicable to followee recommendation over microblogging systems. To solve this problem, in this paper, we propose a novel followee ranking scheme using a variation of the latent factor model, which leverages implicit users’ feedback including both tweet content and social relation information. To achieve good top- $k$ recommendation, we introduce a rank-based criterion to latent factor model (LFM). The main obstacle for training the model parameters is the non-smoothness of the objective function of LFM, which makes traditional parameter optimization methods infeasible. To tackle with the problem, we further design a smooth version of the objective function. We conduct comprehensive experiments on a large-scale dataset collected from Sina Weibo, the most popular microblogging system in China and a real world experiment on the Amazon Mechanical Turk CrowdSourcing platform to evaluate the performance of our design. The results show that our scheme greatly outperforms existing schemes in terms of precision and top- $k$ ranking by 46.8% and 32.8%, respectively.

Introduction

Since the emergence of microblogging systems, such as Twitter and Sina Weibo, hundreds of millions of users have become to use the microblogging service as a tool to propagate and share information on the Internet [1]. For instance, as the most prevalent microblogging system in China, Sina Weibo now has more than three hundred million active users [2]. Moreover, the increase of the population in the Sina Weibo community has been surging sharply with more than 16 million newly registered users per month. In microblogging systems, users follow or are followed by each other. Formally, if user $a$ follows user $b$ , we refer to $a$ as $b$ ’s follower, and $b$ as $a$ ’s followee. By leveraging the follower–followee network, a microblogging platform provides information for a consumer by gathering the update messages from his followees. In such information sharing paradigm, it is foremost for a user to seek and select followees with potential content of interest. However, due to the large populations in major commercial microblogging systems, finding relevant and reliable followees is a challenging task for a user.

Deliberate recommendation schemes are designed for accurately finding relevant content for a user in traditional large-scale data collections [3], [4], [5], [6], [7]. Among existing schemes, collaborative filtering (CF), is the most popular method to exploit user-specific preference. The motivation for collaborative filtering comes from the idea that people often get good recommendation from someone with similar taste. Collaborative filtering explores techniques for matching people with previous similar interests and making recommendations on this basis. The taste of users can be quantified by some explicit rating information, such as user-to-item, user-to-user or item-to-item relevance. By computing the similarity of taste between users, the CF scheme successfully deduces the target user’s personalized interest. Unfortunately, such explicit ratings that represent the strength of interest among the users are not available on microblogging systems. As a result, existing followee recommendation schemes commonly make use of implicit information to detect target user’s interest.

Existing followee recommendation schemes over microblogging systems can be classified mainly into two categories: the content-based methods and the topology-based methods. The content-based methods recommend followees for a user using the similarity of users’ content. For example, Hannon [8] generates the profile for a user using his microblogging histories and make followee recommendation according to the similarity of user profiles. The problem of such a scheme is that microblogging systems suffer the problem of scarcity of data [9]. This results in poor precision of recommendation results [10]. To solve the problem, Armentano et al. [10] instead consider to use social relation factors, including user popularity, number of common friends, etc. They use the social topology information to measure the relevance between a target user and the candidate followee.

A recent research based on the trace of the entire Twittersphere by Kwak et al. [11], shows that microblogging systems deviate significantly from known characteristics of traditional online social networks. Their study indicates that a microblogging system is a platform of both social network and news media. Existing followee recommendation schemes ignore the coexistence of these two features in microblogging systems and may lead to poor performance. Based on this observation, in this work, we propose a variant of the state-of-art latent factor model based on CF, which considers both factors of content relevance and users’ social relations.

Recently, the LFM is proven to be effective for leveraging implicit information for efficient recommendation with scarcity of information [12], [13]. A common approach to the latent factor model is to learn a latent feature vector for each user and item in a certain dataset such that the inner product of these features minimizes or maximizes an objective function. Existing latent factor models typically focus on minimizing the global predicted ratings errors such as root mean square error (RMSE) and mean absolute error (MAE) [12], [14]. However, in microblogging followee recommendation, users care much more about the quality of the results in the top part of the ranked recommendation list than the quality of the global results list¹ [8], [15]. Without considering such a requirement, traditional latent factor model schemes suffer poor performance for top- $k$ recommendation [15]. The problem becomes more acute when more and more users of microblogging services have being moving to mobile platforms with small screens. Thus, the good quality of top- $k$ followee candidates is particularly important in microblogging followee recommendation.

However, how to optimize the top- $k$ results in the followee recommendation is not trivial. In information retrieval research, there are several rank dependent metrics, including R-Precision (RPrec), Mean Average Precision (MAP), etc. A promising scheme is to optimize these metrics as the objective function of LFM [16]. Typically, the parameter optimization process is completed by the Stochastic Gradient Ascent/Descent (SGA/SGD) method [12], [13], [17] or the Alternative Least Squares (ALS) method [18], both of which require that the objective function is smooth and continuous. However, existing IR metrics are dependent on the ranking values of documents but not directly on the predicted relevance scores. If changes are made to the model parameters, the predicted scores will change smoothly, while the ranks of documents will not change until one document’s score passes another, incurring a discontinuous change. That is to say, traditional IR metrics are non-smooth with respect to model parameters [19]. Such non-smoothness makes traditional parameter optimization methods infeasible. The key issue to address this problem is to bridge the gap between the changes of users’ predicted score and their rank value. In this work we use normalized discounted cumulative gain (NDCG) to judge the performance of top- $k$ recommendation results. Using the list wise measure, mistakes in the recommendation items at the top of the list hold a higher penalty than mistakes at the bottom of the list. We propose NDCG-LFM, a recommendation model for implicit feedback domain by directly optimizing a smoothed approximation version of NDCG, which also considers both tweet content information and social relation information.

We conduct both offline and online experiments based on real-user traces to evaluate our recommender model. The results show that our recommendation model greatly outperforms existing schemes and several baseline methods.

The main contributions of our scheme are threefold:

•
We propose a novel CF approach, NDCG-LFM, for recommending high quality of top- $k$ followees over microblogging systems.
•
Based on the unique feature of microblogging systems, we introduce latent factor model to exploit implicit information of both users’ tweet content and social relationship.
•
We evaluate the performance of our followee recommendation model using experiments on large-scale traces from major commercial microblogging systems and real users’ feedback through Amazon Mechanical Turk, a real-world CrowdSourcing platform. The results demonstrate that our scheme significantly outperforms existing schemes in terms of top- $k$ ranking of followee recommendation.

The rest of the paper is structured as follows. In Section 2, we discuss related work. Section 3 presents the followee recommendation model we propose. In Section 4, we evaluate the performance of our design, present the results compared to existing schemes and discuss the complexity of our algorithm. We conduct a CrowdSourcing experiment in Section 5. Section 6 concludes the paper with possible future work.

Section snippets

Recommender systems on microblogging platforms

As microblogging systems, such as Twitter and Sina Weibo, become popular in our daily lives, personalized content recommendation has attracted a lot of attention in the research field of microblogging systems. For example, Wu et al. [20] generate personalized tags for Twitter users to label their interest by extracting keywords from tweets they post. Michelson et al. [21] propose an entity-based profiling approach, which aims at discovering the topics of interest for Twitter users by

Overview

In this section, we give an overview description of our user recommendation model. Borrowing ideas from social networking [32] and SMS messaging, a microblogging system leverages the social network for information sharing. Users follow or are followed by each other. A microblogging system serves a consumer mainly by polling all his/her followees for gathering all the updates of the messages [33]. Thus it is important for a user to seek and select followees with potential content of interest,

Parameter estimation

The goal of our design is that the more the target user shows preference for a certain candidate, with the larger probability the candidate appears in the top rank of the recommendation list. We frame this issue as maximization of NDCG@ $k$ . The issue can be achieved by finding a local maximum of the objective function shown in Eq. (7) by performing stochastic gradient ascent method. Moreover, in order to avoid overfitting, we add L2 regularization. Hence, the resulting NDCG objective function is

Dataset

In this section, we evaluate our design using experiment. We use the large-scale WISE challenge (T2) dataset [37] crawled from Sina Weibo, the most popular microblogging system in China. The dataset contains 51.4 million users’ social link information and 465 million tweets. The dataset include both tweets and user relations,

(1) Tweets: It includes basic information about tweets (time, user ID, message ID etc.), mentions (user IDs appearing in tweets), retweet paths, and whether containing

CrowdSourcing experiments

In the evaluation of the precision and NDCG in the above offline experiments, the precision is calculated as the percentage overlap between recommended followees in the recommendation list and the target user’s existing followees-list in the test dataset. However, it is not difficult to see that the experiment method regards the non-overlapping recommendations as not relevant to the target user. However, they are “not relevant” only in the sense that they are not already followed by the target

Conclusion and future work

In this paper, we propose NDCG-LFM, a novel top- $k$ followee recommendation scheme over microblogging systems based on latent factor model. By modifying a smooth version of objective function, the quality of top- $k$ recommendation results can be directly optimized in our model. Based on the unique feature of dual roles of microblogging systems, we consider both tweet content factor and social relation factor when modeling inter-user preference. We conduct experiments using large-scale traces from

Acknowledgments

This paper is supported by NSFC fund (No. 61370233), Foundation for the Author of National Excellent Doctoral Dissertation of PR China (No. 201345), Ministry of Education and China Mobile Communications Corporation (MoE-CMCC) Research Founding (No. MCM20130382), Research Fund for the Doctoral Program of Higher Education of China (No. 20110142120080), and Fundamental Research Funds for the Central Universities (No. 2014YQ014).

References (42)

D. Yuan et al.
A data placement strategy in scientific cloud workflows
Future Gener. Comput. Syst.
(2010)
L.L. Yu et al.
What trends in Chinese social media
Comput. Res. Repository
(2011)
2013....
H.C. Chen, A.L. Chen, A music recommendation system based on music data grouping and user interests, in: Proceedings of...
X. Zhang et al.
A privacy leakage upper bound constraint-based approach for cost-effective privacy preserving of intermediate data sets in cloud
IEEE Trans. Parallel Distrib. Syst.
(2013)
Z. Guan et al.
Document recommendation in social tagging services
C. Lu et al.
Post-based collaborative filtering for personalized tag recommendation
J. Hannon et al.
Recommending twitter users to follow using content and collaborative filtering approaches
J. Chen et al.
Short and tweet: experiments on recommending content from information streams
M. Armentano, D. Godoy, A. Amandi, A topology-based approach for followees recommendation in twitter, in: Proceedings...

H. Kwak et al.

What is twitter, a social network or a news media?

Y. Koren

Factorization meets the neighborhood: a multifaceted collaborative filtering model

K. Chen et al.

Collaborative personalized tweet recommendation

Y. Koren et al.

Matrix factorization techniques for recommender systems

Computer

(2009)

P. Cremonesi et al.

Performance of recommender algorithms on top- $n$ recommendation tasks

Y. Shi et al.

Tfmap: optimizing map for top-n context-aware recommendation

L. Zhang et al.

O(logt) projections for stochastic optimization of smooth and strongly convex functions

Comput. Res. Repository

(2013)

Y. Zhou, D. Wilkinson, R. Schreiber, R. Pan, Large-scale parallel collaborative filtering for the netflix prize, 2008,...

M. Wu et al.

Smoothing DCG for learning to rank: a novel approach using smoothed hinge functions

W. Wu et al.

Automatic generation of personalized annotation tags for twitter users

M. Michelson et al.

Discovering users’ topics of interest on twitter: a first look

Cited by (25)

Hybrid microblog recommendation with heterogeneous features using deep neural network
2021, Expert Systems with Applications
Citation Excerpt :
The hybrid recommendations combine more than one recommendation methods to improve the recommendation accuracy and solve the data sparsity problem. Most present approaches of hybrid recommendations aggregate together the content-based and collaborative filtering recommendation(Chen et al., 2016, 2017; Kaššák et al., 2016; Lu et al., 2015; Wang et al., 2015, 2017; Wei et al., 2016). In the literature, there are mainly two families of aggregation methods.
With the development of mobile Internet, microblog has become one of the most popular social platforms. The enormous user-generated microblogs have caused the problem of information overload, which makes users difficult to find the microblogs they actually need. Hence, how to provide users with accurate microblogs has become a hot and urgent issue. In this paper, we propose an approach of hybrid microblog recommendation, which is developed on a framework of deep neural network with a group of heterogeneous features as its input. Specifically, two new recommendation strategies are first constructed in terms of the extended user-interest tags and user interest topics, respectively. These two strategies additionally with the collaborative filtering are employed together to obtain the candidate microblogs for final recommendation. Then, we propose the heterogeneous features related to personal interests of users, interest in authors and microblog quality to describe the candidate microblogs. Finally, a deep neural network with multiple hidden layers is designed to predict and rank the microblogs. Extensive experiments conducted on the datasets of Sina Weibo and Twitter indicate that our proposed approach significantly outperforms the state-of-the-art methods. The code and the two datasets of this paper are publicly available at GitHub.
Time-aware adaptive tweets ranking through deep learning
2019, Future Generation Computer Systems
Generally, tweets about brands, news and so forth, are mostly delivered to the Twitter user in a reverse chronological order choosing among those twitted by the so-called followed users. Recently, Twitter is facing with information overload by introducing new filtering features, such as “while you are away”, in order to show only a few tweets summarizing the posted ones, and ranking the tweets considering the quality, in addition to timeliness. Trivially enough we state that the strategy to rank the tweets to maximize the user engagement and, why not, augmenting the tweet and re-tweet rates, is not unique. There are several dimensions affecting the ranking, such as time, location, semantic, publisher authority, quality, and so on. We point out that the tweet ranking model should vary according to the user’s context, interests and how those change along the timeline, cyclically, weekly or at specific date-time when the user logs in.
In this work, we introduce a deep learning method attempting to re-adapt the ranking of the tweets by preferring those that are more likely interesting for the user. User’s interests are extracted by mainly considering previous user re-tweets, replies and also the time when they occurred.
We evaluate a ranking model by measuring how many tweets that will be re-tweeted in the near future were included in the top-ranked tweet list. The results of the proposed ranking model revealed good performances overcoming the methods that consider only the reverse-chronological order or user’s interest score. In addition, we pointed out that in our dataset the most impacting features on the performance of proposed ranking model are: publisher authority, tweet content measures, and time-awareness.
Social networking data analysis tools & challenges
2018, Future Generation Computer Systems
Citation Excerpt :
Although, most of SNS expose an API, which includes methods to get a range of data including friends, events, groups, they limit the number of API transaction per day. Noted that, the variety of data collected for analysis can be distinguished in explicit data, namely information directly related to service usage (e.g. profile details, number of friends, etc.), and implicit data, i.e., that are either information that is processed automatically in the system (e.g. browser data, web sites visited, etc.) or can be discovered from user’s activities by analyzing extensive and repeated interactions between users (voting, sharing, tagging, commenting items) [31,32]. There is analysis that employs implicit data [32], explicit [33] or both [26].
Online Social Network’s (OSN) considered a spark that burst the Big Data era. The unfolding of every event, breaking new or trend flows in real time inside OSN triggering a surge of opinionated networked content. An unprecedented scale of social relationships also diffuses across this vastly interconnected system affecting public behaviors and knowledge construction. Extracting intelligence from such data has becoming a quickly widening multidisciplinary area that demands the synergy of scientific tools and expertise. Key analysis practices include social network analysis, sentiment analysis, trend analysis and collaborative recommendation. Though, both their recent advent and the fact that science is still in the frontiers of processing human-generated data, provokes the need for an update and comprehensible taxonomy of the related research. In response to this chaotic emerging science of social data, this paper provides a sophisticated classification of state-of the-art frameworks considering the diversity of practices, methods and techniques. To the best of our knowledge, this is the first attempt that illustrated the entire spectrum of social data networking analysis and their associated frameworks. The survey demonstrates challenges and future directions with a focus on text mining and the promising avenue of computational intelligence.
A literature review for recommender systems techniques used in microblogs
2018, Expert Systems with Applications
Online social networks (OSNs) are receiving great attention from the research community for different purposes, such as event detection, crisis management, and forecasting, among others. The increasing amount of research conducted with social networks opens the need for a classification methodology regarding trends in the field. This work does not cover all types of social networks; it focuses on the analysis of microblogs as a data source in the context of recommender systems (RSs). The main goal of this work is to provide authors with insights on the trends of academic literature reviews in the proposed context and to provide a comparison of different research approaches. The authors searched for up-to-date research papers related to RS methods using microblogs within a time period of five years, from 2012 to January 2018. Starting from 2012, a significant amount of research related to the subject field of RSs was conducted and identified by the authors of this work. After the filtering process, 39 papers were finally selected from journals and conferences in four different databases related to Internet technologies (i.e., IEEE, ACM, Science Direct, and Springer). A general classification presented in this work is then adopted and used to describe state-of-the-art social network recommendation approaches for microblogging. This work can be extended in the future to include novel methodologies and trends of RSs for microblogs.
Event recommendation in social networks based on reverse random walk and participant scale control
2018, Future Generation Computer Systems
With the merging of cyber world and physical world, event-based social networks have been playing an important role in promoting the spread of offline social events through online channels. Event recommendation in social networks, which is to recommend a list of upcoming events to a user according to his preference, has attracted a lot of research interests recently. In this paper, we study the event recommendation problem based on the graph theory. We first construct a heterogeneous graph to represent the interactions among different types of entities in an event-based social network. Based on the constructed graph, we propose a novel event scoring algorithm called reverse random walk with restart to obtain the user–event recommendation matrix. In practice, the participant capacity of an event may be constrained to a limited number of users. Then based on the user–event recommendation matrix, we further propose two participant scale control algorithms to coordinate unbalanced user arrangements among events. After the rearrangement, each user will be assigned a list of recommended events, which considers both local user preference and global event capacity. Experiment results on Meetup dataset show that the proposed method outperforms the state-of-art algorithms in terms of higher recommendation precision and larger recommendation coverage.
Recommender Systems for Large-Scale Social Networks: A review of challenges and solutions
2018, Future Generation Computer Systems
Social networks have become very important for networking, communications, and content sharing. Social networking applications generate a huge amount of data on a daily basis and social networks constitute a growing field of research, because of the heterogeneity of data and structures formed in them, and their size and dynamics. When this wealth of data is leveraged by recommender systems, the resulting coupling can help address interesting problems related to social engagement, member recruitment, and friend recommendations.
In this work we review the various facets of large-scale social recommender systems, summarizing the challenges and interesting problems and discussing some of the solutions.

View all citing articles on Scopus

Hanhua Chen received his Ph.D. degree in Computer Science and Engineering from Huazhong University of Science and Technology in 2010, where he is now working as an associate professor. His research interests include distributed systems, online social networks, peer-to-peer systems and wireless sensor networks. He received the National Excellent Doctoral Dissertation Award of PR China in 2012 and the Intel Early Career Faculty Honor Program Award in 2013. He is the TPC co-chair of the eighth Asia-Pacific Services Computing Conference (APSCC 2014). He is an editor board member of the International Journal of Distributed Sensor Networks (IJDSN) and a young associate editor of Frontiers of Computer Science (FCS). He is a member of the IEEE and ACM.

Xiaolong Cui is a master student in the School of Computer Science and Technology at Huazhong University of Science and Technology. His research interests include online social networks.

Hai Jin received the Ph.D. degree in Computer Engineering from the Huazhong University of Science and Technology (HUST) in 1994. He is a Cheung Kung Scholars chair professor of Computer Science and Engineering at the HUST in China. He is now the dean of the School of Computer Science and Technology at HUST. He worked at The University of Hong Kong between 1998 and 2000, and as a visiting scholar at the University of Southern California between 1999 and 2000. He is the chief scientist of ChinaGrid, the largest grid computing project in China, and the chief scientist of National 973 Basic Research Program Project of Virtualization Technology of Computing System. He is the member of Grid Forum Steering Group. He has coauthored 15 books and published more than 400 research papers. His research interests include computer architecture, virtualization technology, cluster computing and grid computing, peer-to-peer computing, network storage, and network security. He is the steering committee chair of International Conference on Grid and Pervasive Computing, Asia-Pacific Services Computing Conference, International Conference on Frontier of Computer Science and Technology, and Annual ChinaGrid Conference. He is a member of the steering committee of the IEEE/ACM International Symposium on Cluster Computing and the Grid, the IFIP International Conference on Network and Parallel Computing, and the International Conference on Grid and Cooperative Computing, International Conference on Autonomic and Trusted Computing, and International Conference on Ubiquitous Intelligence and Computing. In 1996, he was awarded a German Academic Exchange Service fellowship to visit the Technical University of Chemnitz in Germany. He was awarded Excellent Youth Award from the National Science Foundation of China in 2001. He is a senior member of the IEEE and a member of the ACM.

View full text

Top-k followee recommendation over microblogging systems by exploiting diverse information sources

Highlights

Abstract

Introduction

Section snippets

Recommender systems on microblogging platforms

Overview

Parameter estimation

Dataset

CrowdSourcing experiments

Conclusion and future work

Acknowledgments

Future Gener. Comput. Syst.

What trends in Chinese social media

Comput. Res. Repository

A privacy leakage upper bound constraint-based approach for cost-effective privacy preserving of intermediate data sets in cloud

IEEE Trans. Parallel Distrib. Syst.

Document recommendation in social tagging services

Post-based collaborative filtering for personalized tag recommendation

Recommending twitter users to follow using content and collaborative filtering approaches

Short and tweet: experiments on recommending content from information streams

What is twitter, a social network or a news media?

Factorization meets the neighborhood: a multifaceted collaborative filtering model

Collaborative personalized tweet recommendation

Matrix factorization techniques for recommender systems

Computer

Performance of recommender algorithms on top-n recommendation tasks

Tfmap: optimizing map for top-n context-aware recommendation

O(logt) projections for stochastic optimization of smooth and strongly convex functions

Comput. Res. Repository

Smoothing DCG for learning to rank: a novel approach using smoothed hinge functions

Automatic generation of personalized annotation tags for twitter users

Discovering users’ topics of interest on twitter: a first look

Top- $k$ followee recommendation over microblogging systems by exploiting diverse information sources

Performance of recommender algorithms on top- $n$ recommendation tasks