Top- followee recommendation over microblogging systems by exploiting diverse information sources
Introduction
Since the emergence of microblogging systems, such as Twitter and Sina Weibo, hundreds of millions of users have become to use the microblogging service as a tool to propagate and share information on the Internet [1]. For instance, as the most prevalent microblogging system in China, Sina Weibo now has more than three hundred million active users [2]. Moreover, the increase of the population in the Sina Weibo community has been surging sharply with more than 16 million newly registered users per month. In microblogging systems, users follow or are followed by each other. Formally, if user follows user , we refer to as ’s follower, and as ’s followee. By leveraging the follower–followee network, a microblogging platform provides information for a consumer by gathering the update messages from his followees. In such information sharing paradigm, it is foremost for a user to seek and select followees with potential content of interest. However, due to the large populations in major commercial microblogging systems, finding relevant and reliable followees is a challenging task for a user.
Deliberate recommendation schemes are designed for accurately finding relevant content for a user in traditional large-scale data collections [3], [4], [5], [6], [7]. Among existing schemes, collaborative filtering (CF), is the most popular method to exploit user-specific preference. The motivation for collaborative filtering comes from the idea that people often get good recommendation from someone with similar taste. Collaborative filtering explores techniques for matching people with previous similar interests and making recommendations on this basis. The taste of users can be quantified by some explicit rating information, such as user-to-item, user-to-user or item-to-item relevance. By computing the similarity of taste between users, the CF scheme successfully deduces the target user’s personalized interest. Unfortunately, such explicit ratings that represent the strength of interest among the users are not available on microblogging systems. As a result, existing followee recommendation schemes commonly make use of implicit information to detect target user’s interest.
Existing followee recommendation schemes over microblogging systems can be classified mainly into two categories: the content-based methods and the topology-based methods. The content-based methods recommend followees for a user using the similarity of users’ content. For example, Hannon [8] generates the profile for a user using his microblogging histories and make followee recommendation according to the similarity of user profiles. The problem of such a scheme is that microblogging systems suffer the problem of scarcity of data [9]. This results in poor precision of recommendation results [10]. To solve the problem, Armentano et al. [10] instead consider to use social relation factors, including user popularity, number of common friends, etc. They use the social topology information to measure the relevance between a target user and the candidate followee.
A recent research based on the trace of the entire Twittersphere by Kwak et al. [11], shows that microblogging systems deviate significantly from known characteristics of traditional online social networks. Their study indicates that a microblogging system is a platform of both social network and news media. Existing followee recommendation schemes ignore the coexistence of these two features in microblogging systems and may lead to poor performance. Based on this observation, in this work, we propose a variant of the state-of-art latent factor model based on CF, which considers both factors of content relevance and users’ social relations.
Recently, the LFM is proven to be effective for leveraging implicit information for efficient recommendation with scarcity of information [12], [13]. A common approach to the latent factor model is to learn a latent feature vector for each user and item in a certain dataset such that the inner product of these features minimizes or maximizes an objective function. Existing latent factor models typically focus on minimizing the global predicted ratings errors such as root mean square error (RMSE) and mean absolute error (MAE) [12], [14]. However, in microblogging followee recommendation, users care much more about the quality of the results in the top part of the ranked recommendation list than the quality of the global results list1 [8], [15]. Without considering such a requirement, traditional latent factor model schemes suffer poor performance for top- recommendation [15]. The problem becomes more acute when more and more users of microblogging services have being moving to mobile platforms with small screens. Thus, the good quality of top- followee candidates is particularly important in microblogging followee recommendation.
However, how to optimize the top- results in the followee recommendation is not trivial. In information retrieval research, there are several rank dependent metrics, including R-Precision (RPrec), Mean Average Precision (MAP), etc. A promising scheme is to optimize these metrics as the objective function of LFM [16]. Typically, the parameter optimization process is completed by the Stochastic Gradient Ascent/Descent (SGA/SGD) method [12], [13], [17] or the Alternative Least Squares (ALS) method [18], both of which require that the objective function is smooth and continuous. However, existing IR metrics are dependent on the ranking values of documents but not directly on the predicted relevance scores. If changes are made to the model parameters, the predicted scores will change smoothly, while the ranks of documents will not change until one document’s score passes another, incurring a discontinuous change. That is to say, traditional IR metrics are non-smooth with respect to model parameters [19]. Such non-smoothness makes traditional parameter optimization methods infeasible. The key issue to address this problem is to bridge the gap between the changes of users’ predicted score and their rank value. In this work we use normalized discounted cumulative gain (NDCG) to judge the performance of top- recommendation results. Using the list wise measure, mistakes in the recommendation items at the top of the list hold a higher penalty than mistakes at the bottom of the list. We propose NDCG-LFM, a recommendation model for implicit feedback domain by directly optimizing a smoothed approximation version of NDCG, which also considers both tweet content information and social relation information.
We conduct both offline and online experiments based on real-user traces to evaluate our recommender model. The results show that our recommendation model greatly outperforms existing schemes and several baseline methods.
The main contributions of our scheme are threefold:
- •
We propose a novel CF approach, NDCG-LFM, for recommending high quality of top- followees over microblogging systems.
- •
Based on the unique feature of microblogging systems, we introduce latent factor model to exploit implicit information of both users’ tweet content and social relationship.
- •
We evaluate the performance of our followee recommendation model using experiments on large-scale traces from major commercial microblogging systems and real users’ feedback through Amazon Mechanical Turk, a real-world CrowdSourcing platform. The results demonstrate that our scheme significantly outperforms existing schemes in terms of top- ranking of followee recommendation.
Section snippets
Recommender systems on microblogging platforms
As microblogging systems, such as Twitter and Sina Weibo, become popular in our daily lives, personalized content recommendation has attracted a lot of attention in the research field of microblogging systems. For example, Wu et al. [20] generate personalized tags for Twitter users to label their interest by extracting keywords from tweets they post. Michelson et al. [21] propose an entity-based profiling approach, which aims at discovering the topics of interest for Twitter users by
Overview
In this section, we give an overview description of our user recommendation model. Borrowing ideas from social networking [32] and SMS messaging, a microblogging system leverages the social network for information sharing. Users follow or are followed by each other. A microblogging system serves a consumer mainly by polling all his/her followees for gathering all the updates of the messages [33]. Thus it is important for a user to seek and select followees with potential content of interest,
Parameter estimation
The goal of our design is that the more the target user shows preference for a certain candidate, with the larger probability the candidate appears in the top rank of the recommendation list. We frame this issue as maximization of NDCG@. The issue can be achieved by finding a local maximum of the objective function shown in Eq. (7) by performing stochastic gradient ascent method. Moreover, in order to avoid overfitting, we add L2 regularization. Hence, the resulting NDCG objective function is
Dataset
In this section, we evaluate our design using experiment. We use the large-scale WISE challenge (T2) dataset [37] crawled from Sina Weibo, the most popular microblogging system in China. The dataset contains 51.4 million users’ social link information and 465 million tweets. The dataset include both tweets and user relations,
(1) Tweets: It includes basic information about tweets (time, user ID, message ID etc.), mentions (user IDs appearing in tweets), retweet paths, and whether containing
CrowdSourcing experiments
In the evaluation of the precision and NDCG in the above offline experiments, the precision is calculated as the percentage overlap between recommended followees in the recommendation list and the target user’s existing followees-list in the test dataset. However, it is not difficult to see that the experiment method regards the non-overlapping recommendations as not relevant to the target user. However, they are “not relevant” only in the sense that they are not already followed by the target
Conclusion and future work
In this paper, we propose NDCG-LFM, a novel top- followee recommendation scheme over microblogging systems based on latent factor model. By modifying a smooth version of objective function, the quality of top- recommendation results can be directly optimized in our model. Based on the unique feature of dual roles of microblogging systems, we consider both tweet content factor and social relation factor when modeling inter-user preference. We conduct experiments using large-scale traces from
Acknowledgments
This paper is supported by NSFC fund (No. 61370233), Foundation for the Author of National Excellent Doctoral Dissertation of PR China (No. 201345), Ministry of Education and China Mobile Communications Corporation (MoE-CMCC) Research Founding (No. MCM20130382), Research Fund for the Doctoral Program of Higher Education of China (No. 20110142120080), and Fundamental Research Funds for the Central Universities (No. 2014YQ014).
Hanhua Chen received his Ph.D. degree in Computer Science and Engineering from Huazhong University of Science and Technology in 2010, where he is now working as an associate professor. His research interests include distributed systems, online social networks, peer-to-peer systems and wireless sensor networks. He received the National Excellent Doctoral Dissertation Award of PR China in 2012 and the Intel Early Career Faculty Honor Program Award in 2013. He is the TPC co-chair of the eighth
References (42)
- et al.
A data placement strategy in scientific cloud workflows
Future Gener. Comput. Syst.
(2010) - et al.
What trends in Chinese social media
Comput. Res. Repository
(2011) - 2013....
- H.C. Chen, A.L. Chen, A music recommendation system based on music data grouping and user interests, in: Proceedings of...
- et al.
A privacy leakage upper bound constraint-based approach for cost-effective privacy preserving of intermediate data sets in cloud
IEEE Trans. Parallel Distrib. Syst.
(2013) - et al.
Document recommendation in social tagging services
- et al.
Post-based collaborative filtering for personalized tag recommendation
- et al.
Recommending twitter users to follow using content and collaborative filtering approaches
- et al.
Short and tweet: experiments on recommending content from information streams
- M. Armentano, D. Godoy, A. Amandi, A topology-based approach for followees recommendation in twitter, in: Proceedings...
What is twitter, a social network or a news media?
Factorization meets the neighborhood: a multifaceted collaborative filtering model
Collaborative personalized tweet recommendation
Matrix factorization techniques for recommender systems
Computer
Performance of recommender algorithms on top- recommendation tasks
Tfmap: optimizing map for top-n context-aware recommendation
O(logt) projections for stochastic optimization of smooth and strongly convex functions
Comput. Res. Repository
Smoothing DCG for learning to rank: a novel approach using smoothed hinge functions
Automatic generation of personalized annotation tags for twitter users
Discovering users’ topics of interest on twitter: a first look
Cited by (25)
Hybrid microblog recommendation with heterogeneous features using deep neural network
2021, Expert Systems with ApplicationsCitation Excerpt :The hybrid recommendations combine more than one recommendation methods to improve the recommendation accuracy and solve the data sparsity problem. Most present approaches of hybrid recommendations aggregate together the content-based and collaborative filtering recommendation(Chen et al., 2016, 2017; Kaššák et al., 2016; Lu et al., 2015; Wang et al., 2015, 2017; Wei et al., 2016). In the literature, there are mainly two families of aggregation methods.
Time-aware adaptive tweets ranking through deep learning
2019, Future Generation Computer SystemsSocial networking data analysis tools & challenges
2018, Future Generation Computer SystemsCitation Excerpt :Although, most of SNS expose an API, which includes methods to get a range of data including friends, events, groups, they limit the number of API transaction per day. Noted that, the variety of data collected for analysis can be distinguished in explicit data, namely information directly related to service usage (e.g. profile details, number of friends, etc.), and implicit data, i.e., that are either information that is processed automatically in the system (e.g. browser data, web sites visited, etc.) or can be discovered from user’s activities by analyzing extensive and repeated interactions between users (voting, sharing, tagging, commenting items) [31,32]. There is analysis that employs implicit data [32], explicit [33] or both [26].
A literature review for recommender systems techniques used in microblogs
2018, Expert Systems with ApplicationsEvent recommendation in social networks based on reverse random walk and participant scale control
2018, Future Generation Computer SystemsRecommender Systems for Large-Scale Social Networks: A review of challenges and solutions
2018, Future Generation Computer Systems
Hanhua Chen received his Ph.D. degree in Computer Science and Engineering from Huazhong University of Science and Technology in 2010, where he is now working as an associate professor. His research interests include distributed systems, online social networks, peer-to-peer systems and wireless sensor networks. He received the National Excellent Doctoral Dissertation Award of PR China in 2012 and the Intel Early Career Faculty Honor Program Award in 2013. He is the TPC co-chair of the eighth Asia-Pacific Services Computing Conference (APSCC 2014). He is an editor board member of the International Journal of Distributed Sensor Networks (IJDSN) and a young associate editor of Frontiers of Computer Science (FCS). He is a member of the IEEE and ACM.
Xiaolong Cui is a master student in the School of Computer Science and Technology at Huazhong University of Science and Technology. His research interests include online social networks.
Hai Jin received the Ph.D. degree in Computer Engineering from the Huazhong University of Science and Technology (HUST) in 1994. He is a Cheung Kung Scholars chair professor of Computer Science and Engineering at the HUST in China. He is now the dean of the School of Computer Science and Technology at HUST. He worked at The University of Hong Kong between 1998 and 2000, and as a visiting scholar at the University of Southern California between 1999 and 2000. He is the chief scientist of ChinaGrid, the largest grid computing project in China, and the chief scientist of National 973 Basic Research Program Project of Virtualization Technology of Computing System. He is the member of Grid Forum Steering Group. He has coauthored 15 books and published more than 400 research papers. His research interests include computer architecture, virtualization technology, cluster computing and grid computing, peer-to-peer computing, network storage, and network security. He is the steering committee chair of International Conference on Grid and Pervasive Computing, Asia-Pacific Services Computing Conference, International Conference on Frontier of Computer Science and Technology, and Annual ChinaGrid Conference. He is a member of the steering committee of the IEEE/ACM International Symposium on Cluster Computing and the Grid, the IFIP International Conference on Network and Parallel Computing, and the International Conference on Grid and Cooperative Computing, International Conference on Autonomic and Trusted Computing, and International Conference on Ubiquitous Intelligence and Computing. In 1996, he was awarded a German Academic Exchange Service fellowship to visit the Technical University of Chemnitz in Germany. He was awarded Excellent Youth Award from the National Science Foundation of China in 2001. He is a senior member of the IEEE and a member of the ACM.