Elsevier

Neurocomputing

Volume 210, 19 October 2016, Pages 116-129
Neurocomputing

FriendBurst: Ranking people who get friends fast in a short time

https://doi.org/10.1016/j.neucom.2015.11.124Get rights and content

Abstract

Number of friends (or followers) is an important factor in social network. Attracting friends (or followers) in a short time is a strong indicator of one person for becoming an influential user quickly. Existing studies mainly focus on analyzing the formation of relationship between users, however, the factors that contribute to users' friend (or follower) numbers increment are still unidentified and unquantified. Along this line, based on users' different friends (or followers) increasing speeds, firstly, we get a number of interesting observations on a microblog system (Weibo) and an academic network (Arnetminer) through analyzing their characteristics of structure and content from the diversity and density angles. Then we define attribute factors and correlation factors based on our observations. Finally we propose a partially labeled ranking factor graph model (PLR-FGM) which combines these two kinds of factors to infer a ranking list of the users' friends (or followers) increasing speed. Experimental results show that the proposed PLR-FGM model outperforms several alternative models in terms of normalized discounted cumulative gain (NDCG).

Introduction

With the success of social Web, the online social networks, such as Facebook, Twitter and DBLP, significantly enlarge our social circles. The friends (or followers) in social networks are important resources not only for transferring messages but also for being popular, which can be considered as an crucial indicator of social status. For example, in microblogs, the increment of followers of a user means his/her published contents have more audiences and his/her actions could affect more people. And if a person's follower number has a great “burst” suddenly, he/she would probably become a “new star”. On the other hand, in academic social networks, an author who get more co-authors in a short time means he/she is more active and can build reputation in his/her research area quickly. Understanding the characteristics of users who attract friends fast is an important issue for social influence analysis, which can provide suggestions for users' behaviors and benefit many applications such as “virtual market” and recommendation systems.

Generally speaking, quick increments of friends (or followers) means that the users get new relationships in a short time. In the literature, there exists some studies on relationship analysis, for example, link prediction and unfollow behavior analysis. The goal of link prediction [1], [2] is to predict whether one user will follow another in the future, while unfollow behavior analysis [3], [4] mainly focuses on analyzing the reason of unfollow behaviors. In sum, most of the existing literatures study the formation of friendship between users. But another perspective, the factors that impact on users' friends increasing speed are still unidentified and unquantified. Although Hutto et al. [5] did a longitudinal study on followers increment, where they built a regression model for follower count prediction, the correlation between magnitude of content-based and structural factors and the friends increasing speed is ignored; moreover, their dataset is small.

Different with these works, we want to propose a method to infer the ranking list of friends increasing speed for candidate users. There are several challenges for the friends increasing speed ranking. First, how to capture the rich structural and content-based information for friend increment analysis? Second, how to construct an algorithm to model both the users' attributes and the relationships between users? Third, how to validate the proposed model in real large social networks.

To address the above challenges, we firstly perform some statistical analysis on the correlations between users' friend (or followers) increasing speed and their structural and content-based properties. The analysis is conducted based on two networks, namely, a microblog system (Weibo) and an academic network (Arnetminer). The structural and content-based information are studied with in-depth exploration. For the structure based analysis, we utilize calculations such as diversity and density of circles and structures; for the content based analysis, we also define diversity and density based on topic distribution and hashtags. We then propose a partially labeled ranking factor graph model (PLR-FGM) to infer the ranking list of friends increasing speed. The model cannot only use the structural and contents-based properties of individuals as attribute factors, but also model the relationships between uses as the correlation factors. The ranking list can be obtained by sorting the marginal probabilities which are calculated by the model. Experimental results show that our PLR-FGM model significantly outperforms several alternative methods in terms of normalized discounted cumulative gain (NDCG) with augments ranging from 6% to 51% .

The contributions of this paper are summarized as follows:

  • Based on large datasets from two real social networks – a microblog system (Weibo) and an academic network (Arnetminer), we derive observations and analyze the correlation between users' friends increasing speed and the users' structures from diversity and density angles. Besides, we also analyze the effects of their contents properties (such as circle diversity and density) on the friends increasing speed.

  • We propose a ranking factor graph model which not only incorporates the structural and contents-based properties of individual users but also model the relationships between them. Then we use the model to detect the users who have a higher friends increasing speed in the social network.

  • We conduct experiments on the two real social networks. Experimental results verified the effectiveness of our observation factors, and the proposed model can achieve a better performance than several alternative models.

The paper is organized as follows: In Section 2, we give a brief description of the datasets we used and perform some preprocessing on these two datasets; Section 3 presents our observations on users' friends burst states with their attributes such as structure and content. Section 4 explains the proposed ranking factor graph model. Section 5 illustrates experimental results and validates the effectiveness of our methodology. Finally, Section 6 discusses related work and Section 7 concludes.

Section snippets

Data collection and preprocessing

The datasets we used in this paper are gathered from two different online social networks: a microblog system - Weibo, and an academic network - Arnetminer.

Weibo1 is a Twitter-style website, which is the most popular microblogging service in China. We collected a network of 88,626 users with 27,080,987 posts and 264,799 edges. Besides, we crawled all the users' profiles which contain gender and verification status.

Intuitively, there are two kinds of users for attracting

Observations

Firstly, we denote the friends (or followers) increasing speed as the friends (or followers) increasing count in a certain time interval. The speed is always a continuous value, in order to facilitate our analysis and experiment, we divide the speeds into 5 burst states s={1,2,3,4,5}. Similar with the topic burst detection [7], we assume that all the speeds are generated by five Poisson distributions corresponding to the five states, then which burst state a user is in depends on which Poisson

Partially labeled ranking factor graph model

Based on the observations in Section 3, our goal is to design a model which can rank the friends increasing speed of users by incorporating the properties of structural and content-based information into the network. In this section, we describe the details of the proposed model.

Experimental results

In this section, we conduct several experiments based on the partially labeled ranking factor graph model to evaluate the effectiveness of the structural and contents-based properties. Firstly, we use the One-way ANOVA (analysis of variance) [18] to test the significance of our observations; then we present the performance of the comparative methods and our model. In the case of PLR-FGM, we conduct analysis on the feature contributions and iteration performance. As the ranges of the features

Related work

In recent years, there are some researches about the social network analysis have been conducted [23], [24], [25], [26]. Moreover, there exist some analysis on relationships of online social network [1], [2], [3], [4], [5]. Golder et al. [1] analyzed two structural characteristics, transitivity and mutuality and proposed a hierarchical regression model to predicted the tie formation. Kwak et al. [3], [4] analyzed the structural properties and actions, and studied the unfollow behavior.

Conclusion

In this paper, we study a novel problem of identifying and quantifying which factors cause users' friends (or followers) number increasing fast. Focusing on the friends increasing speed, we analyze the properties of structure and content from the diversity and density angles and get some interesting observations from two typical social networks – a microblog system (Weibo) and an academic network (Arnetminer). We analyze the observations and conduct statistical evaluations. We formally define

Acknowledgments

This work is funded by the National Program on Key Basic Research Project (973 Program, Grant No. 2013CB329600), National Natural Science Foundation of China (NSFC, Grant Nos. 61472040, 60873237, and 61300178), and Beijing Higher Education Young Elite Teacher Project (Grant No. YETP1198), and Basic Research Foundation of BIT.

Li Liu is currently working toward the Ph.D. degree in the School of Computer Science and Technology, Beijing Institute of Technology. His research interests include social influence analysis and data mining.

References (47)

  • S.A. Golder, S. Yardi, Structural predictors of tie formation in twitter: transitivity and mutuality, in: 2010 IEEE...
  • D. Liben-Nowell et al.

    The link prediction problem for social networks

    J. Am. Soc. Inf. Sci. Technol.

    (2007)
  • H. Kwak, H. Chun, S. Moon, Fragile online relationship: a first look at unfollow dynamics in twitter, in: Proceedings...
  • H. Kwak, S.B. Moon, W. Lee, More of a receiver than a giver: why do people unfollow in twitter?, in: ICWSM,...
  • C. Hutto, S. Yardi, E. Gilbert, A longitudinal study of follow predictors on twitter, in: Proceedings of the SIGCHI...
  • J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, Z. Su, Arnetminer: Extraction and mining of academic social networks, in:...
  • Q. Diao, J. Jiang, F. Zhu, E.-P. Lim, Finding bursty topics from microblogs, in: Proceedings of the 50th Annual Meeting...
  • J. Zhang, B. Liu, J. Tang, T. Chen, J. Li, Social influence locality for modeling retweeting behaviors, in: Proceedings...
  • D.-B. Chen et al.

    Identifying influential nodes in large-scale directed networksthe role of clustering

    PloS One

    (2013)
  • C.E. Shannon

    A mathematical theory of communication

    ACM SIGMOBILE Mob. Comput. Commun. Rev.

    (2001)
  • D.M. Blei, A.Y. Ng, M.I. Jordan, Latent Dirichlet allocation, J. Mach. Learn. Res. 3 (2003)...
  • R. Yan, J. Tang, X. Liu, D. Shan, X. Li, Citation count prediction: learning to estimate future citations for...
  • Y.-C. Wang, R. Kraut, Twitter and the development of an audience: those who stay on topic thrive!, in: Proceedings of...
  • Y. Dong, J. Tang, S. Wu, J. Tian, N.V. Chawla, J. Rao, H. Cao, Link prediction and recommendation across heterogeneous...
  • Hammersley JM, Clifford P. Markov field on finite graphs and lattices; 1971....
  • W. Wiegerinck, Variational approximations between mean field theory and the junction tree algorithm, in: Proceedings of...
  • K.P. Murphy, Y. Weiss, M.I. Jordan, Loopy belief propagation for approximate inference: An empirical study, in:...
  • B.G. Tabachnick, L.S. Fidell, et al., Using Multivariate Statistics,...
  • K. Järvelin et al.

    Cumulated gain-based evaluation of ir techniques

    ACM Trans. Inf. Syst. (TOIS)

    (2002)
  • T. Joachims, Optimizing search engines using clickthrough data, in: Proceedings of the Eighth ACM SIGKDD International...
  • C.-P. Lee et al.

    Large-scale linear ranksvm

    Neural Comput.

    (2014)
  • D. Metzler et al.

    Linear feature-based models for information retrieval

    Inf. Retr.

    (2007)
  • M. Wang et al.

    Towards a relevant and diverse search of social images

    IEEE Trans. Multimed.

    (2010)
  • Cited by (2)

    Li Liu is currently working toward the Ph.D. degree in the School of Computer Science and Technology, Beijing Institute of Technology. His research interests include social influence analysis and data mining.

    Dandan Song received a B.E. degree and a Ph.D. degree from the Department of Computer Science and Technology, Tsinghua University, Beijing, China, in 2004 and 2009, respectively. She is currently an Associate Professor in the School of Computer Science and Technology, Beijing Institute of Technology. Her research interests include information retrieval, data mining and bioinformatics.

    Jie Tang is an Associate Professor at the Department of Computer Science and Technology, Tsinghua University. His main research interests include data mining algorithms and social network theories. He has been visiting scholar at Cornell University, Chinese University of Hong Kong, Hong Kong University of Science and Technology, and Leuven University. He has published over 100 research papers in major international journals and conferences including: KDD, IJCAI, WWW, SIGMOD, ACL, Machine Learning Journal, TKDD, TKDE, and JWS.

    Lejian Liao received a Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences. He is currently a Professor in the School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China. He also served as Vice Dean of the school. With main research interest in machine learning, natural language processing and intelligent network, Professor Lejian Liao has published numerous papers in several areas of computer science.

    Xin Li received the B.Sc. and M.S. degrees in computer science from Jilin University of China and the Ph.D. degree in computer science in 2009 from Hong Kong Baptist University. She was a Postdoctoral Teaching Fellow in the Department of Computer Science, Hong Kong Baptist University, from 2009 to 2010. She is an Assistant Professor in the School of Computer Science and Technology, Beijing Institute of Technology. Her current research interests include artificial intelligence and machine learning, as well as their applications to social network analysis, wireless sensor networks, vehicular networks, and planning under uncertainty.

    Jianguang Du received a B.E. degree from the School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China, in 2010. He is currently working toward the Ph.D. degree in the School of Computer Science and Technology, Beijing Institute of Technology. His research interests include topic modeling and data mining.

    View full text