Abstract
Sockpuppet detection is a valuable and challenging issue in social network. Current works are continually making efforts to detect sockpuppet based on verbal, non-verbal or network-structure features. However, they do not consider the propagation characteristic and propagation structure of sockpuppet. With our observation, the propagation trees of sockpuppet and ordinary account are different. Sockpuppet’ propagation tree is evidently wider and deeper than that of the ordinary one. Based on these observations, we propose a propagation-structure based method to tackle sockpuppet detection problem. The experiment on two real-world datasets of Sina Weibo demonstrates that our method is more robust and outperforms previous methods.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Social networks become a preferential place for information propagation or opinions and to promote ideas [12]. The malicious accounts on social networks lead to serious risks [9]. When the malicious accounts are detected and blocked, they register some new accounts called sockpuppets to continue spreading information. Sockpuppets usually produce malicious and deceptive behavior, such as fraud [11], cyberbullying [2], hate speech [6], and rumors [8]. Therefore, sockpuppet detection is valuable and challenging research issue. We broadly define puppetmaster as an individual that manipulate more than one account.
Prior works on automatic sockpuppet detection have tended to focus on verbal [9], non-verbal [10] and network-structure [7] features. The verbal-based method identify the authorship attribution of sockpuppet [3] by extracting features that capture stylistic, grammatical, and formatting preferences of the authors on 77 groups in Wikipedia and comparing the writing style of account [9]. It assumes that sockpuppets have a similar linguistic preference, such as keywords and topic titles in online discussion forum [15]. [4] is based on byte-level n-grams which are language independent. However, smart puppetmasters would disguise by altering account profile and writing style. Thus non-verbal methods assume that the non-verbal behavior indicates the intention of puppetmasters, [13] extracts 11 features from contribution’s behavior of the accounts, and applies the community detection algorithm to detect sockpuppet group based on the action graph and relationship graph. But most non-verbal features are not fit for different platforms. Existing network structure-based detection methods are subjectively based on user views or emotional similarities. Bu et al. [1] proposed a sockpuppet detection algorithm based on authorship-identification techniques and relationship analysis. The relationships between two accounts are built if they have a similar attitude and similar writing styles. Besides, Kumar et al. [5] constructs the reply network on discussion community and observes that the nodes denoting sockpuppets were more central and highly active. Some community detection based methods have been proposed to leverage the network structure to detect sockpuppet. However, these existing methods almost ignore the propagation characteristic and structure.
In this work, we observe that the differences of propagation trees between sockpuppet and ordinary account which are unusual patterns ignored in previous works. Sockpuppet’ propagation tree contains more identical accounts and is unexpectedly wider and deeper than that of the ordinary ones. In addition, the sockpuppet tend to build similar propagation trees. To utilize these patterns of the observations, we construct the propagation tree to detect sockpuppet and extract a set of independent features from propagation tree to detect sockpuppet. To validate the effectiveness, we collect two real-world data sets from Sina WeiboFootnote 1. The experiment demonstrates that our method outperforms previous methods.
2 Problem Formulation
Suppose \(G=(V,E)\) be a social network, where V is a set of accounts, \(E \in V \times V\) is a set of repost relationship, and \(e_{vu}^i \in E\) denotes repost relationship of message i between account v and u(\(v, u \in V\)) which reflects propagation of information over G. We formally define the sockpuppet detection problem as: given a set of accounts \(U(U\subset V)\), it aims to classify account \(u_i\)(\(u_i\in U\)) as a sockpuppet account or ordinary account.
3 Observations
We engage in investigation of the difference sockpuppet and ordinary account. (1) Difference between sockpuppet and ordinary account. How difference between sockpuppet and ordinary account on dimensions of propagation tree? The number of identical nicknames. (2) The difference of pairwise accounts. Are the propagation behavior of two individual sockpuppets in the same sockpuppets group more similar than sockpuppet-ordinary account pair?
Difference Between Sockpuppet and Ordinary Account. Combined with Fig. 1b and c, the sockpuppet tend to participate in same discussion of post more than once, in order to maximize the influence of the post. According to structural character, the propagation tree of sockpuppet is deeper and highlights that the message is reposted by sockpuppet will be spread far (1.86 vs 1.75) and wider (4.15 vs 3.51).
Difference of Pairwise Accounts. Figure 2 shows the sockpuppets pair is more similar than others through three dimensions: size, depth, and width. It is reasonable that the pairwise sockpuppets behave similarly. It indicates that it is hard for puppetmaster to disguise their identity on propagation behavior.
To sum up, we have several discoveries that sockpuppet tend to repost from the other sockpuppet and the message which is reposted by sockpuppet have a wider propagation range than ordinary account. The pairwise sockpuppets tend to behave similarly to each other, in order to enhance the influence of sockpuppets group opinion.
4 Methodology
4.1 Propagation Tree Construction
Similar to TwitterFootnote 2, there are two types of posts in Sina Weibo: original posts (tweets) and reposts (retweets). Each reposting log will represents an information propagation process, such as “wow//!B:wonderful//@C:lol”. Based on the practice of refereeing to another account in a tweet via “//@username” convention [14], we extract the usernames from reposting log and construct the propagation trees to represent the information propagation process of an account (Fig. 3).
4.2 Sockpuppet Account Detection
Given an account u and constructed the propagation trees of account u. Our method capture propagation behavior features fall into tree types: average value, minimum value and standard deviation. The average value of dimension can be seen in the following term:
Number of posts (\({Np}_u\)): We count the size of set of propagation tree of account u(\(D_u\)). This is a typical feature that depicts the activity frequency of accounts in social network.
Average depth of propagation tree (\({Ad}_u\)): For this feature, we just count maximum depth \({dp}_i\) of \(d^u_i\). This presents the delay in the message i propagation of account u. \({Ad}_u=\sum _{i=0}^{{Nd}_u}\frac{{dp}_i}{{Nd}_u}\), where \({Nd}_u\) is the size of \(D_u\).
Average size of propagation tree (\({As}_u\)): We count the total number of account (\({ds}_i\)) of propagation tree of the original message i which account u latest participated (\(d^u_i\)). While this feature is trying to capture the coverage of message i which the account u is participated in: \({As}_u=\sum _{i=0}^{{Nd}_u}\frac{{ds}_i}{{Nd}_u}\)
Average number of identical account in tree (\({Au}_u\)): The goal of this features \({dn}_i\) which is the number of the same nickname of \(d^u_i\) is to model the participation rates of account in the \(d^u_i\). Some accounts prefer to interact with others account by reposting their posts: \({Au}_u=\sum _{i=0}^{{Nd}_u}\frac{{dn}_i}{{Nd}_u}\)
Average maximum depth and width (\({Ad}_u\), \({Aw}_u\)): Maximum depth \({dd}_i\) is used for presenting one of dimensions of \(d^u_i\): \({Ad}_u=\sum _{i=0}^{{Nd}_u}\frac{{dd}_i}{{Nd}_u}\). And maximum width \({dw}_i\) is also used for presenting one of dimensions of \(d^u_i\): \({Aw}_u=\sum _{i=0}^{{Nd}_u}\frac{{dw}_i}{{Nd}_u}\)
Average Depth of only one 1-hop repost of original post (\({Ah}_u\)): These feature present the depth \({dh}_i\) of \(d^u_i\) with only one child. \({Ah}_u=\sum _{i=0}^{{Nd}_u}\frac{{dh}_i}{{Nd}_u}\)
Average number of children of propagation tree (\({Ac}_u\)): We take into consideration the number of children \({dc}_i\), which represents the diversity of \(d^u_i\). We contain the propagation tree with single child: \({Ac}_u=\sum _{i=0}^{{Nd}_u}\frac{{dc}_i}{{Nd}_u}\)
Average index of type of posts (\({Pm}_u\)): The type of posts \(p_t\) can be divided three types with index of type: posting (1), replying (2) and reposting (3). \({Pm}_u=\sum _{t=0}^{{Np}_u}\frac{{p_t}}{{Np}_u}\)
Average interval between interactions (\({Pi}_u\)): This is a normalized feature where we compute the time difference between the t-th post \(p_t\) and the prior one \(p_{t-1}\). It presents the frequency of which the account u uses the social network: \({Pi}_u=\sum _{i=0}^{{Np}_u}\frac{p_t-p_{t-1}}{{Np}_u}\).
5 Experimental
5.1 Experimental Setup
Datasets. We conduct experiments on two real-world \(\mathcal {D}_{\mathcal {S}}\) and \(\mathcal {D}_{\mathcal {T}}\) which we crawled tweets from 2017.01 to 2018.10. from Sina Weibo. Accounts are identified as sockpuppets when self-reported sentence pattern such as “This is a sockpuppet of Mix” is matched or other accounts identify them as being controlled by a puppetmaster. Ordinary accounts are randomly selected from the accounts interact with sockpuppets and are not correlated to sockpuppets.
Comparison Method. We consider the following baselines in sockpuppet detection. Profile Attributes Features: User profile is the basic information for each account, such as nickname and description. It reflects the lexical preference of puppetmaster. We employ attributes of accounts’ homepage and the number of diversity of login device for sockpuppets detection problem. Verbal Features (Verbal) [9]: The basis of authorship attributes sockpuppets detection in Wikipedia tries to identify the sockpuppet pair by comparing writing style. It extracts 245 verbal features from each comment of account. Non-verbal Features (Non-verbal) [10]: It uses several variables to represent user behavior. Variables of online non-verbal behavior fall under time-independent behavior and time-dependent behavior. For all the methods, 10-fold cross validation is performed and the average results are reported.
5.2 Experimental Result and Discussion
We employ five widely used classification metrics for evaluation: precious (P), recall (R), F1-score (F1) and False Positive Rate (FPR). The Table 1 compares several baseline methods and our proposed method over several machine learning algorithms: Logistic regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Adaptive Boosting (ADA). It shows that we obtained the best F1-score using the LR algorithm on different datasets and the LR algorithm appears the most robust among several methods.
Due to some of the malicious sockpuppets are blocked, we cannot access their profile and some puppetmaster will apply diverse profile information in the same sockpuppets groups, the Profile Attributes Based method have the worst performance. Verbal Based method identifies sockpuppet through their linguistic traits which assume that sockpuppet have unique linguistic traits, because smart account could apply different writing style to express their idea. Non-verbal Based method outperform the Verbal Features method. A plausible explanation is that non-verbal cues are more powerful than verbal cues to characterize account. Our method provides better performance, which achieve the best performance in sockpuppet detection. It indicates that the propagation features based method could capture the sockpuppets’ intention.
6 Conclusion
We investigate the difference between the sockpuppet and ordinary account and extract several features from the propagation tree structure to achieve the goal of sockpuppet detection. Then we evaluate the proposed methods on two real-world social network datasets over two subproblems. Compared with several methods, our model shows the best performance.
Notes
References
Bu, Z., Xia, Z., Wang, J.: A sock puppet detection algorithm on virtual spaces. Knowl.-Based Syst. 37, 366–377 (2013)
Chelmis, C., Zois, D.S., Yao, M.: Mining patterns of cyberbullying on Twitter. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 126–133. IEEE (2017)
Hosseinia, M., Mukherjee, A.: Detecting sockpuppets in deceptive opinion spam. In: Gelbukh, A. (ed.) CICLing 2017. LNCS, vol. 10762, pp. 255–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77116-8_19
Kešelj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. In: Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING, vol. 3, pp. 255–264 (2003)
Kumar, S., Cheng, J., Leskovec, J., Subrahmanian, V.: An army of me: sockpuppets in online discussion communities. In: Proceedings of the 26th International Conference on World Wide Web, pp. 857–866. International World Wide Web Conferences Steering Committee (2017)
Lekea, I.K., Karampelas, P.: Detecting hate speech within the terrorist argument: a Greek case. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1084–1091. IEEE (2018)
Liu, D., Wu, Q., Han, W., Zhou, B.: Sockpuppet gang detection on social media sites. Front. Comput. Sci. 10(1), 124–135 (2016)
Ma, J., Gao, W., Wong, K.F.: Rumor detection on twitter with tree-structured recursive neural networks. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). vol. 1, pp. 1980–1989 (2018)
Solorio, T., Hasan, R., Mizan, M.: A case study of sockpuppet detection in Wikipedia. In: Proceedings of the Workshop on Language Analysis in Social Media, pp. 59–68 (2013)
Tsikerdekis, M., Zeadally, S.: Multiple account identity deception detection in social media using nonverbal behavior. IEEE Trans. Inf. Forensics Secur. 9(8), 1311–1321 (2014)
Wang, B., Gong, N.Z., Fu, H.: Gang: detecting fraudulent users in online social networks via guilt-by-association on directed graphs. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 465–474. IEEE (2017)
Yamak, Z., Saunier, J., Vercouter, L.: Detection of multiple identity manipulation in collaborative projects. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 955–960. International World Wide Web Conferences Steering Committee (2016)
Yamak, Z., Saunier, J., Vercouter, L.: Sockscatch: automatic detection and grouping of sockpuppets in social media. Knowl.-Based Syst. 149, 124–142 (2018)
Yang, J., Counts, S.: Predicting the speed, scale, and range of information diffusion in Twitter. In: ICWSM, vol. 10, PP. 355–358 (2010)
Zheng, X., Lai, Y.M., Chow, K.P., Hui, L.C., Yiu, S.M.: Sockpuppet detection in online discussion forums. In: 2011 Seventh International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), pp. 374–377. IEEE (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, J., Zhou, W., Han, J., Hu, S. (2019). Sockpuppet Detection in Social Network via Propagation Tree. In: Rodrigues, J., et al. Computational Science – ICCS 2019. ICCS 2019. Lecture Notes in Computer Science(), vol 11540. Springer, Cham. https://doi.org/10.1007/978-3-030-22750-0_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-22750-0_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22749-4
Online ISBN: 978-3-030-22750-0
eBook Packages: Computer ScienceComputer Science (R0)