Discovering hidden suspicious accounts in online social networks
Introduction
Online social networks (OSNs) have become increasingly important in recent years and are now essential components of daily communication. For instance, the Sina Weibo platform, which is mainly used for disseminating content in Chinese, recorded 198 million users in March 2015 (averaging 89 million users per day), 38% higher than in March 2014 (see [18]).
The speed and breadth of information propagation has encouraged attackers to spread spam messages and malware through OSNs and to phish uniform resource locators (URLs). Such malicious activities may cause serious economic loss and damage to the reputations of the users of target accounts [21]. Suspicious accounts propagating fake or malicious messages can be rapidly and repeatedly forwarded as part of the message flow. Their malicious URLs or fake links may offer attractive information on fake prizes and may redirect users to phishing pages.
The deletion of suspicious accounts has attracted much interest from researchers and company engineers. Herein, we classify the current approaches into three categories. The first category includes behavioral models, which rely on statistical theory and clustering algorithms such as Markov processes and principal component analysis. The second category includes graph analysis, which exploits the relations among Internet users. Most of the recent graph algorithms are extensions of Sybil identification algorithms or are based on link-farming detection. The third and most popular category encompasses machine learning classifiers with multi-feature training models (message-based, account-based, and graph-based features).
Despite these anti-attack efforts and the deployment of new methods on OSNs, suspicious accounts are regular occurrences. Suspicious accounts can easily disguise themselves as hidden accounts, meaning that they evade detection and complete deletion. Machine learning classifiers rely on feature distinction. Under such conditions, if suspicious accounts have disguised their features like normal accounts, they will weaken or even change their related tactics to evade detection [27]. Behavior-based models focus on the differences between normal and suspicious accounts. However, because the behaviors of normal accounts can be altered by fresh social activities, behavior-based modeling increases the false-negative (FN) rate.
Suspicious accounts frequently extend their lifetimes by disguising themselves as normal accounts. They concentrate and propagate suspicious messages in ways that may not be noticed in normal accounts. Indeed, we have discovered the propagation of suspicious messages by detecting unconventional forwarding behavior in Sina Weibo. Messages can be forwarded by users not listed as the user’s followers. Therefore, hidden suspicious accounts (with few or no interactions among each other) try to forward certain suspicious messages in parallel. The social relations among these accounts are sufficiently weak to evade classifiers based on the features of social relations. Such suspicious accounts can certainly remain hidden in OSNs.
Existing efforts detect single messages and single accounts, typically through their connections on social graphs. Suspicious messages are forwarded in hidden suspicious accounts disguised as normal accounts. Although some suspicious accounts can be detected by their social relations, others can forward their behaviors to extend their propagation scopes. Entire sets of hidden suspicious accounts are difficult to identify and wholly delete at any one time. The sparse connections among hidden suspicious accounts appear to invalidate their analysis by social graph-based algorithms.
Therefore, discovering and completely deleting hidden suspicious accounts has become increasingly necessary. Although suspicious accounts effectively hide themselves and reduce their inter-connectivity, they must nonetheless spread malicious URLs or other attacks through messaging. These disseminated messages can be merged based on their forwarding behavior. Furthermore, by determining the internal connections across messages, we can potentially identify hidden suspicious accounts that are not connected through follower relations and, hence, break their future propagation through OSNs.
This paper introduces a forwarding message tree that identifies the internal relations among hidden suspicious accounts. We focus more on the internal relations among the forwarded messages rather than on suspicious accounts. If we can identify message trees constructed by forwarding certain suspicious messages among the flows, we can easily cut the hidden suspicious accounts from the OSNs simultaneously.
To build such forwarding message trees, we collected approximately 500,000 accounts and their corresponding messages from Sina Weibo. We then assigned approximately 178,440 accounts to approximately 243,746 relations and constructed trees based on the forwarding message IDs. By analyzing the trees composed of normal and suspicious messages, we identified several characteristics of suspicious forwarding message trees: (1) reduced persistent periods of propagation, (2) much shallower propagation depths and much broader widths, (3) much shorter median propagation times, (4) higher numbers of repeated forwarded messages, and (5) much smaller average tree weights.
Finally, to test and verify the effectiveness and applicability of these five characteristics, we combined several existing features with features of the proposed forwarding message tree and constructed classifiers based on various machine learning algorithms. The features of our forwarding message trees improved the accuracy rate and false-positive (FP) rate of classification (achieving 95.3% and 0.5%, respectively, on average). In particular, when a classifier based on features not derived from the forwarding tree was applied to a FN dataset, most of the hidden suspicious accounts were misclassified. Moreover, most of the proposed classification features ranked at the top of a gain ranking.
In summary, this paper makes the following contributions.
- •
It describes a method that can discover hidden suspicious accounts that are sparsely connected on a social graph, but which forward certain suspicious messages together. These messages are not usually detected by graph-based methods because the accounts sending such messages disguise themselves as normal accounts for future potential propagation.
- •
It proposes a forwarding message tree that collects messages forwarded from the same source. The relations among these messages clearly reflect the internal relations among their sending accounts, especially the hidden suspicious accounts.
- •
It presents analyses and summaries of the six primary features of our forwarding message tree constructed from both normal and suspicious accounts. A classifier trained with these features improved the accuracy rate and lowered the FP rate, which were determined to be 95.3% and 0.5%, respectively. A gain ranking placed most of our forwarding message tree features in the top ranks.
The remainder of our paper is organized as follows. Related work is reviewed in Section 2. Section 3 introduces the established dataset, and Section 4 proposes the forwarding tree and differentiates between normal (i.e., human-owned) and suspicious accounts. In Section 5, we identify several existing features for the evaluation. The effectiveness of the forwarding tree is then evaluated in Section 6. The limitations of our approach and ideas for future work are presented inSection 7, and the paper concludes with Section 8.
Section snippets
Related work
To protect normal users from attacks by suspicious accounts, researchers have proposed several techniques, which are grouped into the three categories listed below.
Dataset
In this section, we first present some background information relevant to the datasets. We then discuss the labeling process and overview the samples used in the evaluation.
Forwarding message tree
To detect hidden suspicious accounts, we designed our forwarding message tree. This section first explains the motivation of the paper and the significance of the forwarding message tree. The forwarding message tree is formally defined in Section 4.2. In Section 4.3, we discuss the mining of the inner collection relations among the messages and their sending accounts and identify effective features that characterize suspicious accounts. The analysis revealed six effective features; namely, the
Detection of hidden suspicious accounts
The given definition and identified effective features of the forwarding message tree do not clarify the effectiveness of suspicious account detection. To prove the efficiency of these features, we trained a classifier using different machine learning algorithms. Feature selection in machine learning is based on different social characteristics. This section first identifies the effective features previously utilized by researches. Each of these features is derived from the information of
Evaluation
The effectiveness of the proposed features derived from the forwarding tree was evaluated by a machine learning method. The classifier was trained on the combined features described in current related research. This section first presents the basic metrics and an evaluation of the classifier performance. Next, the performances of the classifiers trained by the features from the forwarding tree are compared with classifiers trained by existing features. Finally, we discuss the FN and FP rates in
Limitation and future work
This section discusses the limitations of our current work and suggests ideas for future work. First, data collection for OSN studies has been rendered increasingly difficult due to personal privacy issues. As mentioned above, seeking cooperation with OSN companies may resolve this problem for researchers. Second, the present study verified the effectiveness of forwarding tree features by machine learning methods. Employing the approaches used in previous studies would have yielded fairer
Conclusion
Suspicious accounts in OSNs have become much more sophisticated in recent years and can effectively hide among normal accounts. In this study, we first analyzed the interconnections between hidden suspicious accounts by deriving the forwarding relationships in constructed message trees. The forwarding message tree concept was developed to improve the feature analysis and simplify the identification of hidden suspicious accounts, which tend to be sparsely connected. To verify the effectiveness
Acknowledgment
This work was supported by the National Natural Science Foundation of China under Grant No. 61472162 and 61170265.
References (29)
- et al.
Phishari: automatic realtime phishing detection on twitter
eCrime Researchers Summit (eCrime), 2012
(2012) - et al.
Copycatch: stopping group attacks by spotting lockstep behavior in social networks
Proceedings of the 22nd international conference on World Wide Web
(2013) - et al.
Íntegro: leveraging victim prediction for robust fake account detection in osns
Proc. of NDSS
(2015) - et al.
Detecting spam urls in social media via behavioral analysis
Advances in Information Retrieval
(2015) - et al.
Aiding the detection of fake accounts in large scale social online services
Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
(2012) - et al.
Uncovering large groups of active malicious accounts in online social networks
Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security
(2014) - et al.
Sybilinfer: detecting sybil nodes using social networks.
NDSS
(2009) - et al.
Compa: detecting compromised accounts on social networks.
NDSS
(2013) - et al.
Binspect: Holistic analysis and detection of malicious web pages
Security and Privacy in Communication Networks
(2013) - et al.
Einspect: evolution-guided analysis and detection of malicious web pages
Computer Software and Applications Conference (COMPSAC), 2013 IEEE 37th Annual
(2013)
Leveraging careful microblog users for spammer detection
Proceedings of the 24th International Conference on World Wide Web Companion
Towards online spam filtering in social networks.
NDSS
Understanding and combating link farming in the twitter social network
Proceedings of the 21st international conference on World Wide Web
Cited by (20)
A review on social spam detection: Challenges, open issues, and future directions
2021, Expert Systems with ApplicationsCitation Excerpt :They tested their framework on real-time datasets and found it efficient for classifying different spammers and legitimate users. Cao et al. (2017) implemented a forwarding message tree approach that recognized internal relations based on communication between the sparsely connected hidden suspicious accounts and forward messages by initiating the forwarding message tree concept to remove the identified accounts simultaneously. Further, they compared various classifiers trained from six specified features: average tree weight, propagation depth, repeated forwarding behavior, propagation speed, and propagation breadth, forwarding layer relation.
User trustworthiness in online social networks: A systematic review
2021, Applied Soft ComputingCitation Excerpt :Finally, a classifier was deployed to interpret the components of the feature vector and decide to which class (human, useful bot, or harmful bot) each of the analyzed social media accounts actually belonged to. Meanwhile, Cao et al. [111] extracted features from the message forwarding tree and fed them into a classifier optimized for identifying and flagging dangerous messages. Yang et al. [112] concentrated on profile and structural features to understand how Twitter users perceive messages posted from different countries.
An uncertainty-aware computational trust model considering the co-existence of trust and distrust in social networks
2020, Information SciencesCitation Excerpt :It is discussed that patterns like the star are highly probable to be the spammer pattern [2, 5]. On the other hand, Cao et al., introduce the factors of suspicious behavior as follows [6]: Age of an account: Suspicious accounts must repeatedly create new accounts to keep pace with the periodic detections performed by the monitoring tools.
Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders
2018, Information SciencesCitation Excerpt :Cao proposed a method that discovers hidden suspicious accounts using a forwarding message tree. They exposed the inner relationships among hidden suspicious accounts [7]. They developed methods to extract features efficiently for capturing modification and scalability from simple comparison using machine learning.
Recognizing human behaviours in online social networks
2018, Computers and SecurityCitation Excerpt :Finally, more recent approaches to anomaly detection in online social networks use social graph metrics and their related properties (Bindu et al., 2017; Kaur and Singh, 2017). For example, relying on the connection between forwarding behaviour and the propagation of malicious URLs, Cao et al. (2016, 2017) use a combination of graph-based features and forwarding-based features to train models of malicious attacks. Another approach which is gaining prominence in anomaly detection works is to track malicious users after they register to the service by analysing their behaviour (Viswanath et al., 2014).
Detecting Malicious Accounts in Online Developer Communities Using Deep Learning
2023, IEEE Transactions on Knowledge and Data Engineering