Elsevier

Information Sciences

Volumes 394–395, July 2017, Pages 123-140
Information Sciences

Discovering hidden suspicious accounts in online social networks

https://doi.org/10.1016/j.ins.2017.02.030Get rights and content

Highlights

  • It describes a method that can discover hidden suspicious accounts that are sparsely connected on a social graph, but which forward certain suspicious messages together.

  • It proposes a forwarding message tree to collect messages forwarding from the same source.

  • A new detection approach focus on accounts in bulk instead of single account or message.

  • Our approach is proved to detect hidden suspicious accounts with significant results.

Abstract

Hidden suspicious accounts are sparsely connected in social graphs; however, certain suspicious messages are usually forwarded in bulk to extend their overall propagation scope. Existing anti-attack methods only detect single messages or accounts. Because most algorithms rely on the connection relations among the accounts in social graphs, they may repeatedly detect the same account. Furthermore, these hidden suspicious accounts cannot be identified and eliminated completely. Therefore, messages forwarded by hidden suspicious accounts should be merged, and the accounts should be eliminated simultaneously rather than individually. This paper introduces the forwarding message tree, which combines accounts based on the relations among their forwarded messages. Our approach clearly exposes the inner relations among hidden suspicious accounts and conveniently deletes those accounts. First, we present the forwarding message tree and identify six effective features: the forwarding layer relation, propagation depth, propagation breadth, repeated forwarding behavior, propagation speed, and average tree weight. Next, to illustrate the effectiveness of these features, we incorporate them into machine learning algorithms. The detection accuracy and false-positive rates for a real dataset collected from an online social network are 95.32% and 0.5%, respectively. Most of the proposed features rate at the top of a gain ranking. We conclude that the forwarding message tree can indeed detect and delete hidden suspicious accounts.

Introduction

Online social networks (OSNs) have become increasingly important in recent years and are now essential components of daily communication. For instance, the Sina Weibo platform, which is mainly used for disseminating content in Chinese, recorded 198 million users in March 2015 (averaging 89 million users per day), 38% higher than in March 2014 (see [18]).

The speed and breadth of information propagation has encouraged attackers to spread spam messages and malware through OSNs and to phish uniform resource locators (URLs). Such malicious activities may cause serious economic loss and damage to the reputations of the users of target accounts [21]. Suspicious accounts propagating fake or malicious messages can be rapidly and repeatedly forwarded as part of the message flow. Their malicious URLs or fake links may offer attractive information on fake prizes and may redirect users to phishing pages.

The deletion of suspicious accounts has attracted much interest from researchers and company engineers. Herein, we classify the current approaches into three categories. The first category includes behavioral models, which rely on statistical theory and clustering algorithms such as Markov processes and principal component analysis. The second category includes graph analysis, which exploits the relations among Internet users. Most of the recent graph algorithms are extensions of Sybil identification algorithms or are based on link-farming detection. The third and most popular category encompasses machine learning classifiers with multi-feature training models (message-based, account-based, and graph-based features).

Despite these anti-attack efforts and the deployment of new methods on OSNs, suspicious accounts are regular occurrences. Suspicious accounts can easily disguise themselves as hidden accounts, meaning that they evade detection and complete deletion. Machine learning classifiers rely on feature distinction. Under such conditions, if suspicious accounts have disguised their features like normal accounts, they will weaken or even change their related tactics to evade detection [27]. Behavior-based models focus on the differences between normal and suspicious accounts. However, because the behaviors of normal accounts can be altered by fresh social activities, behavior-based modeling increases the false-negative (FN) rate.

Suspicious accounts frequently extend their lifetimes by disguising themselves as normal accounts. They concentrate and propagate suspicious messages in ways that may not be noticed in normal accounts. Indeed, we have discovered the propagation of suspicious messages by detecting unconventional forwarding behavior in Sina Weibo. Messages can be forwarded by users not listed as the user’s followers. Therefore, hidden suspicious accounts (with few or no interactions among each other) try to forward certain suspicious messages in parallel. The social relations among these accounts are sufficiently weak to evade classifiers based on the features of social relations. Such suspicious accounts can certainly remain hidden in OSNs.

Existing efforts detect single messages and single accounts, typically through their connections on social graphs. Suspicious messages are forwarded in hidden suspicious accounts disguised as normal accounts. Although some suspicious accounts can be detected by their social relations, others can forward their behaviors to extend their propagation scopes. Entire sets of hidden suspicious accounts are difficult to identify and wholly delete at any one time. The sparse connections among hidden suspicious accounts appear to invalidate their analysis by social graph-based algorithms.

Therefore, discovering and completely deleting hidden suspicious accounts has become increasingly necessary. Although suspicious accounts effectively hide themselves and reduce their inter-connectivity, they must nonetheless spread malicious URLs or other attacks through messaging. These disseminated messages can be merged based on their forwarding behavior. Furthermore, by determining the internal connections across messages, we can potentially identify hidden suspicious accounts that are not connected through follower relations and, hence, break their future propagation through OSNs.

This paper introduces a forwarding message tree that identifies the internal relations among hidden suspicious accounts. We focus more on the internal relations among the forwarded messages rather than on suspicious accounts. If we can identify message trees constructed by forwarding certain suspicious messages among the flows, we can easily cut the hidden suspicious accounts from the OSNs simultaneously.

To build such forwarding message trees, we collected approximately 500,000 accounts and their corresponding messages from Sina Weibo. We then assigned approximately 178,440 accounts to approximately 243,746 relations and constructed trees based on the forwarding message IDs. By analyzing the trees composed of normal and suspicious messages, we identified several characteristics of suspicious forwarding message trees: (1) reduced persistent periods of propagation, (2) much shallower propagation depths and much broader widths, (3) much shorter median propagation times, (4) higher numbers of repeated forwarded messages, and (5) much smaller average tree weights.

Finally, to test and verify the effectiveness and applicability of these five characteristics, we combined several existing features with features of the proposed forwarding message tree and constructed classifiers based on various machine learning algorithms. The features of our forwarding message trees improved the accuracy rate and false-positive (FP) rate of classification (achieving 95.3% and 0.5%, respectively, on average). In particular, when a classifier based on features not derived from the forwarding tree was applied to a FN dataset, most of the hidden suspicious accounts were misclassified. Moreover, most of the proposed classification features ranked at the top of a gain ranking.

In summary, this paper makes the following contributions.

  • It describes a method that can discover hidden suspicious accounts that are sparsely connected on a social graph, but which forward certain suspicious messages together. These messages are not usually detected by graph-based methods because the accounts sending such messages disguise themselves as normal accounts for future potential propagation.

  • It proposes a forwarding message tree that collects messages forwarded from the same source. The relations among these messages clearly reflect the internal relations among their sending accounts, especially the hidden suspicious accounts.

  • It presents analyses and summaries of the six primary features of our forwarding message tree constructed from both normal and suspicious accounts. A classifier trained with these features improved the accuracy rate and lowered the FP rate, which were determined to be 95.3% and 0.5%, respectively. A gain ranking placed most of our forwarding message tree features in the top ranks.

The remainder of our paper is organized as follows. Related work is reviewed in Section 2. Section 3 introduces the established dataset, and Section 4 proposes the forwarding tree and differentiates between normal (i.e., human-owned) and suspicious accounts. In Section 5, we identify several existing features for the evaluation. The effectiveness of the forwarding tree is then evaluated in Section 6. The limitations of our approach and ideas for future work are presented inSection 7, and the paper concludes with Section 8.

Section snippets

Related work

To protect normal users from attacks by suspicious accounts, researchers have proposed several techniques, which are grouped into the three categories listed below.

Dataset

In this section, we first present some background information relevant to the datasets. We then discuss the labeling process and overview the samples used in the evaluation.

Forwarding message tree

To detect hidden suspicious accounts, we designed our forwarding message tree. This section first explains the motivation of the paper and the significance of the forwarding message tree. The forwarding message tree is formally defined in Section 4.2. In Section 4.3, we discuss the mining of the inner collection relations among the messages and their sending accounts and identify effective features that characterize suspicious accounts. The analysis revealed six effective features; namely, the

Detection of hidden suspicious accounts

The given definition and identified effective features of the forwarding message tree do not clarify the effectiveness of suspicious account detection. To prove the efficiency of these features, we trained a classifier using different machine learning algorithms. Feature selection in machine learning is based on different social characteristics. This section first identifies the effective features previously utilized by researches. Each of these features is derived from the information of

Evaluation

The effectiveness of the proposed features derived from the forwarding tree was evaluated by a machine learning method. The classifier was trained on the combined features described in current related research. This section first presents the basic metrics and an evaluation of the classifier performance. Next, the performances of the classifiers trained by the features from the forwarding tree are compared with classifiers trained by existing features. Finally, we discuss the FN and FP rates in

Limitation and future work

This section discusses the limitations of our current work and suggests ideas for future work. First, data collection for OSN studies has been rendered increasingly difficult due to personal privacy issues. As mentioned above, seeking cooperation with OSN companies may resolve this problem for researchers. Second, the present study verified the effectiveness of forwarding tree features by machine learning methods. Employing the approaches used in previous studies would have yielded fairer

Conclusion

Suspicious accounts in OSNs have become much more sophisticated in recent years and can effectively hide among normal accounts. In this study, we first analyzed the interconnections between hidden suspicious accounts by deriving the forwarding relationships in constructed message trees. The forwarding message tree concept was developed to improve the feature analysis and simplify the identification of hidden suspicious accounts, which tend to be sparsely connected. To verify the effectiveness

Acknowledgment

This work was supported by the National Natural Science Foundation of China under Grant No. 61472162 and 61170265.

References (29)

  • A. Aggarwal et al.

    Phishari: automatic realtime phishing detection on twitter

    eCrime Researchers Summit (eCrime), 2012

    (2012)
  • A. Beutel et al.

    Copycatch: stopping group attacks by spotting lockstep behavior in social networks

    Proceedings of the 22nd international conference on World Wide Web

    (2013)
  • Y. Boshmaf et al.

    Íntegro: leveraging victim prediction for robust fake account detection in osns

    Proc. of NDSS

    (2015)
  • C. Cao et al.

    Detecting spam urls in social media via behavioral analysis

    Advances in Information Retrieval

    (2015)
  • Q. Cao et al.

    Aiding the detection of fake accounts in large scale social online services

    Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation

    (2012)
  • Q. Cao et al.

    Uncovering large groups of active malicious accounts in online social networks

    Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security

    (2014)
  • G. Danezis et al.

    Sybilinfer: detecting sybil nodes using social networks.

    NDSS

    (2009)
  • M. Egele et al.

    Compa: detecting compromised accounts on social networks.

    NDSS

    (2013)
  • B. Eshete et al.

    Binspect: Holistic analysis and detection of malicious web pages

    Security and Privacy in Communication Networks

    (2013)
  • B. Eshete et al.

    Einspect: evolution-guided analysis and detection of malicious web pages

    Computer Software and Applications Conference (COMPSAC), 2013 IEEE 37th Annual

    (2013)
  • H. Fu et al.

    Leveraging careful microblog users for spammer detection

    Proceedings of the 24th International Conference on World Wide Web Companion

    (2015)
  • H. Gao et al.

    Towards online spam filtering in social networks.

    NDSS

    (2012)
  • S. Ghosh et al.

    Understanding and combating link farming in the twitter social network

    Proceedings of the 21st international conference on World Wide Web

    (2012)
  • Google, Google safe browsing api (2015)....
  • Cited by (20)

    • A review on social spam detection: Challenges, open issues, and future directions

      2021, Expert Systems with Applications
      Citation Excerpt :

      They tested their framework on real-time datasets and found it efficient for classifying different spammers and legitimate users. Cao et al. (2017) implemented a forwarding message tree approach that recognized internal relations based on communication between the sparsely connected hidden suspicious accounts and forward messages by initiating the forwarding message tree concept to remove the identified accounts simultaneously. Further, they compared various classifiers trained from six specified features: average tree weight, propagation depth, repeated forwarding behavior, propagation speed, and propagation breadth, forwarding layer relation.

    • User trustworthiness in online social networks: A systematic review

      2021, Applied Soft Computing
      Citation Excerpt :

      Finally, a classifier was deployed to interpret the components of the feature vector and decide to which class (human, useful bot, or harmful bot) each of the analyzed social media accounts actually belonged to. Meanwhile, Cao et al. [111] extracted features from the message forwarding tree and fed them into a classifier optimized for identifying and flagging dangerous messages. Yang et al. [112] concentrated on profile and structural features to understand how Twitter users perceive messages posted from different countries.

    • An uncertainty-aware computational trust model considering the co-existence of trust and distrust in social networks

      2020, Information Sciences
      Citation Excerpt :

      It is discussed that patterns like the star are highly probable to be the spammer pattern [2, 5]. On the other hand, Cao et al., introduce the factors of suspicious behavior as follows [6]: Age of an account: Suspicious accounts must repeatedly create new accounts to keep pace with the periodic detections performed by the monitoring tools.

    • Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders

      2018, Information Sciences
      Citation Excerpt :

      Cao proposed a method that discovers hidden suspicious accounts using a forwarding message tree. They exposed the inner relationships among hidden suspicious accounts [7]. They developed methods to extract features efficiently for capturing modification and scalability from simple comparison using machine learning.

    • Recognizing human behaviours in online social networks

      2018, Computers and Security
      Citation Excerpt :

      Finally, more recent approaches to anomaly detection in online social networks use social graph metrics and their related properties (Bindu et al., 2017; Kaur and Singh, 2017). For example, relying on the connection between forwarding behaviour and the propagation of malicious URLs, Cao et al. (2016, 2017) use a combination of graph-based features and forwarding-based features to train models of malicious attacks. Another approach which is gaining prominence in anomaly detection works is to track malicious users after they register to the service by analysing their behaviour (Viswanath et al., 2014).

    • Detecting Malicious Accounts in Online Developer Communities Using Deep Learning

      2023, IEEE Transactions on Knowledge and Data Engineering
    View all citing articles on Scopus
    View full text