Skip to main content
Log in

An enhanced graph-based semi-supervised learning algorithm to detect fake users on Twitter

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Over the the past decade, social networking services (SNS) have proliferated on the web. The nature of such sites makes identity deception easy, providing a fast means for creating and managing identities, and then connecting with and deceiving others. Fake users are those accounts specifically created for purposes such as stalking or abuse of another user, for slander, or for marketing. The current system for detecting deception depends on behavioral, non-behavioral and user-generated content (UGC) information gathered from users. Although these methods have high detection accuracy, they cannot be implemented in databases with massive volumes of data. To address this issue, this paper proposes an enhanced graph-based semi-supervised learning algorithm (EGSLA) to detect fake users from a large volume of Twitter data. The proposed method encompasses four modules: data collection, feature extraction, classification and decision making. Data collected from Twitter using Scrapy is utilized for the evaluation. The performance of the proposed algorithm is tested with existing game theory, k-nearest neighbor (KNN), support vector machine (SVM) and decision tree techniques. The results show that the proposed EGSLA algorithm achieves 90.3% accuracy in spotting fake users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Hanna R, Rohm A, Crittenden VL (2011) We’re all connected: the power of the social media ecosystem. Bus Horiz 54(3):265–273

    Article  Google Scholar 

  2. Doan A, Ramakrishnan R, Halevy AY (2011) Crowdsourcing systems on the World-Wide Web. Commun ACM 54(4):86–96

    Article  Google Scholar 

  3. Ding Y, Yan S, Zhang Y, Dai W, Dong L (2016) Predicting the attributes of social network users using a graph-based machine learning method. Comput Commun 73:3–11

    Article  Google Scholar 

  4. Krombholz K, Merkl D, Weippl E (2012) Fake identities in social media: a case study on the sustainability of the Facebook business model. J Serv Sci Res 4(2):175–212

    Article  Google Scholar 

  5. Chu Z, Gianvecchio S, Wang H, Jajodia S (2012) Detecting automation of Twitter accounts: are you a human, bot, or cyborg? IEEE Trans Dependable Secure Comput 9(6):811–824

    Article  Google Scholar 

  6. Gilani Z, Farahbakhsh R, Tyson G, Wang L, Crowcroft J (2017) Of bots and humans (on Twitter), In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp 349–354. ACM

  7. Qiang F, Feng B, Guo D, Li Q (2018) Combating the evolving spammers in online social networks. Comput Secur 72:60–73

    Article  Google Scholar 

  8. Yang C, Harkreader R, Guofei G (2013) Empirical evaluation and new design for fighting evolving Twitter spammers. IEEE Trans Inf Forensics Secur 8(8):1280–1293

    Article  Google Scholar 

  9. Aggarwal A, Rajadesingan A, Kumaraguru P (2012) PhishAri: Automatic Realtime Phishing Detection on Twitter, eCrime Researchers Summit (eCrime), 2012, pp 1–12, IEEE

  10. Yan G (2013) Peri-Watchdog: hunting for hidden botnets in the periphery of online social networks. Comput Netw 57(2):540–555

    Article  Google Scholar 

  11. Drevs Y, Svodtsev A (2016) Formalization of criteria for social bots detection systems. Procedia-Soc Behav Sci 236:9–13

    Article  Google Scholar 

  12. Farasat A, Nikolaev A, Srihari NS, Blair RH (2015) Probabilistic graphical models in modern social network analysis. Soc Netw Anal Min 5(1):5–62

    Article  Google Scholar 

  13. Ramalingam D, Chinnaiah V (2017) Fake profile detection techniques in large-scale online social networks: a comprehensive review. Comput Electr Eng 65:165–177

    Article  Google Scholar 

  14. Boshmaf Y, Logothetis D, Siganos G, Lería J, Lorenzo J, Ripeanu M, Beznosov K, Halawa H (2016) Íntegro: leveraging victim prediction for robust fake account detection in large scale OSNs. Comput Secur 61:142–168

    Article  Google Scholar 

  15. Escalante HJ, Villatoro-Tello E, Garza SE, López-Monroy AP, Montes-y-Gómez M, Villaseñor-Pineda L (2017) Early detection of deception and aggressiveness using profile-based representations. Expert Syst Appl 89:99–111

    Article  Google Scholar 

  16. Tsikerdekis M (2017) Real-time identity deception detection techniques for social media: optimizations and challenges. IEEE Internet Comput 99:1–11

    Google Scholar 

  17. Kuruvilla AM, Varghese S (2015) A detection system to counter identity deception in social media applications, In: International Conference Circuit, Power and Computing Technologies (ICCPCT), 2015, pp 1–5, IEEE

  18. Gera T, Singh J (2015) A parameterized approach to deal with sock puppets, In: 2015 Third International Conference Computer, Communication, Control and Information Technology (C3IT), pp 1–6, IEEE

  19. Jiang X, Li Q, Ma Z, Dong M, Wu J, Guo D (2018) QuickSquad: a new single-machine graph computing framework for detecting fake accounts in large-scale social networks. Peer-to-Peer Netw Appl 1–18

  20. Yuan W, Yang M, Li H, Wang C, Wang B (2018) End-to-end learning for high-precision lane keeping via multi-state model. CAAI Trans Intell Technol 3:185–190

    Article  Google Scholar 

  21. Shi Q, Lam HK, Xiao B, Tsai SH (2018) Adaptive PID controller based on Q-learning algorithm. CAAI Trans Intell Technol 3(4):235–244

    Article  Google Scholar 

  22. Wang K, Zhu N, Cheng Y, Li R, Zhou T, Long X (2018) Fast feature matching based on r-nearest k-means searching. CAAI Trans Intell Technol 3(4):198–207

    Article  Google Scholar 

  23. BalaAnand M, Karthikeyan N, Karthik S (2018) Designing a framework for communal software: based on the assessment using relation modelling. Int J Parallel Prog. https://doi.org/10.1007/s10766-018-0598-2

    Article  Google Scholar 

  24. Solomon Z, Sivaparthipan CB, Punitha P, BalaAnand M, Karthikeyan N (2018) Certain investigation on power preservation in sensor networks. In: 2018 International Conference on Soft-Computing and Network Security (ICSNS), Coimbatore, India, https://doi.org/10.1109/icsns.2018.8573688

  25. Sivaparthipan CB, Karthikeyan N, Karthik S (2018) Designing statistical assessment healthcare information system for diabetics analysis using big data. Multimed Tools Appl

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. BalaAnand.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

BalaAnand, M., Karthikeyan, N., Karthik, S. et al. An enhanced graph-based semi-supervised learning algorithm to detect fake users on Twitter. J Supercomput 75, 6085–6105 (2019). https://doi.org/10.1007/s11227-019-02948-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02948-w

Keywords

Navigation