Abstract
Over the the past decade, social networking services (SNS) have proliferated on the web. The nature of such sites makes identity deception easy, providing a fast means for creating and managing identities, and then connecting with and deceiving others. Fake users are those accounts specifically created for purposes such as stalking or abuse of another user, for slander, or for marketing. The current system for detecting deception depends on behavioral, non-behavioral and user-generated content (UGC) information gathered from users. Although these methods have high detection accuracy, they cannot be implemented in databases with massive volumes of data. To address this issue, this paper proposes an enhanced graph-based semi-supervised learning algorithm (EGSLA) to detect fake users from a large volume of Twitter data. The proposed method encompasses four modules: data collection, feature extraction, classification and decision making. Data collected from Twitter using Scrapy is utilized for the evaluation. The performance of the proposed algorithm is tested with existing game theory, k-nearest neighbor (KNN), support vector machine (SVM) and decision tree techniques. The results show that the proposed EGSLA algorithm achieves 90.3% accuracy in spotting fake users.
Similar content being viewed by others
References
Hanna R, Rohm A, Crittenden VL (2011) We’re all connected: the power of the social media ecosystem. Bus Horiz 54(3):265–273
Doan A, Ramakrishnan R, Halevy AY (2011) Crowdsourcing systems on the World-Wide Web. Commun ACM 54(4):86–96
Ding Y, Yan S, Zhang Y, Dai W, Dong L (2016) Predicting the attributes of social network users using a graph-based machine learning method. Comput Commun 73:3–11
Krombholz K, Merkl D, Weippl E (2012) Fake identities in social media: a case study on the sustainability of the Facebook business model. J Serv Sci Res 4(2):175–212
Chu Z, Gianvecchio S, Wang H, Jajodia S (2012) Detecting automation of Twitter accounts: are you a human, bot, or cyborg? IEEE Trans Dependable Secure Comput 9(6):811–824
Gilani Z, Farahbakhsh R, Tyson G, Wang L, Crowcroft J (2017) Of bots and humans (on Twitter), In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp 349–354. ACM
Qiang F, Feng B, Guo D, Li Q (2018) Combating the evolving spammers in online social networks. Comput Secur 72:60–73
Yang C, Harkreader R, Guofei G (2013) Empirical evaluation and new design for fighting evolving Twitter spammers. IEEE Trans Inf Forensics Secur 8(8):1280–1293
Aggarwal A, Rajadesingan A, Kumaraguru P (2012) PhishAri: Automatic Realtime Phishing Detection on Twitter, eCrime Researchers Summit (eCrime), 2012, pp 1–12, IEEE
Yan G (2013) Peri-Watchdog: hunting for hidden botnets in the periphery of online social networks. Comput Netw 57(2):540–555
Drevs Y, Svodtsev A (2016) Formalization of criteria for social bots detection systems. Procedia-Soc Behav Sci 236:9–13
Farasat A, Nikolaev A, Srihari NS, Blair RH (2015) Probabilistic graphical models in modern social network analysis. Soc Netw Anal Min 5(1):5–62
Ramalingam D, Chinnaiah V (2017) Fake profile detection techniques in large-scale online social networks: a comprehensive review. Comput Electr Eng 65:165–177
Boshmaf Y, Logothetis D, Siganos G, Lería J, Lorenzo J, Ripeanu M, Beznosov K, Halawa H (2016) Íntegro: leveraging victim prediction for robust fake account detection in large scale OSNs. Comput Secur 61:142–168
Escalante HJ, Villatoro-Tello E, Garza SE, López-Monroy AP, Montes-y-Gómez M, Villaseñor-Pineda L (2017) Early detection of deception and aggressiveness using profile-based representations. Expert Syst Appl 89:99–111
Tsikerdekis M (2017) Real-time identity deception detection techniques for social media: optimizations and challenges. IEEE Internet Comput 99:1–11
Kuruvilla AM, Varghese S (2015) A detection system to counter identity deception in social media applications, In: International Conference Circuit, Power and Computing Technologies (ICCPCT), 2015, pp 1–5, IEEE
Gera T, Singh J (2015) A parameterized approach to deal with sock puppets, In: 2015 Third International Conference Computer, Communication, Control and Information Technology (C3IT), pp 1–6, IEEE
Jiang X, Li Q, Ma Z, Dong M, Wu J, Guo D (2018) QuickSquad: a new single-machine graph computing framework for detecting fake accounts in large-scale social networks. Peer-to-Peer Netw Appl 1–18
Yuan W, Yang M, Li H, Wang C, Wang B (2018) End-to-end learning for high-precision lane keeping via multi-state model. CAAI Trans Intell Technol 3:185–190
Shi Q, Lam HK, Xiao B, Tsai SH (2018) Adaptive PID controller based on Q-learning algorithm. CAAI Trans Intell Technol 3(4):235–244
Wang K, Zhu N, Cheng Y, Li R, Zhou T, Long X (2018) Fast feature matching based on r-nearest k-means searching. CAAI Trans Intell Technol 3(4):198–207
BalaAnand M, Karthikeyan N, Karthik S (2018) Designing a framework for communal software: based on the assessment using relation modelling. Int J Parallel Prog. https://doi.org/10.1007/s10766-018-0598-2
Solomon Z, Sivaparthipan CB, Punitha P, BalaAnand M, Karthikeyan N (2018) Certain investigation on power preservation in sensor networks. In: 2018 International Conference on Soft-Computing and Network Security (ICSNS), Coimbatore, India, https://doi.org/10.1109/icsns.2018.8573688
Sivaparthipan CB, Karthikeyan N, Karthik S (2018) Designing statistical assessment healthcare information system for diabetics analysis using big data. Multimed Tools Appl
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
BalaAnand, M., Karthikeyan, N., Karthik, S. et al. An enhanced graph-based semi-supervised learning algorithm to detect fake users on Twitter. J Supercomput 75, 6085–6105 (2019). https://doi.org/10.1007/s11227-019-02948-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-02948-w