skip to main content
research-article

Malicious Account Identification in Social Network Platforms

Published:17 November 2023Publication History
Skip Abstract Section

Abstract

Today, people of all ages are increasingly using Web platforms for social interaction. Consequently, many tasks are being transferred over social networks, like advertisements, political communications, and so on, yielding vast volumes of data disseminated over the network. However, this raises several concerns regarding the truthfulness of such data and the accounts generating them. Malicious users often manipulate data to gain profit. For example, malicious users often create fake accounts and fake followers to increase their popularity and attract more sponsors, followers, and so on, potentially producing several negative implications that impact the whole society. To deal with these issues, it is necessary to increase the capability to properly identify fake accounts and followers. By exploiting automatically extracted data correlations characterizing meaningful patterns of malicious accounts, in this article we propose a new feature engineering strategy to augment the social network account dataset with additional features, aiming to enhance the capability of existing machine learning strategies to discriminate fake accounts. Experimental results produced through several machine learning models on account datasets of both the Twitter and the Instagram platforms highlight the effectiveness of the proposed approach toward the automatic discrimination of fake accounts. The choice of Twitter is mainly due to its strict privacy laws, and because its the only social network platform making data of their accounts publicly available.

REFERENCES

  1. [1] Allias Noormadinah,Noor Megat Norulazmi Megat Mohamed, Ismail Mohd. Taha, and Ismail Mohd. Nazri. 2022. Optimization algorithms: Who own the crown in predicting multi-output key performance index of LTE handover. In Proceedings of the 2022 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS ’22). IEEE, Los Alamitos, CA, 192196.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Anitha R. and Sekar K. R.. 2018. Spammer detection in social network using naïve Bayes. International Journal of Pure and Applied Mathematics 118, 20 (2018), 32673275.Google ScholarGoogle Scholar
  3. [3] Bharti Kusum Kumari and Pandey Shivanjali. 2021. Fake account detection in Twitter using logistic regression with particle swarm optimization. Soft Computing 25, 16 (2021), 1133311345.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Braker Christopher, Shiaeles Stavros, Bendiab Gueltoum, Savage Nick, and Limniotis Konstantinos. 2020. BotSpot: Deep learning classification of bot accounts within Twitter. In Internet of Things, Smart Spaces, and Next Generation Networks and Systems. Springer, 165175.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Sylvio Barbon Jr., Gabriel F. C. Campos, Gabriel M. Tavares, Rodrigo A. Igawa, Mario L. Procenca Jr., and Rodrigo Capobianco Guido. 2018. Detection of human, legitimate bot, and malicious bot in online social networks based on wavelets. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1s (2018), Article 26, 17 pages.Google ScholarGoogle Scholar
  6. [6] Cao Qiang, Sirivianos Michael, Yang Xiaowei, and Pregueiro Tiago. 2012. Aiding the detection of fake accounts in large scale social online services. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 1515.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Caruccio Loredana, Desiato Domenico, and Polese Giuseppe. 2018. Fake account identification in social networks. In Proceedings of the IEEE International Conference on Big Data (Big Data ’18). IEEE, Los Alamitos, CA, 50785085.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Caruccio Loredana, Deufemia Vincenzo, and Polese Giuseppe. 2016. Relaxed functional dependencies—A survey of approaches. IEEE Transactions on Knowledge and Data Engineering 28, 1 (2016), 147165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Caruccio Loredana, Deufemia Vincenzo, and Polese Giuseppe. 2017. Evolutionary mining of relaxed dependencies from big data collections. In Proceedings of the 7th International Conference on Web Intelligence, Mining, and Semantics (WIMS ’17). ACM, New York, NY, 5.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Caruccio Loredana, Deufemia Vincenzo, and Polese Giuseppe. 2020. Mining relaxed functional dependencies from data. Data Mining and Knowledge Discovery 34, 2 (2020), 443477.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Caruccio Loredana, Polese Giuseppe, and Tortora Genoveffa. 2018. Dependency-based query/view synchronization upon schema evolutions. In Proceedings of the International Conference on Conceptual Modeling. 91105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Castillo Carlos, Mendoza Marcelo, and Poblete Barbara. 2011. Information credibility on Twitter. In Proceedings of the 20th International Conference on World Wide Web. ACM, New York, NY, 675684.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Chan Jonathan Cheung-Wai and Paelinckx Desiré. 2008. Evaluation of random forest and AdaBoost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery. Remote Sensing of Environment 112, 6 (2008), 29993011.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Chawla Nitesh V., Bowyer Kevin W., Hall Lawrence O., and Kegelmeyer W. Philip. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321357.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Chu Zi, Gianvecchio Steven, Wang Haining, and Jajodia Sushil. 2012. Detecting automation of Twitter accounts: Are you a human, bot, or cyborg? IEEE Transactions on Dependable and Secure Computing 9, 6 (2012), 811824.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Cohen William, Ravikumar Pradeep, and Fienberg Stephen. 2003. A comparison of string metrics for matching names and records. In Proceedings of the KDD Workshop on Data Cleaning and Object Consolidation, Vol. 3. 7378.Google ScholarGoogle Scholar
  17. [17] Cresci Stefano, Pietro Roberto Di, Petrocchi Marinella, Spognardi Angelo, and Tesconi Maurizio. 2015. Fame for sale: Efficient detection of fake Twitter followers. Decision Support Systems 80 (2015), 5671.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Cresci Stefano, Pietro Roberto Di, Petrocchi Marinella, Spognardi Angelo, and Tesconi Maurizio. 2018. Social fingerprinting: Detection of spambot groups through DNA-inspired behavioral modeling. IEEE Transactions on Dependable and Secure Computing 15, 4 (2018), 561576.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Cristofaro Juan Echeverria, Emiliano De, Kourtellis Nicolas, Leontiadis Ilias, Stringhini Gianluca, and Shi Zhou. 2018. LOBO: Evaluation of generalization deficiencies in Twitter bot classifiers. In Proceedings of the 34th Annual Computer Security Applications Conference. ACM, New York, NY, 137146.Google ScholarGoogle Scholar
  20. [20] Elmagarmid Ahmed K., Ipeirotis Panagiotis G., and Verykios Vassilios S.. 2007. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19, 1 (2007), 116.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Fan Wenfei, Wu Yinghui, and Xu Jingbo. 2016. Functional dependencies for graphs. In Proceedings of the 2016 International Conference on Management of Data. ACM, New York, NY, 18431857.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Gain Ulla and Hotti Virpi. 2021. Low-code autoML-augmented data pipeline—A review and experiments. Journal of Physics: Conference Series 1828 (2021), 012015.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Gupta Srishti, Khattar Abhinav, Gogia Arpit, Kumaraguru Ponnurangam, and Chakraborty Tanmo. 2018. Collective classification of spam campaigners on Twitter: A hierarchical meta-path based approach. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. ACM, New York, NY, 529538.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] He Debiao, Choo Kim-Kwang Raymond, Kumar Neeraj, and Castiglione Aniello. 2018. IEEE Access special section editorial: Research challenges and opportunities in security and privacy of blockchain technologies. IEEE Access 6 (2018), 7203372036.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Huhtala Ykä, Kärkkäinen Juha, Porkka Pasi, and Toivonen Hannu. 1999. TANE: An efficient algorithm for discovering functional and approximate dependencies. Computer Journal 42, 2 (1999), 100111.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Kartini Pirjatullah, Dwi, Nugrahadi Dodon Turianto, Muliadi, and Andi Farmadi. 2021. Hyperparameter tuning using GridsearchCV on the comparison of the activation function of the ELM method to the classification of pneumonia in toddlers. In Proceedings of the 2021 4th International Conference of Computer and Informatics Engineering (IC2IE ’21). IEEE, Los Alamitos, CA, 390395.Google ScholarGoogle Scholar
  27. [27] Kaur Ravneet, Singh Sarbjeet, and Kumar Harish. 2018. Rise of spam and compromised accounts in online social networks: A state-of-the-art review of different combating approaches. Journal of Network and Computer Applications 112 (2018), 5388.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Keerthi S. Sathiya, Shevade Shirish K., Bhattacharyya Chiranjib, and Murthy Krishna R. K.. 2000. A fast iterative nearest point algorithm for support vector machine classifier design. IEEE Transactions on Neural Networks 11, 1 (2000), 124136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Kivinen Jyrki and Mannila Heikki. 1995. Approximate inference of functional dependencies from relations. Theoretical Computer Science 149, 1 (1995), 129149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Kodati Sarangam, Kumbala Pradeep Reddy, Mekala Sreenivas, Murthy P. L. Srinivasa, and Reddy P. Chandra Sekhar. 2021. Detection of fake profiles on Twitter using hybrid SVM algorithm. In E3S Web of Conferences, Vol. 309. EDP Sciences, Ternate, Indonesia, 01046.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Kotsiantis Sotiris B., Kanellopoulos Dimitris, and Pintelas Panagiotis E.. 2006. Data preprocessing for supervised leaning. International Journal of Computer Science 1, 2 (2006), 111117.Google ScholarGoogle Scholar
  32. [32] Kudugunta Sneha and Ferrara Emilio. 2018. Deep neural networks for bot detection. Information Sciences 467 (2018), 312322.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Lakshmi T. Miranda, Sahana R. Josephine, and Venkatesan V. Prasanna. 2018. Identifying spammers in Twitter using minimized feature set. International Research Journal of Engineering and Technology 5, 7 (2018), 23202327.Google ScholarGoogle Scholar
  34. [34] Larsen Erik, Noever David, MacVittie Korey, and Lilly John. 2021. Overhead-MNIST: Machine learning baselines for image classification. arXiv preprint arXiv:2107.00436 (2021).Google ScholarGoogle Scholar
  35. [35] Liu Yu, Wu Bin, Wang Bai, and Li Guanchen. 2014. SDHM: A hybrid model for spammer detection in Weibo. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM ’14). IEEE, Los Alamitos, CA, 942947.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Mazza Michele, Cresci Stefano, Avvenuti Marco, Quattrociocchi Walter, and Tesconi Maurizio. 2019. RTbust: Exploiting temporal patterns for botnet detection on Twitter. In Proceedings of the 11th ACM Conference on Web Science (WebSci ’19). ACM, New York, NY, 183192.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Mujeeb Shaik and Gupta Sangeeta. 2022. Fake account detection in social media using big data analytics. In Proceedings of the 2nd International Conference on Advances in Computer Engineering and Communication Systems. 587596.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Pal Mahesh. 2005. Random forest classifier for remote sensing classification. International Journal of Remote Sensing 26, 1 (2005), 217222.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Ramalingam Devakunchari and Chinnaiah Valliyammai. 2018. Fake profile detection techniques in large-scale online social networks: A comprehensive review. Computers & Electrical Engineering 65 (2018), 165177.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Raturi Rohit. 2018. Machine learning implementation for identifying fake accounts in social network. International Journal of Pure and Applied Mathematics 118, 20 (2018), 47854797.Google ScholarGoogle Scholar
  41. [41] Redelico Francisco O., Traversaro Francisco, García María del Carmen, Silva Walter, Rosso Osvaldo A., and Risk Marcelo. 2017. Classification of normal and pre-ictal EEG signals using permutation entropies and a generalized linear model as a classifier. Entropy 19, 2 (2017), 72.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Roy Pradeep Kumar and Chahar Shivam. 2021. Fake profile detection on social networking websites: A comprehensive review. IEEE Transactions on Artificial Intelligence 1 (2021), 271285.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Sen Indira, Aggarwal Anupama, Mian Shiven, Singh Siddharth, Kumaraguru Ponnurangam, and Datta Anwitaman. 2018. Worth its weight in likes: Towards detecting fake likes on Instagram. In Proceedings of the 10th ACM Conference on Web Science. ACM, New York, NY, 205209.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Dominic Seyler, Lunan Li, and ChengXiang Zhai. 2018. Identifying compromised account on social media using statistical text analysis. arXiv:abs/1804.07247 (2018).Google ScholarGoogle Scholar
  45. [45] Stringhini Gianluca, Kruegel Christopher, and Vigna Giovanni. 2010. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference. ACM, New York, NY, 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Strutt James P. B., Natarajan Meenubharathi, Lee Elizabeth, Barone Paul W., Wolfrum Jacqueline M., Williams Rohan B. H., Sin Wei Xiang, Rice Scott A., and Springs Stacy L.. 2022. Machine-learning based detection of adventitious microbes in T-cell therapy cultures using long read sequencing. bioRxiv 4 (2022), 11.Google ScholarGoogle Scholar
  47. [47] Swain Philip H. and Hauska Hans. 1977. The decision tree classifier: Design and potential. IEEE Transactions on Geoscience Electronics 15, 3 (1977), 142147.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Verleysen Michel and François Damien. 2005. The curse of dimensionality in data mining and time series prediction. In Computational Intelligence and Bioinspired Systems. Lecture Notes in Computer Science, Vol. 3512. Springer, 758–770.Google ScholarGoogle Scholar
  49. [49] Wanda Putra and Jie Huang Jin. 2020. DeepProfile: Finding fake profile in online social network using dynamic CNN. Journal of Information Security and Applications 52 (2020), 102465.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Wang Ran, Ridley Robert, Xi’ao Su, Weiguang Qu, and Xinyu Dai. 2021. A novel reasoning mechanism for multi-label text classification. Information Processing & Management 58, 2 (2021), 102441.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Yang Chao, Harkreader Robert, and Gu Guofei. 2013. Empirical evaluation and new design for fighting evolving Twitter spammers. IEEE Transactions on Information Forensics and Security 8, 8 (2013), 12801293.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Malicious Account Identification in Social Network Platforms

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Internet Technology
          ACM Transactions on Internet Technology  Volume 23, Issue 4
          November 2023
          249 pages
          ISSN:1533-5399
          EISSN:1557-6051
          DOI:10.1145/3633308
          • Editor:
          • Ling Liu
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 November 2023
          • Online AM: 20 September 2023
          • Accepted: 14 September 2023
          • Revised: 2 May 2023
          • Received: 14 February 2023
          Published in toit Volume 23, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)259
          • Downloads (Last 6 weeks)38

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text