research-article

Malicious Account Identification in Social Network Platforms

Authors:
Loredana Caruccio

Department of Computer Science, University of Salerno, Italy

Department of Computer Science, University of Salerno, Italy

0000-0002-2418-1606
View Profile

,
Gaetano Cimino

Department of Computer Science, University of Salerno, Italy

Department of Computer Science, University of Salerno, Italy

0000-0001-8061-7104
View Profile

,
Stefano Cirillo

Department of Computer Science, University of Salerno, Italy

Department of Computer Science, University of Salerno, Italy

0000-0003-0201-2753
View Profile

,
Domenico Desiato

Department of Computer Science, University of Bari Aldo Moro, Italy

Department of Computer Science, University of Bari Aldo Moro, Italy

0000-0002-6327-459X
View Profile

,
Giuseppe Polese

Department of Computer Science, University of Salerno, Italy

Department of Computer Science, University of Salerno, Italy

0000-0002-8496-2658
View Profile

,
Genoveffa Tortora

Department of Computer Science, University of Salerno, Italy

Department of Computer Science, University of Salerno, Italy

0000-0003-4765-8371
View Profile

Authors Info & Claims

ACM Transactions on Internet Technology Volume 23 Issue 4Article No.: 57pp 1–25https://doi.org/10.1145/3625097

Published:17 November 2023Publication History

ACM Transactions on Internet Technology

Abstract

Today, people of all ages are increasingly using Web platforms for social interaction. Consequently, many tasks are being transferred over social networks, like advertisements, political communications, and so on, yielding vast volumes of data disseminated over the network. However, this raises several concerns regarding the truthfulness of such data and the accounts generating them. Malicious users often manipulate data to gain profit. For example, malicious users often create fake accounts and fake followers to increase their popularity and attract more sponsors, followers, and so on, potentially producing several negative implications that impact the whole society. To deal with these issues, it is necessary to increase the capability to properly identify fake accounts and followers. By exploiting automatically extracted data correlations characterizing meaningful patterns of malicious accounts, in this article we propose a new feature engineering strategy to augment the social network account dataset with additional features, aiming to enhance the capability of existing machine learning strategies to discriminate fake accounts. Experimental results produced through several machine learning models on account datasets of both the Twitter and the Instagram platforms highlight the effectiveness of the proposed approach toward the automatic discrimination of fake accounts. The choice of Twitter is mainly due to its strict privacy laws, and because its the only social network platform making data of their accounts publicly available.

REFERENCES

[1] Allias Noormadinah,Noor Megat Norulazmi Megat Mohamed, Ismail Mohd. Taha, and Ismail Mohd. Nazri. 2022. Optimization algorithms: Who own the crown in predicting multi-output key performance index of LTE handover. In Proceedings of the 2022 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS ’22). IEEE, Los Alamitos, CA, 192–196.Google ScholarCross Ref
[2] Anitha R. and Sekar K. R.. 2018. Spammer detection in social network using naïve Bayes. International Journal of Pure and Applied Mathematics 118, 20 (2018), 3267–3275.Google Scholar
[3] Bharti Kusum Kumari and Pandey Shivanjali. 2021. Fake account detection in Twitter using logistic regression with particle swarm optimization. Soft Computing 25, 16 (2021), 11333–11345.Google ScholarDigital Library
[4] Braker Christopher, Shiaeles Stavros, Bendiab Gueltoum, Savage Nick, and Limniotis Konstantinos. 2020. BotSpot: Deep learning classification of bot accounts within Twitter. In Internet of Things, Smart Spaces, and Next Generation Networks and Systems. Springer, 165–175.Google ScholarDigital Library
[5] Sylvio Barbon Jr., Gabriel F. C. Campos, Gabriel M. Tavares, Rodrigo A. Igawa, Mario L. Procenca Jr., and Rodrigo Capobianco Guido. 2018. Detection of human, legitimate bot, and malicious bot in online social networks based on wavelets. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1s (2018), Article 26, 17 pages.Google Scholar
[6] Cao Qiang, Sirivianos Michael, Yang Xiaowei, and Pregueiro Tiago. 2012. Aiding the detection of fake accounts in large scale social online services. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 15–15.Google ScholarDigital Library
[7] Caruccio Loredana, Desiato Domenico, and Polese Giuseppe. 2018. Fake account identification in social networks. In Proceedings of the IEEE International Conference on Big Data (Big Data ’18). IEEE, Los Alamitos, CA, 5078–5085.Google ScholarCross Ref
[8] Caruccio Loredana, Deufemia Vincenzo, and Polese Giuseppe. 2016. Relaxed functional dependencies—A survey of approaches. IEEE Transactions on Knowledge and Data Engineering 28, 1 (2016), 147–165.Google ScholarDigital Library
[9] Caruccio Loredana, Deufemia Vincenzo, and Polese Giuseppe. 2017. Evolutionary mining of relaxed dependencies from big data collections. In Proceedings of the 7th International Conference on Web Intelligence, Mining, and Semantics (WIMS ’17). ACM, New York, NY, 5.Google ScholarDigital Library
[10] Caruccio Loredana, Deufemia Vincenzo, and Polese Giuseppe. 2020. Mining relaxed functional dependencies from data. Data Mining and Knowledge Discovery 34, 2 (2020), 443–477.Google ScholarDigital Library
[11] Caruccio Loredana, Polese Giuseppe, and Tortora Genoveffa. 2018. Dependency-based query/view synchronization upon schema evolutions. In Proceedings of the International Conference on Conceptual Modeling. 91–105.Google ScholarDigital Library
[12] Castillo Carlos, Mendoza Marcelo, and Poblete Barbara. 2011. Information credibility on Twitter. In Proceedings of the 20th International Conference on World Wide Web. ACM, New York, NY, 675–684.Google ScholarDigital Library
[13] Chan Jonathan Cheung-Wai and Paelinckx Desiré. 2008. Evaluation of random forest and AdaBoost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery. Remote Sensing of Environment 112, 6 (2008), 2999–3011.Google ScholarCross Ref
[14] Chawla Nitesh V., Bowyer Kevin W., Hall Lawrence O., and Kegelmeyer W. Philip. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357.Google ScholarCross Ref
[15] Chu Zi, Gianvecchio Steven, Wang Haining, and Jajodia Sushil. 2012. Detecting automation of Twitter accounts: Are you a human, bot, or cyborg? IEEE Transactions on Dependable and Secure Computing 9, 6 (2012), 811–824.Google ScholarDigital Library
[16] Cohen William, Ravikumar Pradeep, and Fienberg Stephen. 2003. A comparison of string metrics for matching names and records. In Proceedings of the KDD Workshop on Data Cleaning and Object Consolidation, Vol. 3. 73–78.Google Scholar
[17] Cresci Stefano, Pietro Roberto Di, Petrocchi Marinella, Spognardi Angelo, and Tesconi Maurizio. 2015. Fame for sale: Efficient detection of fake Twitter followers. Decision Support Systems 80 (2015), 56–71.Google ScholarDigital Library
[18] Cresci Stefano, Pietro Roberto Di, Petrocchi Marinella, Spognardi Angelo, and Tesconi Maurizio. 2018. Social fingerprinting: Detection of spambot groups through DNA-inspired behavioral modeling. IEEE Transactions on Dependable and Secure Computing 15, 4 (2018), 561–576.Google ScholarDigital Library
[19] Cristofaro Juan Echeverria, Emiliano De, Kourtellis Nicolas, Leontiadis Ilias, Stringhini Gianluca, and Shi Zhou. 2018. LOBO: Evaluation of generalization deficiencies in Twitter bot classifiers. In Proceedings of the 34th Annual Computer Security Applications Conference. ACM, New York, NY, 137–146.Google Scholar
[20] Elmagarmid Ahmed K., Ipeirotis Panagiotis G., and Verykios Vassilios S.. 2007. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19, 1 (2007), 1–16.Google ScholarCross Ref
[21] Fan Wenfei, Wu Yinghui, and Xu Jingbo. 2016. Functional dependencies for graphs. In Proceedings of the 2016 International Conference on Management of Data. ACM, New York, NY, 1843–1857.Google ScholarDigital Library
[22] Gain Ulla and Hotti Virpi. 2021. Low-code autoML-augmented data pipeline—A review and experiments. Journal of Physics: Conference Series 1828 (2021), 012015.Google ScholarCross Ref
[23] Gupta Srishti, Khattar Abhinav, Gogia Arpit, Kumaraguru Ponnurangam, and Chakraborty Tanmo. 2018. Collective classification of spam campaigners on Twitter: A hierarchical meta-path based approach. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. ACM, New York, NY, 529–538.Google ScholarDigital Library
[24] He Debiao, Choo Kim-Kwang Raymond, Kumar Neeraj, and Castiglione Aniello. 2018. IEEE Access special section editorial: Research challenges and opportunities in security and privacy of blockchain technologies. IEEE Access 6 (2018), 72033–72036.Google ScholarCross Ref
[25] Huhtala Ykä, Kärkkäinen Juha, Porkka Pasi, and Toivonen Hannu. 1999. TANE: An efficient algorithm for discovering functional and approximate dependencies. Computer Journal 42, 2 (1999), 100–111.Google ScholarCross Ref
[26] Kartini Pirjatullah, Dwi, Nugrahadi Dodon Turianto, Muliadi, and Andi Farmadi. 2021. Hyperparameter tuning using GridsearchCV on the comparison of the activation function of the ELM method to the classification of pneumonia in toddlers. In Proceedings of the 2021 4th International Conference of Computer and Informatics Engineering (IC2IE ’21). IEEE, Los Alamitos, CA, 390–395.Google Scholar
[27] Kaur Ravneet, Singh Sarbjeet, and Kumar Harish. 2018. Rise of spam and compromised accounts in online social networks: A state-of-the-art review of different combating approaches. Journal of Network and Computer Applications 112 (2018), 53–88.Google ScholarDigital Library
[28] Keerthi S. Sathiya, Shevade Shirish K., Bhattacharyya Chiranjib, and Murthy Krishna R. K.. 2000. A fast iterative nearest point algorithm for support vector machine classifier design. IEEE Transactions on Neural Networks 11, 1 (2000), 124–136.Google ScholarDigital Library
[29] Kivinen Jyrki and Mannila Heikki. 1995. Approximate inference of functional dependencies from relations. Theoretical Computer Science 149, 1 (1995), 129–149.Google ScholarDigital Library
[30] Kodati Sarangam, Kumbala Pradeep Reddy, Mekala Sreenivas, Murthy P. L. Srinivasa, and Reddy P. Chandra Sekhar. 2021. Detection of fake profiles on Twitter using hybrid SVM algorithm. In E3S Web of Conferences, Vol. 309. EDP Sciences, Ternate, Indonesia, 01046.Google ScholarCross Ref
[31] Kotsiantis Sotiris B., Kanellopoulos Dimitris, and Pintelas Panagiotis E.. 2006. Data preprocessing for supervised leaning. International Journal of Computer Science 1, 2 (2006), 111–117.Google Scholar
[32] Kudugunta Sneha and Ferrara Emilio. 2018. Deep neural networks for bot detection. Information Sciences 467 (2018), 312–322.Google ScholarCross Ref
[33] Lakshmi T. Miranda, Sahana R. Josephine, and Venkatesan V. Prasanna. 2018. Identifying spammers in Twitter using minimized feature set. International Research Journal of Engineering and Technology 5, 7 (2018), 2320–2327.Google Scholar
[34] Larsen Erik, Noever David, MacVittie Korey, and Lilly John. 2021. Overhead-MNIST: Machine learning baselines for image classification. arXiv preprint arXiv:2107.00436 (2021).Google Scholar
[35] Liu Yu, Wu Bin, Wang Bai, and Li Guanchen. 2014. SDHM: A hybrid model for spammer detection in Weibo. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM ’14). IEEE, Los Alamitos, CA, 942–947.Google ScholarCross Ref
[36] Mazza Michele, Cresci Stefano, Avvenuti Marco, Quattrociocchi Walter, and Tesconi Maurizio. 2019. RTbust: Exploiting temporal patterns for botnet detection on Twitter. In Proceedings of the 11th ACM Conference on Web Science (WebSci ’19). ACM, New York, NY, 183–192.Google ScholarDigital Library
[37] Mujeeb Shaik and Gupta Sangeeta. 2022. Fake account detection in social media using big data analytics. In Proceedings of the 2nd International Conference on Advances in Computer Engineering and Communication Systems. 587–596.Google ScholarCross Ref
[38] Pal Mahesh. 2005. Random forest classifier for remote sensing classification. International Journal of Remote Sensing 26, 1 (2005), 217–222.Google ScholarCross Ref
[39] Ramalingam Devakunchari and Chinnaiah Valliyammai. 2018. Fake profile detection techniques in large-scale online social networks: A comprehensive review. Computers & Electrical Engineering 65 (2018), 165–177.Google ScholarCross Ref
[40] Raturi Rohit. 2018. Machine learning implementation for identifying fake accounts in social network. International Journal of Pure and Applied Mathematics 118, 20 (2018), 4785–4797.Google Scholar
[41] Redelico Francisco O., Traversaro Francisco, García María del Carmen, Silva Walter, Rosso Osvaldo A., and Risk Marcelo. 2017. Classification of normal and pre-ictal EEG signals using permutation entropies and a generalized linear model as a classifier. Entropy 19, 2 (2017), 72.Google ScholarCross Ref
[42] Roy Pradeep Kumar and Chahar Shivam. 2021. Fake profile detection on social networking websites: A comprehensive review. IEEE Transactions on Artificial Intelligence 1 (2021), 271–285.Google ScholarCross Ref
[43] Sen Indira, Aggarwal Anupama, Mian Shiven, Singh Siddharth, Kumaraguru Ponnurangam, and Datta Anwitaman. 2018. Worth its weight in likes: Towards detecting fake likes on Instagram. In Proceedings of the 10th ACM Conference on Web Science. ACM, New York, NY, 205–209.Google ScholarDigital Library
[44] Dominic Seyler, Lunan Li, and ChengXiang Zhai. 2018. Identifying compromised account on social media using statistical text analysis. arXiv:abs/1804.07247 (2018).Google Scholar
[45] Stringhini Gianluca, Kruegel Christopher, and Vigna Giovanni. 2010. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference. ACM, New York, NY, 1–9.Google ScholarDigital Library
[46] Strutt James P. B., Natarajan Meenubharathi, Lee Elizabeth, Barone Paul W., Wolfrum Jacqueline M., Williams Rohan B. H., Sin Wei Xiang, Rice Scott A., and Springs Stacy L.. 2022. Machine-learning based detection of adventitious microbes in T-cell therapy cultures using long read sequencing. bioRxiv 4 (2022), 11.Google Scholar
[47] Swain Philip H. and Hauska Hans. 1977. The decision tree classifier: Design and potential. IEEE Transactions on Geoscience Electronics 15, 3 (1977), 142–147.Google ScholarCross Ref
[48] Verleysen Michel and François Damien. 2005. The curse of dimensionality in data mining and time series prediction. In Computational Intelligence and Bioinspired Systems. Lecture Notes in Computer Science, Vol. 3512. Springer, 758–770.Google Scholar
[49] Wanda Putra and Jie Huang Jin. 2020. DeepProfile: Finding fake profile in online social network using dynamic CNN. Journal of Information Security and Applications 52 (2020), 102465.Google ScholarCross Ref
[50] Wang Ran, Ridley Robert, Xi’ao Su, Weiguang Qu, and Xinyu Dai. 2021. A novel reasoning mechanism for multi-label text classification. Information Processing & Management 58, 2 (2021), 102441.Google ScholarCross Ref
[51] Yang Chao, Harkreader Robert, and Gu Guofei. 2013. Empirical evaluation and new design for fighting evolving Twitter spammers. IEEE Transactions on Information Forensics and Security 8, 8 (2013), 1280–1293.Google ScholarDigital Library

Index Terms

Malicious Account Identification in Social Network Platforms
1. Security and privacy
  1. Human and societal aspects of security and privacy
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database constraints theory
    2. Machine learning theory

Recommendations

Detecting Malicious Users in Social Network via Collaborative Filtering
BDCA'17: Proceedings of the 2nd international Conference on Big Data, Cloud and Applications

As social networking sites have risen in popularity, cyber-criminals started to exploit these sites to spread malwares and to carry out scams. Previous works has extensively studied the use of fake accounts that attackers set up to distribute spam ...
Read More
Identification and Analysis of the Spread of {Mis}information on Social Media
Computational Data and Social Networks
Abstract
With unfolding crises such as the COVID-19 pandemic, it is essential that factual information is dispersed at a rapid pace. One of the major setbacks to mitigating the effects of such crises is misinformation. Advancing technologies such as ...
Read More
Characterizing social cascades in flickr
WOSN '08: Proceedings of the first workshop on Online social networks

Online social networking sites like MySpace and Flickr have become a popular way to share and disseminate content. Their massive popularity has led to the viral marketing of content, products, and political campaigns on the sites themselves. Despite the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Internet Technology Volume 23, Issue 4
November 2023
249 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/3633308
Editor:
Ling Liu
Georgia Institute of Technology, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 November 2023
- Online AM: 20 September 2023
- Accepted: 14 September 2023
- Revised: 2 May 2023
- Received: 14 February 2023
Published in toit Volume 23, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Data management
fake accounts
data profiling
social networks
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 259
  Total Downloads
- Downloads (Last 12 months)259
- Downloads (Last 6 weeks)38
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Malicious Account Identification in Social Network Platforms

ACM Transactions on Internet Technology

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Detecting Malicious Users in Social Network via Collaborative Filtering

Identification and Analysis of the Spread of {Mis}information on Social Media

Characterizing social cascades in flickr

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

Malicious Account Identification in Social Network Platforms

ACM Transactions on Internet Technology

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Detecting Malicious Users in Social Network via Collaborative Filtering

Identification and Analysis of the Spread of {Mis}information on Social Media

Characterizing social cascades in flickr

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media