Abstract
Today, people of all ages are increasingly using Web platforms for social interaction. Consequently, many tasks are being transferred over social networks, like advertisements, political communications, and so on, yielding vast volumes of data disseminated over the network. However, this raises several concerns regarding the truthfulness of such data and the accounts generating them. Malicious users often manipulate data to gain profit. For example, malicious users often create fake accounts and fake followers to increase their popularity and attract more sponsors, followers, and so on, potentially producing several negative implications that impact the whole society. To deal with these issues, it is necessary to increase the capability to properly identify fake accounts and followers. By exploiting automatically extracted data correlations characterizing meaningful patterns of malicious accounts, in this article we propose a new feature engineering strategy to augment the social network account dataset with additional features, aiming to enhance the capability of existing machine learning strategies to discriminate fake accounts. Experimental results produced through several machine learning models on account datasets of both the Twitter and the Instagram platforms highlight the effectiveness of the proposed approach toward the automatic discrimination of fake accounts. The choice of Twitter is mainly due to its strict privacy laws, and because its the only social network platform making data of their accounts publicly available.
- [1] . 2022. Optimization algorithms: Who own the crown in predicting multi-output key performance index of LTE handover. In Proceedings of the 2022 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS ’22). IEEE, Los Alamitos, CA, 192–196.Google ScholarCross Ref
- [2] . 2018. Spammer detection in social network using naïve Bayes. International Journal of Pure and Applied Mathematics 118, 20 (2018), 3267–3275.Google Scholar
- [3] . 2021. Fake account detection in Twitter using logistic regression with particle swarm optimization. Soft Computing 25, 16 (2021), 11333–11345.Google ScholarDigital Library
- [4] . 2020. BotSpot: Deep learning classification of bot accounts within Twitter. In Internet of Things, Smart Spaces, and Next Generation Networks and Systems. Springer, 165–175.Google ScholarDigital Library
- [5] . 2018. Detection of human, legitimate bot, and malicious bot in online social networks based on wavelets. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1s (2018), Article 26, 17 pages.Google Scholar
- [6] . 2012. Aiding the detection of fake accounts in large scale social online services. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 15–15.Google ScholarDigital Library
- [7] . 2018. Fake account identification in social networks. In Proceedings of the IEEE International Conference on Big Data (Big Data ’18). IEEE, Los Alamitos, CA, 5078–5085.Google ScholarCross Ref
- [8] . 2016. Relaxed functional dependencies—A survey of approaches. IEEE Transactions on Knowledge and Data Engineering 28, 1 (2016), 147–165.Google ScholarDigital Library
- [9] . 2017. Evolutionary mining of relaxed dependencies from big data collections. In Proceedings of the 7th International Conference on Web Intelligence, Mining, and Semantics (WIMS ’17). ACM, New York, NY, 5.Google ScholarDigital Library
- [10] . 2020. Mining relaxed functional dependencies from data. Data Mining and Knowledge Discovery 34, 2 (2020), 443–477.Google ScholarDigital Library
- [11] . 2018. Dependency-based query/view synchronization upon schema evolutions. In Proceedings of the International Conference on Conceptual Modeling. 91–105.Google ScholarDigital Library
- [12] . 2011. Information credibility on Twitter. In Proceedings of the 20th International Conference on World Wide Web. ACM, New York, NY, 675–684.Google ScholarDigital Library
- [13] . 2008. Evaluation of random forest and AdaBoost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery. Remote Sensing of Environment 112, 6 (2008), 2999–3011.Google ScholarCross Ref
- [14] . 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357.Google ScholarCross Ref
- [15] . 2012. Detecting automation of Twitter accounts: Are you a human, bot, or cyborg? IEEE Transactions on Dependable and Secure Computing 9, 6 (2012), 811–824.Google ScholarDigital Library
- [16] . 2003. A comparison of string metrics for matching names and records. In Proceedings of the KDD Workshop on Data Cleaning and Object Consolidation, Vol. 3. 73–78.Google Scholar
- [17] . 2015. Fame for sale: Efficient detection of fake Twitter followers. Decision Support Systems 80 (2015), 56–71.Google ScholarDigital Library
- [18] . 2018. Social fingerprinting: Detection of spambot groups through DNA-inspired behavioral modeling. IEEE Transactions on Dependable and Secure Computing 15, 4 (2018), 561–576.Google ScholarDigital Library
- [19] . 2018. LOBO: Evaluation of generalization deficiencies in Twitter bot classifiers. In Proceedings of the 34th Annual Computer Security Applications Conference. ACM, New York, NY, 137–146.Google Scholar
- [20] . 2007. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19, 1 (2007), 1–16.Google ScholarCross Ref
- [21] . 2016. Functional dependencies for graphs. In Proceedings of the 2016 International Conference on Management of Data. ACM, New York, NY, 1843–1857.Google ScholarDigital Library
- [22] . 2021. Low-code autoML-augmented data pipeline—A review and experiments. Journal of Physics: Conference Series 1828 (2021), 012015.Google ScholarCross Ref
- [23] . 2018. Collective classification of spam campaigners on Twitter: A hierarchical meta-path based approach. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. ACM, New York, NY, 529–538.Google ScholarDigital Library
- [24] . 2018. IEEE Access special section editorial: Research challenges and opportunities in security and privacy of blockchain technologies. IEEE Access 6 (2018), 72033–72036.Google ScholarCross Ref
- [25] . 1999. TANE: An efficient algorithm for discovering functional and approximate dependencies. Computer Journal 42, 2 (1999), 100–111.Google ScholarCross Ref
- [26] . 2021. Hyperparameter tuning using GridsearchCV on the comparison of the activation function of the ELM method to the classification of pneumonia in toddlers. In Proceedings of the 2021 4th International Conference of Computer and Informatics Engineering (IC2IE ’21). IEEE, Los Alamitos, CA, 390–395.Google Scholar
- [27] . 2018. Rise of spam and compromised accounts in online social networks: A state-of-the-art review of different combating approaches. Journal of Network and Computer Applications 112 (2018), 53–88.Google ScholarDigital Library
- [28] . 2000. A fast iterative nearest point algorithm for support vector machine classifier design. IEEE Transactions on Neural Networks 11, 1 (2000), 124–136.Google ScholarDigital Library
- [29] . 1995. Approximate inference of functional dependencies from relations. Theoretical Computer Science 149, 1 (1995), 129–149.Google ScholarDigital Library
- [30] . 2021. Detection of fake profiles on Twitter using hybrid SVM algorithm. In E3S Web of Conferences, Vol. 309. EDP Sciences, Ternate, Indonesia, 01046.Google ScholarCross Ref
- [31] . 2006. Data preprocessing for supervised leaning. International Journal of Computer Science 1, 2 (2006), 111–117.Google Scholar
- [32] . 2018. Deep neural networks for bot detection. Information Sciences 467 (2018), 312–322.Google ScholarCross Ref
- [33] . 2018. Identifying spammers in Twitter using minimized feature set. International Research Journal of Engineering and Technology 5, 7 (2018), 2320–2327.Google Scholar
- [34] . 2021. Overhead-MNIST: Machine learning baselines for image classification. arXiv preprint arXiv:2107.00436 (2021).Google Scholar
- [35] . 2014. SDHM: A hybrid model for spammer detection in Weibo. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM ’14). IEEE, Los Alamitos, CA, 942–947.Google ScholarCross Ref
- [36] . 2019. RTbust: Exploiting temporal patterns for botnet detection on Twitter. In Proceedings of the 11th ACM Conference on Web Science (WebSci ’19). ACM, New York, NY, 183–192.Google ScholarDigital Library
- [37] . 2022. Fake account detection in social media using big data analytics. In Proceedings of the 2nd International Conference on Advances in Computer Engineering and Communication Systems. 587–596.Google ScholarCross Ref
- [38] . 2005. Random forest classifier for remote sensing classification. International Journal of Remote Sensing 26, 1 (2005), 217–222.Google ScholarCross Ref
- [39] . 2018. Fake profile detection techniques in large-scale online social networks: A comprehensive review. Computers & Electrical Engineering 65 (2018), 165–177.Google ScholarCross Ref
- [40] . 2018. Machine learning implementation for identifying fake accounts in social network. International Journal of Pure and Applied Mathematics 118, 20 (2018), 4785–4797.Google Scholar
- [41] . 2017. Classification of normal and pre-ictal EEG signals using permutation entropies and a generalized linear model as a classifier. Entropy 19, 2 (2017), 72.Google ScholarCross Ref
- [42] . 2021. Fake profile detection on social networking websites: A comprehensive review. IEEE Transactions on Artificial Intelligence 1 (2021), 271–285.Google ScholarCross Ref
- [43] . 2018. Worth its weight in likes: Towards detecting fake likes on Instagram. In Proceedings of the 10th ACM Conference on Web Science. ACM, New York, NY, 205–209.Google ScholarDigital Library
- [44] . 2018. Identifying compromised account on social media using statistical text analysis. arXiv:abs/1804.07247 (2018).Google Scholar
- [45] . 2010. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference. ACM, New York, NY, 1–9.Google ScholarDigital Library
- [46] . 2022. Machine-learning based detection of adventitious microbes in T-cell therapy cultures using long read sequencing. bioRxiv 4 (2022), 11.Google Scholar
- [47] . 1977. The decision tree classifier: Design and potential. IEEE Transactions on Geoscience Electronics 15, 3 (1977), 142–147.Google ScholarCross Ref
- [48] . 2005. The curse of dimensionality in data mining and time series prediction. In Computational Intelligence and Bioinspired Systems. Lecture Notes in Computer Science, Vol. 3512. Springer, 758–770.Google Scholar
- [49] . 2020. DeepProfile: Finding fake profile in online social network using dynamic CNN. Journal of Information Security and Applications 52 (2020), 102465.Google ScholarCross Ref
- [50] . 2021. A novel reasoning mechanism for multi-label text classification. Information Processing & Management 58, 2 (2021), 102441.Google ScholarCross Ref
- [51] . 2013. Empirical evaluation and new design for fighting evolving Twitter spammers. IEEE Transactions on Information Forensics and Security 8, 8 (2013), 1280–1293.Google ScholarDigital Library
Index Terms
- Malicious Account Identification in Social Network Platforms
Recommendations
Detecting Malicious Users in Social Network via Collaborative Filtering
BDCA'17: Proceedings of the 2nd international Conference on Big Data, Cloud and ApplicationsAs social networking sites have risen in popularity, cyber-criminals started to exploit these sites to spread malwares and to carry out scams. Previous works has extensively studied the use of fake accounts that attackers set up to distribute spam ...
Identification and Analysis of the Spread of {Mis}information on Social Media
Computational Data and Social NetworksAbstractWith unfolding crises such as the COVID-19 pandemic, it is essential that factual information is dispersed at a rapid pace. One of the major setbacks to mitigating the effects of such crises is misinformation. Advancing technologies such as ...
Characterizing social cascades in flickr
WOSN '08: Proceedings of the first workshop on Online social networksOnline social networking sites like MySpace and Flickr have become a popular way to share and disseminate content. Their massive popularity has led to the viral marketing of content, products, and political campaigns on the sites themselves. Despite the ...
Comments