Abstract
Social Network sites have become incredibly important in the present day. This popularity attracts the attacker to easily approach a large population and to have access to massive information for performing intrusion activities in Online Social Networks (OSN) including spamming. Spammers not only spread unsolicited messages but also perform malicious activities that harm the user’s financial or personal life and tarnish the reputation of social network platforms. Efficient spam detection requires the selection of relevant features to portray spammer behavior. Most of the existing feature selection techniques use any one of the evaluation measures such as, distance, dependence, consistency, information, and classifier error rate. The feature selection techniques select features from different perspectives based on the evaluation measures. Each evaluation measure produces different subset, and the detection rate differs accordingly. The majority of the existing works focus on the individual feature ranking, and discard the lowest weight feature. Lowest weight feature may produce more accurate prediction if, it is combined with other features. So, there is a need for the feature selection technique that considers the characteristics of all the evaluation measures to produce the appropriate subset, which increases the spam detection rate and assigns a weight for the combination of features. In regard to this, the paper proposes a new multi evaluation measure combined with feature subset selection based on the genetic algorithm, GAMEFEST. The performance of the proposed work has been evaluated using Twitter, Apontador, and YouTube datasets. Experimental results prove that our proposed GAMEFEST with Minimum Surplus Crossover (MSC) improves the efficiency of the learning process and increases the spam detection rate.
Similar content being viewed by others
References
Wang F, Qi S, Gao G, Zhao S, Wang X (2016) Logo information recognition in large-scale social media data. Multimedia Systems 22(1):63–73
Zhao S, Yao H, Gao Y, Ji R, Xie W, Jiang X, Chua TS (2016) Predicting personalized emotion perceptions of social images. In: Proceedings of the 24th ACM international conference on multimedia, pp 1385–1394
Zhao S, Yao H, Gao Y, Ji R, Ding G (2016) Continuous probability distribution prediction of image emotions via multitask shared sparse regression. IEEE Trans Multimedia 19(3):632–645
Zhao S, Gao Y, Ding G, Chua TS (2017) Real-time multimedia social event detection in microblog. IEEE Trans Cybern 48(11):3218–3231
Ghosh S, Viswanath B, Kooti F, Sharma NK, Korlam G, Benevenuto F, Ganguly N, Gummadi KP (2012) Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st international conference on World Wide Web. ACM, Lyon, pp 61–70
Ahmed F, Abulaish M (2013) A generic statistical approach for spam detection in online social networks. Comput Commun 36(10–11):1120–1129
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Wang Y, Wang J, Liao H, Chen H (2017) An efficient semi-supervised representatives feature selection algorithm based on information theory. Pattern Recogn 61:511–523
Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555
Zhang Y, Zhang Z (2012) Feature subset selection with cumulate conditional mutual information minimization. Expert Syst Appl 39(5):6078–6088
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14
Hancer E, Xue B, Zhang M (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl-Based Syst 140:103–119
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), Atlanta, pp 856–863
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: European conference on machine learning. Springer, Berlin/Heidelberg, pp 171–182
Padungweang P, Lursinsap C, Sunat K (2009) Univariate filter technique for unsupervised feature selection using a new laplacian score based local nearest neighbors. In: Information processing, APCIP 2009, vol 2, pp 196–200
Zhang Y, Li S, Wang T, Zhang Z (2013) Divergence-based feature selection for separate classes. Neurocomputing 101:32–42
Liu M, Zhang D (2016) Feature selection with effective distance. Neurocomputing 215:100–109
Covões TF, Hruschka ER (2011) Towards improving cluster-based feature selection with a simplified silhouette filter. Inf Sci 181(18):3766–3782
Hall MA (2000) Correlation-based feature selection of discrete and numeric class machine learning. Ph.D. dissertation, The University of Waikato, Hamilton, New Zealand.
Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312
Chormunge S, Jena S (2018) Correlation based feature selection with clustering for high dimensional data. J Electr Syst Inf Technol 5(3):542–549
Wang Y, Feng L (2018) Hybrid feature selection using component co-occurrence based feature relevance measurement. Expert Syst Appl 102:83–99
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1–2):155–176
Dash M, Liu H, Motoda H (2000) Consistency based feature selection. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin/Heidelberg, pp 98–109
Thaseen IS, Kumar CA (2016) An integrated intrusion detection model using consistency based feature selection and LPBoost. In: Green engineering and technologies (IC-GET), pp 1–6
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671
Chuang LY, Chang HW, Tu CJ, Yang CH (2008) Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 32(1):29–38
Shang XG, Jiang WS (1997) A note on fuzzy information measures. Pattern Recogn Lett 18(5):425–432
Bermejo P, Gámez JA, Puerta JM (2014) Speeding up incremental wrapper feature subset selection with Naive Bayes classifier. Knowl-Based Syst 55:140–147
Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. In: Feature extraction, construction and selection. Springer, Boston, pp 117–136
Maulik U, Bandyopadhyay S, Mukhopadhyay A (2011) Multiobjective genetic algorithms for clustering: applications in data mining and bioinformatics. Springer, Berlin/Heidelberg, pp 25–50
Rahnamayan S, Tizhoosh HR, Salama MM (2007) A novel population initialization method for accelerating evolutionary algorithms. Comput Math Appl 53(10):1605–1614
Ahn CW, Ramakrishna RS (2002) A genetic algorithm for shortest path routing problem and the sizing of populations. IEEE Trans Evol Comput 6(6):566–579
Breiman L, Spector P (1992) Submodel selection and evaluation in regression. The X-random case. Int Stat Rev 60:291–319
Cadenas JM, Garrido MC, MartíNez R (2013) Feature subset selection filter–wrapper based on low quality data. Expert Syst Appl 40(16):6241–6252
Jaganathan P, Kuppuchamy R (2013) A threshold fuzzy entropy based feature selection for medical database classification. Comput Biol Med 43(12):2222–2229
Schwämmle V, Jensen ON (2010) A simple and fast method to determine the parameters for fuzzy c–means cluster analysis. Bioinformatics 26(22):2841–2848
Wu KL (2012) Analysis of parameter selections for fuzzy c-means. Pattern Recogn 45(1):407–415
Nogueira S, Brown G (2016) Measuring the stability of feature selection. In: Joint European conference on machine learning and knowledge discovery in databases, pp 442–457
Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), p 12
Benevenuto F, Rodrigues T, Veloso A, Almeida J, Gonçalves M, Almeida V (2012) Practical detection of spammers and content promoters in online video sharing systems. IEEE Trans Syst Man Cybern B 42(3):688–701
Costa H, Merschmann LH, Barth F, Benevenuto F (2014) Pollution, bad-mouthing, and local marketing: the underground of location-based social networks. Inf Sci 279:123–137
Costa H, Benevenuto F, Merschmann LH (2013) Detecting tip spam in location-based social networks. In: Proceedings of the 28th annual ACM symposium on applied computing, pp 724–729
Selvakumar B, Muneeswaran K (2019) Firefly algorithm based feature selection for network intrusion detection. Comput Secur 81:148–155
Arora S, Priyanka A (2019) Binary butterfly optimization approaches for feature selection. Expert Syst Appl 116:147–160
Raileanu LE, Stoffel K (2004) Theoretical comparison between the gini index and information gain criteria. Ann Math Artif Intell 41(1):77–93
Zheng X, Zeng Z, Chen Z, Yu Y, Rong C (2015) Detecting spammers on social networks. Neurocomputing 159:27–34
Lee C, Lee GG (2006) Information gain and divergence-based feature selection for machine learning-based text categorization. Inf Process Manag 42(1):155–165
Dai J, Xu Q (2013) Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput 13(1):211–221
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Elakkiya, E., Selvakumar, S. GAMEFEST: Genetic Algorithmic Multi Evaluation measure based FEature Selection Technique for social network spam detection. Multimed Tools Appl 79, 7193–7225 (2020). https://doi.org/10.1007/s11042-019-08334-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08334-1