GAMEFEST: Genetic Algorithmic Multi Evaluation measure based FEature Selection Technique for social network spam detection

Elakkiya, E.; Selvakumar, S.

doi:10.1007/s11042-019-08334-1

GAMEFEST: Genetic Algorithmic Multi Evaluation measure based FEature Selection Technique for social network spam detection

Published: 20 December 2019

Volume 79, pages 7193–7225, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

395 Accesses
5 Citations
Explore all metrics

Abstract

Social Network sites have become incredibly important in the present day. This popularity attracts the attacker to easily approach a large population and to have access to massive information for performing intrusion activities in Online Social Networks (OSN) including spamming. Spammers not only spread unsolicited messages but also perform malicious activities that harm the user’s financial or personal life and tarnish the reputation of social network platforms. Efficient spam detection requires the selection of relevant features to portray spammer behavior. Most of the existing feature selection techniques use any one of the evaluation measures such as, distance, dependence, consistency, information, and classifier error rate. The feature selection techniques select features from different perspectives based on the evaluation measures. Each evaluation measure produces different subset, and the detection rate differs accordingly. The majority of the existing works focus on the individual feature ranking, and discard the lowest weight feature. Lowest weight feature may produce more accurate prediction if, it is combined with other features. So, there is a need for the feature selection technique that considers the characteristics of all the evaluation measures to produce the appropriate subset, which increases the spam detection rate and assigns a weight for the combination of features. In regard to this, the paper proposes a new multi evaluation measure combined with feature subset selection based on the genetic algorithm, GAMEFEST. The performance of the proposed work has been evaluated using Twitter, Apontador, and YouTube datasets. Experimental results prove that our proposed GAMEFEST with Minimum Surplus Crossover (MSC) improves the efficiency of the learning process and increases the spam detection rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

PRISMO: Priority Based Spam Detection Using Multi Optimization

A feature selection approach for spam detection in social networks using gravitational force-based heuristic algorithm

Article 16 July 2021

A Feature Selection Approach to Detect Spam in the Facebook Social Network

Article 20 October 2017

References

Wang F, Qi S, Gao G, Zhao S, Wang X (2016) Logo information recognition in large-scale social media data. Multimedia Systems 22(1):63–73
Google Scholar
Zhao S, Yao H, Gao Y, Ji R, Xie W, Jiang X, Chua TS (2016) Predicting personalized emotion perceptions of social images. In: Proceedings of the 24th ACM international conference on multimedia, pp 1385–1394
Google Scholar
Zhao S, Yao H, Gao Y, Ji R, Ding G (2016) Continuous probability distribution prediction of image emotions via multitask shared sparse regression. IEEE Trans Multimedia 19(3):632–645
Google Scholar
Zhao S, Gao Y, Ding G, Chua TS (2017) Real-time multimedia social event detection in microblog. IEEE Trans Cybern 48(11):3218–3231
Google Scholar
Ghosh S, Viswanath B, Kooti F, Sharma NK, Korlam G, Benevenuto F, Ganguly N, Gummadi KP (2012) Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st international conference on World Wide Web. ACM, Lyon, pp 61–70
Google Scholar
Ahmed F, Abulaish M (2013) A generic statistical approach for spam detection in online social networks. Comput Commun 36(10–11):1120–1129
Google Scholar
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Google Scholar
Wang Y, Wang J, Liao H, Chen H (2017) An efficient semi-supervised representatives feature selection algorithm based on information theory. Pattern Recogn 61:511–523
MATH Google Scholar
Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555
MathSciNet MATH Google Scholar
Zhang Y, Zhang Z (2012) Feature subset selection with cumulate conditional mutual information minimization. Expert Syst Appl 39(5):6078–6088
Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Google Scholar
Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14
Google Scholar
Hancer E, Xue B, Zhang M (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl-Based Syst 140:103–119
Google Scholar
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), Atlanta, pp 856–863
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: European conference on machine learning. Springer, Berlin/Heidelberg, pp 171–182
Google Scholar
Padungweang P, Lursinsap C, Sunat K (2009) Univariate filter technique for unsupervised feature selection using a new laplacian score based local nearest neighbors. In: Information processing, APCIP 2009, vol 2, pp 196–200
Google Scholar
Zhang Y, Li S, Wang T, Zhang Z (2013) Divergence-based feature selection for separate classes. Neurocomputing 101:32–42
Google Scholar
Liu M, Zhang D (2016) Feature selection with effective distance. Neurocomputing 215:100–109
Google Scholar
Covões TF, Hruschka ER (2011) Towards improving cluster-based feature selection with a simplified silhouette filter. Inf Sci 181(18):3766–3782
Google Scholar
Hall MA (2000) Correlation-based feature selection of discrete and numeric class machine learning. Ph.D. dissertation, The University of Waikato, Hamilton, New Zealand.
Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312
Google Scholar
Chormunge S, Jena S (2018) Correlation based feature selection with clustering for high dimensional data. J Electr Syst Inf Technol 5(3):542–549
Google Scholar
Wang Y, Feng L (2018) Hybrid feature selection using component co-occurrence based feature relevance measurement. Expert Syst Appl 102:83–99
Google Scholar
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1–2):155–176
MathSciNet MATH Google Scholar
Dash M, Liu H, Motoda H (2000) Consistency based feature selection. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin/Heidelberg, pp 98–109
Google Scholar
Thaseen IS, Kumar CA (2016) An integrated intrusion detection model using consistency based feature selection and LPBoost. In: Green engineering and technologies (IC-GET), pp 1–6
Google Scholar
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
MATH Google Scholar
Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671
Google Scholar
Chuang LY, Chang HW, Tu CJ, Yang CH (2008) Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 32(1):29–38
MATH Google Scholar
Shang XG, Jiang WS (1997) A note on fuzzy information measures. Pattern Recogn Lett 18(5):425–432
Google Scholar
Bermejo P, Gámez JA, Puerta JM (2014) Speeding up incremental wrapper feature subset selection with Naive Bayes classifier. Knowl-Based Syst 55:140–147
Google Scholar
Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. In: Feature extraction, construction and selection. Springer, Boston, pp 117–136
Google Scholar
Maulik U, Bandyopadhyay S, Mukhopadhyay A (2011) Multiobjective genetic algorithms for clustering: applications in data mining and bioinformatics. Springer, Berlin/Heidelberg, pp 25–50
MATH Google Scholar
Rahnamayan S, Tizhoosh HR, Salama MM (2007) A novel population initialization method for accelerating evolutionary algorithms. Comput Math Appl 53(10):1605–1614
MathSciNet MATH Google Scholar
Ahn CW, Ramakrishna RS (2002) A genetic algorithm for shortest path routing problem and the sizing of populations. IEEE Trans Evol Comput 6(6):566–579
Google Scholar
Breiman L, Spector P (1992) Submodel selection and evaluation in regression. The X-random case. Int Stat Rev 60:291–319
Google Scholar
Cadenas JM, Garrido MC, MartíNez R (2013) Feature subset selection filter–wrapper based on low quality data. Expert Syst Appl 40(16):6241–6252
Google Scholar
Jaganathan P, Kuppuchamy R (2013) A threshold fuzzy entropy based feature selection for medical database classification. Comput Biol Med 43(12):2222–2229
Google Scholar
Schwämmle V, Jensen ON (2010) A simple and fast method to determine the parameters for fuzzy c–means cluster analysis. Bioinformatics 26(22):2841–2848
Google Scholar
Wu KL (2012) Analysis of parameter selections for fuzzy c-means. Pattern Recogn 45(1):407–415
MATH Google Scholar
Nogueira S, Brown G (2016) Measuring the stability of feature selection. In: Joint European conference on machine learning and knowledge discovery in databases, pp 442–457
Google Scholar
Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), p 12
Google Scholar
Benevenuto F, Rodrigues T, Veloso A, Almeida J, Gonçalves M, Almeida V (2012) Practical detection of spammers and content promoters in online video sharing systems. IEEE Trans Syst Man Cybern B 42(3):688–701
Google Scholar
Costa H, Merschmann LH, Barth F, Benevenuto F (2014) Pollution, bad-mouthing, and local marketing: the underground of location-based social networks. Inf Sci 279:123–137
Google Scholar
Costa H, Benevenuto F, Merschmann LH (2013) Detecting tip spam in location-based social networks. In: Proceedings of the 28th annual ACM symposium on applied computing, pp 724–729
Google Scholar
Selvakumar B, Muneeswaran K (2019) Firefly algorithm based feature selection for network intrusion detection. Comput Secur 81:148–155
Google Scholar
Arora S, Priyanka A (2019) Binary butterfly optimization approaches for feature selection. Expert Syst Appl 116:147–160
Google Scholar
Raileanu LE, Stoffel K (2004) Theoretical comparison between the gini index and information gain criteria. Ann Math Artif Intell 41(1):77–93
MathSciNet MATH Google Scholar
Zheng X, Zeng Z, Chen Z, Yu Y, Rong C (2015) Detecting spammers on social networks. Neurocomputing 159:27–34
Google Scholar
Lee C, Lee GG (2006) Information gain and divergence-based feature selection for machine learning-based text categorization. Inf Process Manag 42(1):155–165
Google Scholar
Dai J, Xu Q (2013) Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput 13(1):211–221
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, Tamil Nadu, 620 015, India
E. Elakkiya & S. Selvakumar
Indian Institute of Information Technology, Una, HP, India
S. Selvakumar

Authors

E. Elakkiya
View author publications
You can also search for this author in PubMed Google Scholar
S. Selvakumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to E. Elakkiya.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elakkiya, E., Selvakumar, S. GAMEFEST: Genetic Algorithmic Multi Evaluation measure based FEature Selection Technique for social network spam detection. Multimed Tools Appl 79, 7193–7225 (2020). https://doi.org/10.1007/s11042-019-08334-1

Download citation

Received: 26 January 2019
Revised: 29 July 2019
Accepted: 01 October 2019
Published: 20 December 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11042-019-08334-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GAMEFEST: Genetic Algorithmic Multi Evaluation measure based FEature Selection Technique for social network spam detection

Abstract

Access this article

Similar content being viewed by others

PRISMO: Priority Based Spam Detection Using Multi Optimization

A feature selection approach for spam detection in social networks using gravitational force-based heuristic algorithm

A Feature Selection Approach to Detect Spam in the Facebook Social Network

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GAMEFEST: Genetic Algorithmic Multi Evaluation measure based FEature Selection Technique for social network spam detection

Abstract

Access this article

Similar content being viewed by others

PRISMO: Priority Based Spam Detection Using Multi Optimization

A feature selection approach for spam detection in social networks using gravitational force-based heuristic algorithm

A Feature Selection Approach to Detect Spam in the Facebook Social Network

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation