Skip to main content
Log in

GAMEFEST: Genetic Algorithmic Multi Evaluation measure based FEature Selection Technique for social network spam detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Social Network sites have become incredibly important in the present day. This popularity attracts the attacker to easily approach a large population and to have access to massive information for performing intrusion activities in Online Social Networks (OSN) including spamming. Spammers not only spread unsolicited messages but also perform malicious activities that harm the user’s financial or personal life and tarnish the reputation of social network platforms. Efficient spam detection requires the selection of relevant features to portray spammer behavior. Most of the existing feature selection techniques use any one of the evaluation measures such as, distance, dependence, consistency, information, and classifier error rate. The feature selection techniques select features from different perspectives based on the evaluation measures. Each evaluation measure produces different subset, and the detection rate differs accordingly. The majority of the existing works focus on the individual feature ranking, and discard the lowest weight feature. Lowest weight feature may produce more accurate prediction if, it is combined with other features. So, there is a need for the feature selection technique that considers the characteristics of all the evaluation measures to produce the appropriate subset, which increases the spam detection rate and assigns a weight for the combination of features. In regard to this, the paper proposes a new multi evaluation measure combined with feature subset selection based on the genetic algorithm, GAMEFEST. The performance of the proposed work has been evaluated using Twitter, Apontador, and YouTube datasets. Experimental results prove that our proposed GAMEFEST with Minimum Surplus Crossover (MSC) improves the efficiency of the learning process and increases the spam detection rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Wang F, Qi S, Gao G, Zhao S, Wang X (2016) Logo information recognition in large-scale social media data. Multimedia Systems 22(1):63–73

    Google Scholar 

  2. Zhao S, Yao H, Gao Y, Ji R, Xie W, Jiang X, Chua TS (2016) Predicting personalized emotion perceptions of social images. In: Proceedings of the 24th ACM international conference on multimedia, pp 1385–1394

    Google Scholar 

  3. Zhao S, Yao H, Gao Y, Ji R, Ding G (2016) Continuous probability distribution prediction of image emotions via multitask shared sparse regression. IEEE Trans Multimedia 19(3):632–645

    Google Scholar 

  4. Zhao S, Gao Y, Ding G, Chua TS (2017) Real-time multimedia social event detection in microblog. IEEE Trans Cybern 48(11):3218–3231

    Google Scholar 

  5. Ghosh S, Viswanath B, Kooti F, Sharma NK, Korlam G, Benevenuto F, Ganguly N, Gummadi KP (2012) Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st international conference on World Wide Web. ACM, Lyon, pp 61–70

    Google Scholar 

  6. Ahmed F, Abulaish M (2013) A generic statistical approach for spam detection in online social networks. Comput Commun 36(10–11):1120–1129

    Google Scholar 

  7. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28

    Google Scholar 

  8. Wang Y, Wang J, Liao H, Chen H (2017) An efficient semi-supervised representatives feature selection algorithm based on information theory. Pattern Recogn 61:511–523

    MATH  Google Scholar 

  9. Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555

    MathSciNet  MATH  Google Scholar 

  10. Zhang Y, Zhang Z (2012) Feature subset selection with cumulate conditional mutual information minimization. Expert Syst Appl 39(5):6078–6088

    Google Scholar 

  11. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Google Scholar 

  12. Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14

    Google Scholar 

  13. Hancer E, Xue B, Zhang M (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl-Based Syst 140:103–119

    Google Scholar 

  14. Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), Atlanta, pp 856–863

  15. Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: European conference on machine learning. Springer, Berlin/Heidelberg, pp 171–182

    Google Scholar 

  16. Padungweang P, Lursinsap C, Sunat K (2009) Univariate filter technique for unsupervised feature selection using a new laplacian score based local nearest neighbors. In: Information processing, APCIP 2009, vol 2, pp 196–200

    Google Scholar 

  17. Zhang Y, Li S, Wang T, Zhang Z (2013) Divergence-based feature selection for separate classes. Neurocomputing 101:32–42

    Google Scholar 

  18. Liu M, Zhang D (2016) Feature selection with effective distance. Neurocomputing 215:100–109

    Google Scholar 

  19. Covões TF, Hruschka ER (2011) Towards improving cluster-based feature selection with a simplified silhouette filter. Inf Sci 181(18):3766–3782

    Google Scholar 

  20. Hall MA (2000) Correlation-based feature selection of discrete and numeric class machine learning. Ph.D. dissertation, The University of Waikato, Hamilton, New Zealand.

  21. Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312

    Google Scholar 

  22. Chormunge S, Jena S (2018) Correlation based feature selection with clustering for high dimensional data. J Electr Syst Inf Technol 5(3):542–549

    Google Scholar 

  23. Wang Y, Feng L (2018) Hybrid feature selection using component co-occurrence based feature relevance measurement. Expert Syst Appl 102:83–99

    Google Scholar 

  24. Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1–2):155–176

    MathSciNet  MATH  Google Scholar 

  25. Dash M, Liu H, Motoda H (2000) Consistency based feature selection. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin/Heidelberg, pp 98–109

    Google Scholar 

  26. Thaseen IS, Kumar CA (2016) An integrated intrusion detection model using consistency based feature selection and LPBoost. In: Green engineering and technologies (IC-GET), pp 1–6

    Google Scholar 

  27. Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann

  28. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422

    MATH  Google Scholar 

  29. Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671

    Google Scholar 

  30. Chuang LY, Chang HW, Tu CJ, Yang CH (2008) Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 32(1):29–38

    MATH  Google Scholar 

  31. Shang XG, Jiang WS (1997) A note on fuzzy information measures. Pattern Recogn Lett 18(5):425–432

    Google Scholar 

  32. Bermejo P, Gámez JA, Puerta JM (2014) Speeding up incremental wrapper feature subset selection with Naive Bayes classifier. Knowl-Based Syst 55:140–147

    Google Scholar 

  33. Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. In: Feature extraction, construction and selection. Springer, Boston, pp 117–136

    Google Scholar 

  34. Maulik U, Bandyopadhyay S, Mukhopadhyay A (2011) Multiobjective genetic algorithms for clustering: applications in data mining and bioinformatics. Springer, Berlin/Heidelberg, pp 25–50

    MATH  Google Scholar 

  35. Rahnamayan S, Tizhoosh HR, Salama MM (2007) A novel population initialization method for accelerating evolutionary algorithms. Comput Math Appl 53(10):1605–1614

    MathSciNet  MATH  Google Scholar 

  36. Ahn CW, Ramakrishna RS (2002) A genetic algorithm for shortest path routing problem and the sizing of populations. IEEE Trans Evol Comput 6(6):566–579

    Google Scholar 

  37. Breiman L, Spector P (1992) Submodel selection and evaluation in regression. The X-random case. Int Stat Rev 60:291–319

    Google Scholar 

  38. Cadenas JM, Garrido MC, MartíNez R (2013) Feature subset selection filter–wrapper based on low quality data. Expert Syst Appl 40(16):6241–6252

    Google Scholar 

  39. Jaganathan P, Kuppuchamy R (2013) A threshold fuzzy entropy based feature selection for medical database classification. Comput Biol Med 43(12):2222–2229

    Google Scholar 

  40. Schwämmle V, Jensen ON (2010) A simple and fast method to determine the parameters for fuzzy c–means cluster analysis. Bioinformatics 26(22):2841–2848

    Google Scholar 

  41. Wu KL (2012) Analysis of parameter selections for fuzzy c-means. Pattern Recogn 45(1):407–415

    MATH  Google Scholar 

  42. Nogueira S, Brown G (2016) Measuring the stability of feature selection. In: Joint European conference on machine learning and knowledge discovery in databases, pp 442–457

    Google Scholar 

  43. Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), p 12

    Google Scholar 

  44. Benevenuto F, Rodrigues T, Veloso A, Almeida J, Gonçalves M, Almeida V (2012) Practical detection of spammers and content promoters in online video sharing systems. IEEE Trans Syst Man Cybern B 42(3):688–701

    Google Scholar 

  45. Costa H, Merschmann LH, Barth F, Benevenuto F (2014) Pollution, bad-mouthing, and local marketing: the underground of location-based social networks. Inf Sci 279:123–137

    Google Scholar 

  46. Costa H, Benevenuto F, Merschmann LH (2013) Detecting tip spam in location-based social networks. In: Proceedings of the 28th annual ACM symposium on applied computing, pp 724–729

    Google Scholar 

  47. Selvakumar B, Muneeswaran K (2019) Firefly algorithm based feature selection for network intrusion detection. Comput Secur 81:148–155

    Google Scholar 

  48. Arora S, Priyanka A (2019) Binary butterfly optimization approaches for feature selection. Expert Syst Appl 116:147–160

    Google Scholar 

  49. Raileanu LE, Stoffel K (2004) Theoretical comparison between the gini index and information gain criteria. Ann Math Artif Intell 41(1):77–93

    MathSciNet  MATH  Google Scholar 

  50. Zheng X, Zeng Z, Chen Z, Yu Y, Rong C (2015) Detecting spammers on social networks. Neurocomputing 159:27–34

    Google Scholar 

  51. Lee C, Lee GG (2006) Information gain and divergence-based feature selection for machine learning-based text categorization. Inf Process Manag 42(1):155–165

    Google Scholar 

  52. Dai J, Xu Q (2013) Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput 13(1):211–221

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. Elakkiya.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elakkiya, E., Selvakumar, S. GAMEFEST: Genetic Algorithmic Multi Evaluation measure based FEature Selection Technique for social network spam detection. Multimed Tools Appl 79, 7193–7225 (2020). https://doi.org/10.1007/s11042-019-08334-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08334-1

Keywords

Navigation