Skip to main content
Log in

Hybrid evolutionary algorithms for classification data mining

  • Advances in Intelligent Data Processing and Analysis
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose novel methods to find the best relevant feature subset using fuzzy rough set-based attribute subset selection with biologically inspired algorithm search such as ant colony and particle swarm optimization and the principles of an evolutionary process. We then propose a hybrid fuzzy rough with K-nearest neighbor (K-NN)-based classifier (FRNN) to classify the patterns in the reduced datasets, obtained from the fuzzy rough bio-inspired algorithm search. While exploring other possible hybrid evolutionary processes, we then conducted experiments considering (i) same feature selection algorithm with support vector machine (SVM) and random forest (RF) classifier; (ii) instance based selection using synthetic minority over-sampling technique with fuzzy rough K-nearest neighbor (K-NN), SVM and RF classifier. The proposed hybrid is subsequently validated using real-life datasets obtained from the University of California, Irvine machine learning repository. Simulation results demonstrate that the proposed hybrid produces good classification accuracy. Finally, parametric and nonparametric statistical tests of significance are carried out to observe consistency of the classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Mitra S, Pal SK, Mitra P (2002) Data mining in soft computing framework: a survey. IEEE Trans Neural Networks 13:3–14

    Article  Google Scholar 

  2. Zhong N et al (2001) Using rough sets with heuristics for feature selection. J Intell Inf Syst 16:199–214

    Article  MATH  Google Scholar 

  3. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324

    Article  MATH  Google Scholar 

  4. Whitney A (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 9(C-20):1100–1103

    Article  MathSciNet  Google Scholar 

  5. Marill T, Green D (1963) On the effectiveness of receptors in recognition systems. IEEE Trans Inf Theory 9(1):11–17

    Article  Google Scholar 

  6. Mohemmed A, Zhang M, Johnston M (2009) Particle swarm optimization based adaboost for face detection. In: IEEE congress on evolutionary computation (CEC’09), pp 2494–2501

  7. Neshatian K, Zhang M (2009) Dimensionality reduction in face detection: a genetic programming approach. In: 24th international conference image and vision computing New Zealand (IVCNZ’09), pp 391–396

  8. Unler A, Murat A (2010) A discrete particle swarm optimization method for feature selection in binary classification problems. Eur J Oper Res 206(3):528–539

    Article  MATH  Google Scholar 

  9. Yang CS, Chuang LY, Ke CH, Yang CH (2008) Boolean binary particle swarm optimization for feature selection. In: IEEE congress on evolutionary computation (CEC’08), pp 2093–2098

  10. Yuan H, Tseng SS, Gangshan W (1999) A two-phase feature selection method using both filter and wrapper. In: IEEE international conference on systems, man, and cybernetics (SMC’99), vol 2, pp 132–136

  11. Kennedy J, Spears W (1998) Matching algorithms to problems: an experimental test of the particle swarm and some genetic algorithms on the multimodal problem generator. In: IEEE congress on evolutionary computation (CEC’98), pp 78–83

  12. Qablan T, Al-Radaidehl QA, Abu Shuqeir S (2012) A reduct computation approach based on ant colony optimization. Basic Sci Eng 21(1):29–40

    Google Scholar 

  13. Chen Y, Miao D, Wang R (2010) A rough set approach to feature selection based on ant colony optimization. Pattern Recogn Lett 31:226–233

    Article  Google Scholar 

  14. Wang J, Xu M, Wang H, Zhang J (2007) Classification of imbalanced data by using the SMOTE algorithm and locally linear embedding. In: International conference on signal processing proceedings, 4129201

  15. Chandana S, Leung H, Trpkov K (2009) Staging of prostate cancer using automatic feature selection, sampling and Dempster–Shafer fusion. Cancer Inform 7:57–73

    Google Scholar 

  16. Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11(5):341–356

    Article  MATH  MathSciNet  Google Scholar 

  17. Mi JS, Wu WZ, Zhang WX (2004) Approaches to knowledge reduction based on variable precision rough set model. Inform Sci 159(3–4):255–272

    Article  MATH  MathSciNet  Google Scholar 

  18. Saha M, Sil J, Sengupta N (2013) Genetic algorithm and fuzzy-rough based dimensionality reduction applied on real valued dataset. Int J Comput Inf Syst Ind Manag Appl 5:462–471

    Google Scholar 

  19. Lingras P, Jensen R (2007) Survey of rough and fuzzy hybridization. In: Proceedings of the 16th international conference fuzzy systems, pp 125–130

  20. Jensen R, Shen Q (2009) New approaches to fuzzy-rough feature selection. IEEE Trans Fuzzy Syst 17(4):824–838

    Article  Google Scholar 

  21. Pedrycz W, Skowron A (2001) Rough sets and fuzzy sets in data mining. In: Zytkow W, Klosgen W (eds) Handbook of knowledge discovery & data mining. Oxford University Press

  22. Keller JM, Gray MR, Givens JA (1985) A fuzzy K-nearest neighbor algorithm. IEEE Trans Syst Man Cybernet 15(4):580–585

    Article  Google Scholar 

  23. Sarkar M (2007) Fuzzy-rough nearest neighbors algorithm. Fuzzy Sets Syst 158:2123–2152

    Article  Google Scholar 

  24. Jones DT (1999) Protein secondary structure prediction based on position specific scoring matrices. J Mol Biol 292:195–202

    Article  Google Scholar 

  25. Panda M, Patra MR (2009) Mining knowledge from network intrusion data using data mining techniques. In: Dehuri SN et al (eds) Knowledge mining using intelligent agents. World Scientific, Singapore

  26. Panda M, Patra MR (2009) Ensemble voting system for anomaly based network intrusion detection. Int J Recent Trends Eng 2(5):8–13

    Google Scholar 

  27. Dehuri SN, Nanda BK, Cho S-B (2009) A hybrid APSO-aided learnable Bayesian classifier. In: Proceedings of Indian international conference on artificial intelligence (IICAI), pp 695–706

  28. Xue B, Zhang M, Browne WN (2012) Multi-objective particle swarm optimisation (PSO) for feature selection, GECCO’12, July 7–11, 2012. ACM Press, Philadelphia, Pennsylvania, USA, pp 81–88

  29. Grosan C, Abraham A, Chis M (2006) Swarm intelligence in data mining. In: Abraham A et al (eds) Studies in computational intelligence series, vol 34. Springer, Berlin

    Google Scholar 

  30. Abraham A, Guo H, Liu H (2006) Swarm intelligence: foundations, perspectives and applications. In: Abraham A et al (eds) Swarm intelligence: foundations, perspectives and applications, studies in computational intelligence (SCI), vol 26. Springer, Germany, pp 3–25

  31. Suguna N, Thanushkodi K (2010) A novel rough set reduct algorithm for medical domain based on bee colony optimization. J Comput 2(6):49–54

    Google Scholar 

  32. Ding S, Chen J, Xu X, Li J (2011) Rough neural networks: a review. J Comput Inf Syst 7(7):2338–2346

    Google Scholar 

  33. Fazayeli F, Wang L, Mandziuk J (2008) Feature selection based on the rough set theory and EM clustering algorithm. In: Proceedings of the 6th international conference on rough sets and current trends in computing, Springer, pp 272–282

  34. Wang KJ, Adrian AM (2013) Breast cancer classification using hybrid synthetic minority over-sampling technique and artificial immune recognition system algorithm. Int J Comput Sci Electron Eng (IJCSEE) 1(3):408–412

    Google Scholar 

  35. Wanga X, Yanga J, Jensenb R, Liua X (2006) Rough set feature selection and rule induction for prediction of malignancy degree in brain glioma. Comput Methods Programs Biomed 83:147–156

    Article  Google Scholar 

  36. Derrac J, Cornelis C, Garcıa S, Herrera1 F (2011) A preliminary study on the use of fuzzy rough set based feature selection for improving evolutionary instance selection algorithms. In: Cabestany J, Rojas I, Joya G (eds) IWANN 2011, part I, LNCS 6691, pp 174–182

  37. Ganivada A, Raya SS, Pal SK (2013) Fuzzy rough sets, and a granular neural network for unsupervised feature selection. Neural Netw 48:91–108

    Article  MATH  Google Scholar 

  38. Sabzevari R, Montazer GA (2008) An intelligent data mining approach using neuro-rough hybridization to discover hidden knowledge from information systems. J Inf Sci Eng 24:1111–1126

    Google Scholar 

  39. Sangeetha R, Kalpana B (2013) Enhanced fuzzy roughset based feature selection strategy using differential evolution. Int J Comput Sci Appl (TIJCSA) 2(06):13–20

    Google Scholar 

  40. Hu X, Shi Y, Eberhart RC (2004) Recent advances in particle swarm. In: Proceedings of congress on evolutionary computation (CEC), Portland, Oregon, pp 90–97

  41. Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of IEEE international conference on neural networks, vol 4. Perth, Australia, IEEE Service Center, Piscataway, NJ, pp 1942–1948

  42. Kennedy J (1997) Minds and cultures: particle swarm implications. Socially intelligent agents. Papers from the 1997 AAAI fall symposium. Technical report FS-97-02. AAAI Press, Menlo Park, CA, pp 67–72

  43. Kennedy J (1998) The behavior of particles. In: Proceedings of 7th annual conference on evolutionary programming. San Diego, USA

  44. Kennedy J (1997) The particle swarm: social adaptation of knowledge. In: Proceedings of IEEE international conference on evolutionary computation. Indianapolis, Indiana, IEEE Service Center, Piscataway, NJ, pp 303–308

  45. Kennedy J (1997) Thinking is social: experiments with the adaptive culture model. J Confl Resolut 42:56–76

    Article  Google Scholar 

  46. Pomeroy P (2003) An introduction to particle swarm optimization. http://www.adaptiveview.com/articles/ipsop1.html

  47. Dorigo M, Blum C (2005) Ant colony optimization theory: a survey. Theoret Comput Sci 344(2–3):243–278

    Article  MATH  MathSciNet  Google Scholar 

  48. Dorigo M, Di Caro G, Gambardella LM (1999) Ant algorithms for discrete optimization. Artif Life 5(2):137–172

    Article  Google Scholar 

  49. Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1(1):53–66

    Article  Google Scholar 

  50. Dorigo M, Bonaneau E, Theraulaz G (2000) Ant algorithms and stigmergy. Future Gener Comput Syst 16:851–871

    Article  Google Scholar 

  51. Toksari MD (2006) Ant colony optimization for finding the global minimum. Appl Math Comput 176(1):308–316

    Article  MATH  MathSciNet  Google Scholar 

  52. Chowla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Google Scholar 

  53. Chen D, Zhang L, Zhao S, Hu Q, Zhu P (2012) A novel algorithm for finding reducts with fuzzy rough sets. IEEE Trans Fuzzy Syst 20(2):385–389

    Article  Google Scholar 

  54. Bhatt RB, Gopal M (2005) On fuzzy-rough sets approach to feature selection. Pattern Recogn Lett 26(7):965–975

    Article  Google Scholar 

  55. Thangavel K, Pethalakshmi A, Jaganathan P (2006) A comparative analysis of feature selection algorithms based on rough set theory. Int J Soft Comput 1(4):288–294

    Google Scholar 

  56. Wang X, Han D, Han C (2012) Fuzzy-rough set based attribute reduction with a simple fuzzification method. In: IEEE control and decision conference (CCDC), pp 3793–3797

  57. Keller JM, Gray MR, Givens JA (1985) A fuzzy K-nearest neighbor algorithm. IEEE Trans Syst Man Cybernet 15(4):580–585

    Article  Google Scholar 

  58. Sarkar M (2007) Fuzzy-rough nearest neighbors algorithm. Fuzzy Sets Syst 158:2123–2152

    Article  Google Scholar 

  59. Wang X, Yang J, Teng X, Peng N (2005) Fuzzy-rough set based nearest neighbor clustering classification algorithm. Lect Notes Comput Sci 3613:370–373

    Article  Google Scholar 

  60. Platt J (1999) SVM by sequential minimal optimization (SMO). ACM Press, USA

    Google Scholar 

  61. Chen C, Liaw A, Breiman L (2004) Using random forest to learn imbalanced data, July 2004

  62. Liang G, Zhang C (2011) Empirical study of bagging predictors on medical data. In: Proceedings of the 9-th Australasian data mining conference (AusDM’11), vol 121, data mining and analytics. Ballarat, Australia, CRPIT, pp 31–40

  63. Trawiński B, Smętek M, Telec Z, Lasota T (2012) Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms. Int J Appl Math Comput Sci 22(4):867–881

    MATH  MathSciNet  Google Scholar 

  64. Howell DC (2013) Statistical methods for psychology, 8th edn. Cengage Wadsworth, Belmont, CA

    Google Scholar 

  65. Smucker MD, Allan J, Carterette B (2007) A comparison of statistical significance tests for information retrieval evaluation, CIKM’07, November 6–8, 2007, ACM Press, Lisboa, Portugal, pp 623–632

  66. Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html

  67. Witten IH, Frank E (2005) Data mining-practical machine learning tools and techniques, 2nd edn. Morgan Kauffman Publishers, Elsevier, Amsterdam

  68. Derrac J, Cornelis C, Garcia S, Herrera F (2011) A preliminary study on the use of fuzzy rough set based feature selection for improving evolutionary IS algorithms. In: Cabestany J, Rojas I, Jaya G (eds) IWANN 2011, part-1, LNCS 6691, pp 174–182

  69. Wang KJ, Adrian AM (2013) Breast cancer classification using hybrid synthetic minority oversampling technique and artificial immune recognition system algorithm. Int J Comput Sci Electron Eng 1(3):408–412

    Google Scholar 

  70. Hu Q, Yu D, Xie Z (2005) A hybrid attribute reduction for classification based on a fuzzy roughest technique. Fifth SIAM international conference on data mining, pp 195–204

  71. Wang X, Yang J, Tang X, Xia W, Jensen R (2007) Feature selection based on roughest and particle swarm optimization. Pattern Recogn Lett 28:459–471

    Article  Google Scholar 

  72. Tan KC, Teoh EJ, Yu Q, Goh KC (2009) A hybrid evolutionary algorithm for attribute selection in data mining. Exp Syst Appl 36:8616–8630

    Article  Google Scholar 

  73. Homlich M, Ramdani M (2012) Data classification by fuzzy ant-miner. Int J Comput Stud 19(3–3):201–206

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mrutyunjaya Panda.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Panda, M., Abraham, A. Hybrid evolutionary algorithms for classification data mining. Neural Comput & Applic 26, 507–523 (2015). https://doi.org/10.1007/s00521-014-1673-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-014-1673-2

Keywords

Navigation