Skip to main content

Advertisement

Log in

Hybrid PSO feature selection-based association classification approach for breast cancer detection

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Breast cancer is one of the leading causes of death among women worldwide. Many methods have been proposed for automatic breast cancer diagnosis. One popular technique utilizes a classification-based association called Association Classification (AC). However, most AC algorithms suffer from considerable numbers of generated rules. In addition, irrelevant and redundant features may affect the measures used in the rule evaluation process. As such, they could severely affect the accuracy rates in rule mining. Feature selection identifies the optimal subset of features representing a problem in almost the same context as the original features. Feature selection is a critical preprocessing step for data mining as it tends to increase the prediction speed and accuracy of the classification model and thereby increase performance. In this research, an ensemble filter feature selection method and a wrapper feature selection algorithm in conjunction with the AC approach are proposed for undertaking breast cancer classification. The proposed approach employs optimal discriminative feature subsets for breast cancer prediction. Specifically, it first utilizes a new bootstrapping search strategy that effectively selects the most optimal feature subset that considers the overall weighted average of the relative frequency-based evaluation criteria function. We employ a Weighted Average of Relative Frequency (WARF)-based filter method to compute discriminative features from the ensemble results. The adopted filter algorithms utilize the prioritization ranking technique for selecting a subset of informative features that are used for subsequent AC-based disease classification. Another wrapper feature selection method, namely a hybrid Particle Swarm Optimization (PSO)-WARF filter-based wrapper method, is also proposed for feature selection. Two classification models, i.e., WARF-Predictive Classification Based on Associations (PCBA) and hybrid PSO-WARF-PCBA, are subsequently constructed based on the above filter and wrapper-based feature selection methods for breast cancer prediction. The proposed approach of the two models is evaluated using UCI breast cancer datasets. The empirical results indicate that our models achieve impressive performance and outperform a variety of well-known benchmark AC algorithms consistently for breast cancer diagnosis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availibility

The datasets were taken from UCI public repository at https://archive.ics.uci.edu/ml/datasets.php.

References

  1. Aličković E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput Appl 28(4):753–763

    Article  Google Scholar 

  2. Biblowit MJ (2022) Resources: breast cancer statistics and resources. https://www.bcrf.org/breast-cancer-statistics-and-resources. Accessed 1 Feb 2022

  3. Wang S, Wang Y, Wang D, Yin Y, Wang Y, Jin Y (2020) An improved random forest-based rule extraction method for breast cancer diagnosis. Appl Soft Comput 86:105941

    Article  Google Scholar 

  4. Alwidian J, Hammo BH, Obeid N (2018) WCBA: weighted classification based on association rules algorithm for breast cancer disease. Appl Soft Comput 62:536–549

    Article  Google Scholar 

  5. Sowan B (2017) A comparative analysis of exam timetable using data mining techniques. IJCSNS 17(1):73

    Google Scholar 

  6. Sowan B, Qattous H (2017) A data mining of supervised learning approach based on k-means clustering

  7. Sowan B, Dahal K, Hossain MA, Zhang L, Spencer L (2013) Fuzzy association rule mining approaches for enhancing prediction performance. Expert Syst Appl 40(17):6928–6937

    Article  Google Scholar 

  8. Liu B, Hsu W, Ma Y et al (1998) Integrating classification and association rule mining. In: Kdd, vol 98, pp 80–86

  9. Abdelhamid N, Thabtah F (2014) Associative classification approaches: review and comparison. J Inf Knowl Manag 13(03):1450027

    Article  Google Scholar 

  10. Sowan BI, Dahal KP, Hossain AM, Alam MS (2010) Diversification of fuzzy association rules to improve prediction accuracy. In: International conference on fuzzy systems. IEEE, pp 1–8

  11. Thabtah F, Cowling P, Peng Y (2005) MCAR: multi-class classification based on association rule. In: The 3rd ACS/IEEE international conference on computer systems and applications. IEEE, p 33

  12. Kumar PM, Lokesh S, Varatharajan R, Babu GC, Parthasarathy P (2018) Cloud and IoT based disease prediction and diagnosis system for healthcare using fuzzy neural classifier. Future Gener Comput Syst 86:527–534

    Article  Google Scholar 

  13. Venkatesan C, Karthigaikumar P, Paul A, Satheeskumaran S, Kumar R (2018) ECG signal preprocessing and SVM classifier-based abnormality detection in remote healthcare applications. IEEE Access 6:9767–9773

    Article  Google Scholar 

  14. Kim SG, Theera-Ampornpunt N, Fang C-H, Harwani M, Grama A, Chaterji S (2016) Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions. BMC Syst Biol 10(2):243–258

    Google Scholar 

  15. Han J, Kamber M, Pei J (2011) Data mining concepts and techniques, 3rd ed. The Morgan Kaufmann series in data management systems, vol 5, no 4, pp 83–124

  16. Fong RC, Vedaldi A (2017) Interpretable explanations of black boxes by meaningful perturbation. In: Proceedings of the IEEE international conference on computer vision, pp 3429–3437

  17. Quinlan J (1993) c4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Mateo

    Google Scholar 

  18. Jensen DD, Cohen PR (2000) Multiple comparisons in induction algorithms. Mach Learn 38(3):309–338

    Article  MATH  Google Scholar 

  19. Chien Y-WC, Chen Y-L (2010) Mining associative classification rules with stock trading data-A GA-based method. Knowl Based Syst 23(6):605–614

    Article  Google Scholar 

  20. Yin X, Han J (2003) CPAR: classification based on predictive association rules. In: Proceedings of the 2003 SIAM international conference on data mining. SIAM, pp 331–335

  21. Thabtah FA, Cowling P, Peng Y (2004) MMAC: a new multi-class, multi-label associative classification approach. In: Fourth IEEE international conference on data mining (ICDM’04). IEEE, pp 217–224

  22. Veloso A, Meira W, Zaki MJ (2006) Lazy associative classification. In: Sixth international conference on data mining (ICDM’06). IEEE, pp 645–654

  23. Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings IEEE international conference on data mining. IEEE, pp 369–376

  24. Lakshmanaprabu S, Mohanty SN, Krishnamoorthy S, Uthayakumar J, Shankar K et al (2019) Online clinical decision support system using optimal deep neural networks. Appl Soft Comput 81:105487

    Article  Google Scholar 

  25. Ahmed H, Younis EM, Hendawi A, Ali AA (2020) Heart disease identification from patients’ social posts, machine learning solution on spark. Future Gener Comput Syst 111:714–722

    Article  Google Scholar 

  26. Shao Y, Liu B, Wang S, Li G (2018) A novel software defect prediction based on atomic class-association rule mining. Expert Syst Appl 114:237–254

    Article  Google Scholar 

  27. Gao K, Khoshgoftaar TM, Wang H, Seliya N (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exp 41(5):579–606

    Article  Google Scholar 

  28. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    Article  MATH  Google Scholar 

  29. Thabtah FA (2007) A review of associative classification mining. Knowl Eng Rev 22(1):37–65

    Article  Google Scholar 

  30. Padillo F, Luna JM, Ventura S (2019) Evaluating associative classification algorithms for big data. Big Data Anal 4(1):1–27

    Article  Google Scholar 

  31. Hall MA, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447

    Article  Google Scholar 

  32. Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinform. https://doi.org/10.1155/2015/198363

    Article  Google Scholar 

  33. Kou G, Yang P, Peng Y, Xiao F, Chen Y, Alsaadi FE (2020) Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl Soft Comput 86:105836

    Article  Google Scholar 

  34. Jadhav S, He H, Jenkins K (2018) Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl Soft Comput 69:541–553

    Article  Google Scholar 

  35. Thaseen IS, Kumar CA (2017) Intrusion detection model using fusion of chi-square feature selection and multi class SVM. J King Saud Univ Comput Inf Sci 29(4):462–472

    Google Scholar 

  36. Scheffer T (2001) Finding association rules that trade support optimally against confidence. In: European conference on principles of data mining and knowledge discovery. Springer, pp 424–435

  37. Paul D, Jain A, Saha S, Mathew J (2021) Multi-objective PSO based online feature selection for multi-label classification. Knowl Based Syst 222:106966

    Article  Google Scholar 

  38. Srisukkham W, Zhang L, Neoh SC, Todryk S, Lim CP (2017) Intelligent leukaemia diagnosis with bare-bones PSO based feature optimization. Appl Soft Comput 56:405–419

    Article  Google Scholar 

  39. Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626

    Article  Google Scholar 

  40. Sakri SB, Rashid NBA, Zain ZM (2018) Particle swarm optimization feature selection for breast cancer recurrence prediction. IEEE Access 6:29637–29647

    Article  Google Scholar 

  41. Hadi W, Al-Radaideh QA, Alhawari S (2018) Integrating associative rule-based classification with Naïve Bayes for text classification. Appl Soft Comput 69:344–356

    Article  Google Scholar 

  42. Tan TY, Zhang L, Neoh SC, Lim CP (2018) Intelligent skin cancer detection using enhanced particle swarm optimization. Knowl Based Syst 158:118–135

    Article  Google Scholar 

  43. Xie H, Zhang L, Lim CP, Yu Y, Liu H (2021) Feature selection using enhanced particle swarm optimisation for classification models. Sensors 21(5):1816

    Article  Google Scholar 

  44. Berrar D (2019) Cross-validation. In: Ranganathan S, Gribskov M, Nakai K, Schönbach C (eds) Encyclopedia of bioinformatics and computational biology. Academic Press, Oxford, pp 542–545

    Chapter  Google Scholar 

  45. Raschka S (2018) Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint. arXiv:1811.12808

  46. Dua D, Graff C (2019) UCI machine learning repository. School of Information and Computer Science, University of California, Irvine

    Google Scholar 

  47. Carter CL, Allen C, Henson DE (1989) Relation of tumor size, lymph node status, and survival in 24,740 breast cancer cases. Cancer 63(1):181–187

    Article  Google Scholar 

  48. Editor in Chief Jyoti D. Patel, F. MD. Breast cancer: stages (09/2021). https://www.cancer.net/cancer-types/breast-cancer/stages#tnm-staging. Accessed 14 Aug 2022

  49. Meehan J, Gray M, Martínez-Pérez C, Kay C, Wills JC, Kunkler IH, Dixon JM, Turnbull AK (2021) A novel approach for the discovery of biomarkers of radiotherapy response in breast cancer. J Pers Med 11(8):796

    Article  Google Scholar 

  50. Professional, C.C.M. Breast cancer (01/21/2022). https://my.clevelandclinic.org/health/diseases/3986-breast-cancer Accessed 14 Aug 2022

  51. Haussmann J, Corradini S, Nestle-Kraemling C, Bölke E, Njanang FJD, Tamaskovics B, Orth K, Ruckhaeberle E, Fehm T, Mohrmann S et al (2020) Recent advances in radiotherapy of breast cancer. Radiat Oncol 15(1):1–10

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by Deanship of Scientific Research and Graduate Studies at University of Petra, Amman, Jordan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bilal Sowan.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sowan, B., Eshtay, M., Dahal, K. et al. Hybrid PSO feature selection-based association classification approach for breast cancer detection. Neural Comput & Applic 35, 5291–5317 (2023). https://doi.org/10.1007/s00521-022-07950-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07950-7

Keywords

Navigation