Abstract
Breast cancer is one of the leading causes of death among women worldwide. Many methods have been proposed for automatic breast cancer diagnosis. One popular technique utilizes a classification-based association called Association Classification (AC). However, most AC algorithms suffer from considerable numbers of generated rules. In addition, irrelevant and redundant features may affect the measures used in the rule evaluation process. As such, they could severely affect the accuracy rates in rule mining. Feature selection identifies the optimal subset of features representing a problem in almost the same context as the original features. Feature selection is a critical preprocessing step for data mining as it tends to increase the prediction speed and accuracy of the classification model and thereby increase performance. In this research, an ensemble filter feature selection method and a wrapper feature selection algorithm in conjunction with the AC approach are proposed for undertaking breast cancer classification. The proposed approach employs optimal discriminative feature subsets for breast cancer prediction. Specifically, it first utilizes a new bootstrapping search strategy that effectively selects the most optimal feature subset that considers the overall weighted average of the relative frequency-based evaluation criteria function. We employ a Weighted Average of Relative Frequency (WARF)-based filter method to compute discriminative features from the ensemble results. The adopted filter algorithms utilize the prioritization ranking technique for selecting a subset of informative features that are used for subsequent AC-based disease classification. Another wrapper feature selection method, namely a hybrid Particle Swarm Optimization (PSO)-WARF filter-based wrapper method, is also proposed for feature selection. Two classification models, i.e., WARF-Predictive Classification Based on Associations (PCBA) and hybrid PSO-WARF-PCBA, are subsequently constructed based on the above filter and wrapper-based feature selection methods for breast cancer prediction. The proposed approach of the two models is evaluated using UCI breast cancer datasets. The empirical results indicate that our models achieve impressive performance and outperform a variety of well-known benchmark AC algorithms consistently for breast cancer diagnosis.











Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availibility
The datasets were taken from UCI public repository at https://archive.ics.uci.edu/ml/datasets.php.
References
Aličković E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput Appl 28(4):753–763
Biblowit MJ (2022) Resources: breast cancer statistics and resources. https://www.bcrf.org/breast-cancer-statistics-and-resources. Accessed 1 Feb 2022
Wang S, Wang Y, Wang D, Yin Y, Wang Y, Jin Y (2020) An improved random forest-based rule extraction method for breast cancer diagnosis. Appl Soft Comput 86:105941
Alwidian J, Hammo BH, Obeid N (2018) WCBA: weighted classification based on association rules algorithm for breast cancer disease. Appl Soft Comput 62:536–549
Sowan B (2017) A comparative analysis of exam timetable using data mining techniques. IJCSNS 17(1):73
Sowan B, Qattous H (2017) A data mining of supervised learning approach based on k-means clustering
Sowan B, Dahal K, Hossain MA, Zhang L, Spencer L (2013) Fuzzy association rule mining approaches for enhancing prediction performance. Expert Syst Appl 40(17):6928–6937
Liu B, Hsu W, Ma Y et al (1998) Integrating classification and association rule mining. In: Kdd, vol 98, pp 80–86
Abdelhamid N, Thabtah F (2014) Associative classification approaches: review and comparison. J Inf Knowl Manag 13(03):1450027
Sowan BI, Dahal KP, Hossain AM, Alam MS (2010) Diversification of fuzzy association rules to improve prediction accuracy. In: International conference on fuzzy systems. IEEE, pp 1–8
Thabtah F, Cowling P, Peng Y (2005) MCAR: multi-class classification based on association rule. In: The 3rd ACS/IEEE international conference on computer systems and applications. IEEE, p 33
Kumar PM, Lokesh S, Varatharajan R, Babu GC, Parthasarathy P (2018) Cloud and IoT based disease prediction and diagnosis system for healthcare using fuzzy neural classifier. Future Gener Comput Syst 86:527–534
Venkatesan C, Karthigaikumar P, Paul A, Satheeskumaran S, Kumar R (2018) ECG signal preprocessing and SVM classifier-based abnormality detection in remote healthcare applications. IEEE Access 6:9767–9773
Kim SG, Theera-Ampornpunt N, Fang C-H, Harwani M, Grama A, Chaterji S (2016) Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions. BMC Syst Biol 10(2):243–258
Han J, Kamber M, Pei J (2011) Data mining concepts and techniques, 3rd ed. The Morgan Kaufmann series in data management systems, vol 5, no 4, pp 83–124
Fong RC, Vedaldi A (2017) Interpretable explanations of black boxes by meaningful perturbation. In: Proceedings of the IEEE international conference on computer vision, pp 3429–3437
Quinlan J (1993) c4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Mateo
Jensen DD, Cohen PR (2000) Multiple comparisons in induction algorithms. Mach Learn 38(3):309–338
Chien Y-WC, Chen Y-L (2010) Mining associative classification rules with stock trading data-A GA-based method. Knowl Based Syst 23(6):605–614
Yin X, Han J (2003) CPAR: classification based on predictive association rules. In: Proceedings of the 2003 SIAM international conference on data mining. SIAM, pp 331–335
Thabtah FA, Cowling P, Peng Y (2004) MMAC: a new multi-class, multi-label associative classification approach. In: Fourth IEEE international conference on data mining (ICDM’04). IEEE, pp 217–224
Veloso A, Meira W, Zaki MJ (2006) Lazy associative classification. In: Sixth international conference on data mining (ICDM’06). IEEE, pp 645–654
Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings IEEE international conference on data mining. IEEE, pp 369–376
Lakshmanaprabu S, Mohanty SN, Krishnamoorthy S, Uthayakumar J, Shankar K et al (2019) Online clinical decision support system using optimal deep neural networks. Appl Soft Comput 81:105487
Ahmed H, Younis EM, Hendawi A, Ali AA (2020) Heart disease identification from patients’ social posts, machine learning solution on spark. Future Gener Comput Syst 111:714–722
Shao Y, Liu B, Wang S, Li G (2018) A novel software defect prediction based on atomic class-association rule mining. Expert Syst Appl 114:237–254
Gao K, Khoshgoftaar TM, Wang H, Seliya N (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exp 41(5):579–606
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Thabtah FA (2007) A review of associative classification mining. Knowl Eng Rev 22(1):37–65
Padillo F, Luna JM, Ventura S (2019) Evaluating associative classification algorithms for big data. Big Data Anal 4(1):1–27
Hall MA, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447
Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinform. https://doi.org/10.1155/2015/198363
Kou G, Yang P, Peng Y, Xiao F, Chen Y, Alsaadi FE (2020) Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl Soft Comput 86:105836
Jadhav S, He H, Jenkins K (2018) Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl Soft Comput 69:541–553
Thaseen IS, Kumar CA (2017) Intrusion detection model using fusion of chi-square feature selection and multi class SVM. J King Saud Univ Comput Inf Sci 29(4):462–472
Scheffer T (2001) Finding association rules that trade support optimally against confidence. In: European conference on principles of data mining and knowledge discovery. Springer, pp 424–435
Paul D, Jain A, Saha S, Mathew J (2021) Multi-objective PSO based online feature selection for multi-label classification. Knowl Based Syst 222:106966
Srisukkham W, Zhang L, Neoh SC, Todryk S, Lim CP (2017) Intelligent leukaemia diagnosis with bare-bones PSO based feature optimization. Appl Soft Comput 56:405–419
Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
Sakri SB, Rashid NBA, Zain ZM (2018) Particle swarm optimization feature selection for breast cancer recurrence prediction. IEEE Access 6:29637–29647
Hadi W, Al-Radaideh QA, Alhawari S (2018) Integrating associative rule-based classification with Naïve Bayes for text classification. Appl Soft Comput 69:344–356
Tan TY, Zhang L, Neoh SC, Lim CP (2018) Intelligent skin cancer detection using enhanced particle swarm optimization. Knowl Based Syst 158:118–135
Xie H, Zhang L, Lim CP, Yu Y, Liu H (2021) Feature selection using enhanced particle swarm optimisation for classification models. Sensors 21(5):1816
Berrar D (2019) Cross-validation. In: Ranganathan S, Gribskov M, Nakai K, Schönbach C (eds) Encyclopedia of bioinformatics and computational biology. Academic Press, Oxford, pp 542–545
Raschka S (2018) Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint. arXiv:1811.12808
Dua D, Graff C (2019) UCI machine learning repository. School of Information and Computer Science, University of California, Irvine
Carter CL, Allen C, Henson DE (1989) Relation of tumor size, lymph node status, and survival in 24,740 breast cancer cases. Cancer 63(1):181–187
Editor in Chief Jyoti D. Patel, F. MD. Breast cancer: stages (09/2021). https://www.cancer.net/cancer-types/breast-cancer/stages#tnm-staging. Accessed 14 Aug 2022
Meehan J, Gray M, Martínez-Pérez C, Kay C, Wills JC, Kunkler IH, Dixon JM, Turnbull AK (2021) A novel approach for the discovery of biomarkers of radiotherapy response in breast cancer. J Pers Med 11(8):796
Professional, C.C.M. Breast cancer (01/21/2022). https://my.clevelandclinic.org/health/diseases/3986-breast-cancer Accessed 14 Aug 2022
Haussmann J, Corradini S, Nestle-Kraemling C, Bölke E, Njanang FJD, Tamaskovics B, Orth K, Ruckhaeberle E, Fehm T, Mohrmann S et al (2020) Recent advances in radiotherapy of breast cancer. Radiat Oncol 15(1):1–10
Acknowledgements
This work is supported by Deanship of Scientific Research and Graduate Studies at University of Petra, Amman, Jordan.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sowan, B., Eshtay, M., Dahal, K. et al. Hybrid PSO feature selection-based association classification approach for breast cancer detection. Neural Comput & Applic 35, 5291–5317 (2023). https://doi.org/10.1007/s00521-022-07950-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07950-7