Skip to main content

Advertisement

Log in

An ensemble algorithm integrating consensus-clustering with feature weighting based ranking and probabilistic fuzzy logic-multilayer perceptron classifier for diagnosis and staging of breast cancer using heterogeneous datasets

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Breast cancer is a major threat, predominantly affecting the female population. Staging of cancer enables early detection and prognosis of patients, leading to determination of efficient and accurate treatment. Consequently, simplified models are required to integrate heterogeneous data for deriving knowledge about patients for further treatment. To achieve this goal, developing machine learning based diagnostic techniques is the predominant need. Prompted by these facts, a novel diagnostic model for staging of breast cancer infusing ensemble clustering, feature weighting based ranking of clusters and ensemble classification into benign or malignant class is developed. The proposed work constitutes of five different phases: data pre-processing, feature selection, ensemble clustering, ensemble classification, and staging of cancer. This work first employs Multiple Imputation Chained Equation for imputing missing values, followed by proposed feature selection technique employing Association Rules, Classification and Regression Tree, and Fuzzy Logic. Subsequently, a coupled clustering and classification algorithm based on consensus is developed to cluster features from different datasets using Self-Organizing Map and Decision Tree. A hierarchical clustering based ranking of these clusters using Multilinear Regression and Modified Fuzzy Analytical Hierarchical Process is proposed to prioritize features. Next, a staged classifier is developed integrating Probabilistic Fuzzy Logic and Multilayer Perceptron followed by feature extraction based staging of cancer. Finally, proposed work is validated on four datasets with various performance metrics using different combinations of train-test dataset. Moreover, k-fold cross-validation is implemented to eliminate biasedness. The detailed analysis of results of this work showcases superiority over other state-of-art methods in literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Sheikhpour R, Sheikhpour R (2016) Breast cancer diagnosis using non-parametric kernel density estimation. Razi J Med Sci 23:30–40

    Google Scholar 

  2. Siegel RL, Miller KD, Jemal A (2015) Cancer statistics, 2015. CA: Cancer J Clin 65:5–29

    Google Scholar 

  3. Assiri AS, Nazir S, Velastin SA (2020) Breast tumor classification using an ensemble machine learning method. J Imaging 6:1–13

    Article  Google Scholar 

  4. Ed-daoudy A, Maalmi K (2020) Breast cancer classification with reduced feature set using association rules and support vector machine. Netw Model Anal Health Inform Bioinform 9:1–10

    Article  Google Scholar 

  5. Mert A, Kiliç N, Bilgili E, Akan A (2015) Breast cancer detection with reduced feature set. Comput Math Methods Med 2015:1–11

    Article  Google Scholar 

  6. Gupta S, Kumar D, Sharma A (2011) Data mining classification techniques applied for breast cancer diagnosis and prognosis. Indian J Comput Sci Eng 2:188–195

    Google Scholar 

  7. Zheng B, Yoon SW, Lam SS (2014) Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst Appl 41:1476–1482

    Article  Google Scholar 

  8. Gulbinat W (1997) What is the role of who as an intergovernmental organisation in the coordination of telematics in healthcare? World Health Organisation Geneva, Switzerland

    Google Scholar 

  9. Huang CL, Wang CJ (2006) A ga-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31:231–240

    Article  Google Scholar 

  10. Tahir MA, Bouridane A, Kurugollu F (2007) Simultaneous feature selection and feature weighting using hybrid tabu search/k-nearest neighbor classifier. Pattern Recogn Lett 28:438–446

    Article  Google Scholar 

  11. Wettschereck D, Aha DW, Mohri T (1997) A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif Intell Rev 11:273–314

    Article  Google Scholar 

  12. Gayathri BM, Sumathi CP, Santhanam T (2013) Breast cancer diagnosis using machine learning algorithms-a survey. Int J Parallel Distrib Syst 4:105–112

    Article  Google Scholar 

  13. Sheikhpour R, Sarram MA, Sheikhpour R (2016) Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer. Appl Soft Comput 40:113–131

    Article  MATH  Google Scholar 

  14. Karabatak MA (2015) A new classifier for breast cancer detection based on naïve Bayesian. Measurement 72:32–36

    Article  Google Scholar 

  15. Wolpert DH (2002) The supervised learning no-free-lunch theorems. In: Soft computing and industry. Springer, pp 25–42

  16. Breiman L (1996) Bias, variance, and arcing classifiers. Tech Rep 460, Statistics Department. University of California Berkeley, CA

    Google Scholar 

  17. Cserni G, Chmielik E, Cserni B, Tot T (2018) The new TNM-based staging of breast cancer. Virchows Arch, (5):697–703

  18. Rahman MA, Muniyandi RC (2018) Feature selection from colon cancer dataset for cancer classification using artificial neural network. Int J Adv Sci Eng Inf Technol 8:1387–1393

    Article  Google Scholar 

  19. Sahran S, Albashish D, Abdullah A, Abd Shukor N, Pauzi SH (2018) Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading. Artif Intell Med 87:78–90

    Article  Google Scholar 

  20. Aličković E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and rotation forest. Neural Comput Appl 28:753–763

    Article  Google Scholar 

  21. Ahmad F, Isa NA, Hussain Z, Osman MK, Sulaiman SN (2015) A GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer. Pattern Anal Appl 18:861– 870

    Article  MathSciNet  Google Scholar 

  22. Gayathri BM, Sumathi CP (2015) Mamdani fuzzy inference system for breast cancer risk detection. In: IEEE international conference on computational intelligence and computing research (ICCIC). IEEE, pp 1–6

  23. Gayathri BM, Sumathi CP (2016) An automated technique using Gaussian naïve Bayes classifier to classify breast cancer. Int J Comput Appl 148:16–21

    Google Scholar 

  24. Aalaei S, Shahraki H, Rowhanimanesh A, Eslami S (2016) Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets. Iran J Basic Med Sci 19:476–482

    Google Scholar 

  25. Ahmadi A, Afshar P (2016) Intelligent breast cancer recognition using particle swarm optimization and support vector machines. J Exp Theor Artif Intell 28:1021–1034

    Article  Google Scholar 

  26. Modi N, Ghanchi K (2016) A comparative analysis of feature selection methods and associated machine learning algorithms on wisconsin breast cancer dataset (WBCD). In: Proceedings of international conference on ICT for sustainable development. Springer, Singapore, pp 215–224

  27. Phan AV, Le Nguyen M, Bui LT (2017) Feature weighting and SVM parameters optimization based on genetic algorithms for classification problems. Appl Intell 46:455–469

    Article  Google Scholar 

  28. Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453

    Article  Google Scholar 

  29. Singh D, Singh B, Kaur M (2020) Simultaneous feature weighting and parameter determination of neural networks using ant lion optimization for the classification of breast cancer. Biocybern Biomed Eng 40:337–351

    Article  Google Scholar 

  30. Kumar P, Nair GG (2021) An efficient classification framework for breast cancer using hyper parameter tuned random decision forest classifier and bayesian optimization. Biomed Signal Process Control 68:1–11

    Google Scholar 

  31. Nguyen T, Nahavandi S (2015) Modified AHP for gene selection and cancer classification using type-2 fuzzy logic. IEEE Trans Fuzzy Syst 24:273–287

    Article  Google Scholar 

  32. Nguyen T, Khosravi A, Creighton D, Nahavandi S (2015) Classification of healthcare data using genetic fuzzy logic system and wavelets. Expert Syst Appl 42:2184–2197

    Article  Google Scholar 

  33. Nguyen T, Khosravi A, Creighton D, Nahavandi S (2015) Medical data classification using interval type-2 fuzzy logic system and wavelets. Appl Soft Comput 30:812–822

    Article  Google Scholar 

  34. Ohri K, Singh H (2016) Fuzzy expert system for diagnosis of breast cancer. In: Proceedings of international conference on wireless communications, signal processing and networking (WiSPNET). IEEE, pp 2487–2492, p Sharma, A

  35. Nilashi M, Ibrahim O, Ahmadi H, Shahmoradi L (2017) A knowledge-based system for breast cancer classification using fuzzy logic method. Telemat Inform 34:133–144

    Article  Google Scholar 

  36. Kellam P, Liu X, Martin N, Orengo C, Swift S, Tucker A (2001) Comparing contrasting and combining clusters in viral gene expression. In: Proceedings of the sixth workshop on intelligent data analysis in medicine and pharmacology

  37. Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52:91–118

    Article  MATH  Google Scholar 

  38. Chen D, Xing K, Henson D, Sheng L, Schwartz AM, Cheng X (2009) Developing prognostic systems of cancer patients by ensemble clustering. J Biomed Biotechnol 2009:1–7

    Google Scholar 

  39. Khairunnahar L, Hasib MA, Rezanur RH, Islam MR, Hosain MK (2019) Classification of malignant and benign tissue with logistic regression. Inform Med Unlocked 16:1–12

    Article  Google Scholar 

  40. Mohanty F, Rup S, Dash B, Majhi B, Swamy MN (2019) A computer-aided diagnosis system using tchebichef features and improved grey wolf optimized extreme learning machine. Appl Intell 49:983–1001

    Article  Google Scholar 

  41. Wang H, Zheng B, Yoon SW, Ko HS (2018) A support vector machine-based ensemble algorithm for breast cancer diagnosis. Eur J Oper Res 267:687–699

    Article  MathSciNet  MATH  Google Scholar 

  42. Alwidian J, Hammo BH, Obeid N (2018) WCBA: weighted Classification based on association rules algorithm for breast cancer disease. Appl Soft Comput 62:536–49

    Article  Google Scholar 

  43. Wang S, Wang Y, Wang D, Yin Y, Wang Y, Jin Y (2020) An improved random forest-based rule extraction method for breast cancer diagnosis. Appl Soft Comput 105941:86

    Google Scholar 

  44. Agrawal U, Soria D, Wagner C, Garibaldi J, Ellis IO, Bartlett JM, Cameron D, Rakha EA, Green AR (2019) Combining clustering and classification ensembles: a novel pipeline to identify breast cancer profiles. Artif Intell Med 97:27–37

    Article  Google Scholar 

  45. Abdar M, Makarenkov V (2019) CWV-BANN-SVM Ensemble learning classifier for an accurate diagnosis of breast cancer. Measurement 146:557–570

    Article  Google Scholar 

  46. Khandezamin Z, Naderan M, Rashti MJ (2020) Detection and classification of breast cancer using logistic regression feature selection and GMDH classifier. J Biomed Inform 103591:111

    Google Scholar 

  47. Abdar M, Zomorodi-Moghadam M, Zhou X, Gururajan R, Tao X, Barua PD, Gururajan R (2020) A new nested ensemble technique for automated diagnosis of breast cancer. Pattern Recognit Lett 132:123–131

    Article  Google Scholar 

  48. Osman AH, Aljahdali HM (2020) An effective of ensemble boosting learning method for breast cancer virtual screening using neural network model. IEEE Access 8:39165–39174

    Article  Google Scholar 

  49. Vives-Boix V, Ruiz-Fernández D (2021) Fundamentals of artificial metaplasticity in radial basis function networks for breast cancer classification. Neural Comput Appl 17:1–12

    Google Scholar 

  50. Bhati S, Gupta MK (2016) Missing data imputation for medical database: review. Int J Adv Res Comput Sci Softw Eng 6:754–758

    Google Scholar 

  51. Barnett AG, McElwee P, Nathan A, Burton NW, Turrell G (2017) Identifying patterns of item missing survey data using latent groups: an observational study. BMJ Open 7:1–9

    Article  Google Scholar 

  52. Gopal KM, Durgaprasad N, Deepa KS, Sravan RG, Revanth RD (2019) Comparative analysis of different imputation techniques for handling missing dataset. Int J Innov Technol Explor Eng 8:347–351

    Google Scholar 

  53. Van Buuren S, Groothuis-Oudshoorn K (2011) MICE: Multivariate imputation by chained equations in R. J Stat Softw 45:1–67

    Article  Google Scholar 

  54. Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. Adv Knowl Discov Data Min 12:307–328

    Google Scholar 

  55. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Acm sigmod record. ACM, vol 22, pp 207–216

  56. Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353

    Article  MATH  Google Scholar 

  57. Mamdani EH (1977) Application of fuzzy logic to approximate reasoning using linguistic synthesis. IEEE Trans Comput 26:1182–1191

    Article  MATH  Google Scholar 

  58. Braae M, Rutherford DA (1978) Fuzzy relations in a control setting. Kybernetes 7:185–188

    Article  MATH  Google Scholar 

  59. Liu Z, Li HX (2005) A probabilistic fuzzy logic system for modeling and control. IEEE Trans Fuzzy Syst 13:848–859

    Article  Google Scholar 

  60. Saaty TL (1980) The analytical hierarchy process. McGraw Hill, New York

    MATH  Google Scholar 

  61. Liu Y, Eckert CM, Earl C (2020) A review of fuzzy AHP methods for decision-making with subjective judgements. Expert Syst Appl 161:1–30

    Article  Google Scholar 

  62. Mon DL, Cheng CH, Lin JC (1994) Evaluating weapon system using fuzzy analytic hierarchy process based on entropy weight. Fuzzy Sets Syst 62:127–134

    Article  Google Scholar 

  63. Buckley JJ (1985) Fuzzy hierarchical analysis. Fuzzy Sets Syst 17:233–247

    Article  MathSciNet  MATH  Google Scholar 

  64. Talon A, Curt C (2017) Selection of appropriate defuzzification methods: application to the assessment of dam performance. Expert Syst Appl 70:160–174

    Article  Google Scholar 

  65. Kahraman C, Cebeci U, Ruan D (2004) Multi-attribute comparison of catering service companies using fuzzy AHP: the case of Turkey. Int J Prod Econ 87:171–184

    Article  Google Scholar 

  66. Kohonen T, Honkela T (2007) Kohonen network. Scholarpedia 2:1568

    Article  Google Scholar 

  67. Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39:261–283

    Article  Google Scholar 

  68. Little RJ, Rubin DB (1987) Statistical analysis with missing data. Wiley

  69. Chhabra G, Vashisht V, Ranjan J (2017) A comparison of multiple imputation methods for data with missing values. Indian J Sci Technol 10:1–7

    Article  Google Scholar 

  70. Rubin DB (2004) Multiple imputation for nonresponse in surveys. Wiley

  71. Thara DK, PremaSudha BG, Xiong F (2019) Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques. Pattern Recognit Lett 128:544–550

    Article  Google Scholar 

  72. Dalton L, Ballarin V, Brun M (2009) Clustering algorithms: on learning, validation, performance, and applications to genomics. Curr Genomics 10:430–445

    Article  Google Scholar 

  73. Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  74. Dobrescu R, Vasilescu C, Ichim L (2006) Using fractal dimension in tumor growth evaluation. In: Proceedings of the 5th WSEAS international conference on non-linear analysis, non-linear systems and chaos, pp 63-68

  75. Bache K, Lichman M (2013) UCI machine learning repository. CA: University of California, school of information and computer science. http://archive.ics.uci.edu/ml. Accessed 6 Oct 2013

Download references

Acknowledgements

The authors are appreciative to Indian Institute of Technology (Indian School of Mines), Dhanbad, for providing with the resources needed to finish this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ananya Das.

Ethics declarations

Conflict of Interests

The authors have reported no conflicts of interest.

Additional information

Ethical approval

Any studies with human participants or animals are not performed in this article.

Informed Consent

This manuscript does not require a statement of informed consent.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chatterjee, S., Das, A. An ensemble algorithm integrating consensus-clustering with feature weighting based ranking and probabilistic fuzzy logic-multilayer perceptron classifier for diagnosis and staging of breast cancer using heterogeneous datasets. Appl Intell 53, 13882–13923 (2023). https://doi.org/10.1007/s10489-022-04157-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04157-0

Keywords

Navigation