Abstract
Breast cancer is a major threat, predominantly affecting the female population. Staging of cancer enables early detection and prognosis of patients, leading to determination of efficient and accurate treatment. Consequently, simplified models are required to integrate heterogeneous data for deriving knowledge about patients for further treatment. To achieve this goal, developing machine learning based diagnostic techniques is the predominant need. Prompted by these facts, a novel diagnostic model for staging of breast cancer infusing ensemble clustering, feature weighting based ranking of clusters and ensemble classification into benign or malignant class is developed. The proposed work constitutes of five different phases: data pre-processing, feature selection, ensemble clustering, ensemble classification, and staging of cancer. This work first employs Multiple Imputation Chained Equation for imputing missing values, followed by proposed feature selection technique employing Association Rules, Classification and Regression Tree, and Fuzzy Logic. Subsequently, a coupled clustering and classification algorithm based on consensus is developed to cluster features from different datasets using Self-Organizing Map and Decision Tree. A hierarchical clustering based ranking of these clusters using Multilinear Regression and Modified Fuzzy Analytical Hierarchical Process is proposed to prioritize features. Next, a staged classifier is developed integrating Probabilistic Fuzzy Logic and Multilayer Perceptron followed by feature extraction based staging of cancer. Finally, proposed work is validated on four datasets with various performance metrics using different combinations of train-test dataset. Moreover, k-fold cross-validation is implemented to eliminate biasedness. The detailed analysis of results of this work showcases superiority over other state-of-art methods in literature.
Similar content being viewed by others
References
Sheikhpour R, Sheikhpour R (2016) Breast cancer diagnosis using non-parametric kernel density estimation. Razi J Med Sci 23:30–40
Siegel RL, Miller KD, Jemal A (2015) Cancer statistics, 2015. CA: Cancer J Clin 65:5–29
Assiri AS, Nazir S, Velastin SA (2020) Breast tumor classification using an ensemble machine learning method. J Imaging 6:1–13
Ed-daoudy A, Maalmi K (2020) Breast cancer classification with reduced feature set using association rules and support vector machine. Netw Model Anal Health Inform Bioinform 9:1–10
Mert A, Kiliç N, Bilgili E, Akan A (2015) Breast cancer detection with reduced feature set. Comput Math Methods Med 2015:1–11
Gupta S, Kumar D, Sharma A (2011) Data mining classification techniques applied for breast cancer diagnosis and prognosis. Indian J Comput Sci Eng 2:188–195
Zheng B, Yoon SW, Lam SS (2014) Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst Appl 41:1476–1482
Gulbinat W (1997) What is the role of who as an intergovernmental organisation in the coordination of telematics in healthcare? World Health Organisation Geneva, Switzerland
Huang CL, Wang CJ (2006) A ga-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31:231–240
Tahir MA, Bouridane A, Kurugollu F (2007) Simultaneous feature selection and feature weighting using hybrid tabu search/k-nearest neighbor classifier. Pattern Recogn Lett 28:438–446
Wettschereck D, Aha DW, Mohri T (1997) A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif Intell Rev 11:273–314
Gayathri BM, Sumathi CP, Santhanam T (2013) Breast cancer diagnosis using machine learning algorithms-a survey. Int J Parallel Distrib Syst 4:105–112
Sheikhpour R, Sarram MA, Sheikhpour R (2016) Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer. Appl Soft Comput 40:113–131
Karabatak MA (2015) A new classifier for breast cancer detection based on naïve Bayesian. Measurement 72:32–36
Wolpert DH (2002) The supervised learning no-free-lunch theorems. In: Soft computing and industry. Springer, pp 25–42
Breiman L (1996) Bias, variance, and arcing classifiers. Tech Rep 460, Statistics Department. University of California Berkeley, CA
Cserni G, Chmielik E, Cserni B, Tot T (2018) The new TNM-based staging of breast cancer. Virchows Arch, (5):697–703
Rahman MA, Muniyandi RC (2018) Feature selection from colon cancer dataset for cancer classification using artificial neural network. Int J Adv Sci Eng Inf Technol 8:1387–1393
Sahran S, Albashish D, Abdullah A, Abd Shukor N, Pauzi SH (2018) Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading. Artif Intell Med 87:78–90
Aličković E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and rotation forest. Neural Comput Appl 28:753–763
Ahmad F, Isa NA, Hussain Z, Osman MK, Sulaiman SN (2015) A GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer. Pattern Anal Appl 18:861– 870
Gayathri BM, Sumathi CP (2015) Mamdani fuzzy inference system for breast cancer risk detection. In: IEEE international conference on computational intelligence and computing research (ICCIC). IEEE, pp 1–6
Gayathri BM, Sumathi CP (2016) An automated technique using Gaussian naïve Bayes classifier to classify breast cancer. Int J Comput Appl 148:16–21
Aalaei S, Shahraki H, Rowhanimanesh A, Eslami S (2016) Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets. Iran J Basic Med Sci 19:476–482
Ahmadi A, Afshar P (2016) Intelligent breast cancer recognition using particle swarm optimization and support vector machines. J Exp Theor Artif Intell 28:1021–1034
Modi N, Ghanchi K (2016) A comparative analysis of feature selection methods and associated machine learning algorithms on wisconsin breast cancer dataset (WBCD). In: Proceedings of international conference on ICT for sustainable development. Springer, Singapore, pp 215–224
Phan AV, Le Nguyen M, Bui LT (2017) Feature weighting and SVM parameters optimization based on genetic algorithms for classification problems. Appl Intell 46:455–469
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453
Singh D, Singh B, Kaur M (2020) Simultaneous feature weighting and parameter determination of neural networks using ant lion optimization for the classification of breast cancer. Biocybern Biomed Eng 40:337–351
Kumar P, Nair GG (2021) An efficient classification framework for breast cancer using hyper parameter tuned random decision forest classifier and bayesian optimization. Biomed Signal Process Control 68:1–11
Nguyen T, Nahavandi S (2015) Modified AHP for gene selection and cancer classification using type-2 fuzzy logic. IEEE Trans Fuzzy Syst 24:273–287
Nguyen T, Khosravi A, Creighton D, Nahavandi S (2015) Classification of healthcare data using genetic fuzzy logic system and wavelets. Expert Syst Appl 42:2184–2197
Nguyen T, Khosravi A, Creighton D, Nahavandi S (2015) Medical data classification using interval type-2 fuzzy logic system and wavelets. Appl Soft Comput 30:812–822
Ohri K, Singh H (2016) Fuzzy expert system for diagnosis of breast cancer. In: Proceedings of international conference on wireless communications, signal processing and networking (WiSPNET). IEEE, pp 2487–2492, p Sharma, A
Nilashi M, Ibrahim O, Ahmadi H, Shahmoradi L (2017) A knowledge-based system for breast cancer classification using fuzzy logic method. Telemat Inform 34:133–144
Kellam P, Liu X, Martin N, Orengo C, Swift S, Tucker A (2001) Comparing contrasting and combining clusters in viral gene expression. In: Proceedings of the sixth workshop on intelligent data analysis in medicine and pharmacology
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52:91–118
Chen D, Xing K, Henson D, Sheng L, Schwartz AM, Cheng X (2009) Developing prognostic systems of cancer patients by ensemble clustering. J Biomed Biotechnol 2009:1–7
Khairunnahar L, Hasib MA, Rezanur RH, Islam MR, Hosain MK (2019) Classification of malignant and benign tissue with logistic regression. Inform Med Unlocked 16:1–12
Mohanty F, Rup S, Dash B, Majhi B, Swamy MN (2019) A computer-aided diagnosis system using tchebichef features and improved grey wolf optimized extreme learning machine. Appl Intell 49:983–1001
Wang H, Zheng B, Yoon SW, Ko HS (2018) A support vector machine-based ensemble algorithm for breast cancer diagnosis. Eur J Oper Res 267:687–699
Alwidian J, Hammo BH, Obeid N (2018) WCBA: weighted Classification based on association rules algorithm for breast cancer disease. Appl Soft Comput 62:536–49
Wang S, Wang Y, Wang D, Yin Y, Wang Y, Jin Y (2020) An improved random forest-based rule extraction method for breast cancer diagnosis. Appl Soft Comput 105941:86
Agrawal U, Soria D, Wagner C, Garibaldi J, Ellis IO, Bartlett JM, Cameron D, Rakha EA, Green AR (2019) Combining clustering and classification ensembles: a novel pipeline to identify breast cancer profiles. Artif Intell Med 97:27–37
Abdar M, Makarenkov V (2019) CWV-BANN-SVM Ensemble learning classifier for an accurate diagnosis of breast cancer. Measurement 146:557–570
Khandezamin Z, Naderan M, Rashti MJ (2020) Detection and classification of breast cancer using logistic regression feature selection and GMDH classifier. J Biomed Inform 103591:111
Abdar M, Zomorodi-Moghadam M, Zhou X, Gururajan R, Tao X, Barua PD, Gururajan R (2020) A new nested ensemble technique for automated diagnosis of breast cancer. Pattern Recognit Lett 132:123–131
Osman AH, Aljahdali HM (2020) An effective of ensemble boosting learning method for breast cancer virtual screening using neural network model. IEEE Access 8:39165–39174
Vives-Boix V, Ruiz-Fernández D (2021) Fundamentals of artificial metaplasticity in radial basis function networks for breast cancer classification. Neural Comput Appl 17:1–12
Bhati S, Gupta MK (2016) Missing data imputation for medical database: review. Int J Adv Res Comput Sci Softw Eng 6:754–758
Barnett AG, McElwee P, Nathan A, Burton NW, Turrell G (2017) Identifying patterns of item missing survey data using latent groups: an observational study. BMJ Open 7:1–9
Gopal KM, Durgaprasad N, Deepa KS, Sravan RG, Revanth RD (2019) Comparative analysis of different imputation techniques for handling missing dataset. Int J Innov Technol Explor Eng 8:347–351
Van Buuren S, Groothuis-Oudshoorn K (2011) MICE: Multivariate imputation by chained equations in R. J Stat Softw 45:1–67
Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. Adv Knowl Discov Data Min 12:307–328
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Acm sigmod record. ACM, vol 22, pp 207–216
Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353
Mamdani EH (1977) Application of fuzzy logic to approximate reasoning using linguistic synthesis. IEEE Trans Comput 26:1182–1191
Braae M, Rutherford DA (1978) Fuzzy relations in a control setting. Kybernetes 7:185–188
Liu Z, Li HX (2005) A probabilistic fuzzy logic system for modeling and control. IEEE Trans Fuzzy Syst 13:848–859
Saaty TL (1980) The analytical hierarchy process. McGraw Hill, New York
Liu Y, Eckert CM, Earl C (2020) A review of fuzzy AHP methods for decision-making with subjective judgements. Expert Syst Appl 161:1–30
Mon DL, Cheng CH, Lin JC (1994) Evaluating weapon system using fuzzy analytic hierarchy process based on entropy weight. Fuzzy Sets Syst 62:127–134
Buckley JJ (1985) Fuzzy hierarchical analysis. Fuzzy Sets Syst 17:233–247
Talon A, Curt C (2017) Selection of appropriate defuzzification methods: application to the assessment of dam performance. Expert Syst Appl 70:160–174
Kahraman C, Cebeci U, Ruan D (2004) Multi-attribute comparison of catering service companies using fuzzy AHP: the case of Turkey. Int J Prod Econ 87:171–184
Kohonen T, Honkela T (2007) Kohonen network. Scholarpedia 2:1568
Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39:261–283
Little RJ, Rubin DB (1987) Statistical analysis with missing data. Wiley
Chhabra G, Vashisht V, Ranjan J (2017) A comparison of multiple imputation methods for data with missing values. Indian J Sci Technol 10:1–7
Rubin DB (2004) Multiple imputation for nonresponse in surveys. Wiley
Thara DK, PremaSudha BG, Xiong F (2019) Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques. Pattern Recognit Lett 128:544–550
Dalton L, Ballarin V, Brun M (2009) Clustering algorithms: on learning, validation, performance, and applications to genomics. Curr Genomics 10:430–445
Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Dobrescu R, Vasilescu C, Ichim L (2006) Using fractal dimension in tumor growth evaluation. In: Proceedings of the 5th WSEAS international conference on non-linear analysis, non-linear systems and chaos, pp 63-68
Bache K, Lichman M (2013) UCI machine learning repository. CA: University of California, school of information and computer science. http://archive.ics.uci.edu/ml. Accessed 6 Oct 2013
Acknowledgements
The authors are appreciative to Indian Institute of Technology (Indian School of Mines), Dhanbad, for providing with the resources needed to finish this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have reported no conflicts of interest.
Additional information
Ethical approval
Any studies with human participants or animals are not performed in this article.
Informed Consent
This manuscript does not require a statement of informed consent.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chatterjee, S., Das, A. An ensemble algorithm integrating consensus-clustering with feature weighting based ranking and probabilistic fuzzy logic-multilayer perceptron classifier for diagnosis and staging of breast cancer using heterogeneous datasets. Appl Intell 53, 13882–13923 (2023). https://doi.org/10.1007/s10489-022-04157-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04157-0