Abstract
Breast cancer is a life-threatening and consequential disease due to its invasive and proliferative trait, predominantly found in women. Early detection of the cancer is a significant contributor to improved mortality and hence is an area of keen focus for ongoing researches. However, developing a technique to diagnose the severity of the patients at an early stage is a challenging task. Manual diagnostic techniques are time-consuming and result in inaccurate diagnosis of breast cancer. Prompted by these facts, a quantum optimized rule-base generated automated framework is developed to cluster the data based on degree of criticality of the cancer patients and further classify it as benign or malignant utilizing probability of malignancy of the clusters along with assignment of grades of cancer. Firstly, after implementing data pre-processing step, significant features are selected using an integrated feature selection approach. An efficient weightage algorithm is proposed incorporating the knowledge of physicians and the benefits of regression analysis which thereby provides a novel approach for detection of breast cancer. A novel ensemble clustering and classification algorithm employing voting-based Weighted Interval Type-II Fuzzy Inference System and Staged Pegasos Quantum Support Vector Classifier is then developed basis the prioritization of clusters depicting the critical state of breast cancer. A grading approach is also proposed based on fuzzy linguistic multi-criteria decision making system. Finally, the research is validated on Wisconsin Breast Cancer dataset. The detailed implementation of the proposed integrated model is accomplished to establish its superiority over other existing models in the literature.
Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Aalaei S, Shahraki H, Rowhanimanesh A, Eslami S (2016) Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets. Iran J Basic Med Sci 19:476–482
Agrawal U, Soria D, Wagner C, Garibaldi J, Ellis IO, Bartlett JM, Cameron D, Rakha EA, Green AR (2019) Combining clustering and classification ensembles: A novel pipeline to identify breast cancer profiles. Artif Intell Med 97:27–37
Ahmad A, Dey L (2011) A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets. Pattern Recognit Lett 32:1062–1069
Ahmad A, Hashmi S (2016) K-Harmonic means type clustering algorithm for mixed datasets. Appl Soft Comput 48:39–49
Ahmad F, Isa NA, Hussain Z, Osman MK, Sulaiman SN (2015) A GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer. Pattern Anal Appl 18:861–870
Ahmadi A, Afshar P (2016) Intelligent breast cancer recognition using particle swarm optimization and support vector machines. J Exp Theor Artif Intell 28:1021–1034
Alickovic E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and rotation forest. Neural Comput Appl 28:753–763
Alwidian J, Hammo BH, Obeid N (2018) WCBA: Weighted classification based on association rules algorithm for breast cancer disease. Appl Soft Comput 62:536–549
Anisha PR, Babu BV (2019) CEBPS: cluster based effective breast cancer prediction system. Int J Recent Technol Eng 7:260–264
Asri H, Mousannif H, Al Moatassime H, Noel T (2016) Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Comput Sci 83:1064–1069. https://doi.org/10.1016/j.procs.2016.04.224
Balanică V, Dumitrache I, Caramihai M, Rae W, Herbst C (2011) Evaluation of breast cancer risk by using fuzzy logic. U Politeh Buch Ser C 73:53–64
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsl 6:20–29
Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach Learn 36:105–139
Behzadian M, Otaghsara SK, Yazdani M, Ignatius J (2012) A state-of the-art survey of TOPSIS applications. Expert Syst Appl 39:13051–13069
Benioff P (1982) Quantum mechanical Hamiltonian models of Turing machines. J Stat Phys 29:515–546
Bukya VP, Nandyala R, Banoth M, Yootla M, Chowhan AK, Prayaga AK (2018) Comparative study of Robinson’s and Mouriquand’s cytological grading systems and correlation with histological grading in breast carcinoma. J Clin of Diagn Res 12:4–8
Caramihai M, Severin I, Blidaru A, Balan H, Saptefrati C (2010) Evaluation of breast cancer risk by using fuzzy logic. In: Proceedings of the 10th WSEAS international conference on applied informatics and communications, and 3rd WSEAS international conference on biomedical electronics and biomedical informatics, World Scientific and Engineering Academy and Society (WSEAS), pp 37–42
Castillo O, Melin P (2008) Type-2 fuzzy logic: theory and applications. Springer-Verlag, Berlin
Chaurasia V, Pal S, Tiwari BB (2018) Prediction of benign and malignant breast cancer using data mining techniques. J Algorithm Comput Technol 12:119–126. https://doi.org/10.1177/1748301818756225
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794
Cheung YM, Jia H (2013) Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit 46:2228–2238
Cutler A, Cutler DR, Stevens JR (2012) Random forests. Ensemble machine learning. Springer, Boston, MA, pp 157–175
Dalton L, Ballarin V, Brun M (2009) Clustering algorithms: on learning, validation, performance, and applications to genomics. Curr Genomics 10:430–445
Dalwinder S, Birmohan S, Manpreet K (2020) Simultaneous feature weighting and parameter determination of neural networks using ant lion optimization for the classification of breast cancer. Biocybern Biomed Eng 40:337–351
De Maesschalck R, Jouan-Rimbaud D, Massart DL (2000) The mahalanobis distance. Chemometr Intell Lab Syst 50:1–8. https://doi.org/10.1016/S0169-7439(99)00047-7
Dua D, Graff C (2019) UCI machine learning repository, 2017. http://archive.ics.uci.edu/ml
Dubey AK, Gupta U, Jain S (2016) Analysis of k-means clustering approach on the breast cancer Wisconsin dataset. Int J Comput Assist Radiol Surg 11:2033–2047. https://doi.org/10.1007/s11548-016-1437-9
Dubey AK, Gupta U, Jain S (2018) Comparative study of k-means and fuzzy C-means algorithms on the breast cancer data. Int J Adv Sci Eng Inf Technol 8:18–29. http://dx.doi.org/https://doi.org/10.18517/ijaseit.8.1.3490
Ed-daoudy A, Maalmi K. (2020) Breast cancer classification with reduced feature set using association rules and support vector machine. NetMAHIB 9:1–0.
Feynman RP (2018) Simulating physics with computers. In: Feynman and computation, CRC Press, pp 133–153
García V, Sánchez JS, Mollineda RA (2012) On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl Based Syst 25:13–21
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21:1263–1284
Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24:289–300
Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st pacific-asia conference on knowledge discovery and data mining (PAKDD), pp 21–34
Jain YK, Bhandare SK (2011) Min max normalization based data perturbation method for privacy protection. Int J Comput Commun 2:45–50
Jia H, Cheung YM (2017) Subspace clustering of categorical and numerical data with an unknown number of clusters. IEEE Trans Neural Netw Learn Syst 29:3308–3325
Juang CF, Huang RB, Lin YY (2009) A recurrent self-evolving interval type-2 fuzzy neural network for dynamic system processing. IEEE Trans Fuzzy Syst 17:1092–1105. https://doi.org/10.1109/TFUZZ.2009.2021953
Keerin P, Kurutach W, Boongoen T (2012) Cluster-based knn missing value imputation for dna microarray data. In: Proceedings of international conference on systems, man, and cybernetics (SMC), IEEE, pp 445–450. https://doi.org/10.1109/ICSMC.2012.6377764
Khairunnahar L, Hasib MA, Rezanur RH, Islam MR, Hosain MK (2019) Classification of malignant and benign tissue with logistic regression. Inform Med Unlocked 16:1–12
Khezri R, Hosseini R, Mazinani M (2014) A fuzzy rule-based expert system for the prognosis of the risk of development of the breast cancer. Int J Eng Sci 27:1557–1564
Khodadi I, Abadeh MS (2016) Genetic programming-based feature learning for question answering. Inf Process Manage 52:340–357
Kutner MH, Nachtsheim CJ, Neter J, Wasserman W (2004) Applied linear regression models. New York: Mcgraw-Hill/irwin 4:563–568
Li Z, Liu X, Xu N, Du J (2015) Experimental realization of a quantum support vector machine. Phys Rev Lett 114:140504
Lin M, Tang K, Yao X (2013) Dynamic sampling approach to training neural networks for multiclass imbalance classification. IEEE Trans Neural Netw Learn Syst 24:647–660
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453
Mashayekhi M, Gras R (2015) Rule extraction from random forest: the RF+ HC methods. Canadian conference on artificial intelligence. Springer, Cham, pp 223–237
Melin P, Castillo O (2013) A review on the applications of type-2 fuzzy logic in classification and pattern recognition. Expert Syst Appl 40:5413–5423. https://doi.org/10.1016/j.eswa.2013.03.020
Mendel JM (2017) Uncertain rule-based fuzzy systems: introduction and new directions. Springer, New York
Modi N, Ghanchi K (2016) A comparative analysis of feature selection methods and associated machine learning algorithms on Wisconsin breast cancer dataset (WBCD). In: Proceedings of international conference on ICT for sustainable development, Springer, Singapore, pp 215–224
Nguyen TT, Nguyen MP, Pham XC, Liew AWC (2018) Heterogeneous classifier ensemble with fuzzy rule-based meta learner. Inf Sci 422:144–160
Nguyen QH, Do TT, Wang Y, Heng SS, Chen K, Ang WHM, Philip CE, Singh M, Pham HN, Nguyen B, Chua MC (2019) Breast cancer prediction using feature selection and ensemble voting. In: 2019 International conference on system science and engineering (ICSSE), IEEE, pp 250–254
Nielsen MA, Chuang I (2002) Quantum computation and quantum information. Am J Phys 70:558–560. https://doi.org/10.1119/1.1463744
Nilashi M, Ibrahim O, Ahmadi H, Shahmoradi L (2017) A knowledge-based system for breast cancer classification using fuzzy logic method. Telemat Inform 34:133–144. https://doi.org/10.1016/j.tele.2017.01.007
Ohri K, Singh H, Sharma A (2016) Fuzzy expert system for diagnosis of breast cancer. In: Proceedings of international conference on wireless communications, signal processing and networking (WiSPNET), IEEE, pp 2487–2492. https://doi.org/10.1109/WiSPNET.2016.7566591
Ojha U, Goel S (2017) A study on prediction of breast cancer recurrence using data mining techniques. In: Proceedings of 7th international conference on cloud computing, data science & engineering-confluence, IEEE, pp 527–530. https://doi.org/10.1109/CONFLUENCE.2017.7943207
Phan AV, Le Nguyen M, Bui LT (2017) Feature weighting and SVM parameters optimization based on genetic algorithms for classification problems. Appl Intell 46:455–469
Rahman MA, Muniyandi RC (2018) Feature selection from colon cancer dataset for cancer classification using artificial neural network. Int J Adv Sci Eng Inf Technol 8:1387–1393
Rebentrost P, Mohseni M, Lloyd S (2014) Quantum support vector machine for big data classification. Phys Rev Lett 113:130503
Ronoud S, Asadi S (2019) An evolutionary deep belief network extreme learning-based for breast cancer diagnosis. Soft Comput 23:13139–13159. https://doi.org/10.1007/s00500-019-03856-0
Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88:1273–1283
Sahran S, Albashish D, Abdullah A, Abd Shukor N, Pauzi SH (2018) Absolute cosine based SVM-RFE feature selection method for prostate histopathological grading. Artif Intell Med 87:78–90
Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for svm. Math Program 127:3–30
Simmons JP, Nelson LD, Simonsohn U (2011) False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci 22:1359–1366
Singh S, Jangir SK, Kumar M, Verma M, Kumar S, Walia TS, Kamal SM (2022) Feature importance score-based functional link artificial neural networks for breast cancer classification. Biomed Res Int 2022:1–8
Sun Y, Wong AK, Kamel MS (2009) Classification of imbalanced data: A review. Intern J Pattern Recognit Artif Intell 23:687–719
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol 58:267–288
Tintu PB, Paulin R (2013) Detect breast cancer using fuzzy c means techniques in wisconsin prognostic breast cancer (WPBC) data sets. Int J Comput Appl Technol Res 2:614–617. https://doi.org/10.7753/IJCATR0205.1017
Venkatadri M, Reddy LC (2011) A review on data mining from past to the future. Int J Comput Appl 15:19–22
Vives-Boix V, Ruiz-Fernandez D (2021) Fundamentals of artificial metaplasticity in radial basis function networks for breast cancer classification. Neural Comput Appl 17:1–12
Wang H, Liu J, Zhi J, Fu C (2013) The improvement of quantum genetic algorithm and its application on function optimization. Math Probl Eng 2013:1–10
Wang H, Zheng B, Yoon SW, Ko HS (2018) A support vector machine-based ensemble algorithm for breast cancer diagnosis. Eur J Oper Res 267:687–699
Wang S, Wang Y, Wang D, Yin Y, Wang Y, Jin Y (2020) An improved random forest based rule extraction method for breast cancer diagnosis. Appl Soft Comput 86:105941
Weiss GM (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor Newsl 6:7–19
Weiss GM, Tian Y (2008) Maximizing classifier utility when there are data acquisition and modeling costs. Data Min Knowl Discov 17:253–282
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw Learn Syst 16:645–678. https://doi.org/10.1109/TNN.2005.845141
Yang J, Rahardja S, Fränti P (2018) Mean-Shift Outlier Detection. In: FSDM, pp 208–215
Yedjour D, Benyettou A (2018) Symbolic interpretation of artificial neural networks based on multiobjective genetic algorithms and association rules mining. Appl Soft Comput 72:177–188
Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353
Zadeh LA (1975) The concept of a linguistic variable and its application to approximate reasoning-II. Inf Sci 8:199–249. https://doi.org/10.1016/0020-0255(75)90046-8
Zeng J, Xie L, Liu ZQ (2008) Type-2 fuzzy gaussian mixture models. Pattern Recognit 41:3636–3643. https://doi.org/10.1016/j.patcog.2008.06.006
Zhang GX, Li N, Jin WD (2004) A novel quantum genetic algorithm and it’s application. Acta Electron Sin 32:476–479
Zhang Y, Qian X, Wang J, Gendeel M (2019) Fuzzy rule-based classification system using multi-population quantum evolutionary algorithm with contradictory rule reconstruction. Appl Intell 49:4007–4021
Zhang B (2000) Generalized k-harmonic means-boosting in unsupervised learning. Hp Laboratories Technical Report Hpl 137
Zheng H, Peng C (2005) Collaboration and fairness in opportunistic spectrum access. In: Proceedings of the 40th annual IEEE international conference on communications (ICC’05), Seoul, Korea, vol 5, pp 3132–3136
Acknowledgements
The authors are grateful to Indian Institute of Technology (Indian School of Mines), Dhanbad, for contributing necessary facilities to conclude this research.
Funding
The authors have not received any funding for conducting this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Statement of informed consent is not applicable in this manuscript.
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chatterjee, S., Das, A. An ensemble algorithm using quantum evolutionary optimization of weighted type-II fuzzy system and staged Pegasos Quantum Support Vector Classifier with multi-criteria decision making system for diagnosis and grading of breast cancer. Soft Comput 27, 7147–7178 (2023). https://doi.org/10.1007/s00500-023-07939-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-023-07939-x