Investigating Accuracy and Diversity in Heterogeneous Ensembles for Breast Cancer Classification

El Ouassif, Bouchra; Idri, Ali; Hosni, Mohamed

doi:10.1007/978-3-030-86960-1_19

Bouchra El Ouassif¹⁸,
Ali Idri^18,19 &
Mohamed Hosni^18,20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12950))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1393 Accesses
6 Citations

Abstract

Breast Cancer (BC) is one of the most common forms of cancer among women. Detecting and accurately diagnosing breast cancer at an early phase increase the chances of women’s survival. For this purpose, various single classification techniques have been investigated to diagnosis BC. Nevertheless, none of them proved to be accurate in all circumstances. Recently, a promising approach called ensemble classifiers have been widely used to assist physicians accurately diagnose BC. Ensemble classifiers consist on combining a set of single classifiers by means of an aggregation layer. The literature in general shows that ensemble techniques outperformed single ones when ensemble members are accurate (i.e. have the lowest percentage error) and diverse (i.e. the single classifiers make uncorrelated errors on new instances). Hence, selecting ensemble members is often a crucial task since it can lead to the opposite: single techniques outperformed their ensemble. This paper evaluates and compares ensemble members’ selection based on accuracy and diversity with ensemble members’ selection based on accuracy only. A comparison with ensembles without member selection was also performed. Ensemble performance was assessed in terms of accuracy, F1-score. Q statistics diversity measure was used to calculate the classifiers diversity. The experiments were carried out on three well-known BC datasets available from online repositories. Seven single classifiers were used in our experiments. Skott Knott test and Borda Count voting system were used to assess the significance of the performance differences and rank ensembles according to theirs performances. The findings of this study suggest that: (1) Investigating both accuracy and diversity to select ensemble members often led to better performance, and (2) In general, selecting ensemble members using accuracy and/or diversity led to better ensemble performance than constructing ensembles without members’ selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

breastCancer. https://www.who.int/cancer/prevention/diagnosis-screening/breast-cancer/en/. Accessed 16 Jan 2019
Breast Cancer Facts - National Breast Cancer Foundation. https://www.nationalbreastcancer.org/breast-cancer-facts. Accessed 11 Dec 2020
Luo, S.T., Cheng, B.W.: Diagnosing breast masses in digital mammography using feature selection and ensemble methods. J. Med. Syst. 36, 569–577 (2012). https://doi.org/10.1007/s10916-010-9518-8
Article Google Scholar
Chhatwal, J., Alagoz, O., Burnside, E.S., Burnside, E.S.: Optimal Breast biopsy decision-making based on mammographic features and demographic factors. Oper. Res. 58(6), 1577–1591 (2010). https://doi.org/10.1287/opre.1100.0877
Article MathSciNet MATH Google Scholar
Kaushik, D., Kaur, K.: Application of data mining for high accuracy prediction of breast tissue biopsy results. In: 2016 3rd Third International Conference on Digital Information Processing, Data Mining, and Wireless Communications. DIPDMWC 2016, pp. 40–45 (2016). https://doi.org/10.1109/DIPDMWC.2016.7529361
Topol, E.J.: High-performance medicine: the convergence of human and artificial intelligence (2019)
Google Scholar
Idri, A., Chlioui, I., El Ouassif, B.: A systematic map of data analytics in breast cancer. In: Proceedings of the Australasian Computer Science Week Multiconference (2018). https://doi.org/10.1145/3167918.3167930
Idri, A., Bouchra, E.O., Hosni, M., Abnane, I.: Assessing the impact of parameters tuning in ensemble based breast Cancer classification. Heal. Technol. 10(5), 1239–1255 (2020). https://doi.org/10.1007/s12553-020-00453-2
Article Google Scholar
El Ouassif, B., Idri, A., Hosni, M.: Homogeneous ensemble based support vector machine in breast cancer diagnosis (2021).https://doi.org/10.5220/0010230403520360
Hosni, M., Abnane, I., Idri, A., Carrillo de Gea, J.M., Fernández-Alemán, J.L.: Reviewing ensemble classification methods in breast cancer. Comput. Methods Programs Biomed. 177, 89–112 (2019)
Google Scholar
Hosni, M., Idri, A., Abran, A., Nassif, A.B.: On the value of parameter tuning in heterogeneous ensembles effort estimation. Soft. Comput. 22(18), 5977–6010 (2017). https://doi.org/10.1007/s00500-017-2945-4
Article Google Scholar
El Ouassif, B., Idri, A., Hosni, M.: Homogeneous ensemble based support vector machine in breast cancer diagnosis - BIOSTEC 2021. In: HEALTHINF 2021 - 14th International Conference on Health Informatics, Proceedings; Part of 13th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2021. SciTePress (2021)
Google Scholar
Yang, L.: Classifiers selection for ensemble learning based on accuracy and diversity. Procedia Eng. 15, 4266–4270 (2011). https://doi.org/10.1016/j.proeng.2011.08.800
Article Google Scholar
Alexandropoulos, S.-A., Aridas, C.K., Kotsiantis, S.B., Vrahatis, M.N.: Stacking strong ensembles of classifiers. In: MacIntyre, J., Maglogiannis, I., Iliadis, L., Pimenidis, E. (eds.) AIAI 2019. IAICT, vol. 559, pp. 545–556. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-19823-7_46
Chapter Google Scholar
Onan, A., Korukoğlu, S., Bulut, H.: A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification. Inf. Process. Manag. 53, 814–833 (2017). https://doi.org/10.1016/j.ipm.2017.02.008
Article Google Scholar
Caruana, R., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: The Proceedings of ICML 2004 (2004)
Google Scholar
Aksela, M.: Comparison of classifier selection methods for improving committee performance. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 84–93. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44938-8_9
Chapter MATH Google Scholar
Butler, H.K., Friend, M.A., Bauer, K.W., Bihl, T.J.: The effectiveness of using diversity to select multiple classifier systems with varying classification thresholds. J. Algorithms Comput. Technol. 12, 187–199 (2018). https://doi.org/10.1177/1748301818761132
Article MathSciNet Google Scholar
Bian, S., Wang, W.: Investigation on Diversity in Homogeneous and Heterogeneous Ensembles (2006)
Google Scholar
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51, 181–207 (2003)
Article Google Scholar
Wang, S., Yao, X.: Relationships between diversity of classification ensembles and single-class performance measures. IEEE Trans. Knowl. Data Eng. 25, 206–219 (2013). https://doi.org/10.1109/TKDE.2011.207
Article Google Scholar
Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Diversity in search strategies for ensemble feature selection. Inf. Fusion. 6, 83–98 (2005). https://doi.org/10.1016/j.inffus.2004.04.003
Article Google Scholar
Windeatt, T.: Diversity measures for multiple classifier system analysis and design. Inf. Fusion. 6, 21–36 (2005). https://doi.org/10.1016/j.inffus.2004.04.002
Article Google Scholar
Schapire, R.E.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51, 181–207 (2003)
Article Google Scholar
Duin, R.P.W., Tax, D.M.J.: Experiments with classifier combining rules. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 16–29. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_2
Chapter Google Scholar
Skurichina, M., Kuncheva, L.I., Duin, R.P.W.: Bagging and boosting for the nearest mean classifier: effects of sample size on diversity and accuracy. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 62–71. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45428-4_6
Chapter MATH Google Scholar
Webb, G.I., Zheng, Z.: Multistrategy ensemble learning: reducing error by combining ensemble learning techniques. IEEE Trans. Knowl. Data Eng. 16, 980–991 (2004). https://doi.org/10.1109/TKDE.2004.29
Article Google Scholar
Kuncheva, L.I.: That elusive diversity in classifier ensembles. In: Perales, F.J., Campilho, A.J.C., de la Blanca, N.P., Sanfeliu, A. (eds.) IbPRIA 2003. LNCS, vol. 2652, pp. 1126–1138. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-44871-6_130
Chapter Google Scholar
Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. In: Proceedings of the 7th International Conference on Neural Information Processing Systems (1994)
Google Scholar
Kuncheva, L.I., Skurichina, M., Duin, R.P.W.: An experimental study on diversity for bagging and boosting with linear classifiers. Inf. Fusion. 3, 245–258 (2002). https://doi.org/10.1016/S1566-2535(02)00093-3
Article Google Scholar
Narasimhamurthy, A.: Evaluation of diversity measures for binary classifier ensembles. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 267–277. Springer, Heidelberg (2005). https://doi.org/10.1007/11494683_27
Chapter Google Scholar
Azizi, N., Farah, N., Sellami, M., Ennaji, A.: Using diversity in classifier set selection for Arabic handwritten recognition. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). LNCS, 5997, 235–244 (2010). https://doi.org/10.1007/978-3-642-12127-2-24
Naldi, M.C., Carvalho, A.C.P.L.F., Campello, R.J.G.B.: Cluster ensemble selection based on relative validity indexes (2013). https://doi.org/10.1007/s10618-012-0290-x
Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced datasets: a review. GESTS Int’l Trans. Comput. Sci. Eng. 30, 25–36 (2012)
Google Scholar
Zhang, S., Li, X., Zong, M., Zhu, X., Wang, R.: Efficient kNN classification with different numbers of nearest neighbors. IEEE Trans. Neural Networks Learn. Syst. 29, 1774–1785 (2018). https://doi.org/10.1109/TNNLS.2017.2673241
Article MathSciNet Google Scholar
Vapnik, V.: Principles of risk minimization for learning theory. In: Advances in Neural Information Processing Systems (1992)
Google Scholar
Schölkopf, B., Alexander, J.S.: Support Vector Machines, Regularization, Optimization, and Beyond. In: Learning with Kernels, pp. 1–27 (2001)
Google Scholar
Bhavsar, H., Ganatra, A.: Radial basis polynomial kernel (RBPK): a generalized kernel for support vector machine. Int. J. Comput. Sci. Inf. Secur. 14, 1–20 (2016)
Google Scholar
Kocyigit, Y., Alkan, A., Erol, H.: Classification of EEG recordings by using fast independent component analysis and artificial neural network. J. Med. Syst. 32, 17–20 (2008). https://doi.org/10.1007/s10916-007-9102-z
Article Google Scholar
Übeyli, E.D.: Combined neural network model employing wavelet coefficients for EEG signals classification. Digit. Signal Process. A Rev. J. 19, 297–308 (2009). https://doi.org/10.1016/j.dsp.2008.07.004
Article Google Scholar
Idri, A., Khoshgoftaar, T., Abran, A.: Can neural networks be easily interpreted in software cost estimation? In: 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE 2002. Proceedings (Cat. No. 02CH37291), vol. 2, pp. 1162–1167 (2003). https://doi.org/10.1109/fuzz.2002.1006668
Haykin, S.: Neural networks: a comprehensive foundation (1999)
Google Scholar
Wang, Y., Wang, Y., Witten, I.: Inducing model tree for continuous classes. In Proceedings of Poster Papers, 9th European Conference on Machine Learning, pp. 128–137 (1997)
Google Scholar
Salzberg, S.L.: C4.5: Programs for machine learning by J. Ross Quinlan. Mach. Learn. 16, 235–240. Morgan Kaufmann Publishers, Inc., 1993 (1994). https://doi.org/10.1007/BF00993309
Idri, A., El Ouassif, B., Hosnia, M., Abran, A.: Classification techniques in breast cancer diagnosis: a systematic literature review. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. (2020)
Google Scholar
Schapire, E., R.: A brief introduction to boosting (1999)
Google Scholar
Sergios, T., Konstantinos, K.: Pattern Recognition, Third Edition.
Google Scholar
Polikar, R.: Ensemble based systems in decision making (2006). https://doi.org/10.1109/MCAS.2006.1688199
Ali, K., Michael J.P.: On the Link between Error Correlation and Error Reduction in Decision Tree Ensembles (1995)
Google Scholar
Kuncheva, L.I., Whitaker, C.J.: Ten measures of diversity in classifier ensembles: limits for two classifiers. IEE Colloq. 73–82 (2001). https://doi.org/10.1049/ic:20010105
Udny Yule, G.: On the association of attributes in statistics: with illustrations from the material of the childhood society, & c on JSTOR. Philos. Trans. R. Soc. London. A 194, 257–319 (63 pages) (1900)
Google Scholar
Giacinto, G., Roli, F.: Design of effective neural network ensembles for image classification purposes. Image Vis. Comput. 19, 699–707 (2001). https://doi.org/10.1016/S0262-8856(01)00045-2
Article Google Scholar
Cunningham, P., Carney, J.: Diversity versus quality in classification ensembles based on feature selection. In: López de Mántaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 109–116. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45164-1_12
Chapter Google Scholar
Partridge, D., Krzanowski, W.: Software diversity: Practical statistics for its measurement and exploitation. Inf. Softw. Technol. 39, 707–717 (1997). https://doi.org/10.1016/s0950-5849(97)00023-2
Article Google Scholar
Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: A new ensemble diversity measure applied to thinning ensembles. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 306–316. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44938-8_31
Chapter Google Scholar
Kadkhodaei, H., Moghadam, A.M.E.: An entropy based approach to find the best combination of the base classifiers in ensemble classifiers based on stack generalization. In: 2016 4th International Conference on Control, Instrumentation, and Automation, ICCIA 2016, pp. 425–429. Institute of Electrical and Electronics Engineers Inc. (2016). https://doi.org/10.1109/ICCIAutom.2016.7483200
Nascimento, D.S.C., Canuto, A.M.P., Silva, L.M.M., Coelho, A.L.V.: Combining different ways to generate diversity in bagging models: an evolutionary approach. In: Proceedings of the International Joint Conference on Neural Networks, pp. 2235–2242. IEEE (2011). https://doi.org/10.1109/IJCNN.2011.6033507
Lysiak, R., Kurzynski, M., Woloszynski, T.: Optimal selection of ensemble classifiers using measures of competence and diversity of base classifiers. Neurocomputing 126, 29–35 (2014). https://doi.org/10.1016/j.neucom.2013.01.052
Article Google Scholar
Lopes Bhering, L., Cruz, D., Soares De Vasconcelos, E., Ferreira, A., Fernando, M., De Resende, R.: Alternative methodology for Scott-Knott test. Crop Breed. Appl. Biotechnol. 8, 9–16 (2008)
Google Scholar
Cox, D.R., Spjøtvoll, E.: On partitioning means into groups source. Wiley behalf Board Found. Scand. J. St. 9, 147–152 (1982)
Google Scholar
Calinski, T., Corsten, L.C.A.: Clustering means in ANOVA by simultaneous testing. Biometrics 41, 39 (1985). https://doi.org/10.2307/2530641
Article Google Scholar
Sharma, A., Kulshrestha, S., Daniel, S.: Machine learning approaches for breast cancer diagnosis and prognosis. In: 2017 International Conference on Soft Computing and its Engineering Applications: Harnessing Soft Computing Techniques for Smart and Better World, icSoftComp 2017, pp. 1–5. Changa, India (2018). https://doi.org/10.1109/ICSOFTCOMP.2017.8280082
Bony, S., Pichon, N., Ravel, C., Durixl, A., Balfourier, F.: The relationship between mycotoxin synthesis and isolatemorphology in fungal endophytes of Lolium perenne. New Phytol. 152, 125–137 (2001)
Article Google Scholar
Tsoumakas, G., Angelis, L., Vlahavas, I.: Selective Fusion of Heterogeneous Classi ers. Intell. Data Anal. 9, 511–525 (2005). https://doi.org/10.3233/ida-2005-9602
Article Google Scholar
Borges, L., Ferreira, D.: Power and type I errors rate of Scott-Knott, Tukey and Newman-Keuls tests under normal and no-normal distributions of the residues. Rev. Matemática e Estatística. 21, 67–83 (2003)
MATH Google Scholar
Rowley, C.K.: Borda, Jean-Charles de (1733–1799). In: Durlauf, S.N., Blume, L.E. (eds.) The New Palgrave: Dictionary of Economics, pp. 527–529. Palgrave Macmillan UK, London (2008). https://doi.org/10.1007/978-1-349-58802-2_148
Chapter Google Scholar
Chawla, N.V, Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique (2002)
Google Scholar
Gu, S.: Generating diverse and accurate classifier ensembles using multi-objective optimization (2014)
Google Scholar
WEKA-University of Waikato: WEKA. https://ai.waikato.ac.nz/weka/
Smith, B.L., Scherer, W.T., Conklin, J.H.: Exploring imputation techniques for missing data in transportation management systems. Transp. Res. Rec. J. Transp. Res. Board. 1836, 132–142 (2003). https://doi.org/10.3141/1836-17
Article Google Scholar
Idri, A., Abnane, I., Abran, A.: Missing data techniques in analogy-based software development effort estimation. J. Syst. Softw. 117, 595–611 (2016). https://doi.org/10.1016/J.JSS.2016.04.058
Article Google Scholar
Oh, S.B.: On the relationship between majority vote accuracy and dependency in multiple classifier systems. Pattern Recogn. Lett. 24, 359–363 (2003). https://doi.org/10.1016/S0167-8655(02)00260-X
Article MATH Google Scholar
Kuncheva, I.L.: Combining Pattern Classifiers: Methods and Algorithms (2014). https://doi.org/10.1002/97811189145641
Idri, A., Hosni, M., Abran, A.: Improved estimation of software development effort using classical and fuzzy analogy ensembles. Appl. Soft Comput. J. 49, 990–1019 (2016). https://doi.org/10.1016/j.asoc.2016.08.012
Article Google Scholar
Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12, 993–1001 (1990). https://doi.org/10.1109/34.58871
Article Google Scholar
Zhou, Z.H., Wu, J., Tang, W.: Ensembling neural networks: many could be better than all. Artif. Intell. 137, 239–263 (2002). https://doi.org/10.1016/S0004-3702(02)00190-X
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Software Project Management Research Team, ENSIAS, Mohammed V University, Rabat, Morocco
Bouchra El Ouassif, Ali Idri & Mohamed Hosni
MSDA, Mohammed VI Polytechnic University, Ben Gueriir, Morocco
Ali Idri
MOSI, L2M3S, ENSAM-Meknes, Moulay Ismail University, Meknes, Morocco
Mohamed Hosni

Authors

Bouchra El Ouassif
View author publications
You can also search for this author in PubMed Google Scholar
Ali Idri
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Hosni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Idri .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Potenza, Italy
Beniamino Murgante
Covenant University, Ota, Nigeria
Sanjay Misra
University of Cagliari, Cagliari, Italy
Chiara Garau
University of Cagliari, Cagliari, Italy
Ivan Blečić
Monash University, Clayton, VIC, Australia
David Taniar
Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
University of Minho, Braga, Portugal
Ana Maria A.C. Rocha
Polytechnic University of Bari, Bari, Italy
Eufemia Tarantino
Polytechnic University of Bari, Bari, Italy
Carmelo Maria Torre

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

El Ouassif, B., Idri, A., Hosni, M. (2021). Investigating Accuracy and Diversity in Heterogeneous Ensembles for Breast Cancer Classification. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2021. ICCSA 2021. Lecture Notes in Computer Science(), vol 12950. Springer, Cham. https://doi.org/10.1007/978-3-030-86960-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-86960-1_19
Published: 11 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86959-5
Online ISBN: 978-3-030-86960-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics