Skip to main content

Investigating Accuracy and Diversity in Heterogeneous Ensembles for Breast Cancer Classification

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2021 (ICCSA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12950))

Included in the following conference series:

Abstract

Breast Cancer (BC) is one of the most common forms of cancer among women. Detecting and accurately diagnosing breast cancer at an early phase increase the chances of women’s survival. For this purpose, various single classification techniques have been investigated to diagnosis BC. Nevertheless, none of them proved to be accurate in all circumstances. Recently, a promising approach called ensemble classifiers have been widely used to assist physicians accurately diagnose BC. Ensemble classifiers consist on combining a set of single classifiers by means of an aggregation layer. The literature in general shows that ensemble techniques outperformed single ones when ensemble members are accurate (i.e. have the lowest percentage error) and diverse (i.e. the single classifiers make uncorrelated errors on new instances). Hence, selecting ensemble members is often a crucial task since it can lead to the opposite: single techniques outperformed their ensemble. This paper evaluates and compares ensemble members’ selection based on accuracy and diversity with ensemble members’ selection based on accuracy only. A comparison with ensembles without member selection was also performed. Ensemble performance was assessed in terms of accuracy, F1-score. Q statistics diversity measure was used to calculate the classifiers diversity. The experiments were carried out on three well-known BC datasets available from online repositories. Seven single classifiers were used in our experiments. Skott Knott test and Borda Count voting system were used to assess the significance of the performance differences and rank ensembles according to theirs performances. The findings of this study suggest that: (1) Investigating both accuracy and diversity to select ensemble members often led to better performance, and (2) In general, selecting ensemble members using accuracy and/or diversity led to better ensemble performance than constructing ensembles without members’ selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. breastCancer. https://www.who.int/cancer/prevention/diagnosis-screening/breast-cancer/en/. Accessed 16 Jan 2019

  2. Breast Cancer Facts - National Breast Cancer Foundation. https://www.nationalbreastcancer.org/breast-cancer-facts. Accessed 11 Dec 2020

  3. Luo, S.T., Cheng, B.W.: Diagnosing breast masses in digital mammography using feature selection and ensemble methods. J. Med. Syst. 36, 569–577 (2012). https://doi.org/10.1007/s10916-010-9518-8

    Article  Google Scholar 

  4. Chhatwal, J., Alagoz, O., Burnside, E.S., Burnside, E.S.: Optimal Breast biopsy decision-making based on mammographic features and demographic factors. Oper. Res. 58(6), 1577–1591 (2010). https://doi.org/10.1287/opre.1100.0877

    Article  MathSciNet  MATH  Google Scholar 

  5. Kaushik, D., Kaur, K.: Application of data mining for high accuracy prediction of breast tissue biopsy results. In: 2016 3rd Third International Conference on Digital Information Processing, Data Mining, and Wireless Communications. DIPDMWC 2016, pp. 40–45 (2016). https://doi.org/10.1109/DIPDMWC.2016.7529361

  6. Topol, E.J.: High-performance medicine: the convergence of human and artificial intelligence (2019)

    Google Scholar 

  7. Idri, A., Chlioui, I., El Ouassif, B.: A systematic map of data analytics in breast cancer. In: Proceedings of the Australasian Computer Science Week Multiconference (2018). https://doi.org/10.1145/3167918.3167930

  8. Idri, A., Bouchra, E.O., Hosni, M., Abnane, I.: Assessing the impact of parameters tuning in ensemble based breast Cancer classification. Heal. Technol. 10(5), 1239–1255 (2020). https://doi.org/10.1007/s12553-020-00453-2

    Article  Google Scholar 

  9. El Ouassif, B., Idri, A., Hosni, M.: Homogeneous ensemble based support vector machine in breast cancer diagnosis (2021).https://doi.org/10.5220/0010230403520360

  10. Hosni, M., Abnane, I., Idri, A., Carrillo de Gea, J.M., Fernández-Alemán, J.L.: Reviewing ensemble classification methods in breast cancer. Comput. Methods Programs Biomed. 177, 89–112 (2019)

    Google Scholar 

  11. Hosni, M., Idri, A., Abran, A., Nassif, A.B.: On the value of parameter tuning in heterogeneous ensembles effort estimation. Soft. Comput. 22(18), 5977–6010 (2017). https://doi.org/10.1007/s00500-017-2945-4

    Article  Google Scholar 

  12. El Ouassif, B., Idri, A., Hosni, M.: Homogeneous ensemble based support vector machine in breast cancer diagnosis - BIOSTEC 2021. In: HEALTHINF 2021 - 14th International Conference on Health Informatics, Proceedings; Part of 13th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2021. SciTePress (2021)

    Google Scholar 

  13. Yang, L.: Classifiers selection for ensemble learning based on accuracy and diversity. Procedia Eng. 15, 4266–4270 (2011). https://doi.org/10.1016/j.proeng.2011.08.800

    Article  Google Scholar 

  14. Alexandropoulos, S.-A., Aridas, C.K., Kotsiantis, S.B., Vrahatis, M.N.: Stacking strong ensembles of classifiers. In: MacIntyre, J., Maglogiannis, I., Iliadis, L., Pimenidis, E. (eds.) AIAI 2019. IAICT, vol. 559, pp. 545–556. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-19823-7_46

    Chapter  Google Scholar 

  15. Onan, A., Korukoğlu, S., Bulut, H.: A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification. Inf. Process. Manag. 53, 814–833 (2017). https://doi.org/10.1016/j.ipm.2017.02.008

    Article  Google Scholar 

  16. Caruana, R., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: The Proceedings of ICML 2004 (2004)

    Google Scholar 

  17. Aksela, M.: Comparison of classifier selection methods for improving committee performance. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 84–93. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44938-8_9

    Chapter  MATH  Google Scholar 

  18. Butler, H.K., Friend, M.A., Bauer, K.W., Bihl, T.J.: The effectiveness of using diversity to select multiple classifier systems with varying classification thresholds. J. Algorithms Comput. Technol. 12, 187–199 (2018). https://doi.org/10.1177/1748301818761132

    Article  MathSciNet  Google Scholar 

  19. Bian, S., Wang, W.: Investigation on Diversity in Homogeneous and Heterogeneous Ensembles (2006)

    Google Scholar 

  20. Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51, 181–207 (2003)

    Article  Google Scholar 

  21. Wang, S., Yao, X.: Relationships between diversity of classification ensembles and single-class performance measures. IEEE Trans. Knowl. Data Eng. 25, 206–219 (2013). https://doi.org/10.1109/TKDE.2011.207

    Article  Google Scholar 

  22. Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Diversity in search strategies for ensemble feature selection. Inf. Fusion. 6, 83–98 (2005). https://doi.org/10.1016/j.inffus.2004.04.003

    Article  Google Scholar 

  23. Windeatt, T.: Diversity measures for multiple classifier system analysis and design. Inf. Fusion. 6, 21–36 (2005). https://doi.org/10.1016/j.inffus.2004.04.002

    Article  Google Scholar 

  24. Schapire, R.E.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51, 181–207 (2003)

    Article  Google Scholar 

  25. Duin, R.P.W., Tax, D.M.J.: Experiments with classifier combining rules. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 16–29. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_2

    Chapter  Google Scholar 

  26. Skurichina, M., Kuncheva, L.I., Duin, R.P.W.: Bagging and boosting for the nearest mean classifier: effects of sample size on diversity and accuracy. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 62–71. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45428-4_6

    Chapter  MATH  Google Scholar 

  27. Webb, G.I., Zheng, Z.: Multistrategy ensemble learning: reducing error by combining ensemble learning techniques. IEEE Trans. Knowl. Data Eng. 16, 980–991 (2004). https://doi.org/10.1109/TKDE.2004.29

    Article  Google Scholar 

  28. Kuncheva, L.I.: That elusive diversity in classifier ensembles. In: Perales, F.J., Campilho, A.J.C., de la Blanca, N.P., Sanfeliu, A. (eds.) IbPRIA 2003. LNCS, vol. 2652, pp. 1126–1138. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-44871-6_130

    Chapter  Google Scholar 

  29. Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. In: Proceedings of the 7th International Conference on Neural Information Processing Systems (1994)

    Google Scholar 

  30. Kuncheva, L.I., Skurichina, M., Duin, R.P.W.: An experimental study on diversity for bagging and boosting with linear classifiers. Inf. Fusion. 3, 245–258 (2002). https://doi.org/10.1016/S1566-2535(02)00093-3

    Article  Google Scholar 

  31. Narasimhamurthy, A.: Evaluation of diversity measures for binary classifier ensembles. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 267–277. Springer, Heidelberg (2005). https://doi.org/10.1007/11494683_27

    Chapter  Google Scholar 

  32. Azizi, N., Farah, N., Sellami, M., Ennaji, A.: Using diversity in classifier set selection for Arabic handwritten recognition. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). LNCS, 5997, 235–244 (2010). https://doi.org/10.1007/978-3-642-12127-2-24

  33. Naldi, M.C., Carvalho, A.C.P.L.F., Campello, R.J.G.B.: Cluster ensemble selection based on relative validity indexes (2013). https://doi.org/10.1007/s10618-012-0290-x

  34. Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced datasets: a review. GESTS Int’l Trans. Comput. Sci. Eng. 30, 25–36 (2012)

    Google Scholar 

  35. Zhang, S., Li, X., Zong, M., Zhu, X., Wang, R.: Efficient kNN classification with different numbers of nearest neighbors. IEEE Trans. Neural Networks Learn. Syst. 29, 1774–1785 (2018). https://doi.org/10.1109/TNNLS.2017.2673241

    Article  MathSciNet  Google Scholar 

  36. Vapnik, V.: Principles of risk minimization for learning theory. In: Advances in Neural Information Processing Systems (1992)

    Google Scholar 

  37. Schölkopf, B., Alexander, J.S.: Support Vector Machines, Regularization, Optimization, and Beyond. In: Learning with Kernels, pp. 1–27 (2001)

    Google Scholar 

  38. Bhavsar, H., Ganatra, A.: Radial basis polynomial kernel (RBPK): a generalized kernel for support vector machine. Int. J. Comput. Sci. Inf. Secur. 14, 1–20 (2016)

    Google Scholar 

  39. Kocyigit, Y., Alkan, A., Erol, H.: Classification of EEG recordings by using fast independent component analysis and artificial neural network. J. Med. Syst. 32, 17–20 (2008). https://doi.org/10.1007/s10916-007-9102-z

    Article  Google Scholar 

  40. Übeyli, E.D.: Combined neural network model employing wavelet coefficients for EEG signals classification. Digit. Signal Process. A Rev. J. 19, 297–308 (2009). https://doi.org/10.1016/j.dsp.2008.07.004

    Article  Google Scholar 

  41. Idri, A., Khoshgoftaar, T., Abran, A.: Can neural networks be easily interpreted in software cost estimation? In: 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE 2002. Proceedings (Cat. No. 02CH37291), vol. 2, pp. 1162–1167 (2003). https://doi.org/10.1109/fuzz.2002.1006668

  42. Haykin, S.: Neural networks: a comprehensive foundation (1999)

    Google Scholar 

  43. Wang, Y., Wang, Y., Witten, I.: Inducing model tree for continuous classes. In Proceedings of Poster Papers, 9th European Conference on Machine Learning, pp. 128–137 (1997)

    Google Scholar 

  44. Salzberg, S.L.: C4.5: Programs for machine learning by J. Ross Quinlan. Mach. Learn. 16, 235–240. Morgan Kaufmann Publishers, Inc., 1993 (1994). https://doi.org/10.1007/BF00993309

  45. Idri, A., El Ouassif, B., Hosnia, M., Abran, A.: Classification techniques in breast cancer diagnosis: a systematic literature review. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. (2020)

    Google Scholar 

  46. Schapire, E., R.: A brief introduction to boosting (1999)

    Google Scholar 

  47. Sergios, T., Konstantinos, K.: Pattern Recognition, Third Edition.

    Google Scholar 

  48. Polikar, R.: Ensemble based systems in decision making (2006). https://doi.org/10.1109/MCAS.2006.1688199

  49. Ali, K., Michael J.P.: On the Link between Error Correlation and Error Reduction in Decision Tree Ensembles (1995)

    Google Scholar 

  50. Kuncheva, L.I., Whitaker, C.J.: Ten measures of diversity in classifier ensembles: limits for two classifiers. IEE Colloq. 73–82 (2001). https://doi.org/10.1049/ic:20010105

  51. Udny Yule, G.: On the association of attributes in statistics: with illustrations from the material of the childhood society, & c on JSTOR. Philos. Trans. R. Soc. London. A 194, 257–319 (63 pages) (1900)

    Google Scholar 

  52. Giacinto, G., Roli, F.: Design of effective neural network ensembles for image classification purposes. Image Vis. Comput. 19, 699–707 (2001). https://doi.org/10.1016/S0262-8856(01)00045-2

    Article  Google Scholar 

  53. Cunningham, P., Carney, J.: Diversity versus quality in classification ensembles based on feature selection. In: López de Mántaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 109–116. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45164-1_12

    Chapter  Google Scholar 

  54. Partridge, D., Krzanowski, W.: Software diversity: Practical statistics for its measurement and exploitation. Inf. Softw. Technol. 39, 707–717 (1997). https://doi.org/10.1016/s0950-5849(97)00023-2

    Article  Google Scholar 

  55. Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: A new ensemble diversity measure applied to thinning ensembles. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 306–316. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44938-8_31

    Chapter  Google Scholar 

  56. Kadkhodaei, H., Moghadam, A.M.E.: An entropy based approach to find the best combination of the base classifiers in ensemble classifiers based on stack generalization. In: 2016 4th International Conference on Control, Instrumentation, and Automation, ICCIA 2016, pp. 425–429. Institute of Electrical and Electronics Engineers Inc. (2016). https://doi.org/10.1109/ICCIAutom.2016.7483200

  57. Nascimento, D.S.C., Canuto, A.M.P., Silva, L.M.M., Coelho, A.L.V.: Combining different ways to generate diversity in bagging models: an evolutionary approach. In: Proceedings of the International Joint Conference on Neural Networks, pp. 2235–2242. IEEE (2011). https://doi.org/10.1109/IJCNN.2011.6033507

  58. Lysiak, R., Kurzynski, M., Woloszynski, T.: Optimal selection of ensemble classifiers using measures of competence and diversity of base classifiers. Neurocomputing 126, 29–35 (2014). https://doi.org/10.1016/j.neucom.2013.01.052

    Article  Google Scholar 

  59. Lopes Bhering, L., Cruz, D., Soares De Vasconcelos, E., Ferreira, A., Fernando, M., De Resende, R.: Alternative methodology for Scott-Knott test. Crop Breed. Appl. Biotechnol. 8, 9–16 (2008)

    Google Scholar 

  60. Cox, D.R., Spjøtvoll, E.: On partitioning means into groups source. Wiley behalf Board Found. Scand. J. St. 9, 147–152 (1982)

    Google Scholar 

  61. Calinski, T., Corsten, L.C.A.: Clustering means in ANOVA by simultaneous testing. Biometrics 41, 39 (1985). https://doi.org/10.2307/2530641

    Article  Google Scholar 

  62. Sharma, A., Kulshrestha, S., Daniel, S.: Machine learning approaches for breast cancer diagnosis and prognosis. In: 2017 International Conference on Soft Computing and its Engineering Applications: Harnessing Soft Computing Techniques for Smart and Better World, icSoftComp 2017, pp. 1–5. Changa, India (2018). https://doi.org/10.1109/ICSOFTCOMP.2017.8280082

  63. Bony, S., Pichon, N., Ravel, C., Durixl, A., Balfourier, F.: The relationship between mycotoxin synthesis and isolatemorphology in fungal endophytes of Lolium perenne. New Phytol. 152, 125–137 (2001)

    Article  Google Scholar 

  64. Tsoumakas, G., Angelis, L., Vlahavas, I.: Selective Fusion of Heterogeneous Classi ers. Intell. Data Anal. 9, 511–525 (2005). https://doi.org/10.3233/ida-2005-9602

    Article  Google Scholar 

  65. Borges, L., Ferreira, D.: Power and type I errors rate of Scott-Knott, Tukey and Newman-Keuls tests under normal and no-normal distributions of the residues. Rev. Matemática e Estatística. 21, 67–83 (2003)

    MATH  Google Scholar 

  66. Rowley, C.K.: Borda, Jean-Charles de (1733–1799). In: Durlauf, S.N., Blume, L.E. (eds.) The New Palgrave: Dictionary of Economics, pp. 527–529. Palgrave Macmillan UK, London (2008). https://doi.org/10.1007/978-1-349-58802-2_148

    Chapter  Google Scholar 

  67. Chawla, N.V, Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique (2002)

    Google Scholar 

  68. Gu, S.: Generating diverse and accurate classifier ensembles using multi-objective optimization (2014)

    Google Scholar 

  69. WEKA-University of Waikato: WEKA. https://ai.waikato.ac.nz/weka/

  70. Smith, B.L., Scherer, W.T., Conklin, J.H.: Exploring imputation techniques for missing data in transportation management systems. Transp. Res. Rec. J. Transp. Res. Board. 1836, 132–142 (2003). https://doi.org/10.3141/1836-17

    Article  Google Scholar 

  71. Idri, A., Abnane, I., Abran, A.: Missing data techniques in analogy-based software development effort estimation. J. Syst. Softw. 117, 595–611 (2016). https://doi.org/10.1016/J.JSS.2016.04.058

    Article  Google Scholar 

  72. Oh, S.B.: On the relationship between majority vote accuracy and dependency in multiple classifier systems. Pattern Recogn. Lett. 24, 359–363 (2003). https://doi.org/10.1016/S0167-8655(02)00260-X

    Article  MATH  Google Scholar 

  73. Kuncheva, I.L.: Combining Pattern Classifiers: Methods and Algorithms (2014). https://doi.org/10.1002/97811189145641

  74. Idri, A., Hosni, M., Abran, A.: Improved estimation of software development effort using classical and fuzzy analogy ensembles. Appl. Soft Comput. J. 49, 990–1019 (2016). https://doi.org/10.1016/j.asoc.2016.08.012

    Article  Google Scholar 

  75. Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12, 993–1001 (1990). https://doi.org/10.1109/34.58871

    Article  Google Scholar 

  76. Zhou, Z.H., Wu, J., Tang, W.: Ensembling neural networks: many could be better than all. Artif. Intell. 137, 239–263 (2002). https://doi.org/10.1016/S0004-3702(02)00190-X

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Idri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

El Ouassif, B., Idri, A., Hosni, M. (2021). Investigating Accuracy and Diversity in Heterogeneous Ensembles for Breast Cancer Classification. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2021. ICCSA 2021. Lecture Notes in Computer Science(), vol 12950. Springer, Cham. https://doi.org/10.1007/978-3-030-86960-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86960-1_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86959-5

  • Online ISBN: 978-3-030-86960-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics