Assessing the impact of data augmentation and a combination of CNNs on leukemia classification
Introduction
Bone marrow occupies the bone cavity, where blood cells are produced. It contains the cells that give rise to red blood cells, known as erythrocytes, platelets, and white blood cells, also known as leukocytes. The latter cells actively participate in the human immune system and help it to defend the body against invaders. Progenitor cells in the marrow, also known as stem cells or precursor cells, produce an average of 100 million leukocytes per day. These leukocytes help the body to combat and eliminate microorganisms and chemical structures that are strangers to it through their capture, i.e., phagocytosis or through the production of antibodies. One of the diseases that affect the functioning of the bone marrow is leukemia [45].
Leukemia is a malignant disease of the white blood cells, usually of unknown origin. Its main characteristic is the accumulation of diseased cells in the bone marrow, which replace normal blood cells. A blood cell that has not yet reached maturity undergoes a genetic mutation that turns it into a cancer cell in leukemia. This abnormal cell does not operate properly, and it multiplies faster and has a shorter lifespan than of normal cells. Hence, the abnormal cancer cells replace healthy blood cells in the bone marrow.
The American Cancer Society (ACS)1 estimated that there would be 60,650 new cases of leukemia in 2022, with approximately 24,000 deaths; in particular, there would be 35,810 male cases and 24,840 female cases, leading to 14,020 male deaths and 9,980 female deaths.
The types of leukemia can be classified according to the worsening speed of the disease. Hence, the condition can be of the chronic type, which usually gets worse slowly, or of the acute type, which usually gets worse quickly. The types of leukemia can also be classified based on the kind of white blood cells they affect: lymphoid or myeloid cells. Thus, the main types of leukemia are acute lymphoblastic leukemia (ALL), acute myeloid leukemia (AML), chronic myeloid leukemia (CML), and chronic lymphocytic leukemia (CLL). Acute leukemia affects mainly children, and chronic leukemia tends to affect adults and the elderly [45].
Each type of leukemia has an appropriate treatment; therefore, a diagnosis in the early stage of the disease is demanded to provide the proper treatment successfully. On the other hand, the main treatments for more advanced disease phases aim to destroy the leukemic cells so that the bone marrow returns to produce normal cells. Fig. 1 shows examples of the blood slide images used in the experiments of the current study, mainly ALL, AML, chronic leukemia, and healthy blood slides (HBS).
Deep learning models have been increasingly used in computer-aided medical diagnosis (CAD) systems. In particular, convolutional neural networks (CNNs) can learn hierarchical representations, from more general features in the first convolutional layers, to more semantic features in the last few layers. Currently, CNNs are one of the most effective techniques used in medical imaging-based diagnosis [22]. Researchers have been seeking to increase the generalizability of CNNs, particularly based on techniques of data augmentation and the combination of CNNs in ensemble and multilevel configurations.
In this study, techniques that are widely used in CNN based CAD systems were evaluated, mainly data augmentation and ensemble and multilevel configurations. Therefore, seven CNNs were studied using different techniques of data augmentation and ensemble and multilevel configurations. According to five leukemia classification scenarios, the analysis was performed using 3,536 images from 18 heterogeneous datasets. Three of these scenarios are binary classification problems: leukemia vs. HBS, ALL vs. HBS, and AML vs. HBS. The other two scenarios are multiclass classification problems: ALL vs. AML vs. HBS and ALL vs. AML vs. HBS vs. other types.
The main contributions of this article are the following: the identifications of the datasets that are widely used for leukemia classification, the introduction of five scenarios for the classification of different types of leukemia, the evaluation of the performance achieved by various CNN-based models on leukemia classification, the assessment of the impact of multiple data augmentation techniques on the classification performance, and assessment and comparison of the improvements achieved by multilevel and ensemble model configurations.
This article is organized as follows. Section 2 presents related work. Section 3 describes the used materials and methods, such as the used datasets, the employed techniques of data augmentation, the evaluated network architectures, the used ensemble and multilevel configurations, and, finally, the adopted evaluation metrics. Sections 4 Experiments and results, 5 Discussion present the achieved results and a comparison of them against the ones of previous works found in the literature, respectively. Finally, the conclusions and possibilities for future work are pointed out in Section 6.
Section snippets
Related work
This section presents studies that have been developed for leukemia detection. Taking into account the applied methodology, we identified traditional methods [40], [29], [14], [32], [26] and methods based on deep learning [43], [48], [27], [1]. Traditional methods comprise several steps, such as image pre-processing, segmentation, feature extraction, and classification. On the other hand, procedures based on deep learning usually apply CNNs. This kind of procedures aims to design and build a
Materials and methods
This study aimed to evaluate the influence of using data augmentation and combinations of CNNs on the detection of leukemia types in blood slide images. The identification of leukemia types in images is a challenging issue. Here, five leukemia classification problems were addressed, mainly three binary classification and two multiclass classification problems: 1) leukemia vs. HBS, 2) ALL vs. HBS, 3) AML vs. HBS, 4) ALL vs. AML vs. HBS, and 5) ALL vs. AML vs. HBS vs. other types. Public image
Experiments and results
The dataset used to study the five scenarios under evaluation is composed of the following images: 1,434 images of healthy slides, 881 images of ALL, 978 images of AML and 243 images of “other types” of leukemia. K-fold cross-validation with the value of k equal to 5 was applied in the evaluated experiments, which were performed on a PC with a 3.6 GHz Intel®Xeon™sprocessor with 24 GB of RAM and an NVIDIA TITAN XP 12 GB graphics card.
The influence of the use of data augmentation on the
Discussion
Table 11 allows a comparison among related state-of-the-art methods regarding the addressed classification problem, used number of datasets, used number of images and achieved accuracy.
The obtained results suggest that even using general-purpose CNNs, by choosing suitable techniques of data augmentation and a appropriate combination of CNNs, results that are competitive against the state-of-the-art methods can be achieved.
To make a more reliable comparison, Table 11 is organized according to
Conclusion
In this study, techniques that can be integrated into computer-aided diagnostic systems in order to detect different types of leukemia, mainly ALL, AML, and other types, in addition to healthy slides, were evaluated. Several experiments were We performed. First, tests were performed according to five scenarios and the effectiveness of using techniques of data augmentation was analyzed. Then, a comparison among techniques of data augmentation for the ALL vs. AML vs. HBS vs. other types
CRediT authorship contribution statement
Maila L. Claro: Methodology, Software, Investigation, Writing – original draft. Rodrigo M.S. de Veras: Supervision, Writing – original draft. Andre M. Santana: Supervision, Writing – review & editing. Luis Henrique S. Vogado: Software, Writing – review & editing. Geraldo Braz Junior: Writing – review & editing. Fatima N.S. de Medeiros: Writing – review & editing. Joao Manuel R.S. Tavares: Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This study was partially founded by the “Coordenação de Aperfeiçoamento de Pessoal de Nivel Superior” (CAPES) - Finance Code 001, “Fundação de Amparo a Pesquisa do Piaui ”(FAPEPI), and “Conselho Nacional de Desenvolvimento Cientifico e Tecnologico” (CNPQ), in Brazil. The authors gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used in this study.
References (50)
- et al.
Automated classification of acute leukemia on a heterogeneous dataset using machine learning and deep learning techniques
Biomed. Signal Process. Control
(2022) - et al.
A survey on image segmentation of blood and bone marrow smear images with emphasis to automated detection of leukemia
Biocybern. Biomed. Eng.
(2020) - et al.
An efficient deep convolutional neural network based detection and classification of acute lymphoblastic leukemia
Expert Syst. Appl.
(2021) - et al.
Sdct-auxnetθ: Dct augmented stain deconvolutional cnn with auxiliary classifier for cancer diagnosis
Med. Image Anal.
(2020) - et al.
Classification of acute leukemia using medical-knowledge-based morphology and cd marker
Biomed. Signal Process. Control
(2018) - et al.
HS, B., Virmani, J., Devgun, J.S. Computer assisted classification framework for prediction of acute lymphoblastic and acute myeloblastic leukemia
Biocybern. Biomed. Eng.
(2017) - et al.
Automatic recognition of five types of white blood cells in peripheral blood
Comput. Med. Imaging Graph.
(2011) - et al.
Leukemia diagnosis in blood slides using transfer learning in cnns and SVM for classification
Eng. Appl. Artif. Intell.
(2018) - et al.
Fast and robust segmentation of white blood cell images by self-supervised learning
Micron
(2018) - et al.
Identification of leukemia subtypes from microscopic images using convolutional neural network
Diagnostics
(2019)
Reconciling modern machine-learning practice and the classical bias–variance trade-off
Proc. Nat. Acad. Sci.
Iomt-based automated detection and classification of leukemia using deep learning
J. Healthcare Eng.
Pathologie-websites im world wide web
Der Pathologe
Xception: Deep learning with depthwise separable convolutions
Convolution neural network models for acute leukemia diagnosis
A composite classifier system design: Concepts and methodology
Proc. IEEE
Detection and classification of immature leukocytes for diagnosis of acute myeloid leukemia using random forest algorithm
Bioengineering
Classification of acute myelogenous leukemia in blood microscopic images using supervised classifier
Learning a no-reference quality assessment model of enhanced images with big data
IEEE Trans. Neural Networks Learn. Syst.
Classification of normal vs malignant cells in b-all white blood cancer microscopic images
Deep residual learning for image recognition
Densely connected convolutional networks
Intelligent medical iot-enabled automated microscopic image diagnosis of acute blood cancers
Sensors
Phase classification of chronic myeloid leukemia using convolution neural networks
Cited by (16)
CAD system for intelligent grading of COVID-19 severity with green computing and low carbon footprint analysis
2023, Expert Systems with Applicationsγ-polyglutamic acid fermentation monitoring with ATR-FTIR spectroscopy based on a shallow convolutional neural network combined with data augmentation and attention module
2023, Chemometrics and Intelligent Laboratory SystemsEfficient improvement of classification accuracy via selective test-time augmentation
2023, Information SciencesA supervised data augmentation strategy based on random combinations of key features
2023, Information SciencesODRNN: Optimized Deep Recurrent Neural Networks for Automatic Detection of Leukaemia
2024, Research SquareLeukemia Classification Using EfficientNetB5: A Deep Learning Approach
2024, Proceedings of the 2024 Conference of Young Researchers in Electrical and Electronic Engineering, ElCon 2024