Evaluating Out-of-the-Box Methods for the Classification of Hematopoietic Cells in Images of Stained Bone Marrow

Gräbel, Philipp; Crysandt, Martina; Herwartz, Reinhild; Hoffmann, Melanie; Klinkhammer, Barbara M.; Boor, Peter; Brümmendorf, Tim H.; Merhof, Dorit

doi:10.1007/978-3-030-00949-6_10

Philipp Gräbel²⁸,
Martina Crysandt²⁹,
Reinhild Herwartz²⁹,
Melanie Hoffmann²⁹,
Barbara M. Klinkhammer³⁰,
Peter Boor³⁰,
Tim H. Brümmendorf²⁹ &
…
Dorit Merhof²⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11039))

Included in the following conference series:

2013 Accesses
8 Citations

Abstract

Compared to the analysis of blood cells in microscope images of peripheral blood, bone marrow images are much more challenging for automated cell classification: not only are the cells more densely distributed, there are also significantly more types of hematopoietic cells. So far, several attempts have been made using custom image features and prior knowledge in form of cytoplasm and nuclei segmentations or a restricted number of cell types in peripheral blood. Instead of hand-crafting features and classification methods for bone marrow images, we compare several well-known methods on our more challenging dataset and we show that while generic classical machine learning approaches cannot compete with specialized algorithms, even out-of-the-box deep learning methods already yield valuable results. Our findings indicate that automated analysis of bone marrow images becomes possible with the advent of convolutional neural networks.

You have full access to this open access chapter, Download conference paper PDF

Abstract: A Database and Neural Network for Highly Accurate Classification of Single Bone Marrow Cells

High-accuracy morphological identification of bone marrow cells using deep learning-based Morphogo system

Article Open access 17 August 2023

Automated recognition of white blood cells using deep learning

Article 31 July 2020

Keywords

1 Introduction

Automated classification of blood cells made significant progress in recent years: novel methods and applications provide clinical experts with tools that yield quantifiable results in short time [1, 2]. Part of this is the analysis of white blood cells (leukocytes) found in peripheral blood. Since these are relatively sparse and show great contrast to background and red blood cells (erythrocytes), segmentation is straight forward. However, as only mature cells are present in healthy peripheral blood, classification of those is less challenging compared to bone marrow, where more cell types in various degrees of maturation are present. This makes distinguishing between those cell types a challenging task even for humans.

As the distribution of cell-classes reflects the hematopoietic process of bone marrow and is severely affected by diseases such as leukemia, a tool for reliable, automated classification and counting would be a great support for medical experts. Such an application could provide impartial and reproducible results as a basis for diagnostic decisions. Even today, the tedious task of counting cells is still performed manually by medical experts.

Previous research on the classification of hematopoietic cells [1] suggested a pipeline for the classification of white blood cells in blood smear images. The authors made use of morphological features, which they can extract based on a segmentation of cell cytoplasm and nucleus. Due to the higher density of cells, this approach is not feasible with bone marrow images, as morphological operations for segmentation require a better separation between individual cells. Other specialized features based on automatically segmented regions described by shape, color and texture yielded a good separation of hematopoietic cells [3].

In a more recent work, convolutional neural networks have been proposed for the classification of hematopoietic cells [2]. They used images obtained from blood smears, which make it possible to conveniently detect and segment the white cell lineage. Based on texture features, three classes of cells were separated with a Support Vector Machine, followed by a classification of three other cell types using features from a neural network and a Random Forest Classifier. This method achieves good results in blood smear images.

As opposed to research on the analysis of white blood cells in blood smear images, research on the classification of hematopoietic cells in bone marrow is rare and often only possible under certain constraints or simplifications: for example, manually segmented images of cropped cells from different maturity stages of the granulopoiesis are given as prior knowledge [4]. Based on the segmentation of nucleus and cytoplasm, it is possible to craft specialized features that make classification possible.

To the best of our knowledge, classification has not been researched in detail for a wider range of hematopoietic cells in bone marrow images without prior knowledge such as manual segmentations. In this paper, we compare and evaluate classical machine learning pipelines with feature extraction and classification for their suitability to perform this challenging task. Classification is performed on images with a centered hematopoietic cell each – without providing anyfurther details such as segmentation of cytoplasm and nucleus or removal of neighbouring cells. Next to classical machine learning, we also consider modern classification methods based on convolutional neural networks.

2 Dataset

Bone marrow samples are acquired and processed by the Department of Hematology, Oncology, Hemostaseology and Stem Cell Transplantation of the University Hospital, RWTH Aachen University. By applying Pappenheim staining, cell nuclei and different types of cytoplasm can be more easily distinguished. The samples were scanned at the Institute of Pathology of the University Hospital, immersed in oil with \(63{\times }\) magnification. Medical experts of the Department of Hematology annotated representative excerpts and assigned a label to every cell and cell-like artifact.

For the following evaluations, we used patches of size \(200 \times 200\) pixels containing bone marrow hematopoietic cells. While most cells fit completely into a patch of this size, some cell types – e.g., megakaryocytes – are slightly cropped. Neighboring cells are often present as well.

Our dataset comprises \(2478\) patches with leukocytes separated into the classes denoted in Fig. 1. Additionally, we have 3331 cell-like artifacts and non-leukocytes.

3 Methods

This paper provides a thorough analysis of various algorithms to classify hematopoietic cells in microscope images of stained bone marrow, which are briefly presented in the following paragraphs.

3.1 Classical Machine Learning Algorithms

Classical approaches comprise a feature extraction and a classification stage. The feature extractor provides an abstract representation of the image that the classifier uses to predict its class.

Feature Extraction. In a preliminary evaluation on a smaller subset, we compared several broadly applicable and openly available feature extractors, namely Dense SIFT, HOG, LBP and Daisy. Since we perform classification purely on image data without prior knowledge of segmentations of nucleus and cytoplasm, shape and contour features were not applicable. To have a manageable number of tests, we focused on Histogram of Oriented Gradients (HOG) [5] and Local Binary Pattern (LBP) [6] as representative, well-performing features for the final evaluation presented in this work. The Histogram of Oriented Gradients feature is computed for every color channel with 50 pixels per cell and 4 cells per block. We also applied the same descriptor to an image transformed into polar coordinates (SHOG). Local Binary Pattern is computed on every color channel as well, using the uniform variant, which is rotation and grayscale invariant.

In an attempt to combine classical and deep machine learning approaches, it is also possible to use the output of a convolutional neural network as a feature vector [7]. To this end, we extracted features from the output of the convolutional part of a ResNet18 and a ResNet152 (cf. Sect. 3.2) pretrained on ImageNet. Every feature descriptor was normalized to zero-mean and unit-variance prior to training.

Classification. Based on feature descriptors, a classifier is able to separate two or more classes. Each of the features presented in Sect. 3.1 is fitted using a Linear Support Vector Machine (SVM), [8] a Radial Basis Function Support Vector Machine (RBF) [9], a Random Forest Classifier (RFC) [10], and the AdaBoost Classifier (ABC) [11]. In addition to annotated training data, those algorithms require the user to define several hyper-parameters such as a kernel and a penalty parameter for SVMs or the number of decision trees in a Random Forest.

3.2 Deep Neural Networks

Neural networks with a large number of parameters have become a powerful tool for classification tasks for all kinds of medical image processing and analysis tasks [7]. A deep neural network in its simplest form has several convolutional layers (the output of which can be used as a feature vector as mentioned above) followed by a few fully connected layers (serving as the classification part of the network). A popular model for image classification tasks is the ResNet, a deep residual network [12]. It is common practice to initialize the weights of these networks based on a pre-training on a larger image database such as ImageNet. This not only drastically reduces the amount of training data necessary for successful network training, it also leads to quicker convergence.

For our evaluation, we chose two variants of ResNet: ResNet18 and ResNet152. As with the training of classical classifiers, a set of hyper-parameters needs to be fine-tuned to achieve high precision. These include, for example, learning rate and data batch size.

We used an Adam optimizer for training these networks, as it showed superior performance compared to Stochastic Gradient Descent (SGD) in preliminary tests. Furthermore, we turn one crossvalidation fold of the training set into a validation set to use the validation error as an early stopping criterion.

4 Experimental Setup

Each classifier is evaluated in several experiments: binary classification between white blood cells and other cells (experiment named leukocytes), classification of the grade of maturity of neutrophilic cells (ignoring other cell types, experiment named neutrophilic cells), and classification between all mentioned cell types (ignoring non-leukocytes, experiment named all).

For each test, we performed a three-fold crossvalidation to obtain the overall results. In every fold, we performed hyper-parameter optimization with six-fold crossvalidation as an inner loop. We also performed very basic data augmentation by using random rotation and a slight translation of up to 10 pixels.

Hyper-parameter Optimization. Every classifier was optimized in terms of its hyper-parameters by using random search over an appropriate range of parameter values. While being able to test \(150\) hyper-parameter sets with classical models, time constraints only allowed the evaluation of \(10\) sets with deep learning approaches.

We optimized the linear SVM with respect to the penalty term and the class weights and the radial SVM additionally with respect to the gamma term. For Random Forest we considered the number of estimators, class weights, and maximum depth, and for AdaBoost the number of estimators and learning rate. All of these hyper-parameters were randomly chosen and not algorithmically optimized. We selected these particular sets of parameters for optimization due to their significant impact on training process and classifier quality.

m-score. A significant class-imbalance renders the commonly used accuracy-score infeasible, as smaller classes can be completely misclassified while still having a high overall accuracy. To mitigate this, we used the sum of the logarithmic per-class F1-scores, called m-score in the remainder of this paper, as a measure for the selection of the best performing hyper-parameter set: \(m = \sum _{n}^{N} \log {(\text {f1}_n)}\) with \(N\) classes and \(\text {f1}_n\) the F1-score of class \(n\). To prevent negative infinity if at least one class has an F1-score of zero, we capped the individual F1-scores at \(0.01\).

Implementation. The project is implemented in Python, using Scikit-Learn for the classical pipeline and Pytorch for the deep neural networks. Both pretrained models were obtained from the Torchvision package. Model training was performed on a single GeForce GTX 1080 Ti GPU each.

5 Results

The results are shown in Fig. 2: at a first glance it is obvious that all features, classifiers and networks achieve the best results in the binary classification task of distinguishing between leukocytes and cell-like artifacts. Particularly the commonly used image features HOG and LBP reach acceptable scores. Using outputs of the ResNet’s convolutional parts did not perform equally well. Training a deep neural network directly, however, yielded excellent results.

Likewise, neural networks performed significantly better than classical methods in both the neutrophilic cells and the all experiment. They did not only achieve higher scores, the individual class scores were also more closely together. Both experiments are demanding for general purpose descriptors and standard classification algorithms: in almost every single test, at least one class – usually one of the underrepresented classes – is completely misclassified. Neural networks, however, still provide valuable results.

The confusion matrix corresponding to classification of neutrophilic cells with a ResNet18 shows that most of the errors are just one class off when considering the ongoing maturation process.

6 Discussion

The lower performance of ResNet features combined with classical classifiers might be explained by the fact that it was pre-trained on ImageNet – a database of images very different from microscope images.

Considering the larger impact of feature descriptors versus the choice of the classification algorithm, one advantage of deep learning becomes visible: not only the classifier is adapted to the specific problem, but also the part of the neural network that corresponds to the feature extraction. This is in line with previous work, where researchers hand-crafted more descriptive features – a task that is inherently covered by neural networks. This is also supported by the confusion matrix, which indicates that the residual network learned significant, descriptive features for this special use-case – even though no prior information about the order of cell classes was available during training.

It can also be inferred from the results that neural networks are better suited for the classification of multiple classes: they still achieved superior results in these cases while compared to classical machine learning methods. Already the more shallow ResNet18 yielded sufficient results without requiring further layers.

We are aware that better performance could be achieved with manual fine-tuning and hand-crafted features, as several other researchers have successfully shown at least for peripheral blood images. However, the same can be said for deep learning approaches: custom loss-functions, smarter data-augmentation, dedicated network architectures, additional training time, and optimisation of hyper-parameters can surely further increase performance.

Furthermore, this study did not evaluate the amount human error in ground truth annotations. Further research about the inter- and intra-rater reliability is needed.

7 Conclusion and Outlook

In this paper, we provide a comparison of several common approaches for the classification of hematopoietic cells. We considered a wide range of white cells in different stages of maturity in the bone marrow, which is a very challenging scenario. We showed that while most classical machine learning algorithms perform poorly, deep neural networks yield promising results.

Particularly the fact that they perform better in multi-class classification is important for the automated analysis of bone marrow smear images, which show an increased number of cell types. It stands to reason that more specialized architectures will be able to yield results accurate enough for medical analysis based on images of bone marrow instead of peripheral blood smears. This paper provides a baseline evaluation for yet to come improvements to neural networks.

References

Piuri, V., Scotti, F.: Morphological classification of blood leucocytes by microscope images. In: IEEE International Conference on Computational Intelligence for Measurement Systems and Applications, CIMSA 2004, pp. 103–108. IEEE (2004)
Google Scholar
Zhao, J., Zhang, M., Zhou, Z., Chu, J., Cao, F.: Automatic detection and classification of leukocytes using convolutional neural networks. Med. Biol. Eng. Comput. 55(8), 1287–1301 (2017)
Article Google Scholar
Putzu, L., Caocci, G., Di Ruberto, C.: Leucocyte classification for leukaemia detection using image processing techniques. Artif. Intell. Med. 62(3), 179–191 (2014)
Article Google Scholar
Theera-Umpon, N., Dhompongsa, S.: Morphological granulometric features of nucleus in automatic bone marrow white blood cell classification. IEEE Trans. Inf. Technol. Biomed. 11(3), 353–359 (2007)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recogn. 1, 886–893 (2005)
Google Scholar
Ojala, T., Pietikainen, M., Harwood, D.: Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 1, pp. 582–585. IEEE (1994)
Google Scholar
Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(Aug), 1871–1874 (2008)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Hastie, T., Rosset, S., Zhu, J., Zou, H.: Multi-class AdaBoost. Stat. Interface 2(3), 349–360 (2009)
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar

Download references

Acknowledgement

This work was supported by the German Research Foundation (DFG) under grant no. ME3737/3-1.

Author information

Authors and Affiliations

Institute of Imaging and Computer Vision, RWTH Aachen University, Kopernikusstraße 16, 52074, Aachen, Germany
Philipp Gräbel & Dorit Merhof
Department of Hematology, Oncology, Hemostaseology and Stem Cell Transplantation, University Hospital RWTH Aachen University, Pauwelsstraße 30, 52074, Aachen, Germany
Martina Crysandt, Reinhild Herwartz, Melanie Hoffmann & Tim H. Brümmendorf
Institute of Pathology, University Hospital RWTH Aachen University, Pauwelsstraße 30, 52074, Aachen, Germany
Barbara M. Klinkhammer & Peter Boor

Authors

Philipp Gräbel
View author publications
You can also search for this author in PubMed Google Scholar
Martina Crysandt
View author publications
You can also search for this author in PubMed Google Scholar
Reinhild Herwartz
View author publications
You can also search for this author in PubMed Google Scholar
Melanie Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar
Barbara M. Klinkhammer
View author publications
You can also search for this author in PubMed Google Scholar
Peter Boor
View author publications
You can also search for this author in PubMed Google Scholar
Tim H. Brümmendorf
View author publications
You can also search for this author in PubMed Google Scholar
Dorit Merhof
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philipp Gräbel .

Editor information

Editors and Affiliations

University College London, London, UK
Danail Stoyanov
University of Leeds, Leeds, UK
Zeike Taylor
Radboud University Medical Center, Nijmegen, The Netherlands
Francesco Ciompi
Baidu, Beijing, China
Yanwu Xu
Sunnybrook Health Science Centre, Toronto, ON, Canada
Anne Martel
Deutsches Krebsforschungszentrum (DKFZ), Heidelberg, Germany
Lena Maier-Hein
University of Warwick, Coventry, UK
Nasir Rajpoot
Radboud University Medical Centre, Nijmegen, The Netherlands
Jeroen van der Laak
Eindhoven University of Technology, Eindhoven, The Netherlands
Mitko Veta
University of Dundee, Dundee, UK
Stephen McKenna
University Hospital Coventry, Coventry, UK
David Snead
University of Dundee, Dundee, UK
Emanuele Trucco
University of Iowa, Iowa City, IA, USA
Mona K. Garvin
Soochow University, Suzhou, China
Xin Jan Chen
Medical University of Vienna, Vienna, Austria
Hrvoje Bogunovic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gräbel, P. et al. (2018). Evaluating Out-of-the-Box Methods for the Classification of Hematopoietic Cells in Images of Stained Bone Marrow. In: Stoyanov, D., et al. Computational Pathology and Ophthalmic Medical Image Analysis. OMIA COMPAY 2018 2018. Lecture Notes in Computer Science(), vol 11039. Springer, Cham. https://doi.org/10.1007/978-3-030-00949-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-00949-6_10
Published: 14 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00948-9
Online ISBN: 978-3-030-00949-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluating Out-of-the-Box Methods for the Classification of Hematopoietic Cells in Images of Stained Bone Marrow

Abstract

Similar content being viewed by others

Abstract: A Database and Neural Network for Highly Accurate Classification of Single Bone Marrow Cells

High-accuracy morphological identification of bone marrow cells using deep learning-based Morphogo system

Automated recognition of white blood cells using deep learning

Keywords

1 Introduction

2 Dataset

3 Methods

3.1 Classical Machine Learning Algorithms

3.2 Deep Neural Networks

4 Experimental Setup

5 Results

6 Discussion

7 Conclusion and Outlook

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Evaluating Out-of-the-Box Methods for the Classification of Hematopoietic Cells in Images of Stained Bone Marrow

Abstract

Similar content being viewed by others

Abstract: A Database and Neural Network for Highly Accurate Classification of Single Bone Marrow Cells

High-accuracy morphological identification of bone marrow cells using deep learning-based Morphogo system

Automated recognition of white blood cells using deep learning

Keywords

1 Introduction

2 Dataset

3 Methods

3.1 Classical Machine Learning Algorithms

3.2 Deep Neural Networks

4 Experimental Setup

5 Results

6 Discussion

7 Conclusion and Outlook

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation