Elsevier

Pattern Recognition Letters

Volume 82, Part 1, 15 October 2016, Pages 28-35
Pattern Recognition Letters

Ensembles of dense and dense sampling descriptors for the HEp-2 cells classification problem

https://doi.org/10.1016/j.patrec.2016.01.026Get rights and content

Highlights

Abstract

The classification of Human Epithelial (HEp-2) cells images, acquired through Indirect Immunofluorescence (IIF) microscopy, is an effective method to identify staining patterns in patient sera. Indeed it can be used for diagnostic purposes, in order to reveal autoimmune diseases. However, the automated classification of IIF HEp-2 cell patterns represents a challenging task, due to the large intra-class and the small inter-class variability. Consequently, recent HEp-2 cell classification contests have greatly spurred the development of new IIF image classification systems.

Here we propose an approach for the automatic classification of IIF HEp-2 cell images by fusion of several texture descriptors by ensemble of support vector machines combined by sum rule. Its effectiveness is evaluated using the HEp-2 cells dataset used for the “Performance Evaluation of Indirect Immunofluorescence Image Analysis Systems” contest, hosted by the International Conference on Pattern Recognition in 2014: the accuracy on the testing set is 79.85%.

The same dataset was used to test an ensemble of ternary-encoded local phase quantization descriptors, built by perturbation approaches: the accuracy on the training set is 84.16%. Finally, this ensemble was validated on 14 additional datasets, obtaining the best performance on 11 datasets.

Our MATLAB code is available at https://www.dei.unipd.it/node/2357.

Introduction

Indirect Immunofluorescence (IIF) is used to detect specific antibodies in the patient serum for the diagnosis of autoimmune diseases (ADs). These are caused by the abnormal activity of the immune system which attacks the body tissues [1]. Although ADs are considered relatively rare (e.g. compared to cardiac diseases), they show high mortality and morbidity and their etiology is still far from being fully understood. Over the years special emphasis has been given to the genetic and environmental factors [1] and important epidemiologic studies have been published about the prevalence of the most common ADs. In 1997, Jacobson and Gange [2] estimated a 3.2% prevalence in the US, focused in a subset of 24 ADs. In 2007, Eaton et al. [3] estimated a prevalence of 5.4% in a subset of 31 ADs, based on the Danish National Patient Register. This study has been updated by Cooper et al. in 2009 [4], reporting a higher prevalence ranging from 7.6% to 9.4%. Moreover, ADs show interesting demographic patterns. The research of Cooper and Stroehla [5] underlines how, in the US, incidence of ADs is higher in women than in men, e.g. 85% of ADs patients, such as systemic lupus erythematosus, Sjögren's syndrome or scleroderma are women. For other ADs, the prevalence ratio drops to 60–75%. Racial factors also influence in the prevalence of ADs illustrated in the US by blacks showing a higher risk for systemic lupus erythematosus and scleroderma, while whites a higher risk of multiple sclerosis compared to blacks and Asians.

The primary test for the evaluation of ADs is the Antinuclear Antibody (ANA) test, which has been reported particularly effective in the diagnosis of many ADs diseases [6]. The gold standard procedure in ANA is the IIF assays [7], which consists in using two antibodies, a primary naked one, which binds to the target antigen, and a secondary fluorescent antibody that binds to the primary one. A cellular substrate, e.g. a monolayer of HEp-2 cells, is used to incubate the patient serum allowing the ANAs to bind to the nuclei of the cells. The HEp-2 substrate allows the expression of many antigens to whom the ANAs can bind and, once the primary-secondary complex binds to the ANAs, several staining patterns can be produced. As reported by Lane and Gravel [8], staining patterns are specific for one or few diseases, e.g. homogeneous for systemic lupus erythematosus, speckled for Sjögren's syndrome and mixed connective tissue disease, or nucleolar for scleroderma. The fluorescent samples are examined by means of a fluorescence microscope for the assessment of (i) the fluorescence intensity with respect to positive and negative controls and (ii) the staining patterns in the positive samples. Especially this second task is the most challenging, also for expert physicians, and may lead to high intra- and inter-operator variability. Bizzaro et al. [9] reported an inter-lab consensus of 92.6% for fluorescence intensity, but only 76% for fluorescence pattern classification. A computer-aided diagnosis (CAD) system is able to minimize these limitations and speed up the sample screening.

The feasibility and the interest that such topic aroused in the scientific community is proven by the high number of publications in the field and by the annual contest in HEp-2 cell IIF images classification hosted in 2012 and 2014 by the International Conference on Pattern Recognition (ICPR) and, in 2013, by the International Conference on Image Processing (ICIP). Foggia et al. [10] summarized the state of the art on HEp-2 cell IIF images classification. In general, classical feature sets, such as morphological measurements (e.g. number and circularity of connected regions, size of the cells, properties related to the holes inside the cells, etc.), texture descriptors (e.g. Haralick features from the grey level co-occurrence matrix and variations of local binary pattern (LBP) descriptor) were used. Approaches specific for this dataset were also developed. Stoklasa et al. [11] proposed a granularity-based descriptor which computes as features the distribution of grains in the image through a series of morphological openings. In the same paper, a specific implementation of the surface descriptor is provided that computes the statistical properties of the image, considered as a topographic surface made of valleys and hills. In Wiliem et al. [12], the cell pyramid matching framework was tailored to the HEp-2 problem. It is a region-based approach, which pools local histograms of visual words into three histograms associated to (i) the whole cell region, (ii) the inner region and (iii) the outer region, thus exploiting also the spatial information. Moreover, Foggia et al. [10] report some strategies for augmenting the training set, such as image rotation [13] or spontaneous activity patterns [14]. For the ICPR 2014 contest, two different tasks were assigned: Task 1 for the cell level classification and Task 2 for the specimen level classification. It is worth noting that many methods, e.g. [15], [16], [17], increased the amount of training samples by patch extraction, flipping and rotation. The best approach for Task 1 was reported by Manivannan et al. [15] by means of an ensemble of support vector machines (SVMs) trained with multi-resolution local patterns, scale-invariant feature transform, random projections and intensity histograms combined with original image rotations, dense patch extraction and a bag-of-words-based feature encoding. Gao et al., in 2014, [16] exploited convolutional neural networks together with image rotation to increase the number of training samples. Also Codrescu [17] augmented the available dataset by image rotation and classified it through an extended version of the finite impulse response multilayer. The same two tasks were proposed for this special issue, and in this paper we focus on Task 1 only, which consists in the classification of the six staining patterns (homogeneous, speckled, nucleolar, centromere, Golgi and nuclear membrane) in pre-segmented single cell images (dataset details in Section 4.1). During the ICPR 2014 contest [18] we used four descriptors based on local binary pattern [19], namely pyramid LBP [20], local configuration pattern [21], rotation invariant co-occurrence among adjacent LBP [13], extended LBP [22] and also the Strandmark morphological features [23] with an ensemble of SVMs as classifier [24], [25] (accuracy on testing set 78.27%).

Here we propose an improved version of our ICPR 2014 approach, based on the following ideas:

  • Since IIF images present large intra-class and small inter-class variability, different dense and dense sampling descriptors can be useful to represent images. We fuse several texture descriptors having different characteristics.

  • The Bag of Features (BoF) technique has recently emerged as one of the most powerful methods for image representation. In this work, we propose to design ensembles of codebooks for BoF using different strategies for codebook building. Each different texton vocabulary is used to train a SVM classifier. The final ensemble is the fusion by sum rule of all the trained SVMs.

  • Ensembles, built through perturbation of the parameters of a texture descriptor, improve the classification performance of the single stand-alone descriptor. In this work we test and validate different methods for building ensembles of different descriptors. The most interesting result is the performance, validated in 15 datasets, of an ensemble of local phase quantization descriptors based on a ternary coding.

Section snippets

Texture descriptors

In this section we review the ``dense’’ (Sections 2.1.1–2.1.8) and ``dense sampling’’ (Sections 2.1.9 and 2.1.10) descriptors used in this work. ``Dense’’ means feature extraction on the whole image (or on a whole region), in contrast with ``dense sampling’’, which denotes extraction only at specific points.

The proposed approach

In Fig. 1, a scheme of the proposed approach is shown. Each input image is first processed by two image enhancing methods in order to improve its quality. Afterwards, several poses are generated in order to increase the size of the training set. For BoF, an additional sub-windowing step is required. The feature extraction step is performed on the original enhanced image and in all the artificial poses, in order to obtain several feature vectors. Each vector is classified by a different SVM and

Dataset description

The HEp-2 dataset used for Task 1 of the ICPR 2014 contest was collected at the SNP lab (http://mivia.unisa.it/iif2014/) [40], [41]. From 419 patient positive sera, 68429 single cell images were acquired. 13596 images are publicly available as training set together with their segmentation masks, and the remaining 54,833 images are used as testing set. The dataset includes six classes: (i) homogeneous (Homo); (ii) speckled (Spe); (iii) nucleolar (Nu); (iv) centromere (Cen); (v) Golgi (Go) and

Conclusions

In this paper we deal with the problem of automatic classification of IIF HEp-2 cell image by fusion of several descriptors and perturbation approaches. Our main contributions are:

  • For handling large intra-class and small inter-class variability, an ensemble of texture descriptors is used to represent images.

  • An ensemble of ternary-encoded LPQ proved to be one of the best approaches on the HEp-2 datasets and on 14 additional datasets.

As future work, we want to improve our system in order to deal

References (65)

  • X. Qian et al.

    PLBP: An effective local binary patterns texture descriptor with pyramid representation

    Pattern Recognit.

    (2011)
  • L. Liu et al.

    Extended local binary patterns for texture classification

    Image Vis. Comput.

    (2012)
  • R. Das et al.

    Evaluation of ensemble methods for diagnosing of valvular heart disease

    Expert Syst. Appl.

    (2010)
  • R. Das et al.

    Diagnosis of valvular heart disease through neural networks ensembles

    Comput. Methods Programs Biomed.

    (2009)
  • M. Paci et al.

    An ensemble of classifiers based on different texture descriptors for texture classification

    J. King Saud Univ. Sci.

    (2013)
  • F.L.C. dos Santos et al.

    Computer vision for virus image classification

    Biosyst. Eng.

    (2015)
  • P. Hobson et al.

    Benchmarking human epithelial type 2 interphase cells classification methods on a very large dataset

    Artif. Intell. Med.

    (2015)
  • G.V. Ponomarev et al.

    ANA HEp-2 cells image classification using number, size, shape and localization of targeted cell regions

    Pattern Recognit.

    (2014)
  • F. Yuan

    Video-based smoke detection with histogram sequence of LBP and LBPV pyramids

    Fire Saf. J.

    (2011)
  • A. Cruz-Roa et al.

    Visual pattern mining in histology image collections using bag of features

    Artif. Intell. Med.

    (2011)
  • G. Braz Junior et al.

    Classification of breast tissues using Moran's index and Geary's coefficient as texture signatures and SVM

    Comput. Biol. Med.

    (2009)
  • L. Nanni et al.

    Protein classification using texture descriptors extracted from the protein backbone image

    J. Theor. Biol.

    (2010)
  • C.G. Parks et al.

    Expert panel workshop consensus statement on the role of the environment in the development of autoimmune disease

    Int. J. Mol. Sci.

    (2014)
  • D. Solomon et al.

    Evidence-based guidelines for the use of immunologic tests: Antinuclear antibody testing

    Arthritis Rheumatol.

    (2002)
  • P. Meroni et al.

    ANA screening: An old test with new recommendations

    Ann. Rheum. Dis.

    (2010)
  • S.K. Lane et al.

    Clinical utility of common serum rheumatologic tests

    Am. Fam. Physician

    (2002)
  • S. Manivannan et al.

    HEp-2 cell Classification using multi-resolution local patterns and ensemble SVMs

  • Z. Gao et al.

    HEp-2 cell image classification with convolutional neural networks

  • C. Codrescu

    Quadratic recurrent finite impulse response MLP for indirect immunofluorescence image recognition

  • L. Nanni et al.

    Morphological and Texture Features for HEp-2 Cells Classification

  • T. Ojala et al.

    Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2002)
  • Y. Guo et al.

    Texture classification using a linear configuration model based descriptor

  • Cited by (0)

    This paper has been recommended for acceptance by Mario Vento

    View full text