Ensembles of dense and dense sampling descriptors for the HEp-2 cells classification problem☆
Introduction
Indirect Immunofluorescence (IIF) is used to detect specific antibodies in the patient serum for the diagnosis of autoimmune diseases (ADs). These are caused by the abnormal activity of the immune system which attacks the body tissues [1]. Although ADs are considered relatively rare (e.g. compared to cardiac diseases), they show high mortality and morbidity and their etiology is still far from being fully understood. Over the years special emphasis has been given to the genetic and environmental factors [1] and important epidemiologic studies have been published about the prevalence of the most common ADs. In 1997, Jacobson and Gange [2] estimated a 3.2% prevalence in the US, focused in a subset of 24 ADs. In 2007, Eaton et al. [3] estimated a prevalence of 5.4% in a subset of 31 ADs, based on the Danish National Patient Register. This study has been updated by Cooper et al. in 2009 [4], reporting a higher prevalence ranging from 7.6% to 9.4%. Moreover, ADs show interesting demographic patterns. The research of Cooper and Stroehla [5] underlines how, in the US, incidence of ADs is higher in women than in men, e.g. 85% of ADs patients, such as systemic lupus erythematosus, Sjögren's syndrome or scleroderma are women. For other ADs, the prevalence ratio drops to 60–75%. Racial factors also influence in the prevalence of ADs illustrated in the US by blacks showing a higher risk for systemic lupus erythematosus and scleroderma, while whites a higher risk of multiple sclerosis compared to blacks and Asians.
The primary test for the evaluation of ADs is the Antinuclear Antibody (ANA) test, which has been reported particularly effective in the diagnosis of many ADs diseases [6]. The gold standard procedure in ANA is the IIF assays [7], which consists in using two antibodies, a primary naked one, which binds to the target antigen, and a secondary fluorescent antibody that binds to the primary one. A cellular substrate, e.g. a monolayer of HEp-2 cells, is used to incubate the patient serum allowing the ANAs to bind to the nuclei of the cells. The HEp-2 substrate allows the expression of many antigens to whom the ANAs can bind and, once the primary-secondary complex binds to the ANAs, several staining patterns can be produced. As reported by Lane and Gravel [8], staining patterns are specific for one or few diseases, e.g. homogeneous for systemic lupus erythematosus, speckled for Sjögren's syndrome and mixed connective tissue disease, or nucleolar for scleroderma. The fluorescent samples are examined by means of a fluorescence microscope for the assessment of (i) the fluorescence intensity with respect to positive and negative controls and (ii) the staining patterns in the positive samples. Especially this second task is the most challenging, also for expert physicians, and may lead to high intra- and inter-operator variability. Bizzaro et al. [9] reported an inter-lab consensus of 92.6% for fluorescence intensity, but only 76% for fluorescence pattern classification. A computer-aided diagnosis (CAD) system is able to minimize these limitations and speed up the sample screening.
The feasibility and the interest that such topic aroused in the scientific community is proven by the high number of publications in the field and by the annual contest in HEp-2 cell IIF images classification hosted in 2012 and 2014 by the International Conference on Pattern Recognition (ICPR) and, in 2013, by the International Conference on Image Processing (ICIP). Foggia et al. [10] summarized the state of the art on HEp-2 cell IIF images classification. In general, classical feature sets, such as morphological measurements (e.g. number and circularity of connected regions, size of the cells, properties related to the holes inside the cells, etc.), texture descriptors (e.g. Haralick features from the grey level co-occurrence matrix and variations of local binary pattern (LBP) descriptor) were used. Approaches specific for this dataset were also developed. Stoklasa et al. [11] proposed a granularity-based descriptor which computes as features the distribution of grains in the image through a series of morphological openings. In the same paper, a specific implementation of the surface descriptor is provided that computes the statistical properties of the image, considered as a topographic surface made of valleys and hills. In Wiliem et al. [12], the cell pyramid matching framework was tailored to the HEp-2 problem. It is a region-based approach, which pools local histograms of visual words into three histograms associated to (i) the whole cell region, (ii) the inner region and (iii) the outer region, thus exploiting also the spatial information. Moreover, Foggia et al. [10] report some strategies for augmenting the training set, such as image rotation [13] or spontaneous activity patterns [14]. For the ICPR 2014 contest, two different tasks were assigned: Task 1 for the cell level classification and Task 2 for the specimen level classification. It is worth noting that many methods, e.g. [15], [16], [17], increased the amount of training samples by patch extraction, flipping and rotation. The best approach for Task 1 was reported by Manivannan et al. [15] by means of an ensemble of support vector machines (SVMs) trained with multi-resolution local patterns, scale-invariant feature transform, random projections and intensity histograms combined with original image rotations, dense patch extraction and a bag-of-words-based feature encoding. Gao et al., in 2014, [16] exploited convolutional neural networks together with image rotation to increase the number of training samples. Also Codrescu [17] augmented the available dataset by image rotation and classified it through an extended version of the finite impulse response multilayer. The same two tasks were proposed for this special issue, and in this paper we focus on Task 1 only, which consists in the classification of the six staining patterns (homogeneous, speckled, nucleolar, centromere, Golgi and nuclear membrane) in pre-segmented single cell images (dataset details in Section 4.1). During the ICPR 2014 contest [18] we used four descriptors based on local binary pattern [19], namely pyramid LBP [20], local configuration pattern [21], rotation invariant co-occurrence among adjacent LBP [13], extended LBP [22] and also the Strandmark morphological features [23] with an ensemble of SVMs as classifier [24], [25] (accuracy on testing set 78.27%).
Here we propose an improved version of our ICPR 2014 approach, based on the following ideas:
- •
Since IIF images present large intra-class and small inter-class variability, different dense and dense sampling descriptors can be useful to represent images. We fuse several texture descriptors having different characteristics.
- •
The Bag of Features (BoF) technique has recently emerged as one of the most powerful methods for image representation. In this work, we propose to design ensembles of codebooks for BoF using different strategies for codebook building. Each different texton vocabulary is used to train a SVM classifier. The final ensemble is the fusion by sum rule of all the trained SVMs.
- •
Ensembles, built through perturbation of the parameters of a texture descriptor, improve the classification performance of the single stand-alone descriptor. In this work we test and validate different methods for building ensembles of different descriptors. The most interesting result is the performance, validated in 15 datasets, of an ensemble of local phase quantization descriptors based on a ternary coding.
Section snippets
Texture descriptors
In this section we review the ``dense’’ (Sections 2.1.1–2.1.8) and ``dense sampling’’ (Sections 2.1.9 and 2.1.10) descriptors used in this work. ``Dense’’ means feature extraction on the whole image (or on a whole region), in contrast with ``dense sampling’’, which denotes extraction only at specific points.
The proposed approach
In Fig. 1, a scheme of the proposed approach is shown. Each input image is first processed by two image enhancing methods in order to improve its quality. Afterwards, several poses are generated in order to increase the size of the training set. For BoF, an additional sub-windowing step is required. The feature extraction step is performed on the original enhanced image and in all the artificial poses, in order to obtain several feature vectors. Each vector is classified by a different SVM and
Dataset description
The HEp-2 dataset used for Task 1 of the ICPR 2014 contest was collected at the SNP lab (http://mivia.unisa.it/iif2014/) [40], [41]. From 419 patient positive sera, 68429 single cell images were acquired. 13596 images are publicly available as training set together with their segmentation masks, and the remaining 54,833 images are used as testing set. The dataset includes six classes: (i) homogeneous (Homo); (ii) speckled (Spe); (iii) nucleolar (Nu); (iv) centromere (Cen); (v) Golgi (Go) and
Conclusions
In this paper we deal with the problem of automatic classification of IIF HEp-2 cell image by fusion of several descriptors and perturbation approaches. Our main contributions are:
- •
For handling large intra-class and small inter-class variability, an ensemble of texture descriptors is used to represent images.
- •
An ensemble of ternary-encoded LPQ proved to be one of the best approaches on the HEp-2 datasets and on 14 additional datasets.
As future work, we want to improve our system in order to deal
References (65)
- et al.
Epidemiology and estimated population burden of selected autoimmune diseases in the United States
Clin. Immunol. Immunopathol.
(1997) - et al.
Epidemiology of autoimmune diseases in Denmark
J. Autoimmun.
(2007) - et al.
Recent insights in the epidemiology of autoimmune diseases: improved prevalence estimates and understanding of clustering of diseases
J. Autoimmun.
(2009) - et al.
The epidemiology of autoimmune diseases
Autoimmun. Rev.
(2003) - et al.
Variability between methods to determine ANA, anti-dsDNA and anti-ENA autoantibodies: A collaborative study with the biomedical industry
J. Immunol. Methods
(1998) - et al.
Pattern recognition in stained HEp-2 cells: Where are we now?
Pattern Recognit.
(2014) - et al.
Efficient k-NN based HEp-2 cells classifier
Pattern Recognit.
(2014) - et al.
Automatic classification of human epithelial type 2 cell indirect immunofluorescence images using cell pyramid matching
Pattern Recognit.
(2014) - et al.
HEp-2 cell classification using rotation invariant co-occurrence among local binary patterns
Pattern Recognit.
(2014) - et al.
Visual learning and classification of human epithelial type 2 cell images through spontaneous activity patterns
Pattern Recognit.
(2014)
PLBP: An effective local binary patterns texture descriptor with pyramid representation
Pattern Recognit.
Extended local binary patterns for texture classification
Image Vis. Comput.
Evaluation of ensemble methods for diagnosing of valvular heart disease
Expert Syst. Appl.
Diagnosis of valvular heart disease through neural networks ensembles
Comput. Methods Programs Biomed.
An ensemble of classifiers based on different texture descriptors for texture classification
J. King Saud Univ. Sci.
Computer vision for virus image classification
Biosyst. Eng.
Benchmarking human epithelial type 2 interphase cells classification methods on a very large dataset
Artif. Intell. Med.
ANA HEp-2 cells image classification using number, size, shape and localization of targeted cell regions
Pattern Recognit.
Video-based smoke detection with histogram sequence of LBP and LBPV pyramids
Fire Saf. J.
Visual pattern mining in histology image collections using bag of features
Artif. Intell. Med.
Classification of breast tissues using Moran's index and Geary's coefficient as texture signatures and SVM
Comput. Biol. Med.
Protein classification using texture descriptors extracted from the protein backbone image
J. Theor. Biol.
Expert panel workshop consensus statement on the role of the environment in the development of autoimmune disease
Int. J. Mol. Sci.
Evidence-based guidelines for the use of immunologic tests: Antinuclear antibody testing
Arthritis Rheumatol.
ANA screening: An old test with new recommendations
Ann. Rheum. Dis.
Clinical utility of common serum rheumatologic tests
Am. Fam. Physician
HEp-2 cell Classification using multi-resolution local patterns and ensemble SVMs
HEp-2 cell image classification with convolutional neural networks
Quadratic recurrent finite impulse response MLP for indirect immunofluorescence image recognition
Morphological and Texture Features for HEp-2 Cells Classification
Multiresolution gray-scale and rotation invariant texture classification with local binary patterns
IEEE Trans. Pattern Anal. Mach. Intell.
Texture classification using a linear configuration model based descriptor
Cited by (0)
- ☆
This paper has been recommended for acceptance by Mario Vento