Keywords

1 Introduction

Colour and texture, along with transparency and gloss, are among the most important visual features of objects, materials and scenes. As a consequence, colour and texture analysis plays a fundamental role in many computer vision applications such as surface inspection [1,2,3], medical image analysis [4,5,6,7] and object recognition [8,9,10]. It is generally believed that combining colour and texture improves accuracy (at least under steady imaging conditions [11, 12]), though it is not quite clear which is the best way to do it. Indeed this has been subject of debate since early on, both in computer vision [11, 12] and perception science [13].

Approaches to colour texture analysis can be roughly categorised into three groups: parallel, sequential and integrative [14], though more involved taxonomies have been proposed too [15]. In this paper we investigate the problem of representing colour texture features starting from three LBP variants as grey-scale texture descriptors. Even in the era of Deep Learning, there are good reasons why Local Binary Patterns and related variations are worth investigating: they are conceptually simple, compact, easy to implement, computationally cheap – yet very accurate. We consider two strategies to extend LBP variants to colour textures: the combination of inter- and intra-channel features, and colour orderings. We also evaluate the effect of the colour space used (RGB, HSV, YUV and YIQ) and of the spatial resolution(s) of the local neighbourhood.

2 Background: Grey-Scale LBP Variants

Local Binary Patterns variants [16,17,18,19,20,21,22] (also referred to as Histograms of Equivalent Patterns [23]) are a well-known class of grey-scale texture descriptors. They are particularly appreciated for their conceptual ease, low computational demand yet high discrimination capability. Nonetheless, extensions to colour images have received much less attention than the original, grey-scale descriptors. In this paper we investigate extensions to the colour domain of Local Binary Patterns [16], Improved Local Binary Patterns [24] and Extended Local Binary Patterns [25], though the same methods could be easily extended to other descriptors of the same class (see [22] for an up-to-date review).

The three methods are all based on comparing the grey-levels of the pixels in a neighbourhood of given shape and size, but the comparison scheme is different in the three cases (see [16, 24, 25] for details). In general, any such comparison scheme can be regarded as a hand-designed function (also referred to as the kernel function [23]), which maps a local image pattern to one visual word among a set of pre-defined ones (dictionary). In formulas, denoted with \(\mathcal {N}\) the neighbourhood, \(\mathcal {P}\) a local image pattern (set of grey-scale values over that neighbourhood) and f the kernel function we can write:

$$\begin{aligned} \mathcal {P}: \xrightarrow {f} w; \ w \in \{w_1,\dots ,w_K\}, \end{aligned}$$
(1)

where \(\{w_1,\dots ,w_K\}\) is the dictionary. Consequently, any LBP variant identifies with its kernel function and vice-versa, as clearly shown in [26]. The dimension of the dictionary depends on the kernel function and the number of pixels in the neighbourhood: standard LBP [16, 18, 19] for instance generates a dictionary of \(2^{n-1}\) words, being n the number of pixels in the neighbourhood. Image features are the one-dimensional, orderless distribution of the visual words over the dictionary (bag of visual words model).

Rotation-invariant versions of LBP and variants are computed by grouping together the visual words that can be obtained from one another via a discrete rotation of the peripheral pixels (also usually referred to as the ‘ri’ configuration [16]). In this case the dimension of the (reduced) dictionary can be computed through standard combinatorial methods [27].

3 Extensions to Colour Images

3.1 Intra- and Inter-channel Analysis

Intra- and/or inter-channel analysis are classic tools for extending texture descriptors to colour images [28,29,30]. Intra-channel features are computed from each colour channel separately, inter-channel features from pairs of channels. In both cases the resulting features are concatenated into a single vector. As for inter-channel features, if we consider three-channel images and indicate with i, j and k the colour channels, there are six possible combinations: ij, ik, jk, ji, ki and kj. However, to avoid redundancy and reduce the overall number of features, it is customary to retain only the first three [29, 30]. Figures 1 and 2 show how to compute intra- and inter-channel features in the RGB space.

Intra- and inter-channel analysis applies to grey-scale LBP variants by replacing the comparison between the grey levels with that between the intensity levels within each colour channel and/or pairs of them, respectively. Therefore, both intra- and inter-channel analysis multiply by three the dimension of the original descriptor (by six when used together). Intra- and inter-channel analysis extends LBP, ILBP and ELBP seamlessly to the colour domain. In the remainder we refer to these colour extensions respectively as Opponent Colour LBP (OCLBP) [29], Improved Opponent Colour LBP (IOCLBP) [30] and Extended Opponent Colour LBP (EOCLBP).

Fig. 1.
figure 1

Computing intra-channel features in the RGB space: the intensity values (circles in the figure) are compared within each of the R, G and B channels separately (squares in the figure) (Color figure online)

Fig. 2.
figure 2

Computing inter-channel features in the RGB space: the intensity values (circles in the figure) are compared between each of the R/G, R/B and G/B pairs of colour channels (squares in the figure) (Color figure online)

3.2 Colour Orderings

Differently from grey-scale, colour data are multivariate, hence lack a natural ordering. Still, higher-dimension analogues of univariate orderings can be introduced by recurring to some sub- (i.e. less than total) ordering principles [31]. Herein we considered the following three types of sub-orderings in the colour space: lexicographic order, order based on the colour vector norm and order based on a reference colour [32,33,34,35,36,37]. The first is a marginal ordering (M-ordering), the second and third are reduced (or aggregate) orderings (R-orderings) [31]. In the remainder we use subscripts ‘lex’, ‘cvn’ and ‘rcl’ to indicate the three orderings. Once the order is defined, the grey-scale descriptors introduced in Sect. 2 extend seamlessly to the colour domain. Also note that colour orderings produce more compact descriptors than intra- and inter-channel analysis, for the number of features is, in this case, the same as that of the original grey-scale descriptor.

Lexicographic Order. The lexicographic order [32, 34] involves defining some kind of (arbitrary) priority among the colour channels. Denoted with i, j and k the three channels, one can for instance establish that i has higher priority than j and j higher than k. In that case, given two colours \(\mathbf {C}_1 = \{C_{1i},C_{1j},C_{1k}\}\) and \(\mathbf {C}_2 = \{C_{2i},C_{2j},C_{2k}\}\), we shall write:

$$\begin{aligned} \mathbf {C}_1 \ge \mathbf {C}_2 \iff&(C_{1i}> C_{2i}) \vee [(C_{1i} = C_{2i}) \wedge (C_{1j} > C_{2j})] \vee \\&[(C_{1i} = C_{2i}) \wedge (C_{1j} = C_{2j}) \wedge (C_{1k} \ge C_{2k})]. \nonumber \end{aligned}$$
(2)

For three-dimensional colour data there are \(3! = 6\) priority rules, and, consequently, as many lexicographic orders.

Aggregate Order Based on the Colour Vector Norm. This is based on comparing the vector norm [33] of the two colours:

$$\begin{aligned} \mathbf {C}_1 \ge \mathbf {C}_2 \iff \Vert \mathbf {C}_1\Vert \ge \Vert \mathbf {C}_2\Vert , \end{aligned}$$
(3)

where ‘\(||\cdot ||\)’ indicates the vector norm. In the remainder we shall assume that this be the \(L_2\) norm, although other types of distance can be used as well.

Aggregate Order Based on a Reference Colour. In this case the comparison is based on the distance from a given (and again arbitrary) reference colour \(\mathbf {C}_\text {ref}\) [35]:

$$\begin{aligned} \mathbf {C}_1 \ge \mathbf {C}_2 \iff \Vert \mathbf {C}_1-\mathbf {C}_{ref}\Vert \ge \Vert \mathbf {C}_2-\mathbf {C}_{ref}\Vert . \end{aligned}$$
(4)

Clearly this case reduces to the order based on the colour vector norm when \(\mathbf {C}_\text {ref} = \{0,0,0\}\).

Fig. 3.
figure 3

Pixel neighbourhoods corresponding to resolutions 1, 2 and 3, respectively

4 Experiments

Different strategies have been proposed to extend LBP (and variants) to colour textures. In order to evaluate the effectiveness of the approaches described in Sect. 3 and to explore which one works better in the case of colour images, we carried out a set of supervised image classification experiments using 15 colour texture datasets (more details on this in Sect. 5). We first run a group of three experiments to determine the optimal settings regarding the colour space used (Experiment 1), the colour orderings (Experiment 2) and the combination of resolutions for intra- and inter-channel features (Experiment 3). To reduce the overall computational burden we only used datasets #1 to #5 for this first group of experiments. Finally, in the last experiment we selected the best settings and carried out a comprehensive evaluation using all the datasets.

We computed rotation-invariant (‘ri’) features from non-interpolated pixel neighbourhoods of radius 1px, 2px and 3px (Fig. 3) and concatenated them. In the remainder, symbol ‘&’ will indicate concatenation; therefore we shall write, for instance, ‘1&2&3’ to signal concatenation of the feature vectors computed at resolution 1, 2 and 3.

The accuracy was estimated via split sample validation with stratified sampling using a train ratio of 1/2, i.e.: half of the samples of each class (train set) were used to train the classifier and the remaining half (test set) to compute the figure of merit. This was the fraction of samples of the test set classified correctly. Classification was based on the nearest neighbour rule with \(L_1\) (‘cityblock’) distance.

4.1 Experiment 1: Selecting the Best Colour Space for Intra- and Inter-channel Features

This experiment aimed to determine the best colour space among RGB, HSV, YUV and YIQ (conversion formulae from RGB available in [38]). Since HSV separates colour into heterogeneous components (hue, saturation and value), we also used a normalized version of this space (HSV\(_\text {norm}\) in the remainder):

$$\begin{aligned} \begin{aligned} H_\mathrm{norm} = \frac{H-\mu _\mathrm{H}}{\sigma _\mathrm{H}},\\ S_\mathrm{norm} = \frac{S-\mu _\mathrm{S}}{\sigma _\mathrm{S}},\\ V_\mathrm{norm} = \frac{V-\mu _\mathrm{V}}{\sigma _\mathrm{V}}.\\ \end{aligned} \end{aligned}$$
(5)

where \(\mu \) and \(\sigma \) indicate the average values over the input image. Normalized versions of YUV and YIQ were also considered, but not reported in the results owing to their poor performance. We computed both intra- and inter-channel features at resolutions 1, 2 and 3, and concatenated the results (‘1&2&3’). We considered the following combinations of colour spaces respectively for the intra- and inter-channel features: RGB-RGB, HSV-HSV, HSV-RGB, HSV-HSV\(_\text {norm}\), HSV-YUV and HSV-YIQ (see Table 3, boldface figures).

4.2 Experiment 2: Colour Orderings Vs. Intra- and Inter-channel Features

The objective of this experiment was to evaluate the effectiveness of intra- and inter-channel features compared with colour orderings (Sect. 3.2) in the RGB colour space, which emerged as the best one from Experiment 1. For the lexicographic order we considered all the six possible combinations of priority among the R, G and B channels, though for the sake of simplicity we only report (see Table 4) the results of the combination that attained the best accuracy in the majority of the cases (this was \(G \succ R \succ B\)). For the order based on a reference colour we considered three possible references, the same three used in [36] since they were the best among the eight vertices of the RGB colour cube: white (1,1,1), green (0,1,0) and magenta (1,0,1), and the first gave the best results (see Table 4). As in Experiment 1, the image features were computed at resolution 1, 2 and 3 and the resulting vectors concatenated (‘1&2&3’).

4.3 Experiment 3: Selecting Optimal Resolutions for Intra- and Inter-channel Features

Since the use of intra- and inter-channel analysis increased by six the number of features of grey-scale descriptors, in this experiment we investigated how to reduce the overall number of features – this way generating reasonably compact descriptors – by selecting appropriate resolutions for intra- and inter-channel features. Specifically, we used three concatenated resolutions (‘1&2&3’) for intra-channel features and one (‘1’, ‘2’ or ‘3’) or two concatenated resolutions (‘1&2’, ‘1&3’ or ‘2&3’) for inter-channel features.

Table 1. Summary table of the generic colour texture datasets used in the experiments
Table 2. Summary table of the biomedical textures used in the experiments

4.4 Experiment 4: Overall Evaluation with Optimised Settings

In this last experiment we computed the classification accuracy over all the 15 datasets described in Sect. 5 using the settings that emerged as optimal from the previous experiments. For calibration purposes we also included five pre-trained convolutional neural network models – specifically: three residual networks (ResNet-50, ResNet-101 and ResNet-152 [39]) and two VGG ‘very deep’ models (VGG-VeryDeep-16 and VGG-VeryDeep-19 [40]). Image features in this case were the \(L_1\) normalised output of the last fully-connected layer (usually referred to as the ‘FC’ configuration [41, 42]). The results are reported in Tables 67.

Table 3. Results of Experiment 1: selecting the best colour space for intra- and inter-channel features. Boldface text indicates the best combination descriptor + colour spaces; framed text the overall best accuracy by dataset. Although the KTH-TIPS and KTH-TIPS2b datasets saw improvements with HSV, all other datasets performed best by remaining in RGB space. Classifier was 1-NN (\(L_1\))

Table 3 summarises the results of Experiment 1. As can be seen, the RGB-RGB combination for intra- and inter-channel features emerged as the best option in seven datasets, followed by HSV-HSV (five datasets) and HSV-RGB (four datasets).

5 Datasets

For the experimental evaluation we considered eight datasets of generic colour textures and seven of biomedical textures as detailed in Sects. 5.15.2. The main characteristics of each dataset are also summarised in Tables 12.

5.1 Generic Colour Textures

#1 – #2: KTH-TIPS [43, 44] and KTH-TIPS2b [44, 45]. Generic materials as bread, cotton, cracker, linen, orange peel, sandpaper, sponge or styrofoam, acquired at nine scales, three viewpoints and three different illuminants.

#3 – #4: Outex-00013 [16, 46] and Outex-00014 [16, 46]. Generic materials such as carpet, chips, flakes, granite, paper, pasta or wallpaper. Images from Outex-00013 were acquired under invariable imaging conditions and those from Outex-00014 under three different illumination conditions.

#5, #7 – #8: PlantLeaves [47], ForestSpecies [48, 49] and NewBarkTex [50, 51]. Images from different species of plants, trees and bark acquired under controlled and steady imaging conditions.

#6: CUReT [52]. A reduced version of the Columbia-Utrecht Reflectance and Texture database maintained by the Visual Geometry Group, University of Oxford, United Kingdom [53], containing samples of generic materials.

5.2 Biomedical Textures

The following databases were acquired through digital microscopy under fixed and reproducible conditions, and are therefore intrinsically different from those presented in the preceding section.

#9: BioMediTechRPE [54, 55]. Retinal pigment epithelium (RPE) cells from different stages of maturation.

#10 – #13: BreakHis [56, 57]. Histological images from benign/malignant breast cancer tissue. Each image was taken under four magnification factors (40\(\times \), 100\(\times \), 200\(\times \) and 400\(\times \)), and we considered each factor as making up a different dataset (see Table 2).

Table 4. Results of Experiment 2: colour orderings vs. intra- and inter-channel features. Boldface text indicates, for each dataset, the best accuracy by descriptor + colour ordering; framed text the overall best accuracy by dataset. Colour priority for lexicographic order (‘lex’) was \(G \succ R \succ B\), distance for colour vector norm (‘cvn’) was \(L_2\) and reference colour for ‘rcl’ was white (1,1,1). The features were computed on the RGB space. The significant reduction in dimensionality offered by the colour orderings does not lead to any improvement
Table 5. Experiment 3: best combination of resolutions for intra- and inter-channel features. Boldface text indicates, for each dataset, the best combination descriptor + resolutions used for computing the inter-channel features; framed text the overall accuracy by dataset. The features were computed on the RGB space. In general, strongest performance is realized by limiting the number of resolutions associated with inter-channel features. That is, the added information present in multiple inter-channels is more than offset by the decrease in classification performance due to the increase in feature dimensionality

#14 – #15: Epistroma [5, 58] and Kather [59,60,61]. Histological images from colorectal cancer tissue representing different tissue sub-types.

6 Results and Discussion

The results of Experiment 2 (Table 4) show that in most cases intra- and inter-channel features from the RGB space improved the accuracy of the original, grey-scale descriptors by a good margin. By contrast, no clear advantage emerged from using colour orderings as an alternative to grey-scale values.

Experiment 3 indicated that the best accuracy was achieved by concatenating multi-resolution intra-channel features and single-resolution inter-channel features. In fact, adding more than one inter-channel resolution degraded the performance in the majority of cases, as clearly shown in Table 5. The results were however inconclusive as to which resolution (‘1’, ‘2’ or ‘3’) should be used.

The comparison between LBP variants and pre-trained convolutional networks (Experiment 4, Tables 6 and 7) showed nearly perfectly split results, with the former achieving the best performance in seven datasets out of 15 and the reverse occurring in the other eight. Convolutional models seemed better at classifying textures with higher intra-class variability (as a consequence of texture non-stationariness and/or changes in the imaging conditions), as for instance in datasets #1, #2 and #7 (see Sect. 5.1). Conversely, homogeneous textures acquired under steady imaging conditions (most of the biomedical datasets) were still better classified by LBP variants. This finding generally agrees with those obtained in previous studies [62]. Pre-trained convolutional networks, however, achieved this result by employing at least twice as many features than LBP variants.

Table 6. Results of Experiment 4: overall evaluation with optimised settings (datasets #1 to #8). Boldface text indicates, for each dataset, the best combination descriptor + resolutions used for computing the inter-channel features; framed text the overall best accuracy by dataset
Table 7. Results of Experiment 4: overall evaluation with optimised settings (datasets #9 to #15). Boldface text indicates, for each dataset, the best combination descriptor + resolutions used for computing the inter-channel features; framed text the overall best accuracy by dataset

7 Conclusions

In this work we have investigated two strategies for extending LBP variants to the colour domain: intra- and inter-channel features on the one hand and colour orderings on the other. Colour orderings did not prove particularly effective; however, intra- and inter-channel features improved the accuracy of the original, grey-scale descriptors in virtually all the cases. The best results were obtained by combining multi-resolution intra-channel features with single-resolution inter-channel features, and this represents a novel finding. In future works we plan expand the study to consider more LBP variants [22] and different strategies for compacting the feature vectors [63].