1 Introduction

Hyperspectral Imaging is an advanced technology that solves many classification and identification problems. Thus, the large number of spectral and spatial information complicates its classification. Hyperspectral imaging classification domain is very broad (spatial and/or spectral feature extraction or other, segmentation, band reduction, anomaly detection, large image storage, etc.) [1,2,3,4]; There is an infinity of treatment algorithms, among the best we have found CNN. Convolutional neural networks are characterized by their excellent performance. There are several methods, as mentioned in [5]: logical regression, Perceptron Multilayer and Autoencoders Stacked Denoising, etc.; The neural network parameters, the number of neurons and the number of hidden layers, etc., can vary and can greatly influence the classification results. In this paper, we present an initialization approach for data to be processed by the convolutional neural networks. First, we adaptatively select kernels by a clustering algorithm. Next, we calculate the size of the adaptive batches. Extracted batches are processed normally in the CNN. in the last section, we present the test results, applied to three different hyperspectral images.

2 Proposed Approach

Convolutional neural networks have shown their effectiveness in recent years. In fact, it is characterized by the repetitive processing of the convolution and pooling layers until the fully connected layer is obtained. On the other hand, the hyperspectral image is a cubic image, composed of more than a dozen spectral bands, each of them providing information about the image. We extract each band separately. Then we extract each of the batches of information, which is going through the different stages of CNN. The problem is how to choose the right numbers of kernels? their sizes? their positions? The batches size? The proposed method is shown schematically in Fig. 1.

Fig. 1.
figure 1

Overall architecture of the proposed approach

2.1 Kernel Selection

To select the right position of kernels, first, a feature map is created containing the positions of the relevant pixels (kernels). To adaptively select the kernels, we propose to apply the algorithm CKmeans [6, 7]. CKmeans purposes to stake the \( {\text{X}} = \left\{ {{\text{x}}_{1} ,{\text{x}}_{2} , \ldots ,{\text{x}}_{\text{n}} } \right\} \) ambiguous pixels in p clusters with \( \upmu_{\text{ij}} \) are the \( {\text{x}}_{\text{i}} \) membership degree, that belongs to the \( {\text{j}}^{\text{th}} \) cluster. The clustering result is expressed by the membership degree on the matrix \( \upmu \).

Before all, initialize the parameters: n: number of training data; p: number of clusters; and m: fuzzification parameter, representing the width of the p dimensional cluster perimeter.

For the p initialization, lets applying the Euclidean division in relative integers: for two integers n and p, with p different of zero, Euclidean division associates n quotient m and n remainder r, both integers, satisfying: \( {\text{n}} = {\text{pm}}\,+\,{\text{r}} \) where \( 0 \le {\text{r }}\,<\,\left| { p } \right| \). In this case, r should be minimized, to cover as ample space as possible. The three parameters n, p, and m are all positive, and \( {\text{m}} \in \left[ {1.25, 2} \right] \).

Algorithm steps are as follows; Firstly, take each band separately, one by one, then generate \( {\text{Xi}} \) vectors of data. The i from 1 to n data:

figure a

According to the previous instructions, we get a new clustering matrix, from which we generate a binary matrix indicating the kernels positions. We set a threshold parameter \( \uptheta \), in [2, 8], which presents the number of neighboring similar pixels to the middle pixel (active pixel). For each pixel X, for \( {\text{i}} \in \left[ {1,8} \right] \), if \( {\text{P}}_{\text{i}} = = {\text{X}} \), Neighbors value \( \aleph \)++. Then, if \( \aleph \ge\uptheta \), so \( {\text{X}} \) is a cluster center, and it must be marked by 1, and 0 otherwise.

2.2 Batch Size Calculation

Let µ the threshold of neighbors to consider, and 4 the numbers of the kernels nearest neighbors (this number depends on the size of the image). For each pixel, select \( {\text{V}} = \left\{ {{\text{v}}1,{\text{v}}2,{\text{v}}3,{\text{v}}4} \right\} \) nearest kernels neighbors, by calculating the distance between each pixel with all the other kernels and choose the smaller values.

Calculate the median among the active pixel and each of its neighbors. let \( {\text{W}} = \left\{ {{\text{w}}1,{\text{w}}2,{\text{w}}3,{\text{w}}4} \right\} \), so

$$ {\text{w}}_{\text{i}} \left( {{\text{x}},{\text{y}}} \right) = \left( {\frac{{{\text{x}}_{{{\text{v}}_{\text{i}} }} + {\text{x}}_{\text{k}} }}{2};\frac{{{\text{y}}_{{{\text{v}}_{\text{i}} }} + {\text{y}}_{\text{k}} }}{2}} \right) $$
(6)

Where x and y present the ordinate and the abscissa of the point successively, and k represents the active pixel.

After selecting new points, select the lines parallel to the x and y-axes passing through each of these points. For that, use constant affine functions, \( {\text{f }}\left( {\text{x}} \right) = {\text{ax }}\,+\,{\text{b}} \) with \( {\text{a }} = 0 \) and \( {\text{b}} \) is the coordinates of the neighboring kernels. So we get four horizontal lines L1, L2, L3 and L4; and four vertical lines (columns) C1, C2, C3, and C4.

For the four new selected points, identify the nearest rows and columns of our active pixel. The intersection of these represents the coordinates of the batch to extract (see Fig. 2). Space is divided into four parts up, down, left and right of the active pixel.

figure b
Fig. 2.
figure 2

(a) Selection of nearest neighbors; (b) Calculation of medians; (c) Coordinates batch selection

The new positions are;

\( {\text{P}}_{1} = {\text{Lu}} \cap {\text{Cl}} \); \( {\text{P}}_{2} = {\text{Lu}} \cap {\text{Cr}} \); \( {\text{P}}_{3} = {\text{Ld}}\,\cap\,{\text{Cl}} \) and \( {\text{P}}_{4} = {\text{Ld}} \cap {\text{Cr}} \);

$$ {\text{So}},\,\;{\text{P}}_{\text{i}} \left( {{\text{x}},{\text{y}}} \right) = \left( {{\text{x}}_{{{\text{L}}_{{\left\{ {{\text{u}};{\text{d}}} \right\}}} }} + {\text{x}}_{{{\text{C}}_{{\left\{ {{\text{r}};{\text{l}}} \right\}}} }} ;{\text{y}}_{{{\text{L}}_{{\left\{ {{\text{u}};{\text{d}}} \right\}}} }} + {\text{y}}_{{{\text{C}}_{{\left\{ {{\text{r}};{\text{l}}} \right\}}} }} } \right) $$
(7)

2.3 CNN Treatment

Convolution layer adds to each pixel of the image its local neighbors, weighted by the chosen mask. The central element of the mask is placed on the active pixel. The active pixel is replaced by a weighted sum of the neighbor‘s pixels and itself.

Let \( {\text{H}}_{0} \times {\text{W}}_{0} \) is the input image size (that is the size of the batch \( {\text{B}}_{\text{i}} \)), and L the number of layers, where \( \upalpha \) is a positive integer. Characteristics map n of convolution layer l is calculated as:

$$ y_{n}^{l} = f_{l} \left( {\mathop \sum \limits_{{m \in V_{n}^{l} }} y_{m}^{l - 1} \otimes w_{m,n}^{l} + \beta_{n}^{l} } \right) $$
(8)

The symbol \( \otimes \) presents convolution operator. \( {\text{f}}_{\text{l}} \) is activation function of the layer, \( \upbeta_{\text{n}}^{\text{l}} \) the Bias for characteristics map B in max-pooling layer \( {\text{S}}^{\text{l}} \), and \( {\text{S}}^{1} ,{\text{S}}^{3} \ldots ,{\text{S}}^{{2{\text{a}}}} \) max-pooling layers. \( {\text{V}}_{\text{n}}^{\text{l}} \) presents the list of all the bands in the layer \( {\text{l}} - 1 \) which are connected to the characteristics map n. Finally, \( {\text{w}}_{{{\text{m}},{\text{n}}}}^{\text{l}} \) is convolution mask from the characteristics map m in layer \( {\text{S}}^{{{\text{l}} - 1}} \) to characteristics map n in layer \( {\text{C}}^{\text{l}} \).

The size of the output characteristics map \( {\text{y}}_{\text{n}}^{\text{l}} \) is

$$ \left( {H^{l - 1} - r^{l} + 1} \right) \times \left( {W^{l - 1} - c^{l} + 1} \right)\,{\text{Pixels}} $$
(9)

Where \( {\text{r}}^{\text{l}} \times {\text{c}}^{\text{l}} \) pixels are the size of convolution masks \( {\text{w}}_{{{\text{m}},{\text{n}}}}^{\text{l}} \), and \( {\text{H}}^{{{\text{l}} - 1}} \times {\text{W}}^{{{\text{l}} - 1}} \) pixels are the size of the input characteristics maps \( {\text{y}}_{\text{m}}^{{{\text{l}} - 1}} \). The following image represents an explanatory example of the application of convolution mask on an image.

The max pooling layer divides the batch by making out the maximum of each pixel group, according to a given mask size. And this reduces at least half of its size.

In our work, we applied masks of size \( 2\,{\text{x}}\, 2 \). Then the computation equation is as follows;

$$ {\text{y}}_{\text{n}}^{\text{l}} \left( {{\text{i}},{\text{j}}} \right) = {\text{Max}}\left( {{\text{y}}_{\text{n}}^{{{\text{l}} - 1}} \left( {2{\text{i}} - 1,2{\text{j}} - 1} \right);{\text{y}}_{\text{n}}^{{{\text{l}} - 1}} \left( {2{\text{i}} - 1,2{\text{j}}} \right);{\text{y}}_{\text{n}}^{{{\text{l}} - 1}} \left( {2{\text{i}},2{\text{j}} - 1} \right);{\text{y}}_{\text{n}}^{{{\text{l}} - 1}} \left( {2{\text{i}},2{\text{j}}} \right)} \right) $$
(10)

Let \( {\text{H}}^{{{\text{l}} - 1}} \times {\text{W}}^{{{\text{l}} - 1}} \) is the input batch size, so the size of the output feature map \( {\text{y}}_{\text{n}}^{\text{l}} \) is \( {\text{H}}^{\text{l}} = {\text{H}}^{{{\text{l}} - 1}} /2 \) and \( {\text{W}}^{\text{l}} = {\text{W}}^{{{\text{l}} - 1}} /2 \).

In the last convolution layer, each band is connected to exactly one preceding characteristics map. It uses convolution masks that have the same size as its input characteristics maps.

The output layer, called L, is constructed from sigmoidal neurons. Let \( {\text{N}}^{\text{L}} \) is the number of sigmoidal output neurons. The corresponding equation to compute \( {\text{y}}_{\text{n}}^{\text{L}} \), the output of sigmoidal neuron n, is:

$$ y_{n}^{L} = f^{L} \left( {\mathop \sum \limits_{m = 1}^{{N^{L - 1} }} y_{m}^{L - 1} w_{m,n}^{L} + \beta_{n}^{L} } \right) $$
(11)

Where \( \upbeta_{\text{n}}^{\text{L}} \) is the bias associated with the neuron n of the L layer, and \( {\text{w}}_{{{\text{m}},{\text{n}}}}^{\text{L}} \) is the weight of the characteristic map m of the last convolutional layer, at the neuron n of the output layer. Finally, to calculate the outputs of all the sigmoidal neurons creating the outputs of the network:

$$ y = \left[ {y_{1}^{L} ,y_{2}^{L} , \ldots y_{{N_{L} }}^{L} } \right] $$
(12)

3 Experiences and Discussion

To prove the effectiveness of our proposed method, we tested our algorithm on three different datasets, SalinasA, Pavia University and Indian Pines, and compared them with other algorithms and situations.

3.1 Datasets

Datasets (1) SalinasA is a small excerpt from the Salinas hyperspectral image and is characterized by its high spatial resolution, 3.7 meters of pixels. Its 204 spectral bands taken by the AVIRIS sensor, in the Salinas Valley area of California. SalinasA is \( 86 \times 83 \) pixels in size and is constructed of six classes.

Datasets (2) Indian Pines is composed of 220 spectral bands, reduced to 200 by the removal of water absorption region. It is captured by AVIRIS, in the northwestern Indiana area. Its size is \( 145 \times 145 \) pixels and is constructed of sixteen classes.

Datasets (3) University of Pavia is composed of 103 spectral bands. It is captured by ROSIS, in the area of Pavia in Italy, of spatial resolution 1.3 m. Its size is \( 610 \times 610 \) pixels, a few of which are not sampled and can be removed from the analysis and is composed of nine classes.

3.2 Experimental Results

For experiments, from each hyperspectral image, we extract the spectral bands. For each of the bands, we compute the grouping matrix, then from each matrix, we generate a new feature map in the form of a binary matrix that specifies the positions and the number of kernels. We extracted a separate feature from the feature map.

We did two different tests: the first by applying fixed size batches (see Table 1). Then, in the second tests, we extracted an adaptive batches sizes; whose size is calculated according to the dimensionality of the input image and the number of generated kernels.

Table 1. Batch size, convolution masks size, and max-pooling mask size chosen for each dataset.

To test the effectiveness of the proposed methods, we first compare the results with a manually selected number of kernels. The tests are applied to the three datasets: SalinasA, Indian Pines, and Pavia University.

Before we start, we have to mention that we have to initialize the number of kernels, so we select it randomly as long as we do not exceed the dimensions of the image; We also let the system choose a random kernel threshold in [2, 8]. In the second tests, for the other parameters, our algorithms calculate the size of the batch, according to the number of kernels calculated and the size of the input band; The average number of layers performed in our three examples is generally between 1 and 3.

First Tests: Fixed Batches Size

We started by testing our method with batches of fixed sizes. we applied the parameters noted in Table 1. We defined by Mk (Manual kernel) and by AutoK (Automatic selection of kernel). The obtained results are written in Table 2.

Table 2. Variation of the results according to the number of kernels used, applied to the three types of hyperspectral data (by fixed batches)

In the first experiments, we tested our results with a number of manually selected kernels, which are 35, 40, 45, 50, 55, 60, 65, 70, 75 and 80. We have applied these tests to the three datasets already mentioned above. The results obtained are noted in Table 2. We noticed each time that the best precisions obtained are, using the kernels of 40, 50 and 75, successively for the data Salinas-A, Indian Pines, and Pavia University. These results are consistent with the results obtained automatically by the CKmeans application. This implies that our method has succeeded in selecting the best number of kernels to use.

Seconds Tests: Adaptatively Batches Size

In the second part, we did the same tests, but by applying adaptive size batches. the results obtained are saved in Table 3.

Table 3. Variation of the results according to the number of kernels used, applied to the three types of hyperspectral data (by adaptive batches)

Again, we have noticed that the best precisions obtained are, using the kernel of 40, 50 and 75, successively for the data Salinas-A, Indian Pines, and the University of Pavia; These results are consistent with the results obtained automatically by the CKmeans application. These results are also better than the accuracies obtained by applying batches of fixed size. this implies that the use of variable batches is a solution to improve the accuracy of the classification.

3.3 Discussions

In Table 4, we compare the classification results obtained by three different algorithms which are K-Means, PCA, Random, (1) and (2). Where (1) the method we tested in the first place, applying the clustering algorithm for calculating the number of kernels. (2) is the application of the steps in (1) but using batches of variable sizes (this is our proposed approach).

Table 4. Comparison of results with other methods

Obtained result by testing on the three types of images implies that our proposed approaches are very effective for hyperspectral imaging classification in deep learning.

4 Conclusion

Conventional neural networks are very effective for classification tasks, and especially for complicated or large images such as hyperspectral images. In this paper, we presented a method for initializing data to convolute, by two steps: (1) the adaptive selection of kernels by a clustering algorithm, and (2) the definition of adaptive size batches around pixels. Extracted information batches pass through the different convolution and pooling layers, in order to obtain at the end a simple vector to classify. In order to validate our approach, we tested the algorithms on three different hyperspectral images, and the results showed the effectiveness of our proposal.