Keywords

1 Introduction

Over the past two decades, estimating spectral reflectance from objects surface has been widely used in object analysis and visualization [1] such as biometrics, medical diagnosis [2], color reproduction, art reproduction and cultural heritage [3]. Reflectance reconstruction for the 400–700 nm wavelength range from the responses of a digital camera has received considerable attention recently.

The traditional devices such as hyper- and multi-spectral imaging systems using existing spectral cameras (beyond trichromatic) can produce highly accurate information. While they can obtain highly accurate information, most of these methods require complex mechanical constructions and larger investments for imaging, which prevent many practical applications such as those in the outdoors compared to systems with consumer level RGB cameras [4].

Using consumer cameras as relatively cheap measurement devices for estimating spectral color properties has become an interesting alternative to making pointwise high-precision spectral measurements compared with special equipment like photospectrometers [5]. The results obtained with consumer cameras cannot compete with the quality of the traditional devices, but they are very attractive since the equipment is relatively cheap and instant measurements are obtained for millions of measurement points. These advantages come, at the price of lower-quality, and it is thus of interest to improve the precision of the estimations [5].

Rang pointed out the prior approaches sensitive to input images captured under illuminations which were not present in the training data and proposed a novel training-based method to reconstruct a scene’s spectral reflectance from a single RGB image in [6]. Which explore a new strategy to use training images to model the mapping between camera white-balancing RGB values and scene reflectance spectra. The method improved reconstruction performance compared with previous works, especially when the tested illumination is not included in the training data.

Heikkinen suggests that one way to increase the accuracy of reconstruction performance is via the inclusion of a priori knowledge in [1]. And Gijsenij make a non-linear transformation to reflectance in reflectance reconstruction and results demonstrate that non-linear transformation improved the accuracy of reconstruction in [7]. Inspired by these, we produce physically feasible estimations via combined with link functions and training-based approach. The general training-based approach using the method showed in [6], but we replace the reflectance with the specific reflectance via link function in the training stage and reconstruction stage. Our main focus is in the comparison of the performance of link functions when combined with the training-based approach. We evaluate the performance of three link functions (logit, square root 1, square root 2) reconstruction method in color patchess reflectance reconstruction and a normal scene reconstruction, the experimental results demonstrate that the inclusion of link function improve the performance of the training-based reconstruction in terms of spectral error and shape.

Fig. 1.
figure 1

The process of the training and reconstruction. (a) The training stage. (b) The reconstruction stage.

2 Method

In this paper, we do not use RGB images taken directly from the camera. Instead, we synthesize RGB images from hyperspectral images using known camera’s sensitivity functions. Synthesize RGB images in this way has two advantages compare with taken directly from the camera. Firstly, it removes the need to create a dataset of the images captured using the chosen camera for the same scenes as captured by the spectral camera. And the method can be used for any commercial camera so far as its sensitivity functions are known [6].

In this section, the details of link functions and training-based method were showed in Fig. 1. The method can be divided into two processes: the training stage and reconstruction stage.

2.1 Training Stage

The method considers a mapping between RGB images under canonical illumination (using white-balancing) and their reflectance. The training process is shown in Fig. 1(a). The training process has four steps: synthesizing the RGB images, white-balancing the RGB images, link functions transform the reflectance, and computing the mapping.

Firstly, synthesized the RGB images corresponding to scenes and illuminations in spectral images can be formed by using the model as

$$\begin{aligned} {I_c}(x) = \int \limits _\lambda {P(\lambda )R(\lambda ,x)} C(\lambda )d\lambda \end{aligned}$$
(1)

where \(P(\lambda )\) is the spectrum of illumination, \(R(\lambda ,x)\) is the scene reflectance for the pixel intensity, \(C(\lambda )\) is the camera spectral sensitivity.

After the RGB image \(I_c(x)\) formed, we utilize transformation for the RGB image in a white balanced image \({\widehat{I}_c}(x)\) as follows:

$$\begin{aligned} {\widehat{I}_c}= diag(\frac{1}{t})I_c(x)=diag(\frac{1}{t_r},\frac{1}{t_g},\frac{1}{t_b})I_c(x) \end{aligned}$$
(2)

where \(\mathbf {t}=[\mathbf {t}_\mathbf {r}, \mathbf {t}_\mathbf {g}, \mathbf {t}_\mathbf {b}]\) is the white balancing vector obtained by a chosen white balancing algorithm. For the white-balancing step, we used shades of grey (SoG) method [8] that was widely used for its simplicity, low computational requirement and proven efficacy over various datasets.

Next, the reflectance vectors R are replaced with transformed vectors \(\mathbf {\widetilde{R}}\) via link functions. The link function made a non-linear transformation of reflectance vectors R. Three link functions were evaluated in the experiments.

  1. (1)

    The logit function

    $$\begin{aligned} \mathbf {\widetilde{R}} = \log \mathbf {it}(\mathbf {R}) = \mathbf {log}(\frac{\mathbf {R}}{{\mathbf {1} - \mathbf {R}}}) \end{aligned}$$
    (3)

    where \(log:[0,{+\infty }]\rightarrow R\) is the natural logarithm evaluated element-wise for \( \mathbf {R} \in {[\mathbf {0,1}]^{\mathbf {n}}}\).

  2. (2)

    The square root 1

    $$\begin{aligned} \mathbf {\widetilde{R}} = \sqrt{\mathbf {R}} \end{aligned}$$
    (4)
  3. (3)

    The square root 2 Tzeng and Berns proposed a new empirical space that gives a near-normal and reduced dimensionality for subtractive opaque processes in [7], the link function are given by Eq. 5, where a is an offset vector which is empirically derived.

    $$\begin{aligned} \mathbf {\widetilde{R}} = \mathbf {a} - \sqrt{\mathbf {R}} \end{aligned}$$
    (5)

Finally, the mapping f is learnt between the white balanced RGB images \({\widehat{I}_c}(x)\) and their specific spectral reflectance \(\widetilde{R}\). We use scatter point interpolation based on a radial basis function (RBF) network for mapping. RBF network is a popular interpolation method in multidimensional space [6]. It is used to implement a mapping \(f:R^3 \rightarrow {R^P}\) according to:

$$\begin{aligned} f(x) = {\omega _0} + \sum \limits _{i = 1}^M {{\omega _i}\emptyset (||x - {c_i}||)} \end{aligned}$$
(6)

where \(x \in {R^3}\) is the RGB input value, \(f(x) \in {R^P}\) is the spectral reflectance value in P-dimensional space, \(\emptyset (.)\) is the radial basis function, ||.|| denotes the Euclidean distance, \({\omega _i}(0 \le i \le M)\) are the weights, \({c_i} \in {R^3}(1 \le i \le M)\) are the RBF centers, M is the number of center. The RBF centers \(c_i\) are chosen by the orthogonal least squares method. The weights \({\omega _i}\) are determined using linear least squares method.

To control the number of centers M for the RBF network model against overfitting, Nguyen had used repeated random sub-sampling validation to do cross-validation in [6], and found that the number of centers M which gave the best result for validation set was within 40–50, here the number of centers was set 50.

Fig. 2.
figure 2

This figure shows the reconstruction of eight color patches’s reflectance using Canon 1D Mark III under indoor illumination using metal halid lamp of the 4300 K color temperature. The quantitative errors of eight patches are shown in Tables 1 and 2.

2.2 Reconstruction Stage

Once the training is performed, the mapping can be saved and used offline for spectral reflectance reconstruction. The reconstruction stage process is shown in Fig. 1(b).

To reconstruct spectral reflectance from a new RGB image, this image must be white-balanced to transform the image to the normalized illumination space \({\widehat{I}_c}(x)\). The learned mapping f is used to map the white-balanced image to the specific spectral reflectance image as:

$$\begin{aligned} \widetilde{R}(\lambda ,x) = f({\widehat{I}_c}(x)) \end{aligned}$$
(7)

Then the reconstructed reflectance \(R(\lambda ,x)\) can be get via inverse link function transformed \(\widetilde{R}(\lambda ,x)\).

3 Experiments

3.1 Experiment Data

In this experiment, the hyperspectral images data from [6] and camera’s sensitivity functions data from [9] has been used, the dataset contain spectral images and illumination spectra taken using Specim’s PFD-CL-65-V10E (400 nm to 1000 nm) spectral camera. For light sources, natural sunlight and shade conditions were considered. Additionally, artificial wideband lights were also considered by using metal halide lamps with different color temperatures (2500 K, 3000 K, 3500 K, 4300 K, 6500 K) and a commercial off-the-shelf LED E400 light. For the natural light sources, outdoor images of natural objects (plants, human beings, etc.) as well as manmade objects has been taken. And a few images of buildings at very large focal length were also taken. The images corresponding to the other light sources have manmade objects as their scene content. For each spectral image, a total of 31 bands were used for imaging (400 nm to 700 nm at a spacing of about 10 nm).

Fig. 3.
figure 3

This figure shows the reconstruction result of a normal scene using a Canon 1D Mark III under indoor illumination using metal halide lamp of 2500 K color temperature. The quantitative errors of the locations are shown in Tables 3 and 4.

There are a total of 64 spectral images, and 24 images with color charts taken as the test images for the reconstruction stage since explicit ground truth of their spectral reflectance are available and thus the accuracy of reconstruction can be better assessed, the remaining 40 images are used for training.

3.2 Experiments

Since the pixel amount of 40 training images is very large and most of the training images are similar together, each training image was sub-sampled by using k-means clustering and totally collected around 16,000 spectral reflectance from all the images for the training stage. We used 24 images with color charts as the test images for the reconstruction stage, the ground truth of the spectral reflectance are obtained from the hyperspectral camera.

Four methods: without link function, logit link function, square root 1 link function, and square root 2 link function were compared. Firstly, the RGB test images for reconstruction are formed using the intrinsic image model in Eq. 1. The reflectance of 24 images (size of \(1312 \times 1924\)) were reconstructed.

Table 1. This table shows the eight color patches’s reconstruction result (in RMSE) of colorchecker’s reflectance using Canon 1D Mark III under indoor illumination using metal halide lamp of 4300 K color temperature.
Table 2. This table shows the eight color patches’s reconstruction result (in PD) of colorchecker’s reflectance using Canon 1D Mark III under indoor illumination using metal halide lamp of 4300 K color temperature.
Table 3. This table shows the reconstruction result (in RMSE) of a normal scene using a Canon 1D Mark III under indoor illumination using metal halide lamp of 2500 K color temperature.
Table 4. This table shows the reconstruction result (in PD) of a normal scene using a Canon 1D Mark III under indoor illumination using metal halide lamp of 2500 K color temperature.

In order to compare the performance of four reconstruction methods, the actual reconstruction results for eight color patches in the color chart were compared in Fig. 2 for Canon 1D Mark III. The quantitative results of these patches for all methods are shown in Tables 1 and 2. Additionally, the reconstruction result of a normal scene were compared in Fig. 3, the RGB image is synthesized using a Canon 1D Mark III under indoor illumination (metal halide lamp with color temperature of 2500 K), and the quantitative results of the scene for all methods are shown in Tables 3 and 4.

3.3 Evaluation of Reconstruction Performance

To verify the quantitative performance for the spectral reflectance reconstruction in the experiments, we use root mean square error (RMSE) to measure the error,

$$\begin{aligned} RMSE(R,\widehat{R}) = \sqrt{\frac{{\sum \limits _x {||R(\lambda ,x) - \widehat{R}(\lambda ,x)||_2^2} }}{N}} \end{aligned}$$
(8)

and Pearson Distance (PD) to measure the similarity,

$$\begin{aligned} PD(R,\widehat{R}) = 1 - \frac{1}{N}\sum \limits _x {\frac{{|\sum \limits _\lambda {R(\lambda ,x)\widehat{R}(\lambda ,x)} |}}{{\sqrt{{{\sum \limits _\lambda {[R(\lambda ,x)]} }^2}} \sqrt{\sum \limits _\lambda {{{[\widehat{R}(\lambda ,x)]}^2}} } }}} \end{aligned}$$
(9)

The PD (1-PD is called as GFC [10]) is independent of the magnitude and therefore gives information about the shape of estimations. Where \(R(\lambda ,x)\) and \(\widehat{R}(\lambda ,x)\) are the actual and reconstructed spectral reflectance, N are the number of pixels in the image, and \(||.|{|_2}\) is \({l^2} - norm\).

4 Results and Discussion

The numerical results for experiments in Tables 1, 2, 3 and 4, which can be summarized as follows:

A conclusion from these results is that model with link function improved the reconstruction performance in terms of RMSE and PD.

For the actual reconstruction results for eight color patches in the Tables 1 and 2, it can been seen that (when compared to the model without link function) three link function improve the reconstruction performance, and the square root 2 link function provides the best results in most cases which the RMSE metrics decrease 17.4%, 11.2%, 30.4%, 55.8%, 63.3% and 8.9% corresponding the patch a, b, c, d, f and g respectively. And the PD metrics decrease 17.4%, 11.2%, 30.4%, 71.9%, and 12.7% corresponding the patch a, b, c, d, and g corresponding the patch a, b, c, d, f and g respectively. Another result was the similarity (indicated by PD) of the reconstruction performance improved obviousely.

For the reconstruction results of the normal scene in Tables 3 and 4, it can been seen that it can been seen that (when compared with the model without link function) three link function improve the reconstruction performance in terms of RMSE and PD, however, the performance of the different link functions is somewhat mixed.

The logit link function shows the best RMSE metrics results in location a, b, and h which decreases 8.9%, 10% and 31.3% respectively compared with the model without the link function, and similarly for PD metric in location a, b, and h which decreases 14.6%, 10%, and 43.8% respectively. The square root 2 link function provides the best RMSE metrics in location d, e, and f which decreases 2.3%, 22.7%, and 2.7% respectively compared with the model without the link function, and similarly for PD metric in location d, e, and f which decreases 2.6%, 30.9%, and 5.6% respectively. The square root 1 link function provides the best RMSE metrics result in location c, and g which decreases 9% and 1.1% respectively compared with the model without the link function, and similarly for PD metric in location c, and g which decreases 18.1% and 2% respectively. From the data, the PD improved obviousely.

The logit link function has been evaluated before in [5], also square root 2 link function has been used for reflectance estimation in [7] combined with principal component analysis. The link function combined with training-based has not been used for reflectance estimation before, but the link function has been proposed to be combined with kernel regression model in [1]. Neverthless, it is possible to introduce the link function to the training-based approach for improve the reconstruction performance. In this paper, our main interest was in the evaluation of the models in Eq. 7 for the estimation with link functions.

5 Conclusion

In this paper, we proposed a new method to reconstruct spectral reflectance from RGB images, which combined with the link function and training-based approach. The training-based approach is based on a radial basis function network and using white-balancing as an intermediate step, the method is learning a mapping between the white balanced RGB images and their specific spectral reflectance which is a non-linear transformation to reflectance via link function. We compared with the performance of different link functions reconstruction method in the color patchess reflectance reconstruction and a normal scene reconstruction experiments.

Our results suggest that link functions improve the spectral accuracy for training-based spectral reconstruction from RGB images. The results show similar relative performance for different link functions and indicate that the spectral error (indicated by RMSE) and the spectral shape (indicated by PD) especially the spectral shape is estimated more accurately via link functions. Another, the model with square root 2 link function decreases several spectral errors significantly in most cases when compared to the model without link function.

Since the approach is combined with the link function and radial basis function network, the results of the training will seriously affect the reconstruction results, for the training stage, a limitation of the approach is the assumption that the scene is illuminated by an uniform illumination, for many scene in reality this is not the case. Moreover, although the training-based approach can handle well the reflectance which have smooth spectra, the approach like other approaches [6] still has poor results in case of spiky spectra. Spectral reconstruction under the narrow band illuminations will be interesting and challenging areas for future researching.