1 Introduction

Color constancy is the vision property that allows humans to identify the color of an object independently of the color of the light source. For example, we are able to perceive a banana as yellow both in a room illuminated with a tungsten bulb – i.e. under reddish light – and outside in a cloudy day i.e. under bluish light. This property means that solving for color constancy – in other words, removing the color of the light – is a fundamental step in digital color image processing.

The chromagenic method for color constancy [3, 4] solves for the illuminant color given 2 images of the same scene, one captured with a color filter and another without. This method can be decomposed in two different steps. In the first step, a set of linear transform matrices are calculated using a set of pairs of filtered and unfiltered images. In particular, each of these matrices relates a particular unfiltered image to its filtered counterpart. In the second step, given a pair of new chromagenic images, the method estimates the light color (the illuminant estimation step) by finding the best transform among the ones calculated in the pre-processing step.

The chromagenic color constancy approach can deliver good estimates of the illuminant [5]. However, the filtered and the unfiltered images need to be registered [7]. This is a limitation for some real-life applications as image registration is usually time-consuming and computationally expensive. In fact, image registration is still an important field of research on its own [6] and cannot always be solved reliably.

In this paper, we present an approach that aims at avoiding the need for image registration in the first step of the chromagenic color constancy algorithm. In particular, we propose to use the Monge-Kantorovitch (MK) transform for obtaining the linear relations between the filtered and unfiltered images.

To show the effectiveness of our new method, we introduce a new pilot database of 63 scenes of chromagenic facial images (to be used in Kampo diagnosis). Using this dataset we demonstrate that our new method supports better color correction compared with assuming registered images (when registration cannot be carried out or is insufficiently accurate).

While the focus of this paper is color correction – of a normal capture and a second image taken though a colored filter – without registration we have investigated using the discovered color corrections for illuminant estimation using the full chromagenic algorithm. However, we found that the dataset is too small to conclude much about estimation performance. Indeed, for this small dataset, we found the modified chromagenic algorithm can work almost perfectly (and conversely chromagenic working with unregistered images can fail). But, in order to study algorithm performance in depth we will need to capture a much large corpus of images. We plan to compile a large set of chromagenic face images in the near future.

This paper is organized as follows. We start by recalling the background of our research: color constancy, the chromagenic color constancy, and an overview of what is the image-based Kampo diagnosis system. Then, we introduce our new dataset of facial images. Section 4 presents our approach. This is followed by the experiments and results. Finally, the paper is summed up in the conclusions.

2 Research Background

2.1 Color Constancy

Color constancy is the ability of a visual system to see objects with same colors regardless of the lighting conditions. In Fig. 1, we can see that the gray ball color varies with the color of the light, this happens when color constancy is not performed, here in the case of a digital camera.

While the human visual system is designed to achieve color constancy, machines – in particular modern digital cameras – need algorithms to accomplish this function (also known as white-balancing in digital photography). In computer vision, color constancy is achieved by first determining the color of the light under which the image scene was captured. Once the light color is estimated, it can be “divided out”. Illuminant estimation is a core component of modern digital cameras reproduction pipelines.

Fig. 1.
figure 1

The rendering of a gray ball under various lights. The image is from the SFU gray-ball dataset [8], appeared in [9].

Illuminant estimation algorithms can be split into two broad classes: algorithms that estimate the illuminant via a ‘bag of pixels’ statistical approach [14,15,16,17], and learning-based methods [18, 19] (including deep learning [20, 21]). There are also less commonly used methods that look for physical insights to drive the light estimation. For example, in the specular highlight method [22], highlights are sought in the scene. It is then assumed that the highlight color is the same as the illuminant color (true for dielectic materials). Another example is the blackbody-model-based algorithm [30, 31] that uses the sensors responses to form an illuminant invariant color space and estimate the power spectrum of the illuminant. Another physics based method is the eponymous chromagenic algorithm [3, 4, 7], see Sect. 2.2, below.

Color constancy – the ability to estimate and then remove the color bias due to illumination – is important in several applications including, object tracking [12], facial recognition [11] and scene understanding [13]. In this paper we focus on a medical application requiring color constancy. Matsushita et al. [2] developed a pathophysiology system to reproduce a Kampo medical diagnosis for number of diseases based on facial images. The method only works when face color in an image is directly related to the physical reflectance properties of a face. This condition is only accomplished when the light illuminating the scene is equienergetic (i.e. achromatic), meaning that color constancy should be applied.

2.2 Chromagenic Color Constancy

In the chromagenic color constancy approach two images are taken of each scene. The first image is a normal capture and the second is an image taken through a specially chosen chromagenic filter. Given reasonable assumptions about the dimensionality of lights and surfaces it was shown in [7] that the filtered and unfiltered responses are related by a linear transform and that this relationship varies with (is intrinsic to) the illuminant color. Put another way, the relationship between filtered and unfiltered RGBs indexes – and so identifies – the illumination.

Mathematically, by adopting the Lambertian model of image formation, if we denote as \(\underline{\rho }\) the normal captured image, and \(\underline{\rho }_{F}\), the image captured by placing a color filter in front of the camera, we can write:

(1)

where \(\lambda \) denotes a particular wavelength, \(\omega \) the visual spectrum (normally from 380 to 740 nm), E is the illuminant, S is the set of scene objects reflectances, k corresponds to R, G or B, the color channels of the digital camera, Q is the camera sensitivity function, and F is the spectral response of the selected filter.

As stated above, it was shown in [7] that under reasonable assumptions about the dimensionality of lights and surfaces, the unfiltered and filtered responses should be related by a 3 \(\times \) 3 linear transform:

$$\begin{aligned} \underline{\rho }_{F} \approx T_{E}^{F}\underline{\rho } \end{aligned}$$
(2)

The chromagenic algorithm works in two steps. First, in pre-processing, we calculate a range of illuminant transform matrices \(T_{i}\) (for \(i=1,..,N\) illuminants) using a least squares approach. In a second step, given a chromagenic pair of images \(\underline{I}(x,y)\) and \(\underline{I}_F(x,y)\), we determine the illuminant color by minimizing:

$$\begin{aligned} argmin_{i}( \varSigma _x\varSigma _y|| T_{i}\underline{I}(x,y)-\underline{I}_F(x,y)||) \end{aligned}$$
(3)

where (xy) represents a particular pixel of the image.

A limitation of chromagenic color constancy is that images need to be registered. Image registration is required both in the least squares minimization of the first step and also in the selection of \(T_{i}\) in the second step. In this work, we present a method to avoid the need for registration in the first step.

2.3 Kampo Medical Diagnosis

Kampo medicine is the traditional Japanese medicine used in Japan and, in alternate forms, across Asia. A Kampo medical diagnosis [25] requires a visual observation, an olfactory examination, an inquiry and a palpation. A face-only diagnosis is, however, possible for various diseases: blood stagnation (due to a poor blood circulation), blood deficiency (resulting from the lack of blood, in other terms when the blood is not regenerated in normal proportions) and yin deficiency (which is a sign of a lack of water at the face level).

Matsushita et al. [2] developed an image-based system for facial Kampo diagnosis. The system emits a diagnosis in the form of a score (from 1 to 5) where 1 indicates a non-disease state and 5 indicates a severe disease state. The Kampo system works as follows. First, given an image of an ill patient, the system generates a hemoglobin density image and a gloss image. The hemoglobin image is the result of a pigmentation component separation by independent component analysis (ICA) [23] and the gloss image is obtained by using a polarizer (the face is captured with and without a polarizing plate) [24]. Five regions of interest are extracted for each of the two images: one region from the forehead area in the image, 2 regions under the eyes and 2 other regions at the cheeks level. A final region is the sum of all these 5 regions. Five features values are calculated from the RGBs values of these 5 regions. In total 60 features are extracted from the images. The system emits a diagnosis by support vector regression (an optimization problem).

In [2], the system was evaluated and tested on a dataset of images generated from images of healthy patients (taken in a lab under a white light) by the modulations of gloss and hemoglobin. The results were compared to Kampo medical doctor diagnostic. In this paper, we also present a new dataset of images we collected for Kampo diagnosis, in order to allow the testing of the system under different lights. However, here we capture every scene with and without a colored filter.

3 A Chromagenic Face Image Dataset for Pathophysiology

We introduce in this section a new dataset of facial images for Kampo pathophysiology diagnosis. The dataset has 63 initial scenes (a set of three facial images of a healthy subject taken under a determined light). Every scene was captured 3 times: one time without a filter and 2 other times with a red and a yellow filter (respectively a Tiffen 85 and a Tiffen 81EF). The images were taken in Chiba University in Japan during the Summer 2018. Nine participants took part in the data collection. All images were taken with a Nikon D5200 camera in a lighting room equipped with 2 Thouslite LED cubes (which allowed us to simulate a range of illuminant color temperatures).

Fig. 2.
figure 2

Two images of the same scene from the dataset showing the ColorChecker chart, the left image is a normal capture and the right image was captured through a red filter, note that these 2 images are the camera pipeline outputs.

The left of Fig. 2 shows one normal image from our dataset, the right shows the same capture but through the red filter. The right image is, of course, redder in appearance. The images are not registered but the experiment conditions were very well controlled, for this reason the difference in the images alignment is not easily noticeable in this case. Notice that there is a ColorChecker in the scene and this is true for all our images. Placing a ColorChecker in every scene is useful for two reasons.

First, we can use it to measure the white point (the RGB of the color of the light). In line with [10] this is defined to be the RGB taken from the brightest unsaturated gray patch of the ColorChecker. In Fig. 4 we show the ground-truth chromaticities for the illuminants in the 63 scenes of the dataset. It is clear that our dataset has a range of illuminant colors.

Second, given a chromagenic pair of images, with the Macbeth ColorChecker in each image, we can solve for the best possible 3 \(\times \) 3 matrix relating the colors of the two color charts (without requiring the pixel-wise registration of the images in this case). We will consider this ColorChecker-based 3 \(\times \) 3 matrix transform as our reference when solving for the linear transform in the case of non-registered images (see Sect. 5). Of course, usually there is no ColorChecker in the scene and solving for the best possible transform is not possible.

While Fig. 2 shows the capture environment, in the Kampo diagnosis we need only the face image. In Fig. 3 we show two sets of 3 images for two of our subjects. From left to right we show the original image. Then there is the same person imaged through a yellow and a red filter.

Fig. 3.
figure 3

Two sets of filtered and unfiltered images from the dataset. The 2 sets represent 2 scenes (2 subjects), from left to right: normal capture, image with yellow filter and image with red filter. (Color figure online)

Fig. 4.
figure 4

Ground-truth chromaticities of the Kampo images dataset.

In the future, we will generate new images with various diseases states from this dataset. These images will be obtained by modulations of gloss and hemoglobin [2].

4 Color Correction Without Registration: The Monge-Kantorovitch Linear Transform

The chromagenic color constancy algorithm works in two steps. In the pre-processing step we solve for the best \(3\times 3\) matrices relating unfiltered to filtered RGBs for a large range of scenes and illuminants. Each transform by construction is associated with a light color. Then in the second step, when we have a chromagenic image pair for an unknown illuminant, we test each of the pre-computed transforms in turn to see which when applied to the unfiltered RGBs best predicts the filtered counterparts. The color of the light is then defined to be the associated light color.

Both the pre-processing step and the application part of the chromagenic algorithm (estimating the illuminant using a pair of chromagenic images) requires registered pairs of images. Unfortunately, even after decades of investigation, image registration remains a hard problem and even when it works it delivers imperfect results [6]. Further, even images that are only slightly out of registration can result in significantly different transforms (\(3\times 3\) matrices) that best relate the unfiltered to filtered RGBs. This is the case for our dataset where the differences in the alignment of the unfiltered and filtered images of the same scene is not significantly visible but this difference still impacts on the transforms (see results in Sect. 5).

In what follows, we only focus on the pre-processing step of the chromagenic method. In particular, we propose using the Monge-Kantorovitch (MK) linear transform to replace the \(3\times 3\) matrix relating the images in the chromagenic pair without registration. Note that the MK transform is a 3-D similarity transform. MK has its roots in the Earth Movers Distance (EMD) [26] (or Wasserstein Metric [28]) which has proven to be a useful tool in image recognition [27]. Imagine we have a few piles of earth. Equal to the volume of all the earth we have several holes to deposit the earth. Clearly if we wish to move the earth into the holes to minimize the energy expelled, we wish to move each shovel full of earth as little as possible. The minimum distance we have to move all the earth is exactly the earth movers distance. It can be efficiently solved using linear programming [29].

Rather usefully, there is a simple and closed form linear restriction to EMD. Given \(M\times N\) data matrices A and B (where \(N>M\)) the classic linear least squares minimization solves:

$$\begin{aligned} \min _T ||TA-B|| \end{aligned}$$
(4)

Of course to solve the above then we exploit the fact the columns of A and B are in correspondence (not the case for non-registered images). In the linear restriction EMD, we seek to find a transform T such that the correlation structure of AT and B matches and that TA is as close to A as possible (that is the colors in A move as little as possible). Specifically, we minimize

$$\begin{aligned} \min _T ||TA-A|| \;s.t.\; TAA^tT^t = BB^t \end{aligned}$$
(5)

Pitié et al. [1] have shown that MK (or linear restriction to EMD) can be used in color grading (to map the colors of an input image to match the look and feel of a target image).

Here we use Eq. 5 to find the transform relating an RGB image to its filtered counterpart. In Eq. 5, A would contain the pixels from the unfiltered image and B pixels for the same scene captured through a colored filter. The pixels in A and B may not be in correspondence. Indeed, there is no constraint that the number of pixels in the filtered and unfiltered images need to be the same.

5 Experiments and Results

In order to evaluate the effectiveness of our approach, we compare the results given by our method (Eq. 5; MK) to those obtained by the usual least squares procedure (Eq. 4; LST) on our Kampo dataset.

As a reference point, we use the fact that a Macbeth ColorChecker chart is included in our images. This allows us to compute the best possible linear transform between the two images (\(T_{CC}\)) by computing the least squares regression only considering those colors of the color charts. Thanks to this, we can now evaluate the difference between the best linear transform and the two other solutions as

$$ \epsilon _{m}=||T_{CC}A-f_{m}(A)|| $$

where A is the unfiltered image and \(f_{m}\) states the method being computed where \(m=\left\{ LS,MK\right\} \).

Table 1. Mean and RMS error for MK vs LST approaches with red and yellow filters

Figure 5 plots the individual errors for all images: the upper graph is the result for the red filter and the lower graph is the result for the yellow filter. These results are summarized in Table 1, where we present the mean and RMS errors for the dataset. Our method improves the usual procedure by at least \(75\%\). Note all image values are in the interval [0,1] so a mean error of 0.01 corresponds to a 1% error.

Visual examples of our results are presented in Fig. 6 where we show from left to right: an unfiltered image, the image corrected by using the ColorChecker (i.e., the best possible result or reference), the image corrected using MK (the approach proposed in this paper), and the image corrected with the LST approach. The upper example was generated with the red filter and the lower with the yellow filter. The images were linearly converted from RAW format and demosaiced. We can clearly see that our approach generates colors that are very close to the best possible solution.

Fig. 5.
figure 5

Chromagenic distance when using MK vs the least squares transform LST. Top: red filter. Bottom: yellow filter. (Color figure online)

Fig. 6.
figure 6

Two scenes from the dataset (upper with red filter and lower with yellow filter), from left to right: the original unfiltered image, the color corrected image with CCT (the ColorChecker-based transform), the color corrected image with MK and the color corrected image with LST. (Color figure online)

6 Conclusion

This paper introduces a new image dataset comprising 63 scenes of facial images taken under a variety of lights and, novelly, with and without a color filter. Given a pair of filtered and unfiltered images it will be possible to use the chromagenic approach to illuminant estimation. The chromagenic algorithm has two parts: first we need to relate the unfiltered to filtered image using a linear transform. Second, we need to identify the illuminant by searching for the best transform. This paper focuses on the first question only.

We show, using the Monge-Kantorovitch (MK) transform, how we can solve for the linear map without the need to register the images. This is of significant practical importance. Not only is registration a hard problem it cannot always be solved in a pixel-wise manner. Here we remove the need for registration altogether. Moreover, we show that the MK method outperforms direct least squares (where we assume good registration when this not the case) by a factor of about 4:1.

Looking to the future our plan is to capture a larger set of facial images so we can test the second part of the chromagenic algorithm. That is, we will investigate whether MK suffices to allow the chromagenic algorithm to estimate the illuminant for face images.