1 Introduction

Food crisis and undernutrition have been critical issues in many low and middle income countries. One promising way to address this problem is to provide targeted delivery of health and supplementary nutrition (like the Plumpy‘Nut project in East Africa), food fortification, and empowering local villages to grow nutritious foods.

However, aside from daily food diaries and reports from local health workers, there is no reliable automated method to identify foods consumed and their caloric content. A method is needed to detect - for every food consumed - a unique food signature that is invariant to light, heat, and slight variations in visual appearance.

Despite several efforts in the field of image recognition, conventional imaging technologies that only acquire morphology are not adequate for the accurate detection and assessment of intrinsic properties of a food. Spectroscopic imaging covering the visible and near-IR spectrum (from 400 nm to 1100 nm) can help identify unique spectral features that readily discriminate between foods. In this paper, through feature extraction and classification on hyperspectral images of real food, we show that hyperspectral imaging provides useful information and promise in distinguishing caloric composition within the same food type.

2 Related Works

Throughout the past decade, food images have been widely studied. For food image segmentation, Zhu et al. [13] employs connected component analysis and normalized cuts. Anthimopoulos et al. [1] uses mean-shift algorithm. Kong et al. [6] extracts Scale Invariant Feature Transform (SIFT) points. Matsuda et al. [8] provides a method to segment food images containing multiple food items by using Felzenszwalb’s deformable part model (DPM) and JSEG region segmentation. For food feature extraction, Gabor features [13], simple color and text features [12] have been explored on food images. In Dehais et al. [3], SIFT and Speeded Up Robust Features (SURF) detectors are used.

Meanwhile, hyperspectral imaging has developed wide uses for identifying the chemical and physical properties of food. Some of these uses include predicting color [9], detecting damage [4], and analyzing quality attributes [5]. Also many works have been focused on bruises detection [2, 7], and food quality classification [10] by using PCA.

However, for food image recognition, while these methods have been successful in various data sets, they are all based on true color images which only focus on RGB channels. In this paper, we argue that RGB channels can be insufficient in many natural settings where foods and dishes are more complex (i.e. food in restaurants). To solve this problem, we take advantage of hyperspectral images which contain information from multiple spectra. In hyperspectral image analysis, only a few methods have been proposed to detect calorie contents of real-world foods. This work provides a method to distinguish calorie content of food types using hyperspectral imaging.

3 Methods

In this section we introduce our hyperspectral image processing technique. To begin, we first introduce our data (foods cooked), then our image acquisition system, followed by our food type identification and within-food calorie discriminant techniques.

3.1 Food Samples and the Hyperspectral Imaging System

To simulate real-world foods in low income countries, we prepared food samples using traditional Nigerian recipes. Three separate dishes were made with three different levels of fat content, for a total of nine samples. The three dishes were white rice, chicken stew, and spiced yams. The fat content was adjusted by using various quantities of oil during cooking. The low-fat rice was prepared without butter while the medium- and high-fat rice dishes were prepared using one and four tablespoons of butter, respectively. Similarly, for the low-fat yam and stew no oil was used. One tablespoon of oil was used for the medium fat dishes, and four tablespoons of oil were used for the high fat dishes. The nutritional information for each dish was calculated based on the nutritional value of the ingredients. The nutritional information can be viewed in Table 1. During the measurement process, the system was shielded from outside light in order to decrease the interference of ambient light. The size of the recorded images is 240 wavebands with a resolution of 640\(\,\times \,\)244 pixels. Image sizes varied slightly depending on the location of the camera at the beginning of each image acquisition. Data was collected for each pixel in the wavelength range of 393 nm to 892 nm.

Table 1. Nutritional information for food samples.

3.2 Hyperspectral Image Acquisition

A total number of 30 images were taken. For each dish, three images were taken: one in each light setting. This led to a total of 27 images (9\(\,\times \,\)3), followed by three extra reference images taken in between light adjustments to calibrate the system to the newer lighting settings. The images were acquired using a laboratory hyperspectral system. Only halogen lights were used, and all other light sources in the room were turned off. During image acquisition, 100 g of each sample were placed in straw fiber bowls.

Due to the variant intensity of the light source, calibration with light and dark references was necessary in order to obtain accurate hyperspectral images. The dark reference was used in order to remove dark current effects. This image was collected by placing a black cap over the camera lens and turning off all lights in the room. The light reference was obtained by taking an image of a reflective white sheet of paper. The corrected image can be calculated by \(R = {R_o-R_d}/{R_r-R_d}\) where \(R_o\) is the acquired original hyperspectral image, \(R_r\) is the white reference image, \(R_d\) is the dark image.

This procedure was repeated twice more, and each time the two light sources were moved to a greater distance from the sample. Each time, the light source was adjusted, the images used for references had to be retaken in order to maintain an accurate reference measurement. The exposure time and camera speed remained the same for all images.

3.3 Feature Extraction for Food Item Detection

To identify between food types, we perform preprocessing, feature extraction, and classification, each section is described below:

Preprocessing. The whole preprocessing procedure includes three steps: data cleaning, dimension reduction, and patch selection. During data cleaning we remove noisy bands between 360 nm to 480 nm (first 30 channels), so the wavelength range used for feature extraction is between 480 nm and 892 nm. The resulting data set includes 30 images (644\(\,\times \,\)244 pixel resolution each) with 210 wavebands.

To further reduce the dimension of the data, we merge 210 wavebands into seven larger wavebands. That is, for each pixel, we calculate the mean for every 30 wavebands. Therefore, each pixel is represented by a seven-dimensional vector. Finally, we select 40 30\(\,\times \,\)30 pixels patches from each image at each waveband to expand our data set and also enrich the training of our model.

Feature Extraction. The left side of Fig. 1 shows different food items we used for feature extraction. From the image, we can easily see that the visual appearance of different food items is largely different, while the same food items almost look alike and are homogeneous. This reminds us that simple statistical features such as mean and standard deviation from the visible light spectrum are enough for food type detection, particularly with a low selection of food types.

Fig. 1.
figure 1

Left side: Picture of our food samples. Right side: Average spectrum for three kinds of food items under the same light condition and fat content.

Classification. For food item detection, we extract the mean, standard deviation, max and min from the first three wavebands for a total of 12 features. We then build a random forest classifier using these features to classify three kinds of food items.

3.4 Feature Extraction for Calorie Content Detection

After classifying the food type, we need to classify the food calorie content. The variation in calorie content is represented in three labels of low, medium, and high fat. As mentioned before, the images have been captured under different light conditions: high, medium, and low light. Images captured with different fat content and different light conditions of stew and yams have very similar spectra which makes distinguishing between them very challenging. In Fig. 2, the spectra of stew with medium and high fat in three different light conditions are provided and as can be observed they have very similar shapes and intensities. For fat content classification, for each food, we use nine images: three images under different light conditions for each fat condition. Before classifying the calorie content we apply different preprocessing and feature extraction described below.

Fig. 2.
figure 2

Images of stews with medium and high fat content captured under different light conditions. Even though the system was calibrated between each light condition, there are still differences in the spectra based on the intensity of light.

Preprocessing. First, we crop each image and select a rectangle of the image that contains the food. Then, we remove the first 30 noisy bands. To have a more meaningful data representation in a lower dimensional space, we apply PCA and select the first k-components which accounts for much of the variance in the data (accounting for 98% variance of the information). We selected 16, 51 and 43 PCs for rice, yams and stew, respectively.

Feature Extraction. After preprocessing the images, we divide the data into train and test sets. The images in two different light conditions (e.g. low and high) are considered as training images and the images in another light condition (e.g. medium) are used as test images. Now that we have training and test images, we extract patches of size 30\(\,\times \,\)30\(\,\times \,\)k from each image and use the “mean” spectrum over the patch pixels as the new feature vector (size k).

4 Experiments and Results

In this section we apply classification methods to the food data sets to evaluate selected features.

4.1 Food Item Classification

The random forest yielded best performance with 10 trees and a maximum depth of 10. The confusion matrix for three labels is shown in Table 2. Results show that the classifier achieves more than 98% accuracy.

Table 2. Confusion matrix and F-measure for food item classification.

4.2 Food Calorie Content Classification

For the fat content classification, we use Radial Basis Function (RBF) kernel SVM which has been used widely for classification problems. In RBF, the kernel function is: \(K(x_i, x_j) = \text {exp}(-\gamma {\parallel x_i-x_j \parallel }^2_2)\). The optimal values of hyper parameters of RBF classifier are obtained by using n-fold cross validation. The optimal \(\gamma \) values have been found as \(1.1 \text {e}-2\), \(1.9 \text {e}-2\) and \(2.3 \text {e}-2\), for rice, yams and stew respectively. Then, the classifiers are trained with the optimal hyper parameters and tested on the test dataset.

Table 3. Results for food calorie content identification using RGB and hyper-spectral images.

We also apply the same procedure with true-color RGB images to compare against the hyperspectral images. We extracted 30\(\,\times \,\)30\(\,\times \,\)3 patches from the images and calculated the “mean” vector over pixel patches and trained RBF kernel SVMs. Table 3 shows confusion matrices for each food type using RGB images and hyperspectral images. As can be observed from these tables, using hyperspectral images provides us with more discriminant information to classify foods compared to RGB images. However, if we look at the Yams’ confusion matrices, it seems that RGB-based features yield higher recall (but lower precision) when testing on the low fat class while hyperspectral features work better for the other two classes. To use the outputs obtained by both set of the features, we used the simple Arithmetic Mean Rule (AMR) [11] to combine the probability outputs of two classifiers. It was found that using AMR, increases the accuracy from \(81.66\%\) to \(85.0\%\). The confusion matrix from this approach is given in Table 4. As can be observed from Table 4, in the new approach more samples of low fat and medium fat are classified correctly (yielding higher recall and precision).

Table 4. Results for food calorie content identification in yams using RGB and hyper-spectral images.

5 Conclusion and Discussion

In this paper we demonstrated that hyperspectral imaging is able to aid in distinguishing between food calorie content with varying percentage of fat. For food type detection, we generated features from the visible spectrum only and achieved a 97% F-measure. When distinguishing calorie content, our method using PCA and SVM with an RBF kernel on hyperspectral images was able to improve our ability to distinguish calorie content, compared to using only true-color RGB images. We further show potential for improving classification by augmenting data from both hyperspectral and RGB images. Such findings show that further expanding existing databases of food images with hyperspectral images may further advance automated image-based calorie detection. However, there remain some limitations in this work, which we are inspired to address in the future. First of all we use a limited number of food types, and only vary fat content. We also only analyze hyperspectral data with wavelengths ranging from 393 to 892 nm. Moreover, while our food is cooked in a home environment, our images are extracted in a laboratory environment. Our results inspire further research in the use of hyperspectral imaging to advance automated estimation of calorie content.