Keywords

1 Introduction

In order to assist the search of clandestine graves several methods have been developed, of which non-intrusive methods are preferred because they reduce terrain disturbance and search times. One of the most used methods is the ground-penetration radar, which is a near-surface geophysical technique that identifies underground anomalies through variations of dielectric permittivity using the backscatter of a microwave beam [13]. Other studies have taken advantage of the chemical properties of the soil in the presence of human (or animal) remains, which shows significant differences in the concentrations of several elements such as N and M, among others, besides the presence of several volatile organic compounds (toluene, benzene and hexane) that reduce or modify the growth pattern of vegetation, which has also been exploited to identify massive graves [2, 12, 13]. Although all these approaches can be very effective in detecting common graves, they are often impractical, costly and risky, since they need to be implemented in situ, which seriously limits the area of exploration and exposes the search team to a high risk.

Remote sensing and pattern recognition methods offer a low risk alternative that allows increasing the area of exploration and reduce the associated costs and search times. In this way, vegetation patterns can be identified, soil mineral concentration measured, or some of the physical features mentioned above detected [6, 9]. For instance, aerial photographs have been used to detect abnormal changes of graves vegetation, as well as abnormal landmarks on the graves [4]. Ultraviolet photographs have also been used to map the maturity of vegetation, since graves present younger vegetation than other areas due to a change in soil pH [13]. Obviously the detection of clandestine graves through aerial photographs presents limitations in terms of spectral sensitivity since they detect the reflected light signal in three spectral bands, generally of the visible spectrum. More recent investigations have evaluated the effectiveness of multispectral and hyper-spectral images in the detection of human remains [8,9,10]. For instance, [9] used in situ data acquired with the field spectrometer ASD Fieldspec FR (in the spectral range of 350–2500 nm) and the HyMap II sensor (125 spectral bands in the range of 450–2500 nm at 4.7–5.2 m spatial resolution) for detecting simulated animal mass graves. They found that, on both scales (in situ and airborne), the graves with bodies did have a signature that distinguishes it from graves that did not contain bodies. They also observed that regeneration of vegetation was strongly inhibited by the presence of residues. A more recent study [10] had demonstrated the feasibility of hyperspectral sensors (CASI and SASI covering the spectral ranges of 408–905 nm and 883–2524 nm respectively) for the detection of single graves, but also acknowledged that additional research was needed to consider other environments and areas with different types of vegetation [10].

Although it is recognized that hyperspectral images constitute a rich source of radiometric information, it is also recognized that in most cases such information can be redundant or useless so that one must question the actual utility of the data for each type of problem. For the case of the detection of clandestine graves, not only we what to know What is the true potential of the hyperspectral images to detect buried remains? But also, What is the useful information in the hyperspectral images? And What is the time window for a better detection? These questions were addressed through simulated graves with domestic pig carcasses, for which the evolution of soil reflectance over a period of six months was measured and spectral signatures were analyzed using standard pattern recognition techniques. In the rest of the paper we describe the experimental setting used (Sect. 2), the methods used for grave detection (Sect. 3), and the results obtained (Sect. 4).

2 Experimental Setting

2.1 Graves Simulation

In February of 2016, we simulated seven graves near Yautepec, Morelos. The location of graves fulfils a number of required characteristics such as accessibility, security, representativeness and climate. Temperature and humidity regimes in this site ensures a relatively fast decomposition of buried remains, so that its effect on the upper surface, if any, could be observed within a monitoring period of around six months. The land was bounded by a solid wall which prevented removal of remains by scavengers and/or human intrusions. Pits of 2 by 2 m wide by 1.2–1.5 m depth were created with a backhoe in an area of 400 m\(^2\) (Fig. 1). Then, ten carcasses of domestic pigs (pietrain breed) were distributed into four of the seven pits, as indicated Fig. 1, and Table 1 and pits were covered back with the same soil. The other three empty pits were also covered, and where meant to simulate soil disturbances caused by other non-burial process such as plow or constructions. The graves were marked to avoid walking over them during the whole monitoring period and other areas apart from graves were also designated to kept undisturbed.

Fig. 1.
figure 1

Location and distribution of simulated graves. Graves are labeled as F1 through F7 and their characteristics are summarized in Table 1. Main entrance (South) and interior walls (North) are also indicated.

Table 1. Characteristics of simulated graves.

2.2 Hyperspectral Measurements

After burial, the evolution of soil reflectance was measured every other week using a full-range field spectroradiometer (Field Spec 4 Std Res by ASD Inc. @ 350–2500 nm). In order to ensure that the same locations were measured every time, a squared threads mesh was fixed at 1.5 m above aground, and with the threads separated by one meter in each direction. Note that the graves were distributed systematically so that 2-by-2 vertices of the mesh fell in the center of each grave, thus ensuring the maximum number of samples per grave. The reflectance measurements were made on each vertex of the mesh by pointing the fiber optic vertically towards the soil surface and using a standard calibration procedure based on a labertian surface of known reflectance (Spectralon). Since measurements were taken using the natural illumination, the time of data acquisition was approximately within 11:00 am–13:00 pm, and whenever the sky conditions was clear or mostly clear, the winds blew slowly (0–5 kmh) and there was no rainfall.

A total of twelve hyperspectral images of 19-by-19 pixels and 2151 spectral bands were formed from the measurements of every sampling date, from February 12 to July 29 (Fig. 2). Although the ground sampling distance was 1 m, the instantaneous field of view of sensor was \(25^{\circ }\) giving an effective image resolution of 0.67 m. By July, the vegetation had already over passed the mesh height so that some measurements were carried out under the canopy of vegetation. This was actually the main reason to stop monitoring as measurements under the canopy are not compatible with airborne or satellite measurements. Images were not georeferenced as this was not necessary to the project purpose, yet they can be easily aligned to North with a counter-clockwise rotation by an angle of \(19^{\circ }\) (See Fig. 1).

2.3 Data Preparation

A number of correction and calibration procedures were applied to the images in order to perform a consistent analysis. We first corrected the spectral misalignments between individual sensors of the instrument (VNIR: 350–1000 nm, SWIR1:1001–1800 nm and SWIR2:1800–2500 nm). This was observed only in a couple of images and was caused by an insufficient warm-up of the instrument. The problem was corrected by applying an offset to the reflectances captured by the SWIR1 and SWIR2 sensors. We also eliminated spectral bands of low atmospheric transmissivity in which reflectance values fluctuated erratically (1350–1480 nm, 1780–2032 nm and 2450–2500 nm). Some images also presented the so-called banding artifacts in which the pixels of an entire line appear lighter or darker than those of neighbour lines. This was caused by a poor calibration of sensor and was corrected by multiplying the target line by a coefficient that was estimated using neighbour lines. Finally, all the measurements were normalized as if they were taken under same illumination conditions. The radiometric variability was caused, among other things, by the seasonal variation of the incidence angle of direct sun rays. Fortunately, one of the points of the mesh fell on a piece of concrete which was regarded as spectrally invariant in the spectral range 400–500 nm (blue band). Using as a reference the first acquired image, the other images were re-calibrated through the empirical line method, so that following calibration all images presented similar reflectances in the blue band for the spectrally-invariant point.

Fig. 2.
figure 2

Color infrared composite of corrected and normalized images for each acquisition date. The number of days from burial is shown on top of each image and the burial date was February 5, 2016. (Color figure online)

Fig. 3.
figure 3

Average spectral reflectance for classes of interest including undisturbed ground, disturbed ground and graves with n carcasses (n-grave), for \(n=1,2,3\), and 4. The number of days from burial is indicated on top of each plot.

The color-infrared composite of corrected and normalized images are shown in Fig. 2, where black pixels in the upper-left corner correspond to missing data that were not measured, whereas those of the vertical line in the 32-days image (March 7) were eliminated due to measuring errors. Figure 3 shows plots of average spectral reflectance for classes of interests including disturbed and undisturbed ground and graves with n carcasses denoted as n-grave, for \(n=1,2,\) and 4.

3 Methods for Detecting Buried Remains

The detection of buried remains with hyperspectral measurements was formulated in terms of an estimate for number of buried carcasses, while the estimation was based on the so-called Partial Least Square method, where the independent variables corresponded to reflectance at each wavelength. This method was chosen for its relative simplicity and because it is appropriate for the case when the number of variables is larger than the number of samples, as in our case.

3.1 Partial Least Squares

The Partial Least Square (PLS) method is a statistical technique that consists in estimating latent variables that serve as components (scores) to which both the independent and dependent variables are projected [7, 14]. Let X be an \(n\times m\) matrix containing m samples of n independent variables and let Y be an \(n\times p\) matrix containing the corresponding n samples of p dependent variables (in our case \(p=1\)). The PLS model relates X and Y through the system of equations:

$$\begin{aligned} X = TP^T+E \end{aligned}$$
(1)
$$\begin{aligned} Y = TQ^T+F \end{aligned}$$
(2)

where T is an \(n\times l\) matrix of l scores, P and Q are orthogonal loading matrices E and F are independent and identically distributed random normal variables. There are several algorithms to estimate the scores and loading matrices in the PLS model above, which also provide estimates of the dependent variable in the form of linear regression \(Y=XB+B_0\), where B and \(B_0\) are regression coefficients.

In this study we used the PLSREGRRESS function implemented in MATLAB software (The Math Works Inc. 2010), which is based on mean-subtracted variables \(X_0\) and \(Y_0\), so that \(P=X_0^TT, Q=Y_0^TT\) and \(T=X_0W\) for a weighting matrix W of order \(m\times l\) so that T is orthonormal. This implementation also allows to determine the optimal number of components l through a k-fold cross-validation approach. With this method, the dataset is divided into k subsets, and the holdout method is repeated k times. Each time, one of the k subsets is used as the test set and the other \(k-1\) subsets are put together to form a training set. Then the average error across all k trials is computed. The process is repeated for \(1,2,\ldots , l_{max}\) components, and the number of optimal components is selected as the one with lowest prediction error.

3.2 Variable Importance of Projection

One advantage of using PLS method is that there are many ways to reduce the number of variables in the final model [11]. One of such method is based on the so-called variable importance of projection (VIP), which measures the contribution of each variable according to its relative importance given by its loading and the relative variance explained by each PLS component. More specifically, the VIP of the jth variable is computed as:

$$\begin{aligned} v_{j} = \sqrt{\frac{{m\sum \nolimits _{{i = 1}}^{l} {q_{i}^{2} t_{i}^{T} t_{i} (w_{{ij}} /\left\| {w_{j}} \right\| )^{2} } }}{{\sum \nolimits _{{i = 1}}^{l} {q_{i}^{2} t_{i}^{T} t_{i} } }}} \end{aligned}$$
(3)

where \(w_{i,j}\) is the loading weight of the j variable in the i component, and \(t_i, q_i\) and \(w_i\) are the ith column vectors of matrices TQ and W, respectively. In our case, since T is orthonormal, then \(t_i^Tt_i=1\) for all i.

Theoretically, variables with VIP values greater or equal than 1 must be included in the final model. In practice, however, this variable selection strategy may leave out some important variables or include unimportant variables by chance because the computation of the VIP is based on a relatively small sample. A more robust alternative named the Bootstrap-VIP technique [5] was used here. This method is based on repeated computations of the VIP for subsamples of around 40% from the entire sample. Then, variables are selected if their average VIP plus one standard deviation exceed the unit.

4 Results

4.1 Temporal Window of Optimal Detection

We computed the average spectral separability of graves and disturbed ground with respect to undisturbed ground pixels and of graves with respect to disturbed ground. The spectral separability was computed as 1 minus the cosine of the spectral angle between a pair of reflectance signatures. This measure becomes zero when the reflectance curves have similar shapes and differ only by a constant factor. On the other hand, it tends to unit for two reflectance curves with totally distinct shapes (orthogonal). We hypothesize that the spectral signatures of graves with remains have shapes that are distinct from those of other sites, so that the higher the separability the more likely the detection of graves will be.

Fig. 4.
figure 4

Average separability of grave pixels with respect to undisturbed pixels (top) and average separability of grave with remains with respect to disturbed ground.

An increase in spectral separability with respect to undisturbed pixels was observed for both graves (Fig. 4-left) and disturbed ground (center) within the first three months. This increase in spectral separability seems to be caused by a faster vegetation growth on grave and disturbed ground than on undisturbed pixels. Nevertheless, the grave pixels exhibited also increasing separability with respect to disturbed ground (right). The latter shows the greatest separability within the period between 90 to 120 days after the burial. This seemed to be caused also by a relative greater growth rate of vegetation for grave pixels. The second mode, which occurred towards the end of the monitoring period (174 days), was due to shadows cast by vegetation that had already overpassed the measuring height. A third and lowest mode occurred around the 7 weeks from burial, which coincides with the period of advanced decay of pig carcasses [1], but also with the early appearance of vegetation over the graves with the remains.

4.2 Useful Spectral Intervals

In order to determine the most useful wavelengths for detecting buried remains, we applied the Bootstrap-VIP method as follows. A training sample was first selected consisting of around 50% of image pixels that were randomly selected from three images acquired within the temporal windows of higher spectral separability (90–120 days). Due to the lower representativity of graves, all of available pixels on graves were added to the sample. This dataset was used with a 30-fold cross-validation strategy, which yielded an optimal number of PLS components of 21. With that number of components, the Bootstrap-VIP method was then applied and the most important variables were selected. Finally, the PLS regression coefficients were determined for selected variables.

Fig. 5.
figure 5

The top plot shows the mean VIP (black line) and its uncertainty (shaded area) estimated with Bootstrap-VIP method. The red line indicates the variable selection threshold. The bottom plot shows the regression coefficients B for selected variables (reflectance). Typical spectral bands of space-borne remote sensing are highlighted in colors. From left to right they are: Coastal, Blue, Green, Yellow, Red, Red Edge, NIR-1, NIR-2, SWIR-1, SWIR-2 (Color figure online)

Figure 5-top shows results from the Bootstrap-VIP method, where the upper envelop of the gray area indicates the mean VIP plus one standard deviation, which was used to select the model variables (values above the red line). Figure 5-bottom shows a plot of the regression coefficients B for selected variables. As a reference, the typical spectral bands used in remote sensing are highlighted. It should be noted that selected wavelength intervals are distributed across the full spectral range, with the major contributions within the NIR (700–1000 nm) and the SWIR1 (1001–1800-nm) spectral regions. The blue, yellow and red bands had little or no use in this model. Some spectral regions that are not measured from space at all turned out to be important, whereas some intervals are too narrow (\(\ll \)100 nm) for any multi-spectral sensor, thus stressing the need for hyperspectral sensing.

4.3 Detection Accuracy

Constructed model was applied to all images and estimates were compared with actual number of buried carcasses. Figure 6 shows the scatter plots between estimates and actual values. In most cases, the model underestimated the number of bodies and did not present a clear positive trend, except for one date (118 days). This suggests that the model presents a very narrow time window for accurately estimating the number of buried carcasses (black line). Fortunately, the correct detection of graves requires estimating at least one carcase (red line), for which the temporal window can be wider. This can be shown best by computing some measures of the estimation errors and of the detection accuracy. In order to compensate the unbalanced proportion of pixels from graves with respect to those from other sites, we computed the average per-class RMSE (Classwise RMSE) and the kappa coefficient of agreement [3]. In addition, and just for comparison, we computed the root mean squared error over all pixels (Overall RMSE) and the percent of correctly detected pixels (Overall Accuracy).

Fig. 6.
figure 6

Scatterplots between actual and estimated number of carcasses. The black line indicates the ideal case of perfect estimation and the red line indicates the detection threshold applied. The number of days from burial is indicated on top of each plot. (Color figure online)

If one only considers the Overall RMSE, the estimations have virtually the same low error throughout the monitoring period except for the last date (Fig. 7, left, blue bars). This is because the vast majority of pixels are associated with undisturbed ground. However when considering the Classwise RMSE, the pattern reflect more clearly what is observed in the scatterplots of Fig. 6. The Classwise RMSE demonstrates that only in two dates the estimation error was below one. Likewise, when using the overall accuracy one observes that more than 90% of the pixels are classified correctly, in all but the last sampling date (Fig. 7, right, blue bars). This apparent high detection accuracy is due the much higher number of pixels for one class for which one should expect a very high accuracy by chance. The kappa coefficient compensates the amount of accuracy by chance and thus it is a more reliable measure. The kappa value remains near zero for the first two and a half months afterwards it starts to increase steadily until day 118, and then it decreases again but in a slower and irregular pattern. This irregularity could be associated to the fact that on these dates some measures were already made below the vegetation canopy, where shadows produced a greater variation in reflectance.

Fig. 7.
figure 7

Root mean square error of estimated carcasses with PLS (left) and detection accuracy after thresholding estimations. (Color figure online)

The spatial distribution of the errors can be seen in Fig. 8 for each sampling date. The correct detection of the graves (light gray tone) was made possible only on images acquired after 100 days from burial. It is also interesting to note that the date of optimum detection (118 days) did not correspond to the date of greatest spectral separability (Fig. 4), and that the grave F6 was not detected on this date, the grave F4 was partially detected (3 out of 4 pixels), while the graves F5 and F7 were fully detected, thus suggesting that the amount of buried mass was a factor for successful detection on that date. Yet, that the four graves were fully detected only on the last date (072916) suggested that the detection key on that date was the shadowing of taller vegetation that developed on those sites, and which also influenced neighbouring pixels causing false detections.

Fig. 8.
figure 8

Distribution of errors/successes detection for each sampling date, where date is specified in terms of the number of days from burial and the colors represent: incorrect true (red), incorrect false (blue), correct true (light gray) and correct false (dark gray). (Color figure online)

5 Conclusions

We have presented preliminary results from a study aiming at assessing the potential of hyperspectral imagery for detecting clandestine graves in Mexico. Through a controlled experiment, we demonstrated that the timing for image acquisition is very important as the effect of decay of buried remains can become apparent only after some time period of around three months. Through a least square analysis of monitored surface reflectance, we demonstrated that the NIR and SWIR1 spectral regions are critical for estimating the concentration of buried remains. Although some spectral intervals that were relevant to the detection are wide enough to be measured by multispectral sensors, there were several narrow intervals that stressed the need for hyperspectral sensing.

The results presented in this paper may guide the construction of spectral indices that can be used for the delimitation of search areas by forensic anthropologists. Given the costs involved in hyperspectral image acquisition, a further examination of simulated multispectral data needs to be conducted in order to determine to what extend the multispectral images, which are much cheaper, can be used for detecting graves. Also, further research should explore non-linear models, since the PLS model may have limited power for representing the complexity of the data used in this study. One of such alternatives is the support vector machine with nonlinear kernel, which is being investigated in this project.