International Journal of Applied Earth Observation and Geoinformation
A machine learning approach to detect crude oil contamination in a real scenario using hyperspectral remote sensing
Introduction
Crude oil pollution is a global anthropogenic problem. It mainly occurs through oil spills, leaks, and accidents caused by either system failures, human errors, or negligence. The presence of crude oil in the environment is hazardous or even deadly to wildlife, as well as to humans, and has negative effects on the natural environment (Aguilera et al., 2010; Chima and Vure, 2014; Kruse et al., 1993; Ramirez et al., 2017). In the case of an oil spill event, the location and spatial extent of the pollution must be precisely mapped. This information can then be transferred to response personnel to implement countermeasures and focus on cleaning, rehabilitation, and long-term monitoring efforts, in order to minimize the environmental damage.
Hyperspectral remote sensing (HRS) technology, also known as imaging spectroscopy, records the radiation reflected or emitted from the surface in tens or hundreds of continuously spaced narrow spectral bands in each pixel. The spectral response in each pixel can then be related to chemical and\or physical properties of the surface or material (Ben-Dor et al., 2009; Ben‐Dor et al., 2008; Kruse, 1988). Hence, HRS can be utilized to detect individual absorption features (i.e., the fraction of incident radiation absorbed by the material over a range of wavelengths) due to specific chemical bonds in a solid, liquid, or gas. However, the actual detection of materials using HRS depends on the spectral coverage, spectral resolution, and signal-to- noise ratio (SNR) of the sensor, as well as the abundance of the material and the strength of the absorption features (Andreoli et al., 2007).
Crude oil transported in ships and pipelines from the production locations to the distillation facilities, contains petroleum hydrocarbons (PHC), which have distinctive spectral absorptions in ∼1200, ∼1700, and ∼2300 nm (further in Fig. 3) due to the vibrational stretching and bending of the CH bond in alkanes (Cloutis, 1989; Schwartz et al., 2009; Winkelmann, 2005). Therefore, and as opposed to methods such as drilling and geochemical analysis, HRS can be used to detect PHC on the surface (of bare soils) from a distance (Asadzadeh and de Souza Filho, 2017; Ellis et al., 2001; Hörig et al., 2001; Kühn et al., 2004; Pabón et al., 2019; Scafutto et al., 2017; Van Der Meer et al., 2002), thereby offering a time-saving, cost-effective, and non-destructive method of characterizing oil spills and their impact on the environment.
To date, most studies conducted on the ability to detect PHC using HRS in terrestrial areas have been based on an indirect approach, which mainly identifies anomalies in vegetation or soil caused by the presence of PHC (e.g., Khanna et al., 2013; Kokaly et al., 2013; Li et al., 2005; Noomen et al., 2012; Sanches et al., 2013; Van Der Werff et al., 2008; Yang et al., 2000). Fewer studies have focused on the direct approach, meaning to directly sense the spectral signature of PHC on the surface (bare soil) that results from natural seepage or an oil spill. More specifically, as this study deals with the latter situation, only a handful of published studies (Hörig et al., 2001; Kühn et al., 2004; Lenz et al., 2015; Lever et al., 2015; Pabón et al., 2019; Scafutto et al., 2017) used in-situ airborne HRS for that purpose, and only one of them (Pabón et al., 2019) was performed in an actual oil spill situation (oil refinery). The different approaches and methods explored in these studies reported good results in demonstrating the potential ability of HRS as an operative tool for detecting and monitoring PHC. All of them showed that the spectral features of PHC were clearly seen in the spectra collected from an airborne hyperspectral sensor upon which they based their analyses.
Nevertheless, they all operated under optimal and controlled conditions (excluding Pabón et al., 2019), such as: (i) use of artificial man-made samples which, among other things, ensure at least one pure pixel of their samples to facilitate identification of PHC and end-member selection; (ii) HRS image acquisition performed shortly after the artificial samples had been contaminated; (iii) use of reference spectra from the laboratory or in-situ; and most importantly (iv) PHC spectral absorptions were clearly visible in the HRS spectra. None of these studies attempted to evaluate direct detection of PHC by HRS in a real disaster zone, where more limitations need to be considered than under controlled conditions. For example, pollution that has been in the soil for some time causes the PHC spectral effects to attenuate from the surface to the airborne sensor. The study conducted by Pabón et al. (2019) was executed in an oil refinery in Brazil (area of ∼0.7 km2). They showed the spectral features of the PHC in the HRS image (1 m ground spatial distance-GSD) and reported high classification accuracy. However, there was no information about how long the spill was on the surface, or whether it was ongoing leakage which might continuously enforce the amount of oil absorbed in the soil. Hence, it could keep the spectral signal of the oil evident in the spectrum of the airborne image.
In this study, an airborne HRS image was acquired over an arid area in southern Israel two and a half years following a major crude oil spill. The detection and mapping of PHC in a real disaster zone poses new challenges that might not exist in controlled environments. From the moment in which the oil spill occurred until the time when the image was taken (two and a half years later), the spilled crude oil was exposed to atmospheric conditions as well as to biological, chemical, and physical processes. In addition, the area itself was subjected to remediation and rehabilitation efforts shortly after the spill, including the removal of soil together with oil. Moreover, the illumination conditions at the time of the flight were suboptimal, due to sparse clouds, resulting in varying illumination intensities at the scene. These challenges resulted in lack of clear PHC spectral absorption features in the spectra (further discussed in section 2.7.2).
Therefore, this study, in contrast to others, could not be based on the premise that PHC spectral features will appear in the spectrum of contaminated pixels. In other words, relying on the PHC absorption features and/or absorption minimum/maximum or even laboratory spectra would have resulted in poor classification accuracy and thus, a different approach had to be taken.
Hence, a supervised machine learning workflow was developed, i.e., training a classifier (a machine learning model) on a set of contaminated and uncontaminated pixels from the scene and predicting, per pixel, whether it is clean or contaminated. Machine learning models can detect the slightest variation between classes in the data, which is extremely difficult or even impossible for humans to notice. However, training the model blindly based on the entire spectrum of selected pixels might give poor results, because not all the spectral bands contain relevant information. Instead, this study used standardization techniques combined with vicarious band selection and dimension reduction to create a superior training dataset upon which the model could be built.
The purpose of this study was to develop an approach to detect and map contaminated pixels, despite the mentioned obstacles, long after the oil spill had occurred.
Section snippets
Study area
The Evrona Nature Reserve (Fig. 1) in Israel is located ∼20 km north of Eilat in the Arava Valley, which is part of the Dead Sea Rift extending between the Dead Sea to the north and the Gulf of Eilat to the south. It is crossed by the Evrona drainage system going southward, leaving topographically elevated playa deposits high above the channels. The climate and environmental conditions of Evrona make it one of the most interesting acidic habitats in the world. It contains a variety of unique
Vicarious band selection
When analyzing a spectrum (in the VNIR-SWIR) to detect PHC in a soil substrate, the analysis is guided by the diagnostic spectral ranges of ∼1700 and ∼2300 nm. However, in our case, a statistical test (ANOVA) was used on each spectral band to identify important wavelength ranges for classification. Fig. 9A shows the results of the ANOVA test alongside lab spectra for reference. The higher the F statistic, the better the spectral band is in separating the two classes (oil and clean). An F
Discussion
The predominant spectral features of PHC in the VNIR-SWIR, which are located at ∼1700 and ∼2300 nm, can be captured with hyperspectral sensors. This has been previously demonstrated, often in heavily oiled settings, by different researchers, for example Hörig et al. (2001) (HyMap sensor, 128 bands, GSD 2–4 m), Ellis et al. (2001) (PROBE-1 sensor, 128 bands, GSD 5 m), Kokaly et al. (2013) (AVIRIS sensor, 224 bands, GSD 3.5 m), Asadzadeh and de Souza Filho (2017) (AVIRIS sensor, 224 bands, GSD
Conclusion
This study demonstrated the power of utilizing a machine learning approach on airborne hyperspectral data for rapid detection of PHC in a terrestrial spill area long after the event occurred. The study was quite distinctive because it was not carried out in laboratory or controlled conditions but was rather situated in a real oil spill zone—where the oil was exposed on the surface for two and a half years prior to the acquisition of the image. Instead of relying on the spectral absorption
Acknowledgment
The authors would like to thank the members of the remote-sensing laboratory at Tel Aviv University, who assisted in executing the flight campaign and for their constructive comments. We greatly appreciate the crude oil provided by Arnon Karnieli from Ben-Gurion University, Ittai Herrmann from the Hebrew University, Image Sat International – ISI for providing the EROS subset image of the study area, and the Israeli Nature and Parks Authority for providing the GIS layers of the leakage. We also
References (37)
- et al.
Spectral remote sensing for onshore seepage characterization: a critical overview
Earth-Sci. Rev.
(2017) - et al.
Using Imaging Spectroscopy to study soil properties
Remote Sens. Environ., Imaging Spectroscopy Special Issue
(2009) - et al.
A simple apparatus to measure soil spectral information in the field under stable conditions
Geoderma
(2017) - et al.
- et al.
Spectroscopic remote sensing of the distribution and persistence of oil from the Deepwater Horizon spill in Barataria Bay marshes
Remote Sens. Environ.
(2013) Imaging SpectrometryUse of airborne imaging spectrometer data to map minerals associated with hydrothermally altered rocks in the northern grapevine mountains, Nevada, and California
Remote Sens. Environ.
(1988)- et al.
The spectral image processing system (SIPS)—interactive visualization and analysis of imaging spectrometer data
Remote Sens. Environ., Airbone Imaging Spectrometry
(1993) - et al.
Application of AVIRIS data in detection of oil-induced vegetation stress and cover change at Jornada, New Mexico
Remote Sens. Environ.
(2005) - et al.
Spectral and spatial indicators of botanical changes caused by long-term hydrocarbon seepage
Ecol. Inform.
(2012) - et al.
Contamination by oil crude extraction – refinement and their effects on human health
Environ. Pollut.
(2017)
Assessing the impact of hydrocarbon leakages on vegetation using reflectance spectroscopy
ISPRS J. Photogramm. Remote Sens.
Hyperspectral remote sensing detection of petroleum hydrocarbons in mixtures with mineral substrates: implications for onshore exploration and monitoring
ISPRS J. Photogramm. Remote Sens.
PLS-regression: a basic tool of chemometrics
Chemom. Intell. Lab. Syst., PLS Methods
Review on the effects of exposure to spilled oils on human health
J. Appl. Toxicol. JAT
Hyperspectral Analysis of Oil and Oil-Impacted Soils for Remote Sensing Purposes
Part 1: simple definition and calculation of accuracy, sensitivity and specificity
Emergency
Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra
Appl. Spectrosc.
Supervised vicarious calibration (SVC) of multi-source hyperspectral remote-sensing data
Remote Sens.
Cited by (33)
Comparing Object-Based and Pixel-Based Machine Learning Models for Tree-Cutting Detection with PlanetScope Satellite Images: Exploring Model Generalization
2023, International Journal of Applied Earth Observation and GeoinformationAirborne imaging spectroscopy for assessing land-use effect on soil quality in drylands
2022, ISPRS Journal of Photogrammetry and Remote SensingCitation Excerpt :Moreover, airborne IS is still expensive and requires a complex infrastructure to operate (Ben-Dor et al., 2009; Chabrillat et al., 2019a). Despite these drawbacks, the use of airborne IS to study properties and processes related to soil has emerged and grown substantially in the last couple of decades (Chabrillat et al., 2019a), through its mapping and monitoring of multiple aspects such as soil salinity (Ben-Dor et al., 2002; Zhang et al., 2019), soil composition (Li, 2020; Žížala et al., 2017), soil organic carbon (Stevens et al., 2006; Tziolas et al., 2020), soil moisture (Diek et al., 2016; Haubrock et al., 2008), soil erosion and stability (Schmid et al., 2005, 2016), soil contamination (Davies and Calvin, 2017; Pelta et al., 2019), and many other soil aspects. Paz-Kagan et al. (2014) demonstrated the use of 14 soil indicators in determining the variability of soil attributes among three different LU types that changed from managed to unmanaged and vice versa.
Terrestrial oil spill mapping using satellite earth observation and machine learning: A case study in South Sudan
2021, Journal of Environmental ManagementCitation Excerpt :Mahdianpari et al. (2018) distinguished different subclasses that represent varying degrees of contamination by combining very high-resolution images in an object-based classification framework with electromagnetic induction (EM) survey measurements that were based on airborne hyperspectral data and machine learning techniques. Mapping longstanding contamination more than two years after an oil spill was possible only under restrained conditions due to a lack of clearly distinguishable spectral features of petroleum hydrocarbon in the hyperspectral data (Pelta et al., 2019). Both the radiative transfer model (Lassalle et al., 2019b) and the regression modelling (Lassalle et al., 2019a) were applied to estimate persistent oil contamination in tropical regions.
Intelligent computational techniques in marine oil spill management: A critical review
2021, Journal of Hazardous MaterialsCitation Excerpt :Although wind-related features were previously reported important in oil spill detection, no significant influence of those features was found on the classification because a manual annotation of the initial dataset had considered the effects of low-wind areas in advance (Mera et al., 2017). Most oil spill detection models have been using binary statistical classifiers to detect dark spots in images (Bianchi et al., 2020; Guo and Zhang, 2014; Pelta et al., 2019). For this purpose, classifiers should be trained based on influential features extracted from a manually annotated training dataset of oil spill images.
The surface expression of hydrocarbon seeps characterized by satellite image spectral analysis and rock magnetic data (Falcon basin, western Venezuela)
2021, Journal of South American Earth SciencesCitation Excerpt :The likely chemical pathways that explain near-surface hydrocarbon-mediated mineral alteration have been thoroughly discussed and summarized by Thompson and Oldfield (1986) and Schumacher (1996). Satellite images have proven to be a fast and cost-effective way to depict the spatial extension of terrains that have been affected by oil spills as well as hydrocarbon-induced vegetation stress and mineral diagenesis in soils and sediments (e.g., Yang et al., 2000; Petrovic et al., 2008; Khan and Jacobson, 2008; Petrovic et al., 2012; Noomen et al., 2012; Kokaly et al., 2013; Scafutto et al., 2017; Asadzadeh and de Souza Filho, 2017; Pelta et al., 2017; Pabón et al., 2019). In fact, these images show characteristic spectral signatures and linear, textural, and tonal features that outline the scope and boundaries of zones that are more influenced by the presence of hydrocarbon seeps (e.g., Simpson et al., 1991; Zhu and Zhang, 1991; Thompson et al., 1994; Li et al., 2005; Petrovic et al., 2008; Petrovic et al., 2012; Sanches et al., 2013).
Introduction to Machine Learning in the Oil and Gas Industry
2021, Machine Learning and Data Science in the Oil and Gas Industry: Best Practices, Tools, and Case Studies