International Journal of Applied Earth Observation and Geoinformation
Modelling LiDAR derived tree canopy height from Landsat TM, ETM+ and OLI satellite imagery—A machine learning approach☆
Introduction
The value of remote sensing in ecological studies has been well recognised (Roughgarden et al., 1991, Wang et al., 2010, Pettorelli et al., 2014). Landsat satellites have been capturing multispectral imagery of the earth surface since 1972 representing the longest record of temporal space-borne land observations (Roy et al., 2010). Landsat data has been used for a variety of applications, such as natural hazard assessment (Barlow et al., 2003, Joyce et al., 2009), fire scar mapping (Gill et al., 2000, Goodwin and Collett, 2014), coral reef mapping (Joyce et al., 2004), rangeland monitoring (Wallace et al., 2004, Scarth et al., 2010), temperate and tropical forest mapping (Brown et al., 2000, Renó et al., 2011), and many others. The characteristics of the Landsat sensors have been identified as valuable for regional monitoring applications (Cohen and Goward, 2004). The spectral and spatial resolution of the Landsat imagery combined with its temporal record make it valuable for monitoring woody cover change across large regions (Woodcock et al., 2001, Danaher et al., 2004, Staben et al., 2016, Gill et al., 2017). Amongst its many applications Landsat imagery has been utilised to detect severe forest damage (Ekstrand, 1996) including damage as a result of cyclonic (hurricane) winds (Preston, 1987, Paling et al., 2008, Staben and Evans, 2008).
Tropical cyclones occur on a frequent basis across the coastline of the Australian Northern Territory. The destructive winds associated with these cyclones can have a major impact on both the man-made and natural environments. The impact of cyclonic winds are greatest on the coastal regions, however they also have the potential to cause significant disturbance further inland (e.g. Cyclone Monica) (Staben and Evans, 2008). The impact on native vegetation can be significant, resulting in major structural changes to vegetation communities. A number of studies have reported on the impact of cyclones on vegetation in the Northern Territory (Stocker, 1976, Fox, 1980, Cameron et al., 1983, Bowman and Panton, 1994, Cook and Goyens, 2004, Staben and Evans, 2008, Williamson et al., 2011, Hutley et al., 2013). These studies have used a number of methods ranging from collection of field data, aerial photography and satellite imagery. Although cyclones are frequent and have the potential to be a major disturbance agent in ecosystems across the Northern Territory (Murphy, 1984), very few studies have been undertaken to quantify the impact and potential role they play in driving the structure of these communities (Cook and Goyens, 2004). While it is well recognised that fire and the stress of the seasonal drought (a characteristic of the of the wet-dry tropics of northern Australia) are frequent disturbance factors on vegetation communities, very little focus has been given to the impact cyclones have on these ecosystems (Cook and Goyens, 2004, Hutley et al., 2013).
While severe damage to woody vegetation can be relatively easy to identify by comparing satellite imagery captured directly before and after the change event (e.g. cyclones), accurate assessment of the subtle changes through time is enhanced by relating biophysical variables to satellite remote sensing observations. To obtain quantitative information from optical satellite data relationships between biophysical variables need to be established (Moulin et al., 1998). Numerous studies have derived empirical relationships between Landsat imagery and field based measurements such as leaf area index (Coops et al., 1997, Eriksson et al., 2006), above ground biomass of woody vegetation (Foody et al., 2003, Powell et al., 2010, Avitabile et al., 2012), fractional cover (Scarth et al., 2010) and woody vegetation foliage projective cover (Danaher et al., 2004, Armston et al., 2009). A variety of statistical methods have been used to develop these relationships including, linear and non-linear regression models based on single or multiple predictor variables (Cohen et al., 2003), while others have used machine learning algorithms such as neural networks, tree-based models, K-nearest neighbours and support vector machines (Labrecque et al., 2006, Li et al., 2010, Avitabile et al., 2012).
Vegetation height has been identified as a key parameter for inferring long term trends in biomass and carbon stock (Skidmore et al., 2015, Cook et al., 2015). Combined with species and site quality information vegetation height helps to inform estimates of stand age and successional stages (Stojanova et al., 2010). Light detection and ranging (LiDAR) data has been used extensively to measure woody vegetation structure, and while LiDAR is an efficient way to map and measure woody vegetation structure (Lim et al., 2003, Wulder et al., 2012, Goldbergs et al., 2018), the use of these data at a regional level can be prohibitive due to financial constraints (Pascual et al., 2010). Furthermore, the availability of LiDAR for long-term studies (multiple decades) is limited due to the paucity of data. Ecological processes can occur over long time frames, and understanding these processes often requires information recorded over multiple decades, captured at an appropriate spatial, spectral and temporal resolution. Numerous studies have used structural information obtained from LiDAR data to develop predictive models using Landsat sensors with an aim to enhance the spatial and temporal coverage (Hudak et al., 2002, Pascual et al., 2010, Hill et al., 2011, Ota et al., 2014, Ahmed et al., 2015). These studies have been undertaken across a variety of vegetation communities ranging from conifer forests (Ahmed et al., 2015) to tropical evergreen and deciduous forests (Ota et al., 2014, Hill et al., 2011, Wilkes et al., 2015). In southern Australia Wilkes et al. (2015) predicted canopy height over a 2.9 million ha area of heterogeneous temperate forests by developing a relationship between LiDAR derived canopy height and a combination of satellite imagery (Landsat and Moderate Resolution Imaging Spectroradiometer) using the random forest algorithm. Machine learning techniques based on ensemble models such as random forest have been used successfully for a variety of remote sensing classification and regression modelling applications (Pal, 2005, Avitabile et al., 2012, Mellor et al., 2013, Mellor et al., 2015, Mascaro et al., 2014, Karlson et al., 2015, Wilkes et al., 2015). These studies demonstrate the advantages of random forest algorithm such as its robustness to outliers in the training data, ability to handle non-parametric data, its ability to uncover complicated non-linear relationships between variables and the ease in tuning the models parameters.
In this study, we investigate the application of Landsat satellite sensors to predict woody vegetation canopy height and develop a model predicting canopy height across a range of vegetation communities in the wet-dry tropics of Northern Australia. While previous studies have demonstrated a fusion of different sensors and LiDAR to derive predictive models of canopy height in Australia (Wilkes et al., 2015), this study investigates the use of Landsat sensors only for the estimation of canopy height over a long time series of multiple decades. To our knowledge this is the first study to look at predicting LiDAR derived canopy height from Landsat sensors in the wet-dry tropics of northern Australia. A canopy height model (1 m spatial resolution) was produced from a LiDAR dataset captured in 2009 for use as the dependent variable. Random forest regression was used to produce a model to predict LiDAR derived canopy height from a single Landsat-5 Thematic Mapper (TM) image captured in 2009 (30 m spatial resolution). We developed a three-stage approach to identify the important independent variables and optimise the parameters used in the random forest model, which was applied to Landsat-5 TM, Landsat-7 Enhanced Thematic Mapper Plus (ETM+) and Landsat-8 Operational Land Imager (OLI) sensors.
Section snippets
Study area
This study was undertaken in the Darwin region, located in northern Australia's wet dry tropics (Fig. 1). The average annual temperature for the Darwin region is 32 °C with average annual rainfall of 1729 mm, with the majority of the precipitation occurring during October and April. The study site covers an area of approximately 1800 km2 consisting of urban, peri-urban development and native vegetation. The dominant native vegetation communities occurring in the study area include Mangrove
Model Development Stage One: optimising number of trees
To reduce the computational burden of the random forest model we undertook an experiment to identify the optimal number of decision trees, the results are presented as box plots in Fig. 4. Each box plot represents the RMSE values for the number of trees in the random forest model (based on 100 using independent test data) with mean RMSE values ranging between 3.18 m and 3.92 m. The lowest mean RMSE score was recorded for n_estimator values 512 and 4096. These results are consistent with other
Conclusions
In this study we implemented a random forest regression model to predict canopy height from a single date Landsat-5 TM scene, across a variety of natural vegetation communities in the Northern Territory, Australia. The model was trained with a LiDAR-derived canopy height model (CHM) (R2 = 0.53, RMSE = 2.8 m). A three-stage approach was undertaken to tune the random forest model and select the predictor variables used in the final model. Despite none of the individual independent predictor
Acknowledgements
This study would not have been possible without the support of the Northern Territory Government and the collaborative partnership between the Northern Territory Government’s Department of Environment and Natural Resources, Rangelands Division and Queensland Government’s Department of Environment and Science, Remote Sensing Centre. Also thanks to Neil Flood for assistance and advice in the development of the python code used in this study.
References (96)
- et al.
Characterizing stand-level forest canopy cover and height using Landsat time series, samples of airborne LiDAR, and the Random Forest algorithm
ISPRS J. Photogramm. Remote Sens.
(2015) - et al.
Capabilities and limitations of Landsat and land cover data for aboveground woody biomass estimation of Uganda
Remote Sens. Environ.
(2012) - et al.
A shortwave infrared modification to the simple ratio for LAI retrieval in boreal forests: an image and model analysis
Remote Sens. Environ.
(2000) - et al.
Tree damage in Darwin parks and gardens during cyclones Tracy and Max
Landsc. Plan.
(1983) Optically-based methods for measuring seasonal variation of leaf area index in boreal conifer stands
Agric. Forest Meteorol.
(1996)- et al.
An improved strategy for regression of biophysical variables and Landsat ETM+ data
Remote Sens. Environ.
(2003) - et al.
Remote sensing of environment spectral analysis of fire severity in north Australian tropical savannas
Remote Sens. Environ.
(2013) - et al.
Impact of understory vegetation on forest canopy reflectance and remotely sensed LAI estimates
Remote Sens. Environ.
(2006) - et al.
Predictive relations of tropical forest biomass from Landsat TM data and their transferability between regions
Remote Sens. Environ.
(2003) - et al.
Mapping tree height distributions in Sub-Saharan Africa using Landsat 7 and 8 data
Remote Sens. Environ.
(2016)