Abstract
Increasing the spatial coverage and temporal resolution of Earth surface monitoring can significantly improve forecasting or monitoring capabilities in the context of smart city, such as extreme weather forecasting, ecosystem monitoring and anthropogenic impact monitoring. As an essential data source for Earth’s surface monitoring, most satellite observations exist data gaps due to various factors like the limitations of measuring equipment, the interferences of environments, and the delay or loss of data updates. Although many efforts have been conducted to fill the gaps in the last decade, the existing techniques cannot efficiently address the problem. In this paper, we extensively study the gap-filling problem of satellite observations using imbalanced learning. Specifically, we propose a framework called Reanalysis to Satellite (R2S) to simulate satellite observations with reanalysis data. In the R2S framework, we propose a generic method called Spatial Temporal Match (STM), matching reanalysis data and satellite observations to construct the Reanalysis-Satellite (R-S) dataset used to train the model. Based on the R-S dataset, we propose a novel method called Semi-imbalanced (SIMBA) to handle the imbalance problem of gap-filling by taking advantages of traditional machine learning and imbalanced learning. We construct a hybrid model in the R2S framework for the Soil Moisture Active Passive (SMAP) satellite observations of the tropical cyclone wind speed. Extensive experiments demonstrate the hybrid model outperforms the traditional machine learning model and closely approximates in situ observations.
Similar content being viewed by others
References
Adetiloye T, Awasthi A (2017) Chapter 8 - predicting Short-Term congested traffic flow on urban motorway networks. In: Handbook of neural computation. Academic Press, pp 145–165
O’Brien Andrew, Gleason Scott (2015) Joel Johnson Chris Ruf: The CYGNSS end-to-end simulator (e2ES)
Benabdelkader S, Melgani F (2008) Contextual spatiospectral postreconstruction of Cloud-Contaminated images. IEEE Geosci Remote Sens Lett 5(2):204–208
Blanchard BW, Hsu SA (2005) On the radial variation of the tangential wind speed outside the radius of maximum wind during hurricane Wilma (2005). Coastal Studies Institue. Louisiana State University, pp 1–11
Branco P, Ribeiro RP, Torgo L (2016) UBL: An R package for Utility-based Learning. arXiv:1604.08079 [cs, stat]
Branco P, Torgo L, Ribeiro RP (2017) SMOGN: A Pre-processing Approach For Imbalanced Regression. In: First international workshop on learning with imbalanced domains: Theory and applications, pp 36–50
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 16:321–357
Chen T, Guestrin C (2016) XGBOost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. Association for Computing Machinery, pp 785–794
Chen T, Tang L, Liu Q, Yang D, Xie S, Cao X, Wu C, Yao E, Liu Z, Jiang Z (2012) Combining factorization model and additive forest for collaborative followee recommendation. KDD CUP
Cressie N, Wikle CK (2015) Statistics for Spatio-Temporal data. Wiley
Das M, Ghosh SK (2017) A deep-learning-based forecasting ensemble to predict missing data for remote sensing analysis. IEEE J Sel Top Appl Earth Observ Remote Sens 10(12):5228–5236
Entekhabi D, Njoku EG, O’Neill PE, Kellogg KH, Crow WT, Edelstein WN, Entin JK, Goodman SD, Jackson TJ, Johnson J, Kimball J, Piepmeier JR, Koster RD, Martin N, McDonald KC, Moghaddam M, Moran S, Reichle R, Shi JC, Spencer MW, Thurman SW, Tsang L, Van Zyl J (2010) The soil moisture active passive (SMAP) mission. Proc IEEE 98 (5):704–716
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Annals of statistics:1189–1232
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: Review of methods and applications. Expert Syst Appl 73:220–239
He H, Ma Y (2013) Imbalanced learning: Foundations, Algorithms, and Applications. Wiley
Huang X, Zou Y, Wang Y (2016) Cost-sensitive sparse linear regression for crowd counting with imbalanced training data. In: 2016 IEEE International conference on multimedia and expo (ICME), pp 1–6
Kandasamy S, Baret F, Verger A, Neveux P, Weiss M (2013) A comparison of methods for smoothing and gap filling time series of remote sensing observations-application to MODIS LAI products. Biogeosciences 10 (6):4055
Kato T (2016) Chapter 4 - Prediction of photovoltaic power generation output and network operation. In: Integration of distributed energy resources in power systems. Academic Press, pp 77–108
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: a highly efficient gradient boosting decision tree. In: Advances in neural information processing systems, pp 3146–3154
Kimball SK, Mulekar MS (2004) A 15-Year climatology of north atlantic tropical cyclones. Part I: Size parameters. J Clim 17(18):3555–3575
Klotz BW, Uhlhorn EW (2014) Improved stepped frequency microwave radiometer tropical cyclone surface winds in heavy precipitation. J Atmos Ocean Technol 31(11):2392–2408
Konik M, Kowalewski M, Bradtke K, Darecki M (2019) The operational method of filling information gaps in satellite imagery using numerical models. Int J Appl Earth Observ Geoinforma 75:68–82
Krasnopolsky V, Nadiga S, Mehra A, Bayler E, Behringer D (2016) Neural networks technique for filling gaps in satellite measurements: Application to ocean color observations. Comput Intell Neurosci 2016:e6156513
Krawczyk B, Woźniak M, Schaefer G (2014) Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl Soft Comput 14:554–562
Lee S, Cho M, Lee C (2016) An effective gap filtering method for Landsat ETM+ SLC-off data. TAO: Terrestrial Atmosph Ocean Sci 27(6):9
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: 2017 IEEE International conference on computer vision (ICCV), pp 2999–3007
Liu J, Zio E (2017) Weighted-feature and cost-sensitive regression model for component continuous degradation assessment. Reliab Eng Syst Safety 168:210–217
Loyola-González O, Martínez-Trinidad JF, Carrasco-Ochoa JA, García-Borroto M (2016) Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases. Neurocomputing 175:935–947
Mariethoz G, McCabe MF, Renard P (2012) Spatiotemporal reconstruction of gaps in multivariate fields using the direct sampling approach. Water Resources Research 48(10)
Masunaga H, Matsui T, Tao W.k., Hou AY, Kummerow CD, Nakajima T, Bauer P, Olson WS, Sekiguchi M, Nakajima TY (2010) Satellite Data Simulator Unit: A Multisensor, Multispectral Satellite Simulator Package. Bullet Amer Meteorol Soc 91(12):1625–1632
Meissner T, Ricciardulli L, Wentz FJ (2017) Capability of the SMAP mission to measure ocean surface winds in storms. Bull Am Meteorol Soc 98(8):1660–1677
Mohan P, Strobl E (2017) The short-term economic impact of tropical Cyclone Pam: An analysis using VIIRS nightlight satellite imagery. Int J Remote Sens 38(21):5992–6006
Murakami H (2014) Tropical cyclones in reanalysis data sets. Geophys Res Lett 41(6):2133– 2141
Pal R (2017) Chapter 4 - Validation methodologies. In: Predictive modeling of drug sensitivity. Academic Press, pp 83–107
Pan Y, Jin M, Zhang S, Deng Y (2020) TEC Map Completion Using DCGAN And Poisson Blending. Space Weather 18(5):e2019SW002390
Ribeiro RPA (2011) Utility-based Regression. Ph.D. thesis, University of Porto
Roy PS, Behera MD, Srivastav SK (2017) Satellite remote sensing: sensors, Applications and Techniques. Proc Natl Acad Sci India Sect A: Phys Sci 87(4):465–472
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Schenkel BA, Hart RE (2011) An examination of tropical cyclone position, intensity, and intensity life cycle within atmospheric reanalysis datasets. J Clim 25(10):3453–3475
Tahir MA, Kittler J, Mikolajczyk K, Yan F (2009) A multiple expert approach to the class imbalance problem using inverse random under sampling. In: Multiple classifier systems, lecture notes in computer science. Springer, pp 82–91
Torgo L, Ribeiro RP, Pfahringer B, Branco P (2013) SMOTE For regression. In: Progress in artificial intelligence, lecture notes in computer science. Springer, pp 378–389
Tyree S, Weinberger KQ, Agrawal K, Paykin J (2011) Parallel boosted regression trees for web search ranking. In: Proceedings of the 20th International Conference on World Wide Web, pp 387–396
Uhlhorn EW, Black PG, Franklin JL, Goodberlet M, Carswell J, Goldstein AS (2007) Hurricane surface wind measurements from an operational stepped frequency microwave radiometer. Mon Weather Rev 135(9):3070–3085
Wang G, Garcia D, Liu Y, de Jeu R, Johannes Dolman A (2012) A three-dimensional gap filling method for large geophysical datasets: Application to global satellite soil moisture observations. Environ Modell Softw 30:139–142
Webster PJ, Holland GJ, Curry JA, Chang HR (2005) Changes in tropical cyclone number, duration, and intensity in a warming environment. Science 309(5742):1844–1846
Woodruff JD, Irish JL, Camargo SJ (2013) Coastal flooding by tropical cyclones and sea-level rise. Nature 504(7478):44–52
Xian S, Yin J, Lin N, Oppenheimer M (2018) Influence of risk factors and past events on flood resilience in coastal megacities: Comparative analysis of NYC and Shanghai. Sci Total Environ 610:1251–1261
Yeh CW, Li DC, Lin LS, Tsai TI (2016) A Learning Approach with Under-and Over-Sampling for Imbalanced Data Sets. In: 2016 5Th IIAI international congress on advanced applied informatics (IIAI-AAI), pp 725–729
Yi Y, Johnson JT, Wang X (2018) On the estimation of wind speed diurnal cycles using simulated measurements of CYGNSS and ASCAT. IEEE Geosci Remote Sens Lett 16(2):168–172
Yin G, Mariethoz G, Sun Y, McCabe MF (2017) A comparison of gap-filling approaches for Landsat-7 satellite data. Int J Remote Sens 38 (23):6653–6679
Yu X, Liu J, Yang Z, Jia X, Ling Q, Ye S (2017) Learning from imbalanced data for predicting the number of software defects. In: 2017 IEEE 28Th international symposium on software reliability engineering (ISSRE). IEEE, pp 78–89
Yun J, Ha J, Lee JS (2016) Automatic determination of neighborhood size in SMOTE. In: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication, IMCOM ’16. Association for Computing Machinery, pp 1–8
Zhang R, Di B, Luo Y, Deng X, Grieneisen ML, Wang Z, Yao G, Zhan Y (2018) A nonparametric approach to filling gaps in satellite-retrieved aerosol optical depth for estimating ambient PM2. 5 levels, vol 243
Acknowledgements
This work is supported by the National Key R&D Program of China (Grant No. 2018YFB0203801), the National Natural Science Foundation of China (Grant Nos. 61572510, 61702529 and 61802424) and China National Special Fund for Public Welfare (Grant No. GYHY201306003).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lu, J., Ren, K., Li, X. et al. From reanalysis to satellite observations: gap-filling with imbalanced learning. Geoinformatica 26, 397–428 (2022). https://doi.org/10.1007/s10707-020-00426-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-020-00426-7