Skip to main content

Advertisement

Log in

From reanalysis to satellite observations: gap-filling with imbalanced learning

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Increasing the spatial coverage and temporal resolution of Earth surface monitoring can significantly improve forecasting or monitoring capabilities in the context of smart city, such as extreme weather forecasting, ecosystem monitoring and anthropogenic impact monitoring. As an essential data source for Earth’s surface monitoring, most satellite observations exist data gaps due to various factors like the limitations of measuring equipment, the interferences of environments, and the delay or loss of data updates. Although many efforts have been conducted to fill the gaps in the last decade, the existing techniques cannot efficiently address the problem. In this paper, we extensively study the gap-filling problem of satellite observations using imbalanced learning. Specifically, we propose a framework called Reanalysis to Satellite (R2S) to simulate satellite observations with reanalysis data. In the R2S framework, we propose a generic method called Spatial Temporal Match (STM), matching reanalysis data and satellite observations to construct the Reanalysis-Satellite (R-S) dataset used to train the model. Based on the R-S dataset, we propose a novel method called Semi-imbalanced (SIMBA) to handle the imbalance problem of gap-filling by taking advantages of traditional machine learning and imbalanced learning. We construct a hybrid model in the R2S framework for the Soil Moisture Active Passive (SMAP) satellite observations of the tropical cyclone wind speed. Extensive experiments demonstrate the hybrid model outperforms the traditional machine learning model and closely approximates in situ observations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://www.ncdc.noaa.gov/ibtracs/

  2. http://www.remss.com/missions/smap/winds/

  3. https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels

  4. https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-pressure-levels

  5. https://www.aoml.noaa.gov/hrd/data_sub/hurr.html

References

  1. Adetiloye T, Awasthi A (2017) Chapter 8 - predicting Short-Term congested traffic flow on urban motorway networks. In: Handbook of neural computation. Academic Press, pp 145–165

  2. O’Brien Andrew, Gleason Scott (2015) Joel Johnson Chris Ruf: The CYGNSS end-to-end simulator (e2ES)

  3. Benabdelkader S, Melgani F (2008) Contextual spatiospectral postreconstruction of Cloud-Contaminated images. IEEE Geosci Remote Sens Lett 5(2):204–208

    Article  Google Scholar 

  4. Blanchard BW, Hsu SA (2005) On the radial variation of the tangential wind speed outside the radius of maximum wind during hurricane Wilma (2005). Coastal Studies Institue. Louisiana State University, pp 1–11

  5. Branco P, Ribeiro RP, Torgo L (2016) UBL: An R package for Utility-based Learning. arXiv:1604.08079 [cs, stat]

  6. Branco P, Torgo L, Ribeiro RP (2017) SMOGN: A Pre-processing Approach For Imbalanced Regression. In: First international workshop on learning with imbalanced domains: Theory and applications, pp 36–50

  7. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  8. Chen T, Guestrin C (2016) XGBOost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. Association for Computing Machinery, pp 785–794

  9. Chen T, Tang L, Liu Q, Yang D, Xie S, Cao X, Wu C, Yao E, Liu Z, Jiang Z (2012) Combining factorization model and additive forest for collaborative followee recommendation. KDD CUP

  10. Cressie N, Wikle CK (2015) Statistics for Spatio-Temporal data. Wiley

  11. Das M, Ghosh SK (2017) A deep-learning-based forecasting ensemble to predict missing data for remote sensing analysis. IEEE J Sel Top Appl Earth Observ Remote Sens 10(12):5228–5236

    Article  Google Scholar 

  12. Entekhabi D, Njoku EG, O’Neill PE, Kellogg KH, Crow WT, Edelstein WN, Entin JK, Goodman SD, Jackson TJ, Johnson J, Kimball J, Piepmeier JR, Koster RD, Martin N, McDonald KC, Moghaddam M, Moran S, Reichle R, Shi JC, Spencer MW, Thurman SW, Tsang L, Van Zyl J (2010) The soil moisture active passive (SMAP) mission. Proc IEEE 98 (5):704–716

    Article  Google Scholar 

  13. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Annals of statistics:1189–1232

  14. Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: Review of methods and applications. Expert Syst Appl 73:220–239

    Article  Google Scholar 

  15. He H, Ma Y (2013) Imbalanced learning: Foundations, Algorithms, and Applications. Wiley

  16. Huang X, Zou Y, Wang Y (2016) Cost-sensitive sparse linear regression for crowd counting with imbalanced training data. In: 2016 IEEE International conference on multimedia and expo (ICME), pp 1–6

  17. Kandasamy S, Baret F, Verger A, Neveux P, Weiss M (2013) A comparison of methods for smoothing and gap filling time series of remote sensing observations-application to MODIS LAI products. Biogeosciences 10 (6):4055

    Article  Google Scholar 

  18. Kato T (2016) Chapter 4 - Prediction of photovoltaic power generation output and network operation. In: Integration of distributed energy resources in power systems. Academic Press, pp 77–108

  19. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: a highly efficient gradient boosting decision tree. In: Advances in neural information processing systems, pp 3146–3154

  20. Kimball SK, Mulekar MS (2004) A 15-Year climatology of north atlantic tropical cyclones. Part I: Size parameters. J Clim 17(18):3555–3575

    Article  Google Scholar 

  21. Klotz BW, Uhlhorn EW (2014) Improved stepped frequency microwave radiometer tropical cyclone surface winds in heavy precipitation. J Atmos Ocean Technol 31(11):2392–2408

    Article  Google Scholar 

  22. Konik M, Kowalewski M, Bradtke K, Darecki M (2019) The operational method of filling information gaps in satellite imagery using numerical models. Int J Appl Earth Observ Geoinforma 75:68–82

    Google Scholar 

  23. Krasnopolsky V, Nadiga S, Mehra A, Bayler E, Behringer D (2016) Neural networks technique for filling gaps in satellite measurements: Application to ocean color observations. Comput Intell Neurosci 2016:e6156513

    Article  Google Scholar 

  24. Krawczyk B, Woźniak M, Schaefer G (2014) Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl Soft Comput 14:554–562

    Article  Google Scholar 

  25. Lee S, Cho M, Lee C (2016) An effective gap filtering method for Landsat ETM+ SLC-off data. TAO: Terrestrial Atmosph Ocean Sci 27(6):9

    Article  Google Scholar 

  26. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: 2017 IEEE International conference on computer vision (ICCV), pp 2999–3007

  27. Liu J, Zio E (2017) Weighted-feature and cost-sensitive regression model for component continuous degradation assessment. Reliab Eng Syst Safety 168:210–217

    Article  Google Scholar 

  28. Loyola-González O, Martínez-Trinidad JF, Carrasco-Ochoa JA, García-Borroto M (2016) Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases. Neurocomputing 175:935–947

    Article  Google Scholar 

  29. Mariethoz G, McCabe MF, Renard P (2012) Spatiotemporal reconstruction of gaps in multivariate fields using the direct sampling approach. Water Resources Research 48(10)

  30. Masunaga H, Matsui T, Tao W.k., Hou AY, Kummerow CD, Nakajima T, Bauer P, Olson WS, Sekiguchi M, Nakajima TY (2010) Satellite Data Simulator Unit: A Multisensor, Multispectral Satellite Simulator Package. Bullet Amer Meteorol Soc 91(12):1625–1632

    Article  Google Scholar 

  31. Meissner T, Ricciardulli L, Wentz FJ (2017) Capability of the SMAP mission to measure ocean surface winds in storms. Bull Am Meteorol Soc 98(8):1660–1677

    Article  Google Scholar 

  32. Mohan P, Strobl E (2017) The short-term economic impact of tropical Cyclone Pam: An analysis using VIIRS nightlight satellite imagery. Int J Remote Sens 38(21):5992–6006

    Article  Google Scholar 

  33. Murakami H (2014) Tropical cyclones in reanalysis data sets. Geophys Res Lett 41(6):2133– 2141

    Article  Google Scholar 

  34. Pal R (2017) Chapter 4 - Validation methodologies. In: Predictive modeling of drug sensitivity. Academic Press, pp 83–107

  35. Pan Y, Jin M, Zhang S, Deng Y (2020) TEC Map Completion Using DCGAN And Poisson Blending. Space Weather 18(5):e2019SW002390

  36. Ribeiro RPA (2011) Utility-based Regression. Ph.D. thesis, University of Porto

  37. Roy PS, Behera MD, Srivastav SK (2017) Satellite remote sensing: sensors, Applications and Techniques. Proc Natl Acad Sci India Sect A: Phys Sci 87(4):465–472

    Article  Google Scholar 

  38. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517

    Article  Google Scholar 

  39. Schenkel BA, Hart RE (2011) An examination of tropical cyclone position, intensity, and intensity life cycle within atmospheric reanalysis datasets. J Clim 25(10):3453–3475

    Article  Google Scholar 

  40. Tahir MA, Kittler J, Mikolajczyk K, Yan F (2009) A multiple expert approach to the class imbalance problem using inverse random under sampling. In: Multiple classifier systems, lecture notes in computer science. Springer, pp 82–91

  41. Torgo L, Ribeiro RP, Pfahringer B, Branco P (2013) SMOTE For regression. In: Progress in artificial intelligence, lecture notes in computer science. Springer, pp 378–389

  42. Tyree S, Weinberger KQ, Agrawal K, Paykin J (2011) Parallel boosted regression trees for web search ranking. In: Proceedings of the 20th International Conference on World Wide Web, pp 387–396

  43. Uhlhorn EW, Black PG, Franklin JL, Goodberlet M, Carswell J, Goldstein AS (2007) Hurricane surface wind measurements from an operational stepped frequency microwave radiometer. Mon Weather Rev 135(9):3070–3085

    Article  Google Scholar 

  44. Wang G, Garcia D, Liu Y, de Jeu R, Johannes Dolman A (2012) A three-dimensional gap filling method for large geophysical datasets: Application to global satellite soil moisture observations. Environ Modell Softw 30:139–142

    Article  Google Scholar 

  45. Webster PJ, Holland GJ, Curry JA, Chang HR (2005) Changes in tropical cyclone number, duration, and intensity in a warming environment. Science 309(5742):1844–1846

    Article  Google Scholar 

  46. Woodruff JD, Irish JL, Camargo SJ (2013) Coastal flooding by tropical cyclones and sea-level rise. Nature 504(7478):44–52

    Article  Google Scholar 

  47. Xian S, Yin J, Lin N, Oppenheimer M (2018) Influence of risk factors and past events on flood resilience in coastal megacities: Comparative analysis of NYC and Shanghai. Sci Total Environ 610:1251–1261

    Article  Google Scholar 

  48. Yeh CW, Li DC, Lin LS, Tsai TI (2016) A Learning Approach with Under-and Over-Sampling for Imbalanced Data Sets. In: 2016 5Th IIAI international congress on advanced applied informatics (IIAI-AAI), pp 725–729

  49. Yi Y, Johnson JT, Wang X (2018) On the estimation of wind speed diurnal cycles using simulated measurements of CYGNSS and ASCAT. IEEE Geosci Remote Sens Lett 16(2):168–172

    Article  Google Scholar 

  50. Yin G, Mariethoz G, Sun Y, McCabe MF (2017) A comparison of gap-filling approaches for Landsat-7 satellite data. Int J Remote Sens 38 (23):6653–6679

    Article  Google Scholar 

  51. Yu X, Liu J, Yang Z, Jia X, Ling Q, Ye S (2017) Learning from imbalanced data for predicting the number of software defects. In: 2017 IEEE 28Th international symposium on software reliability engineering (ISSRE). IEEE, pp 78–89

  52. Yun J, Ha J, Lee JS (2016) Automatic determination of neighborhood size in SMOTE. In: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication, IMCOM ’16. Association for Computing Machinery, pp 1–8

  53. Zhang R, Di B, Luo Y, Deng X, Grieneisen ML, Wang Z, Yao G, Zhan Y (2018) A nonparametric approach to filling gaps in satellite-retrieved aerosol optical depth for estimating ambient PM2. 5 levels, vol 243

Download references

Acknowledgements

This work is supported by the National Key R&D Program of China (Grant No. 2018YFB0203801), the National Natural Science Foundation of China (Grant Nos. 61572510, 61702529 and 61802424) and China National Special Fund for Public Welfare (Grant No. GYHY201306003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaijun Ren.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, J., Ren, K., Li, X. et al. From reanalysis to satellite observations: gap-filling with imbalanced learning. Geoinformatica 26, 397–428 (2022). https://doi.org/10.1007/s10707-020-00426-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-020-00426-7

Keywords

Navigation