Skip to main content
Log in

Scalable big earth observation data mining algorithms: a review

  • Review
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

Enormous amount of earth information, gathered from satellite sensors, simulations, and other resources, are collectively referred to as Big Earth Observation Data (BEOD). The data contains remarkable insights and spatio-temporal stamps of pertinent Earth phenomena for enhancing our knowledge, responding, and addressing demanding situations of earth sciences and observations. However, However, traditional data mining algorithms are generally time-inefficient, making it difficult to process and analyze BEOD. To address this challenge, we explore two ways to enhance scalability: 1) improving the algorithm with specific parameters or data modifications when run on a single machine, and 2) making the algorithm parallel through distributed execution on multiple machines, such as with cluster-based implementations. We also suggest improvements for existing techniques and widely used algorithms for processing BEOD. In this review, we conduct a systematic review of data mining techniques for classification, clustering, prediction, regression, association rules, pattern mining, and anomaly detection to determine their scalability and performance on various Earth observation use cases. We explored advanced mining techniques, including statistical, machine learning, and deep learning approaches, for handling BEOD at scale. We also identified potential challenges, open research issues, and provide future directions for the development of scalable algorithms for mining BEOD. We observed that applying data mining techniques on BEOD introduces serious concerns since the data has both spatial and temporal components. Statistical and machine learning models such as ARIMA, SARIMA, Naive Bayes, Bayesian networks, KNN, and K-means, as well as SVM, are not suitable for working with the volume and heterogeneity present in the data, but it can be improved by employing big data environments. Nowadays, deep learning-based techniques are popularly used for working with large amounts of data, but it requires specialized systems and upgrades as the data volume increases. This issue can be addressed through a big data platform. A unified deep learning architecture can be employed to handle both the spatial and temporal components of BEOD, and performance can be improved by deploying the architecture on a big data environment. Therefore, this study also reveals that, although deep learning architectures are efficient and trending, traditional statistical methods and machine learning can achieve competitive or sometimes improved performance with the involvement of big data technologies and/or internal data representation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability Statement (DAS)

The paper does not include any specific dataset, neither a new data is generated nor analysed. Data sharing does not apply to this study. Material referred in this paper is mentioned in the reference section.

References

  • Skytland N (2012) Big data: What is nasa doing with big data today? Open, Gov open access article

    Google Scholar 

  • Kamilaris A, Kartakoullis A, Prenafeta-Bold FX (2017) “A review on the practice of big data analysis in agriculture," Comput Electron Agric vol 143. p 23-37. no. C. [Online]. Available: https://doi.org/10.1016/j.compag.2017.09.037

  • Vatsavai RR, Ganguly A, Chandola V, Stefanidis A, Klasky S, Shekhar S (2012) “Spatiotemporal data mining in the era of big spatial data: Algorithms and applications,” In: Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, ser. BigSpatial’ 12. New York, NY, USA: Association for Computing Machinery, p 1–10. [Online]. Available: https://doi.org/10.1145/2447481.2447482

  • Sisodiya N, Garg S, Dube N (2022) “Scalable clustering for eo data using efficient raster representation,” Multimed Tools Appl 82 vol 12303-12319

  • Salcedo-Sanz S, Ghamisi P, Piles M, Werner M, Cuadra L, Moreno-Martínez A, Izquierdo-Verdiguier E, Muñoz-Marí J, Mosavi A, Camps-Valls G (2020) “Machine learning information fusion in earth observation: A comprehensive review of methods, applications and data sources,” Information Fusion, vol 63. pp 256–272 [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1566253520303171

  • Shaheen M, Shahbaz M, Rehman Z, Guergachi A (2011) Data mining applications in hydrocarbon exploration. Artif Intell Rev 35:1–18

    Article  Google Scholar 

  • Persello C, Wegner JD, Hansch R, Tuia D, Ghamisi P, Koeva M, Camps-Valls G (2022) “Deep learning and earth observation to support the sustainable development goals: Current approaches, open challenges, and future opportunities,” IEEE Geoscience and Remote Sensing Magazine. pp 2–30

  • Xu C, Yang C (2014) “Introduction to big geospatial data research,” Annals of GIS vol 20. pp 227–232, no. 4 [Online]. Available: https://doi.org/10.1080/19475683.2014.938775

  • Jiang Z, Shekhar S (2017) Spatial and Spatiotemporal Big Data Science. Cham: Springer International Publishing. pp 15–44. [Online]. Available: https://doi.org/10.1007/978-3-319-60195-3_2

  • Shashi S, Zhe J, Y AR, Emre E, Xun T, V GVM, Xun Z (2015) “Spatiotemporal data mining: A computational perspective,” ISPRS International Journal of Geo-Information, vol 4. pp 2306–2338. no. 4 [Online]. Available: https://www.mdpi.com/2220-9964/4/4/2306

  • Gotz M, Richerzhagen M, Bodenstein C, Cavallaro G, Glock P, Riedel M, Benediktsson JA (2015) “On scalable data mining techniques for earth science,” Procedia Computer Science vol 51. pp 2188–2197. International Conference On Computational Science, ICCS. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1877050915013022

  • Rolf E, Proctor J, Carleton T, Bolliger I, Shankar V, Ishihara M, Recht B, Hsiang S (2020)“A generalizable and accessible approach to machine learning with global satellite imagery,” CoRR vol abs/2010.08168. [Online]. Available: arXiv:2010.08168

  • Sisodiya N, Vyas K, Dube N, Thakkar P (2023). Scalable architecture for mining big earth observation data: SAMBEO. https://doi.org/10.1007/978-3-031-31407-0_38

    Article  Google Scholar 

  • Sisodiya N, Vyas K, Dube N, Thakkar P (2023) Analyzing hydro-estimator INSAT-3D time series with outlier detection. https://doi.org/10.1007/978-3-031-31407-0_37

  • Guo H, Nativi S, Liang D, Craglia M, Wang L, Schade S, Corban C, He G, Pesaresi M, Li J, Shirazi Z, Liu J, Annoni A (2020) “Big earth data science: an information framework for a sustainable planet,” International Journal of Digital Earth vol 13. pp 743–767. no. 7 [Online]. Available: https://doi.org/10.1080/17538947.2020.1743785

  • Sharma P, Mutreja U (2013) Analysis of satellite images using artificial neural network. Int J Soft Comput Eng (IJSCE) 2(6):276–278. ISSN: 2231-2307

  • Fu Y, Zhao C, Wang J, Jia X, Yang G, Song X, Feng H (2017) “An improved combination of spectral and spatial features for vegetation classification in hyperspectral images,” Remote Sens vol 9. no. 3 [Online]. Available: https://www.mdpi.com/2072-4292/9/3/261

  • Xia G, He C, Sun H (2007) “A rapid and automatic mrf-based clustering method for sar images,” IEEE Geosci Remote Sens Lett vol 4. pp 596–600 no. 4

  • Woodley A, Tang L-X, Geva S, Nayak R, Chappell T (2016) “Using parallel hierarchical clustering to address spatial big data challenges,” 2016 IEEE International Conference on Big Data (Big Data), pp 2692–2698

  • Hong Y, Yu L, Chen Y, Liu Y, Liu Y, Liu Y, Cheng H (2017) Prediction of soil organic matter by vis-nir spectroscopy using normalized soil moisture index as a proxy of soil moisture. Remote Sens 10(1). [Online]. https://www.mdpi.com/2072-4292/10/1/28

  • Bruzzone L, Demir B, Bovolo F, Brockmann C, Fomferra N, Iapaolo M, Jha R, Lu J, Quast R, Stelzer K, Veci L (2014) Analyzing and Retrieving Remote Sensing Images from Large Data Archives

  • Zhang L, Lei L, Yan D (2010) “Comparison of two regression models for predicting crop yield,” In: 2010 IEEE International Geoscience and Remote Sensing Symposium. pp 1521–1524

  • Sun J, Yang J, Shi S, Chen B, Du L, Gong W, Song S (2017) “Estimating rice leaf nitrogen concentration: Influence of regression algorithms based on passive and active leaf reflectance,” Remote Sens vol 9. no. 9 [Online]. Available: https://www.mdpi.com/2072-4292/9/9/951

  • Silvestro PC, Pignatti S, Pascucci S, Yang H, Li Z, Yang G, Huang W, Casa R (2017) “Estimating wheat yield in china at the field and district scale from the assimilation of satellite data into the aquacrop and simple algorithm for yield (safy) models,” Remote Sens vol 9. no. 5 [Online]. Available: https://www.mdpi.com/2072-4292/9/5/509

  • Jalili M, Gharibshah J, Ghavami SM, Beheshtifar M, Farshi R (2014) Nationwide prediction of drought conditions in iran based on remote sensing data. IEEE Trans Comput 63:90–101

    Article  Google Scholar 

  • Dorjsuren M, Liou Y-A, Cheng C-H (2016) “Time series modis and in situ data analysis for mongolia drought,” Remote Sens vol 8. no. 6 [Online]. Available: https://www.mdpi.com/2072-4292/8/6/509

  • Rajasekar U, Weng Q (2009) Application of association rule mining for exploring the relationship between urban land surface temperature and biophysical/social parameters. Photogramm Eng Remote Sens 75:385–396

    Article  Google Scholar 

  • Wang F, Li W, Wang S, Johnson CR (2018) “Association rules-based multivariate analysis and visualization of spatiotemporal climate data,” ISPRS International Journal of Geo-Information vol 7. no. [Online]. Available: https://www.mdpi.com/2220-9964/7/7/266

  • Qamer FM, Shehzad K, Abbas S, Murthy M, Xi C, Gilani H, Bajracharya B (2016) “Mapping deforestation and forest degradation patterns in western himalaya, pakistan,” Remote Sens vol 8. no. 5 [Online]. Available: https://www.mdpi.com/2072-4292/8/5/385

  • Oxoli D, Ronchetti G, Minghini M, Molinari ME, Lotfian M, Sona G, Brovelli MA (2018) “Measuring urban land cover influence on air temperature through multiple geo-data-the case of milan, italy,” ISPRS International Journal of Geo-Information vol 7. no. 11 [Online]. Available: https://www.mdpi.com/2220-9964/7/11/421

  • Chatziantoniou A, Petropoulos GP, Psomiadis E (2017) “Co-Orbital Sentinel 1 and 2 for LULC Mapping with Emphasis on Wetlands in a Mediterranean Setting Based on Machine Learning,” Remote Sens vol 9. p 1259. no. 12

  • Fu Y, Zhao C, Wang J, Jia X, Yang G, Song X, Feng H (2017) “An improved combination of spectral and spatial features for vegetation classification in hyperspectral images,” Remote Sens vol 9. no. 3 [Online]. Available: https://www.mdpi.com/2072-4292/9/3/261

  • CFC author, Bognár P, Lichtenberger J, Hamar D, Tarcsai G, Timár G, Molnár G, Pásztor S, Steinbach P, Székely B, Ferencz OE, Ferencz-Árkos I (2004) “Crop yield estimation by satellite remote sensing," International Journal of Remote Sensing vol 25. pp 4113-4149 no. 20 [Online]. Available: https://doi.org/10.1080/01431160410001698870

  • Ulsig L, Nichol CJ, Huemmrich KF, Landis DR, Middleton EM, Lyapustin AI, Mammarella I, Levula J, Porcar-Castell A (2017) “Detecting inter-annual variations in the phenology of evergreen conifers using long-term modis vegetation index time series,” Remote Sens vol 9. no. 1 [Online]. Available: https://www.mdpi.com/2072-4292/9/1/49

  • Wang J, Huang J, Gao P, Wei C, Mansaray LR (2016) “Dynamic mapping of rice growth parameters using hj-1 ccd time series data,” Remote Sens vol 8. no. 11 [Online]. Available: https://www.mdpi.com/2072-4292/8/11/931

  • Silvestro PC, Pignatti S, Pascucci S, Yang H, Li Z, Yang G, Huang W, Casa R (2017) “Estimating wheat yield in china at the field and district scale from the assimilation of satellite data into the aquacrop and simple algorithm for yield (safy) models,” Remote Sens vol 9. no. 5 [Online]. Available: https://www.mdpi.com/2072-4292/9/5/509

  • Wei C, Huang J, Mansaray LR, Li Z, Liu W, Han J (2017) “Estimation and mapping of winter oilseed rape lai from high spatial resolution satellite data based on a hybrid method,” Remote Sens, vol 9. no. 5 [Online]. Available: https://www.mdpi.com/2072-4292/9/5/488

  • Taubenböck H, Staab J, Zhu XX, Geiß, Dech S, Wurm M (2018) “Are the poor digitally left behind? indications of urban divides based on remote sensing and twitter data,” ISPRS International Journal of Geo–Information, vol 7. no. 8 [Online]. Available: https://www.mdpi.com/2220-9964/7/8/304

  • Wu K, Du Q, Wang Y, Yang Y (2017) “Supervised sub-pixel mapping for change detection from remotely sensed images with different resolutions,” Remote Sens vol 9. no. 3 [Online]. Available: https://www.mdpi.com/2072-4292/9/3/284

  • Shahbaz M, Guergachi A, Noreen A, Shaheen M (2012) “Classification by object recognition in satellite images by using data mining,” Lecture Notes in Engineering and Computer Science vol 2197

  • Qi K, Yang C, Guan Q, Wu H, Gong J (2017) “A multiscale deeply described correlatons-based model for land-use scene classification,” Remote Sens vol 9. no. 9 [Online]. Available: https://www.mdpi.com/2072-4292/9/9/917

  • Huang B, Wang J (2020) “Big spatial data for urban and environmental sustainability,” Geo-spatial Information Science vol 23. pp 125–140 no. 2 [Online]. Available: https://doi.org/10.1080/10095020.2020.1754138

  • Xia H, Huang C-W, Li N, Zhang D (2019) Parsuc: A parallel subsampling-based method for clustering remote sensing big data. Sensors 19:3438

    Article  Google Scholar 

  • Birant D, Kut A (2019) “St-dbscan: An algorithm for clustering spatial-temporal data,” Data and Knowledge Engineering, vol 60. pp 208–221 no. 1. intelligent Data Mining. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0169023X06000218

  • Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) Optics: ordering points to identify the clustering structure. SIGMOD 28(2):49–60

  • An S, Yang H, Wang J (2018) “Revealing recurrent urban congestion evolution patterns with taxi trajectories,” ISPRS International Journal of Geo-Information vol 7. no. 4 [Online]. Available: https://www.mdpi.com/2220-9964/7/4/128

  • You W, Chenghu Z, Tao P (2017) “Semantic-geographic trajectory pattern mining based on a new similarity measurement,” ISPRS International Journal of Geo-Information vol 6. no. 7 [Online]. Available: https://www.mdpi.com/2220-9964/6/7/212

  • Wei C, Cabrera Barona P, Blaschke T (2017) “A new look at public services inequality: The consistency of neighborhood context and citizens’ perception across multiple scales,” ISPRS International Journal of Geo-Information vol 6. no. 7 [Online]. Available: https://www.mdpi.com/2220-9964/6/7/200

  • He B, Zhang Y, Chen Y, Gu Z (2018) “A simple line clustering method for spatial analysis with origin-destination data and its application to bike-sharing movement data,” ISPRS International Journal of Geo-Information vol 7. no. 6 [Online]. Available: https://www.mdpi.com/2220-9964/7/6/203

  • Xiaoying S, Zhenhai Y, Qiming F, Quan Z (2017) “A visual analysis approach for inferring personal job and housing locations based on public bicycle data,” ISPRS International Journal of Geo-Information vol 6. no. 7 [Online]. Available: https://www.mdpi.com/2220-9964/6/7/205

  • Späth H (1992) Mathematical algorithms for linear regression. Academic Press Professional Inc, USA

    Google Scholar 

  • A W, E P (2017) Multiple regression analysis for unmixing of surface temperature data in an urban environment, vol 9. Academic Press Professional Inc, USA., p 684

  • Khurshid H, Khan MF (2015) “Segmentation and classification using logistic regression in remote sensing imagery.” vol 8. pp 224–232

  • Rahman M, MHGCI, HBBJ (2014) “An assessment of polynomial regression techniques for the relative radiometric normalization (rrn) of high-resolution multi-temporal airborne thermal infrared (tir) imagery.” vol 6. pp 11810–11828

  • Mutanga O, Adam E, Cho M (2014) “High density biomass estimation for wetland vegetation using worldview-2 imagery and random forest regression algorithm.” vol 18. p 399-406

  • Caicedo JPR, Verrelst J, Munoz-Mari J, Moreno J, Camps-Valls G (2014) “Toward a semiautomatic machine learning retrieval of biophysical parameters.” vol 7, pp 1249–1259. no. 4

  • Bala Rajaratnam DS, Roberts S, Yu H (2019) “Influence diagnostics for high-dimensional lasso regression,” vol 28, pp 877–890. no. 4

  • Soomro BN, Xiao L, Huang L, Soomro SH, Molaei M (2016) “Bilayer elastic net regression model for supervised spectral-spatial hyperspectral image classification,” vol 9, pp 4102–4116. no. 9

  • Tian H, Li W, Wu M, Huang N, Li G, Li X, Niu Z (2017) “Dynamic monitoring of the largest freshwater lake in china using a new water index derived from high spatiotemporal resolution sentinel-1a data,” Remote Sens vol 9. no. 6 [Online]. Available: https://www.mdpi.com/2072-4292/9/6/521

  • Jung C, Lee Y, Cho Y, Kim S (2017) “A study of spatial soil moisture estimation using a multiple linear regression model and modis land surface temperature data corrected by conditional merging,” Remote Sens vol 9. no. 8 [Online]. Available: https://www.mdpi.com/2072-4292/9/8/870

  • Ratzmann G, Gangkofner U, Tietjen B, Fensholt R (2016) “Dryland vegetation functional response to altered rainfall amounts and variability derived from satellite time series data,” Remote Sens vol 8. no. 12 [Online]. Available: https://www.mdpi.com/2072-4292/8/12/1026

  • Shiliang L, Zhang Y, Fangyan C, Xiaoyun H, Shuang Z (2017) “Response of grassland degradation to drought at different time-scales in qinghai province: Spatio-temporal characteristics, correlation, and implications,” Remote Sens vol 9. no. 12 [Online]. Available: https://www.mdpi.com/2072-4292/9/12/1329

  • Sakai T, Matsunaga T, Maksyutov S, Gotovtsev S, Gagarin L, Hiyama T, Yamaguchi Y (2016) “Climate-induced extreme hydrologic events in the arctic,” Remote Sens vol 8. no. 11 [Online]. Available: https://www.mdpi.com/2072-4292/8/11/971

  • Tomppo E, Gagliano C, De Natale F, Katila M, Mcroberts R (2009) “Predicting categorical forest variables using an improved k-nearest neighbour estimator and landsat imagery.” vol 113 pp 500–517

  • Pham B, Tien Bui D, Pourghasemi HR, Prakash I, Dholakia M (2015) “Landslide susceptibility assessment in the uttarakhand area (india) using gis: a comparison study of prediction capability of naïve bayes, multilayer perceptron neural networks, and functional trees methods.” vol 112

  • MP M, J R, C A, P A, E P, CAO V, BFT R (2013) “Bayesian networks for raster data (baynerd): Plausible reasoning from observations.” no. 5, 2013, pp. 5999–6025

  • Rahman MR, Lateh HB (2015) Climate change in bangladesh a spatio-temporal analysis and simulation of recent temperature and rainfall data using gis and time series analysis model. Theor Appl Climatol 128:27–41

    Article  Google Scholar 

  • Nhita F, Saepudin D, Adiwijaya, Wisesty UN (2015) “Comparative study of moving average on rainfall time series data for rainfall forecasting based on evolving neural network classifier,” In: 2015 3rd International Symposium on Computational and Business Intelligence (ISCBI). pp 112–116

  • Hu Z, Zhang Y, Yao L (2014) “Radial basis function neural network with particle swarm optimization algorithms for regional logistics demand prediction.” Discret Dyn Nat Soc

  • Jalili M, Gharibshah J, Ghavami SM, Beheshtifar M, Farshi R (2014) “Nationwide prediction of drought conditions in iran based on remote sensing data,” IEEE Trans Comput vol 63. pp 90–101. no. 1

  • Stojanova D, Panov P, Kobler A, Džeroski S, Tažkova K (2006) Learning to predict forest fires with different data mining techniques

  • Pokhriyal N, Jacques DC (2017) “Combining disparate data sources for improved poverty prediction and mapping,” Proceedings of the National Academy of Sciences, vol 114. pp E9783–E9792. no. 46 [Online]. Available: https://www.pnas.org/content/114/46/E9783

  • Tingzon I, Orden A, Sy S, Sekara V, Ingmar, Weber, Fatehkia M, Herranz M, Kim D-H (2019) “Mapping poverty in the philippines using machine learning, satellite imagery, and crowd-sourced geospatial information,”

  • Subash SP, Kumar R, Aditya K (2018) “Satellite data and machine learning tools for predicting poverty in rural india,”

  • Gómez D, Salvador P, Sanz J, Casanova JL (2019) “Potato yield prediction using machine learning techniques and sentinel 2 data,” Remote Sens vol 11. no. 15 [Online]. Available: https://www.mdpi.com/2072-4292/11/15/1745

  • Christodoulou V, Bi Y, Wilkie G (2019) “A tool for swarm satellite data analysis and anomaly detection,” PLOS ONE vol 14. pp 1–20 no. 4 [Online]. Available: https://doi.org/10.1371/journal.pone.0212098

  • Hu Z, Zhang Y, Yao L (2016) “Detecting anomaly regions in satellite image time series based on sesaonal autocorrelation analysis,” ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III3. p 303

  • Zhu Fuying FN, Yun W (2011) “Application of kalman filter in detecting pre-earthquake ionospheric tec anomaly,” Geodesy and Geodynamics. vol 2. no. 43-47

  • Tomppo E, Gagliano C, De Natale F, Katila M, Mcroberts R (2009) Predicting categorical forest variables using an improved k-nearest neighbour estimator and landsat imagery. Remote Sens Environ 113:500–517

    Article  Google Scholar 

  • Hamlet C, Straub J, Russell M, Kerlin S (2017) “An incremental and approximate local outlier probability algorithm for intrusion detection and its evaluation,” Journal of Cyber Security Technology vol 1. pp 75–87. no. 2 [Online]. Available: https://doi.org/10.1080/23742917.2016.1226651

  • US Goldstein M (2016) A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE 11(4)

  • Koonsanit K, Jaruskulchai C (2011) Finding and detection of outlier regions in satellite image. International Conference on Network and Electronics Engineering IPCSIT vol.11 (2011) (2011) IACSIT Press, Singapore

  • Chandola V, Vatsavai R (2011) A Gaussian Process Based Online Change Detection Algorithm for Monitoring Periodic Time Series. Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011. p 95–106. https://doi.org/10.1137/1.9781611972818.9

  • LJ K et al. (2015) “Pairs: A scalable geo-spatial data analytics platform.” IEEE International Conference on Big Data(Big Data), Santa Clara, CA, no. 1290-1298

  • Maatouki MSA, Meyer J, Streit A (2015) “A horizontally-scalable multiprocessing platform based on node.js.” IEEE Trustcom/BigDataSE/ISPA, Helsinki. no. 100-107

  • JY Z, Q L, HW Z, (2011) “A cloud-based system for spatial analysis service.” International Conference on Remote Sensing, Environment and Transportation Engineering (RSETE), Nanjing. no. 24-26

  • Nieuwejaar N, Kotz D, Purakayastha A, Ellis C, Best M (1996) “File-access characteristics of parallel scientific workloads.” IEEE Trans Parallel Distrib Syst vol 7. no. 1075–1089

  • G ZZ, P T, M Z (2016) “Detecting Anomaly Regions in Satellite Image Time Series Based on Sesaonal Autocorrelation Analysis,” ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences vol III3. pp 303–310

  • Prasad L, Theiler J, Fair M, Swaminarayan S (2012) “Feature extraction, anomaly, and change detection on WorldView-2 imagery by hierarchical image segmentation: a study,” In: Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XVIII. Shen SS, Lewis PE (Eds.), vol 8390. International Society for Optics and Photonics. SPIE, pp 560–570 [Online]. Available: https://doi.org/10.1117/12.919295

  • Plank S, Twele A, Martinis S (2016) “Landslide mapping in vegetated areas using change detection based on optical and polarimetric sar data,” Remote Sens vol 8. no. 4 [Online]. Available: https://www.mdpi.com/2072-4292/8/4/307

  • Xu F, Liu J, Sun M, Zeng D, Wang X (2017) “A hierarchical maritime target detection method for optical remote sensing imagery,” Remote Sens vol 9. no. 3 [Online]. Available: https://www.mdpi.com/2072-4292/9/3/280

  • Bhaduri K VP, Das K,(2010) “Distributed anomaly detection using satellite data from multiple modalities.” NASA conference on intelligent data understanding (CIDU’ 10) no. 109–123

  • Yan F, Zhang S, Liu X, Chen D, Chen J, Bu K, Yang J, Chang L (2016) “The effects of spatiotemporal changes in land degradation on ecosystem services values in sanjiang plain, china,” Remote Sens vol 8. no. 11 [Online]. Available: https://www.mdpi.com/2072-4292/8/11/917

  • Batran M, Mejia MG, Kanasugi H, Sekimoto Y, Shibasaki R (2018) “Inferencing human spatiotemporal mobility in greater maputo via mobile phone big data mining,” ISPRS International Journal of Geo-Information vol 7. no. 7 [Online]. Available: https://www.mdpi.com/2220-9964/7/7/259

  • Wurihan Zhang H, Zhang Z, Guo X, Zhao J, Duwala Shan Y, Hongying (2018) Understanding the Spatio-Temporal Pattern of Fire Disturbance in the Eastern Mongolia Using Modis Product. ISPRS - Information Sciences Int Arch Photogramm Remote Sens Spat Inf Sci 42(3):1921–1924

  • Rajasekar U, Weng Q (2009) Application of association rule mining for exploring the relationship between urban land surface temperature and biophysical/social parameters. Photogramm Eng Remote Sens 75:385–396

    Article  Google Scholar 

  • Liu L,Yang X, Liu H, Wang M, Welles S, Marquez S, Frank A, Haas C (2016) “Spatial–temporal analysis of air pollution, climate change, and total mortality in 120 cities of china, 2012–2013,” Frontiers in Public Health vol 4

  • Wang F, Li W, Wang S, Johnson CR (2018) “Association rules-based multivariate analysis and visualization of spatiotemporal climate data,” ISPRS International Journal of Geo-Information vol 7. no. 7 [Online]. Available: https://www.mdpi.com/2220-9964/7/7/266

  • Ding Z, Liao X, Su F, Fu D (2017) “Mining coastal land use sequential pattern and its land use associations based on association rule mining,” Remote Sens vol 9. no. 2 [Online]. Available: https://www.mdpi.com/2072-4292/9/2/116

  • Shaheen M, Shahbaz M, Guergachi A (2013) “Context based positive and negative spatio-temporal association rule mining,” Knowledge-Based Systems vol 37. pp 261–273 [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0950705112002237

  • Muhammad Shaheen UA (2021) “Carm: Context based association rule mining for conventional data,” Computers, Materials and Continua vol 68. pp 3305–3322 no. 3 [Online]. Available: http://www.techscience.com/cmc/v68n3/42485

  • Shaheen M, Khan S (2022) “Wisrule: First cognitive algorithm of wise association rule mining,” J Inf Sci

  • Fangjie M, Xuejian L, Huaqiang D, Guomo Z, Ning H, Xiaojun X, Yuli L, Liang C, Lu C (2017) “Comparison of two data assimilation methods for improving modis lai time series for bamboo forests,” Remote Sens vol 9. no. 5, 2017. [Online]. Available: https://www.mdpi.com/2072-4292/9/5/401

  • Pajic V, Govedarica M, Amovic M (2018) “Model of point cloud data management system in big data paradigm,” ISPRS International Journal of Geo-Information vol 7. no. 7 [Online]. Available: https://www.mdpi.com/2220-9964/7/7/265

  • Kang X, Liu J, Dong C, Xu S (2018) “Using high-performance computing to address the challenge of land use/land cover change analysis on spatial big data,” ISPRS International Journal of Geo-Information vol 7. no. 7 [Online]. Available: https://www.mdpi.com/2220-9964/7/7/273

  • Zhang N, Deng S, Chen H, Chen X, Chen J, Li X, Zhang Y (2018) “Structured knowledge base as prior knowledge to improve urban data analysis,” ISPRS International Journal of Geo-Information vol 7. no. 7 [Online]. Available: https://www.mdpi.com/2220-9964/7/7/264

  • Mathew A, Sreekumar S, Khandelwal S, Kaul N, Kumar R (2016a) “Prediction of land-surface temperatures of jaipur city using linear time series model,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing vol 9. pp 3546–3552 no. 8

  • Mathew A, Sreekumar S, Khandelwal S, Kaul N, Kumar R (2016b) “Prediction of surface temperatures for the assessment of urban heat island effect over ahmedabad city using linear time series model,” Energy and Buildings vol 128. pp 605–616. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0378778816306004

  • Salcedo-Sanz S, Ghamisi P, Piles M, Werner M, Cuadra L, Moreno-Martínez A, Izquierdo-Verdiguier E, Muñoz-Marí J, Mosavi A, Camps-Valls G (2020) “Machine learning information fusion in earth observation: A comprehensive review of methods, applications and data sources,” Information Fusion vol 63. pp 256–272 [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1566253520303171

  • Sisodiya N, Dube N, Thakkar P (2020) Next-Generation Artificial Intelligence Techniques for Satellite Data Processing pp 235–254

  • Manogaran G, Lopez D (2018) “Spatial cumulative sum algorithm with big data analytics for climate change detection,” Computers & Electrical Engineering vol 65. pp 207–221 [Online]. Available: http://www.sciencedirect.com/science/article/pii/S004579061730811X

  • Kurte K, Sanyal J, Berres A, Lunga D, Coletti M, Yang L, Graves D, Liebersohn B, Rose A (2019) Performance analysis and optimization for scalable deployment of deep learning models for country-scale settlement mapping on titan supercomputer. Concurrency and Computation: Practice and Experience 31:e5305

    Article  Google Scholar 

  • Merritt P, Bi H, Davis B, Windmill C, Xue Y, (2018) “Big earth data: a comprehensive analysis of visualization analytics issues,” Big Earth Data vol 2. no. 4, pp 321–350. [Online]. Available: https://doi.org/10.1080/20964471.2019.1576260

  • Arvor D, Belgiu M, Falomir Z, Mougenot I, Durieux L (2019) “Ontologies to interpret remote sensing images: why do we need them?” GIScience and Remote Sensing vol 56. pp 911–939. no. 6 [Online]. Available: https://doi.org/10.1080/15481603.2019.1587890

  • Andrés S, Arvor D, Mougenot I, Libourel T, Durieux L (2017) “Ontology-based classification of remote sensing images using spectral rules,” Computers and Geosciences vol 102. pp 158–166. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0098300417302078

  • Sun K, Zhu Y, Pan P, Hou Z, Wang D, Li W, Song J (2019) “Geospatial data ontology: the semantic foundation of geospatial data integration and sharing,” Big Earth Data vol 3. pp 269–296. no. 3 [Online]. Available: https://doi.org/10.1080/20964471.2019.1661662

  • Shengzhou X, Yihua T, Yansheng L, Cai W, Pei Y (2021) “Subtask attention based object detection in remote sensing images,” Remote Sens vol. 13. no. 10 [Online]. Available: https://www.mdpi.com/2072-4292/13/10/1925

  • Pan E, Ma Y, Fan F, Mei X, Huang J (2021) “Hyperspectral image classification across different datasets: A generalization to unseen categories,” Remote Sens vol 13. no. 9 [Online]. Available: https://www.mdpi.com/2072-4292/13/9/1672

  • Feng M, Bai Y (2019) “A global land cover map produced through integrating multi-source datasets,” Big Earth Data vol 3 pp 191–219 no. 3 [Online]. Available: https://doi.org/10.1080/20964471.2019.1663627

  • Rousi M, Sitokonstantinou V, Meditskos G, Papoutsis I, Gialampoukidis I, Koukos A, Karathanassi V, Drivas T, Vrochidis S, Kontoes C, Kompatsiaris I (2021) Semantically enriched crop type classification and linked earth observation data to support the common agricultural policy monitoring. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14:529–552

    Article  Google Scholar 

  • Ouyang S, Li Y (2021) “Combining deep semantic segmentation network and graph convolutional neural network for semantic segmentation of remote sensing imagery,” Remote Sens vol 13. no. 1 [Online]. Available: https://www.mdpi.com/2072-4292/13/1/119

  • Masmoudi M, Lamine SBAB, Zghal HB, Archimede B, Karray MH (2021) “Knowledge hypergraph-based approach for data integration and querying: Application to earth observation,” Future Generation Computer Systems vol 115. pp 720–740 [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X20311961

  • Yan S, Yao X, Zhu D, Liu D, Zhang L, Yu G, Gao B, Yang J, Yun W (2021) “Large-scale crop mapping from multi-source optical satellite imageries using machine learning with discrete grids,” International Journal of Applied Earth Observation and Geoinformation vol 103. p 102485 [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0303243421001926

  • Tan CW, Webb G, Petitjean F (2017) Indexing and classifying gigabytes of time series under time warping. pp 282–290

  • Tan CW, Herrmann M, Forestier G, Webb G, Petitjean F (2018) “Efficient search of the best warping window for dynamic time warping,”

  • Pérez-Suay A, Amorós-López J, Gómez-Chova L, Laparra V, Munoz-Marí and Camps-Valls G (2017) “Randomized kernels for large scale earth observation applications,” Remote Sensing of Environment vol 202. pp 54–63 big Remotely Sensed Data: tools, applications and experiences. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0034425717300615

  • Cavallaro G, Riedel M, Bodenstein C, Glock P, Richerzhagen M, M. Götz, and J. Benediktsson (2015) “Scalable developments for big data analytics in remote sensing,” pp 1366–1369

  • Cai Y, Zhang Z, Liu Y, Ghamisi P, Li K, Liu X, Cai Z (2021) “Large-scale hyperspectral image clustering using contrastive learning,” CoRR, vol. abs/2111.07945 [Online]. Available: arXiv:2111.07945

  • Ng R, Han J (2002) Clarans: A method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14:1003–1016

    Article  Google Scholar 

  • Mahfouz M, Ismail M (2009) Fuzzy relatives of the clarans algorithm with application to text clustering. Engineering and Technology vol, World Academy of Science, p 37

    Google Scholar 

  • Shaheen M, Khan MZ (2016) “A method of data mining for selection of site for wind turbines,” Renewable and Sustainable Energy Reviews vol 55. pp 1225–1233 [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1364032115002853

  • Shahabi H, Rahimzad M, Tavakkoli Piralilou S, Ghorbanzadeh O, Homayouni S, Blaschke T, Lim S, Ghamisi P (2021) “Unsupervised deep learning for landslide detection from multispectral sentinel-2 imagery,” Remote Sensing vol 13. no. 22 [Online]. Available: https://www.mdpi.com/2072-4292/13/22/4698

  • Liu Y (2017) “Low-rank tensor regression: Scalability and applications,” In: 2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp 1–5

  • Pokhriyal N, Jacques DC (2017) “Combining disparate data sources for improved poverty prediction and mapping,” Proceedings of the National Academy of Sciences vol 114, pp E9783–E9792 no. 46 [Online]. Available: https://doi.org/10.1073/pnas.1700319114

  • Oliveira I, de Freitas Cunha RL, Silva B, Netto MAS (2018) “A scalable machine learning system for pre-season agriculture yield forecast,” CoRR vol abs/1806.09244, 2018. [Online]. Available: arXiv:1806.09244

  • Gauch M, Kratzert F, Klotz D, Nearing G, Lin J, Hochreiter S (2021) Rainfall runoff prediction at multiple timescales with a single long short-term memory network. Hydrol Earth Syst Sci 25:2045–2062

    Article  Google Scholar 

  • Siddiqui T, Alam A, Jain S (2012) “Discovery of scalable association rules from large set of multidimensional quantitative datasets,” Journal of Advances in Information Technology vol 3

  • Jayababu Y, Varma G, Govardhan A (2018) “Incremental topological spatial association rule mining and clustering from geographical datasets using probabilistic approach,” Journal of King Saud University - Computer and Information Sciences vol 30 pp 510–523. no. 4 [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1319157816301549

  • DeLancey ER, Kariyeva J, Bried JT, Hird JN (2019) “Large-scale probabilistic identification of boreal peatlands using google earth engine, open-access satellite data, and machine learning,” PLOS ONE vol 14. pp 1–23 no. 6 [Online]. Available: https://doi.org/10.1371/journal.pone.0218165

  • Awad M (2021) “Google earth engine (gee) cloud computing based crop classification using radar, optical images and support vector machine algorithm (svm),” In: 2021 IEEE 3rd International Multidisciplinary Conference on Engineering Technology (IMCET) pp 71–76

  • Aprilianti HS, Ari RA, Ranti A, Aslam MF (2021) “Identification and classification of cloud computing-based vegetation index values on several lands used in bogor regency, indonesia,” IOP Conference Series: Earth and Environmental Science vol 918. p 012011. no. 1 [Online]. Available: https://doi.org/10.1088/1755-1315/918/1/012011

  • Praveen B, Mustak S, Sharma P (2019) “Assessing the transferability of machine learning algorithms using cloud computing and earth observation datasets for agricultural land use/cover mapping,” vol XLII-3/W6, pp 585–592

  • Zou Q, Li G, Yu W (2020) “Cloud computing based on computational characteristics for disaster monitoring,” Applied Sciences vol 10 no. 19 [Online]. Available: https://www.mdpi.com/2076-3417/10/19/6676

  • Antunes RR, Blaschke T, Tiede D, de Souza Bias E, da Costa GAOP, Happ PN (2018) Proof of concept of a novel cloud computing approach for object-based remote sensing data analysis and classification. GIScience and Remote Sensing 56:536–553

    Article  Google Scholar 

  • Hyrkas J, Clayton S, Ribalet F, Halperin D, Armbrust E, Howe B (2015) “Scalable clustering algorithms for continuous environmental flow cytometry,” Bioinformatics (Oxford, England) vol 32

  • Yin W, Simmhan Y, Prasanna VK (2012) “Scalable regression tree learning on hadoop using openplanet,” In: Proceedings of Third International Workshop on MapReduce and Its Applications Date, ser. MapReduce ’12. New York, NY, USA: Association for Computing Machinery. p 57–64. [Online]. Available: https://doi.org/10.1145/2287016.2287027

  • Appel M, Lahn F, Buytaert W, Pebesma E (2018) “Open and scalable analytics of large earth observation datasets: From scenes to multidimensional arrays using scidb and gdal,” ISPRS J Photogramm Remote Sens vol 138. pp 47–56 [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0924271617300898

  • Paudel D, Boogaard H, de Wit A, Janssen S, Osinga S, Pylianidis C, Athanasiadis IN (2021) “Machine learning for large-scale crop yield forecasting,” Agric Syst vol 187. p 103016, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0308521X20308775

  • Yao X, Li G, Xia J, Ben J, Cao Q, Zhao L, Ma Y, Zhang L, Zhu D (2020) “Enabling the big earth observation data via cloud computing and dggs: Opportunities and challenges,” Remote Sens vol 12. no. 1 [Online]. Available: https://www.mdpi.com/2072-4292/12/1/62

  • Un UN (2015) “Transforming our world: the 2030 agenda for sustainable development.” Working Papers, eSocialSciences, pp 1–4. [Online]. Available: https://EconPapers.repec.org/RePEc

  • Pause M, Schweitzer C, Rosenthal M, Keuck V, Bumberger J, Dietrich P, Heurich M, Jung A, Lausch A (2016) “In situ/remote sensing integration to assess forest health a review,” Remote Sens vol 8. no. 6 [Online]. Available: https://www.mdpi.com/2072-4292/8/6/471

  • Stojanova D, Panov P, Kobler A, Dzeroski S, Taskova K (2006) “Learning to predict forest fires with different data mining techniques,”

  • Wurihan, Zhang H, Zhang Z, Guo X, Zhao J, Duwala, Shan Y, Hong-ying (2018b) “Understanding the spatio-temporal pattern of fire disturbance in the eastern mongolia using modis product,”

  • Xu F, Liu J, Sun M, Zeng D, Wang X (2017) “A hierarchical maritime target detection method for optical remote sensing imagery,” Remote Sens vol 9. no 3 [Online]. Available: https://www.mdpi.com/2072-4292/9/3/280

  • Navalgund R, Jayanthi S (2004) “Role of earth observations for sustainable development: Emerging trends (ss1: Icorse earth observation systems for sustainable development),”

  • “Earth observation and sustainable development goals in the netherlands,” towards more synergetic use of Earth Observation: An exploratory study. (Updated in 2021) https://www.spaceoffice.nl/. Accessed on 23 Nov 2022

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. The first draft of the manuscript was written by Neha Sisodiya. Neha Sisodiya wrote the main manuscript text, prepared all figures and entire work is being supervised by Nitant Dube and Priyank Thakkar. All authors read and approved the final manuscript .

Corresponding author

Correspondence to Neha Sisodiya.

Ethics declarations

Competing Interests and Funding

The authors declare that they have no competing interests. No funding was received for this study.

Additional information

Communicated by: H. Babaie.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sisodiya, N., Dube, N., Prakash, O. et al. Scalable big earth observation data mining algorithms: a review. Earth Sci Inform 16, 1993–2016 (2023). https://doi.org/10.1007/s12145-023-01032-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-023-01032-5

Keywords

Navigation