Abstract
Earth remote sensing has always been a source of “big” data. Satellite data have inspired the development of “array” DBMS. An array DBMS processes N-dimensional (N-d) arrays utilizing a declarative query style to simplify raster data management and processing. However, raster data are traditionally stored in files, not in databases. Respective command line tools have long been developed to process these files. Most tools are feature-rich and free but optimized for a single machine. The approach of partially delegating in situ raster data processing to such tools has been recently proposed. The approach includes a new formal N-d array data model to abstract from the files and the tools as well as new distributed algorithms based on the model. This paper extends the approach with a new algorithm for the reshaping (tiling) of N-d arrays. The algorithm physically reorganizes the storage layout of N-d arrays to obtain an order of magnitude speedup. The extended approach outperforms SciDB up to 28\(\times \) on retrospective Landsat data – one of the most typical and popular kind of satellite imagery. SciDB is the only freely available distributed array DBMS to date. Experiments were carried out on an 8-node cluster in Microsoft Azure Cloud.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
ArcGIS for server|Image Extension. http://www.esri.com/software/arcgis/arcgisserver/extensions/image-extension
Baumann, P., Holsten, S.: A comparative analysis of array models for databases. Int. J. Database Theory Appl. 5(1), 89–120 (2012)
Blanas, S., Wu, K., Byna, S., Dong, B., Shoshani, A.: Parallel data analysis directly on scientific file formats. In: ACM SIGMOD 2014, pp. 385–396 (2014)
Coverity scan: GDAL. https://scan.coverity.com/projects/gdal
Earth on AWS. https://aws.amazon.com/earth/
GeoTIFF. http://trac.osgeo.org/geotiff/
Landsat apps. https://aws.amazon.com/blogs/aws/start-using-landsat-on-aws/
Landsat project statistics. https://landsat.usgs.gov/landsat-project-statistics
Nativi, S., Caron, J., Domenico, B., Bigagli, L.: Unidata’s common data model mapping to the ISO 19123 data model. Earth Sci. Inform. 1, 59–78 (2008)
NCO homepage. http://nco.sourceforge.net/
Not enough memory error - SciDB forum. http://forum.paradigm4.com/t/problem-with-memory-while-stacking-array/1838
Oracle spatial and graph. http://www.oracle.com/technetwork/database/options/spatialandgraph/overview/index.html
PostGIS raster data management. http://postgis.net/docs/manual-2.2/using_raster_dataman.html
RasDaMan homepage. http://rasdaman.org/
Rodriges Zalipynis, R.A.: Chronosserver: real-time access to “native” multi-terabyte retrospective data warehouse by thousands of concurrent clients. Inform. Cybern. Comput. Eng. 14(188), 151–161 (2011)
Rodriges Zalipynis, R.A.: ChronosServer: fast in situ processing of large multidimensional arrays with command line tools. In: Voevodin, V., Sobolev, S. (eds.) RuSCDays 2016. CCIS, vol. 687, pp. 27–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-55669-7_3
Rodriges Zalipynis, R.A.: Distributed in situ processing of big raster data in the cloud. In: Perspectives of System Informatics - 11th International Andrei Ershov Informatics Conference, PSI 2017, Moscow, Russia, June 27–29, 2017, Revised Selected Papers. Lecture Notes in Computer Science, LNCS. Springer (2017, in press)
SciDB homepage. http://www.paradigm4.com/
Acknowledgments
This work was partially supported by Russian Foundation for Basic Research (grant №16-37-00416). We also thank anonymous reviewers for their helpful and inspiring comments.
Contributions. Rodriges: all text, figures, algorithms, ChronosServer, its data model, Azure management code, SciDB import code, experimental setup. Pozdeev: SciDB cluster deployment. Bryukhov: partial implementation of the reshaping algorithm for one machine, adapted SciDB import code to Landsat data. All authors: experiments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Rodriges Zalipynis, R.A., Bryukhov, A., Pozdeev, E. (2017). Retrospective Satellite Data in the Cloud: An Array DBMS Approach. In: Voevodin, V., Sobolev, S. (eds) Supercomputing. RuSCDays 2017. Communications in Computer and Information Science, vol 793. Springer, Cham. https://doi.org/10.1007/978-3-319-71255-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-71255-0_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71254-3
Online ISBN: 978-3-319-71255-0
eBook Packages: Computer ScienceComputer Science (R0)