Abstract
The volume of spatial data increased tremendously, and growing attention has been paid to the research of distributed system for spatial data analysis. Spark, an in-memory distributed system which performs much better than Hadoop in speed and many other aspects, lacks spatial SQL query extensions. In this paper, we study the technology framework of Spark SQL, and implement the spatial query extension system tightly combined with the native Spark system. The extensions in the system include spatial types, spatial operators, spatial query optimizations and spatial data source formats. The spatial extension system on Spark retains the scalability and can be further extended with more query optimizations and data source formats. In this paper, the spatial data type system and spatial operator system follow OGC standards. In addition, the extension method is also a general method of query extensions on Spark SQL in other fields.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. OSDI 2004: 1 (2004, to appear)
Zaharia, M., Chowdhury, M., Franklin, M.J., et al.: Spark: cluster computing with working sets. In: USENIX Conference on Hot Topics in Cloud Computing. USENIX Association, pp. 1765–1773 (2010)
Xin, R.S., Rosen, J., Zaharia, M., Franklin, M.J., Shenker, S., Stoica, I.: Shark: SQL and rich analytics at scale. In: SIGMOD Conference, pp. 13–24 (2013)
Apache spark: lightning-fast cluster computing. http://spark.apache.org
Eldawy, A., Mokbel, M.: A demonstration of SpatialHadoop: an efficient MapReduce framework for spatial data. Proc. VLDB Endow. 6(12), 1230–1233 (2013)
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI 2012 (2012)
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD 1990), pp. 322–331. ACM, New York (1990)
Neis, P., Zipf, A.: Analyzing the contributor activity of a volunteered geographic information project—the case of OpenStreetMap. ISPRS Int. J. Geo-Inf. 1(2), 146–165 (2012)
Fox, A., Eichelberger, C., Hughes, J., Lyon, S.: Spatio-temporal indexing in non-relational distributed databases. In: Proceedings of the IEEE International Conference on Big Data, October 2013
Armbrust, M., Xin, R.S., Lian, C., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ACM (2015)
OpenGIS® Implementation Standard for Geographic information - Simple feature access - Part 2: SQL option. 2010 Open Geospatial Consortium, Inc
Java Topology Suite. http://www.vividsolutions.com/jts/main.htm
The GeoJSON Format Specification. http://geojson.org/geojson-spec.html
Yu, J., Wu, J., Sarwat, M.: Geospark: a cluster computing framework for processing large-scale spatial data. In: SIGSPATIAL International Conference (2015)
R project for statistical computing. http://www.r-project.org
Armbrust, M., Xin, R.S., Lian, C.: Spark SQL: relational data processing in spark. In: SIGMOD Conference (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Meng, Q., Ma, X., Lu, W., Yao, Z. (2017). A Spatial SQL Based on SparkSQL. In: Yuan, H., Geng, J., Bian, F. (eds) Geo-Spatial Knowledge and Intelligence. GRMSE 2016. Communications in Computer and Information Science, vol 698. Springer, Singapore. https://doi.org/10.1007/978-981-10-3966-9_50
Download citation
DOI: https://doi.org/10.1007/978-981-10-3966-9_50
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3965-2
Online ISBN: 978-981-10-3966-9
eBook Packages: Computer ScienceComputer Science (R0)