A Spatial SQL Based on SparkSQL

Meng, Qingyun; Ma, Xiujun; Lu, Wei; Yao, Zerong

doi:10.1007/978-981-10-3966-9_50

Qingyun Meng¹³,
Xiujun Ma¹³,
Wei Lu¹³ &
…
Zerong Yao¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 698))

Included in the following conference series:

International Conference on Geo-Informatics in Resource Management and Sustainable Ecosystem

1625 Accesses
1 Citations

Abstract

The volume of spatial data increased tremendously, and growing attention has been paid to the research of distributed system for spatial data analysis. Spark, an in-memory distributed system which performs much better than Hadoop in speed and many other aspects, lacks spatial SQL query extensions. In this paper, we study the technology framework of Spark SQL, and implement the spatial query extension system tightly combined with the native Spark system. The extensions in the system include spatial types, spatial operators, spatial query optimizations and spatial data source formats. The spatial extension system on Spark retains the scalability and can be further extended with more query optimizations and data source formats. In this paper, the spatial data type system and spatial operator system follow OGC standards. In addition, the extension method is also a general method of query extensions on Spark SQL in other fields.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. OSDI 2004: 1 (2004, to appear)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., et al.: Spark: cluster computing with working sets. In: USENIX Conference on Hot Topics in Cloud Computing. USENIX Association, pp. 1765–1773 (2010)
Google Scholar
Xin, R.S., Rosen, J., Zaharia, M., Franklin, M.J., Shenker, S., Stoica, I.: Shark: SQL and rich analytics at scale. In: SIGMOD Conference, pp. 13–24 (2013)
Google Scholar
Apache spark: lightning-fast cluster computing. http://spark.apache.org
Eldawy, A., Mokbel, M.: A demonstration of SpatialHadoop: an efficient MapReduce framework for spatial data. Proc. VLDB Endow. 6(12), 1230–1233 (2013)
Article Google Scholar
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)
Article Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI 2012 (2012)
Google Scholar
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD 1990), pp. 322–331. ACM, New York (1990)
Google Scholar
Neis, P., Zipf, A.: Analyzing the contributor activity of a volunteered geographic information project—the case of OpenStreetMap. ISPRS Int. J. Geo-Inf. 1(2), 146–165 (2012)
Article Google Scholar
Fox, A., Eichelberger, C., Hughes, J., Lyon, S.: Spatio-temporal indexing in non-relational distributed databases. In: Proceedings of the IEEE International Conference on Big Data, October 2013
Google Scholar
Armbrust, M., Xin, R.S., Lian, C., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ACM (2015)
Google Scholar
OpenGIS® Implementation Standard for Geographic information - Simple feature access - Part 2: SQL option. 2010 Open Geospatial Consortium, Inc
Google Scholar
Java Topology Suite. http://www.vividsolutions.com/jts/main.htm
The GeoJSON Format Specification. http://geojson.org/geojson-spec.html
Yu, J., Wu, J., Sarwat, M.: Geospark: a cluster computing framework for processing large-scale spatial data. In: SIGSPATIAL International Conference (2015)
Google Scholar
R project for statistical computing. http://www.r-project.org
Armbrust, M., Xin, R.S., Lian, C.: Spark SQL: relational data processing in spark. In: SIGMOD Conference (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Machine Intelligence, School of EECS, Peking University, Beijing, China
Qingyun Meng, Xiujun Ma, Wei Lu & Zerong Yao

Authors

Qingyun Meng
View author publications
You can also search for this author in PubMed Google Scholar
Xiujun Ma
View author publications
You can also search for this author in PubMed Google Scholar
Wei Lu
View author publications
You can also search for this author in PubMed Google Scholar
Zerong Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Lu .

Editor information

Editors and Affiliations

Beijing Institute of Technology, Beijing, China
Hanning Yuan
Beijing Institute of Technology, Beijing, China
Jing Geng
Wuhan University, Wuhan, China
Fuling Bian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meng, Q., Ma, X., Lu, W., Yao, Z. (2017). A Spatial SQL Based on SparkSQL. In: Yuan, H., Geng, J., Bian, F. (eds) Geo-Spatial Knowledge and Intelligence. GRMSE 2016. Communications in Computer and Information Science, vol 698. Springer, Singapore. https://doi.org/10.1007/978-981-10-3966-9_50

Download citation

DOI: https://doi.org/10.1007/978-981-10-3966-9_50
Published: 03 March 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3965-2
Online ISBN: 978-981-10-3966-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics