Skip to main content

A Spatial SQL Based on SparkSQL

  • Conference paper
  • First Online:
Geo-Spatial Knowledge and Intelligence (GRMSE 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 698))

Abstract

The volume of spatial data increased tremendously, and growing attention has been paid to the research of distributed system for spatial data analysis. Spark, an in-memory distributed system which performs much better than Hadoop in speed and many other aspects, lacks spatial SQL query extensions. In this paper, we study the technology framework of Spark SQL, and implement the spatial query extension system tightly combined with the native Spark system. The extensions in the system include spatial types, spatial operators, spatial query optimizations and spatial data source formats. The spatial extension system on Spark retains the scalability and can be further extended with more query optimizations and data source formats. In this paper, the spatial data type system and spatial operator system follow OGC standards. In addition, the extension method is also a general method of query extensions on Spark SQL in other fields.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. OSDI 2004: 1 (2004, to appear)

    Google Scholar 

  2. Zaharia, M., Chowdhury, M., Franklin, M.J., et al.: Spark: cluster computing with working sets. In: USENIX Conference on Hot Topics in Cloud Computing. USENIX Association, pp. 1765–1773 (2010)

    Google Scholar 

  3. Xin, R.S., Rosen, J., Zaharia, M., Franklin, M.J., Shenker, S., Stoica, I.: Shark: SQL and rich analytics at scale. In: SIGMOD Conference, pp. 13–24 (2013)

    Google Scholar 

  4. Apache spark: lightning-fast cluster computing. http://spark.apache.org

  5. Eldawy, A., Mokbel, M.: A demonstration of SpatialHadoop: an efficient MapReduce framework for spatial data. Proc. VLDB Endow. 6(12), 1230–1233 (2013)

    Article  Google Scholar 

  6. Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)

    Article  Google Scholar 

  7. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI 2012 (2012)

    Google Scholar 

  8. Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD 1990), pp. 322–331. ACM, New York (1990)

    Google Scholar 

  9. Neis, P., Zipf, A.: Analyzing the contributor activity of a volunteered geographic information project—the case of OpenStreetMap. ISPRS Int. J. Geo-Inf. 1(2), 146–165 (2012)

    Article  Google Scholar 

  10. Fox, A., Eichelberger, C., Hughes, J., Lyon, S.: Spatio-temporal indexing in non-relational distributed databases. In: Proceedings of the IEEE International Conference on Big Data, October 2013

    Google Scholar 

  11. Armbrust, M., Xin, R.S., Lian, C., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ACM (2015)

    Google Scholar 

  12. OpenGIS® Implementation Standard for Geographic information - Simple feature access - Part 2: SQL option. 2010 Open Geospatial Consortium, Inc

    Google Scholar 

  13. Java Topology Suite. http://www.vividsolutions.com/jts/main.htm

  14. The GeoJSON Format Specification. http://geojson.org/geojson-spec.html

  15. Yu, J., Wu, J., Sarwat, M.: Geospark: a cluster computing framework for processing large-scale spatial data. In: SIGSPATIAL International Conference (2015)

    Google Scholar 

  16. R project for statistical computing. http://www.r-project.org

  17. Armbrust, M., Xin, R.S., Lian, C.: Spark SQL: relational data processing in spark. In: SIGMOD Conference (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Meng, Q., Ma, X., Lu, W., Yao, Z. (2017). A Spatial SQL Based on SparkSQL. In: Yuan, H., Geng, J., Bian, F. (eds) Geo-Spatial Knowledge and Intelligence. GRMSE 2016. Communications in Computer and Information Science, vol 698. Springer, Singapore. https://doi.org/10.1007/978-981-10-3966-9_50

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3966-9_50

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3965-2

  • Online ISBN: 978-981-10-3966-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics