Skip to main content

ST-Hadoop: A MapReduce Framework for Spatio-Temporal Data

  • Conference paper
  • First Online:
Book cover Advances in Spatial and Temporal Databases (SSTD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10411))

Included in the following conference series:

Abstract

This paper presents ST-Hadoop; the first full-fledged open-source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop is a comprehensive extension to Hadoop and SpatialHadoop that injects spatio-temporal data awareness inside each of their layers, mainly, language, indexing, and operations layers. In the language layer, ST-Hadoop provides built in spatio-temporal data types and operations. In the indexing layer, ST-Hadoop spatiotemporally loads and divides data across computation nodes in Hadoop Distributed File System in a way that mimics spatio-temporal index structures, which result in achieving orders of magnitude better performance than Hadoop and SpatialHadoop when dealing with spatio-temporal data and queries. In the operations layer, ST-Hadoop shipped with support for two fundamental spatio-temporal queries, namely, spatio-temporal range and join queries. Extensibility of ST-Hadoop allows others to expand features and operations easily using similar approach described in the paper. Extensive experiments conducted on large-scale dataset of size 10 TB that contains over 1 Billion spatio-temporal records, to show that ST-Hadoop achieves orders of magnitude better performance than Hadoop and SpaitalHadoop when dealing with spatio-temporal data and operations. The key idea behind the performance gained in ST-Hadoop is its ability in indexing spatio-temporal data within Hadoop Distributed File System.

This work is partially supported by the National Science Foundation, USA, under Grants IIS-1525953, CNS-1512877, IIS-1218168, and by a scholarship from the College of Computers & Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. NYC Taxi and Limousine Commission (2017). http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

  2. (2017). https://about.twitter.com/company

  3. Land Process Distributed Active Archive Center, March 2017. https://lpdaac.usgs.gov/about

  4. Data from NASA’s Missions, Research, and Activities (2017). http://www.nasa.gov/open/data.html

  5. European XFEL: The Data Challenge, September 2012. http://www.xfel.eu/news/2012/the_data_challenge

  6. Apache. Hadoop. http://hadoop.apache.org/

  7. Apache. Spark. http://spark.apache.org/

  8. Whitman, R.T., Park, M.B., Ambrose, S.A., Hoel, E.G.: Spatial indexing and analytics on hadoop. In: SIGSPATIAL (2014)

    Google Scholar 

  9. Lu, J., Guting, R.H.: Parallel secondo: boosting database engines with hadoop. In: ICPADS (2012)

    Google Scholar 

  10. Nishimura, S., Das, S., Agrawal, D., El Abbadi, A.: \({\cal{MD}}\)-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. DAPD 31, 289–319 (2013)

    Google Scholar 

  11. Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop-GIS: a high performance spatial data warehousing system over mapreduce. In: VLDB (2013)

    Google Scholar 

  12. Kini, A., Emanuele, R.: Geotrellis: adding geospatial capabilities to spark (2014). http://spark-summit.org/2014/talk/geotrellis-adding-geospatial-capabilities-to-spark

  13. Yu, J., Wu, J., Sarwat, M.: GeoSpark: a cluster computing framework for processing large-scale spatial data. In: SIGSPATIAL (2015)

    Google Scholar 

  14. Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE (2015)

    Google Scholar 

  15. Ma, Q., Yang, B., Qian, W., Zhou, A.: Query processing of massive trajectory data based on MapReduce. In: CLOUDDB (2009)

    Google Scholar 

  16. Tan, H., Luo, W., Ni, L.M.: Clost: a hadoop-based storage system for big spatio-temporal data analytics. In: CIKM (2012)

    Google Scholar 

  17. Li, Z., Hu, F., Schnase, J.L., Duffy, D.Q., Lee, T., Bowen, M.K., Yang, C.: A spatiotemporal indexing approach for efficient processing of big array-based climate data with mapreduce. Int. J. Geograph. Inf. Sci. IJGIS 31, 17–35 (2017)

    Article  Google Scholar 

  18. Eldawy, A., Mokbel, M.F., Alharthi, S., Alzaidy, A., Tarek, K., Ghani, S.: SHAHED: a MapReduce-based system for querying and visualizing Spatio-temporal satellite data. In: ICDE (2015)

    Google Scholar 

  19. ST-Hadoop website. http://st-hadoop.cs.umn.edu/

  20. Eldawy, A., Mokbel, M.F.: Pigeon: a spatial mapreduce language. In: ICDE (2014)

    Google Scholar 

  21. Han, W., Kim, J., Lee, B.S., Tao, Y., Rantzau, R., Markl, V.: Cost-based predictive spatiotemporal join. TKDE 21, 220–233 (2009)

    Google Scholar 

  22. Al-Naami, K.M., Seker, S.E., Khan, L.: GISQF: an efficient spatial query processing system. In: CLOUDCOM (2014)

    Google Scholar 

  23. Fries, S., Boden, B., Stepien, G., Seidl, T.: PHiDJ: parallel similarity self-join for high-dimensional vector data with mapreduce. In: ICDE (2014)

    Google Scholar 

  24. Stonebraker, M., Brown, P., Zhang, D., Becla, J.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15, 54–62 (2013)

    Article  Google Scholar 

  25. Zhang, X., Ai, J., Wang, Z., Lu, J., Meng, X.: An efficient multi-dimensional index for cloud data management. In: CIKM (2009)

    Google Scholar 

  26. Wang, G., Salles, M., Sowell, B., Wang, X., Cao, T., Demers, A., Gehrke, J., White, W.: Behavioral simulations in MapReduce. PVLDB 3, 952–963 (2010)

    Google Scholar 

  27. Lu, P., Chen, G., Ooi, B.C., Vo, H.T., Wu, S.: ScalaGiST: scalable generalized search trees for MapReduce systems. PVLDB 7, 1797–1808 (2014)

    Google Scholar 

  28. Fox, A.D., Eichelberger, C.N., Hughes, J.N., Lyon, S.: Spatio-temporal indexing in non-relational distributed databases. In: BIGDATA (2013)

    Google Scholar 

  29. GeoWave. https://ngageoint.github.io/geowave/

  30. Accumulo. https://accumulo.apache.org/

  31. Erwig, M., Schneider, M.: Spatio-temporal predicates. In: TKDE (2002)

    Google Scholar 

  32. Pavlo, A., Paulson, E., Rasin, A., Abadi, D., DeWitt, D., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD (2009)

    Google Scholar 

  33. Lo, M.L., Ravishankar, C.V.: Spatial hash-joins. In: SIGMODR (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Louai Alarabi or Mohamed F. Mokbel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Alarabi, L., Mokbel, M.F., Musleh, M. (2017). ST-Hadoop: A MapReduce Framework for Spatio-Temporal Data. In: Gertz, M., et al. Advances in Spatial and Temporal Databases. SSTD 2017. Lecture Notes in Computer Science(), vol 10411. Springer, Cham. https://doi.org/10.1007/978-3-319-64367-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64367-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64366-3

  • Online ISBN: 978-3-319-64367-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics