skip to main content
10.1145/2676536.2676539acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

Haggis: turbocharge a MapReduce based spatial data warehousing system with GPU engine

Published: 04 November 2014 Publication History

Abstract

Spatial query processing involves complex multidimensional objects and compute intensive spatial operations, and therefore requires a high performance approach to meet the rapid data analytics requirements of modern spatial applications. Recently, MapReduce based spatial query systems have become a viable solution for many data intensive query tasks, and gained widespread adoption in both academia and industry. At the same time, GPUs have been successfully utilized in many applications that require high performance computation. Both approaches, GPU and MapReduce, have their own limitations and advantages, and have been separately utilized in spatial query processing tasks to boost application performance. However, it is unclear that how MapReduce and GPU, two vastly different parallelization techniques, can be fused together to effectively deal with the spatial big data challenges. In this paper, we explore such synergy of parallelization techniques for large scale spatial query processing. We extend Hadoop-GIS, a MapReduce based spatial query system, and provide GPU accelerated spatial query processing capability at the engine level. We evaluate the system on a real world dataset, and demonstrate that GPU accelerated system can gain considerable performance improvements. We also show how other factors such as partition granularity, task scheduling between CPU and GPU can impact the query performance.

References

[1]
http://esri.github.io/gis-tools-for-hadoop.
[2]
http://www.nvidia.com/docs/IO/43395/NV_DS_Tesla_M2050_M2070_Apr10_LowRes.pdf.
[3]
Spatialhadoop. http://spatialhadoop.cs.umn.edu/.
[4]
A. Aji. High performance spatial query processing for large scale scientific data. In Proceedings of the on SIGMOD/PODS 2012 PhD Symposium, pages 9--14. ACM, 2012.
[5]
A. Aji, X. Sun, H. Vo, Q. Liu, R. Lee, X. Zhang, J. Saltz, and F. Wang. Demonstration of hadoop-gis: A spatial data warehousing system over mapreduce. In SIGSPATIAL/GIS, pages 518--521. ACM, 2013.
[6]
A. Aji, F. Wang, and J. H. Saltz. Towards Building A High Performance Spatial Query System for Large Scale Medical Imaging Data. In SIGSPATIAL/GIS, pages 309--318. ACM, 2012.
[7]
A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. Saltz. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce. Proc. VLDB Endow., 6(11):1009--1020, Aug. 2013.
[8]
A. Akdogan, U. Demiryurek, F. Banaei-Kashani, and C. Shahabi. Voronoi-based geospatial query processing with mapreduce. In CLOUDCOM, pages 9--16, 2010.
[9]
C. Augonnet, O. Aumage, N. Furmento, R. Namyst, and S. Thibault. StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators. In S. B. Jesper Larsson Träff and J. Dongarra, editors, The 19th European MPI Users' Group Meeting (EuroMPI 2012), volume 7490 of LNCS, Vienna, Autriche, 2012. Springer.
[10]
S. Borkar and A. A. Chien. The future of microprocessors. Commun. ACM, 54(5):67--77, 2011.
[11]
A. Cary, Z. Sun, V. Hristidis, and N. Rishe. Experiences on processing spatial data with mapreduce. In SSDBM, pages 302--319, 2009.
[12]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008.
[13]
A. Eldawy and M. F. Mokbel. A demonstration of spatialhadoop: an efficient mapreduce framework for spatial data. Proceedings of the VLDB Endowment, 6(12):1230--1233, 2013.
[14]
H. Gupta, B. Chawda, S. Negi, T. A. Faruquie, L. V. Subramaniam, and M. Mohania. Processing multi-way spatial joins on map-reduce. In EDBT, pages 113--124, 2013.
[15]
T. D. R. Hartley, E. Saule, and Ü. V. Çatalyürek. Automatic dataflow application tuning for heterogeneous systems. In International Conference on High Performance Computing (HiPC), pages 1--10, 2010.
[16]
B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: a mapreduce framework on graphics processors. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 260--269. ACM, 2008.
[17]
B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: A MapReduce Framework on Graphics Processors. In Parallel Architectures and Compilation Techniques, 2008.
[18]
I. Kamel and C. Faloutsos. Hilbert r-tree: An improved r-tree using fractals. In VLDB, pages 500--509, 1994.
[19]
M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. Merge: a programming model for heterogeneous multi-core systems. SIGPLAN Not., 43(3):287--296, 2008.
[20]
J. Lu and R. H. Guting. Parallel secondo: Practical and efficient mobility data processing in the cloud. In Big Data, pages 107--25. IEEE, 2013.
[21]
J. Orenstein. A comparison of spatial query processing techniques for native and parameter spaces. In ACM SIGMOD Record, volume 19, pages 343--352. ACM, 1990.
[22]
S. Puri and S. K. Prasad. Mpi-gis: New parallel overlay algorithm and system prototype. 2014.
[23]
S. Ray, B. Simion, A. D. Brown, and R. Johnson. A parallel spatial data analysis infrastructure for the cloud. In SIGSPATIAL, pages 274--283. ACM, 2013.
[24]
C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: operating system abstractions to manage GPUs as compute devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 233--248, 2011.
[25]
B. Simion, S. Ray, and A. D. Brown. Surveying the landscape: an in-depth analysis of spatial database workloads. In SIGSPATIAL, pages 376--385. ACM, 2012.
[26]
G. Teodoro, T. Hartley, U. Catalyurek, and R. Ferreira. Optimizing dataflow applications on heterogeneous environments. Cluster Computing, 15:125--144, 2012.
[27]
G. Teodoro, R. Sachetto, O. Sertel, M. Gurcan, W. M. Jr., U. Catalyurek, and R. Ferreira. Coordinating the Use of GPU and CPU for Improving Performance of Compute Intensive Applications. In IEEE Cluster, pages 1--10, 2009.
[28]
G. Teodoro, E. Valle, N. Mariano, R. Torres, J. Meira, Wagner, and J. Saltz. Approximate similarity search for online multimedia services on distributed CPU-GPU platforms. The VLDB Journal, pages 1--22, 2013.
[29]
K. Wang, Y. Huai, R. Lee, F. Wang, X. Zhang, and J. H. Saltz. Accelerating pathology image data cross-comparison on cpu-gpu hybrid systems. Proc. VLDB Endow., 5(11):1543--1554, 2012.
[30]
S. You, J. Zhang, and L. Gruenwald. Parallel spatial query processing on gpus using r-trees. In ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, BigSpatial '13, pages 23--31, 2013.
[31]
J. Zhang and S. You. Speeding up large-scale point-in-polygon test based spatial join on gpus. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, pages 23--32. ACM, 2012.
[32]
S. Zhang, J. Han, Z. Liu, K. Wang, and Z. Xu. Sjmr: Parallelizing spatial join with mapreduce on clusters. In CLUSTER, 2009.
[33]
Y. Zhong, J. Han, T. Zhang, Z. Li, J. Fang, and G. Chen. Towards parallel spatial query processing for big spatial data. In IPDPSW, pages 2085--2094, 2012.

Cited By

View all
  • (2024)Streamlining trajectory map-matching: a framework leveraging spark and GPU-based stream processingInternational Journal of Geographical Information Science10.1080/13658816.2024.233722538:6(1158-1178)Online publication date: 9-Apr-2024
  • (2021)The art of balanceProceedings of the VLDB Endowment10.14778/3476311.347637814:12(2999-3013)Online publication date: 28-Oct-2021
  • (2020)Accelerating Spatial Cross-Matching on CPU-GPU Hybrid Platform With CUDA and OpenACCFrontiers in Big Data10.3389/fdata.2020.000143Online publication date: 8-May-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BigSpatial '14: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
November 2014
69 pages
ISBN:9781450331326
DOI:10.1145/2676536
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU
  2. MapReduce
  3. load balancing
  4. spatial data partition
  5. spatial query processing

Qualifiers

  • Research-article

Funding Sources

Conference

SIGSPATIAL '14
Sponsor:

Acceptance Rates

BigSpatial '14 Paper Acceptance Rate 8 of 13 submissions, 62%;
Overall Acceptance Rate 32 of 58 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Streamlining trajectory map-matching: a framework leveraging spark and GPU-based stream processingInternational Journal of Geographical Information Science10.1080/13658816.2024.233722538:6(1158-1178)Online publication date: 9-Apr-2024
  • (2021)The art of balanceProceedings of the VLDB Endowment10.14778/3476311.347637814:12(2999-3013)Online publication date: 28-Oct-2021
  • (2020)Accelerating Spatial Cross-Matching on CPU-GPU Hybrid Platform With CUDA and OpenACCFrontiers in Big Data10.3389/fdata.2020.000143Online publication date: 8-May-2020
  • (2019)Hierarchical Filter and Refinement System Over Large Polygonal Datasets on CPU-GPU2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC.2019.00027(141-151)Online publication date: Dec-2019
  • (2019)Optimizing parameter sensitivity analysis of large‐scale microscopy image analysis workflows with multilevel computation reuseConcurrency and Computation: Practice and Experience10.1002/cpe.540332:2Online publication date: 24-Jun-2019
  • (2018)Accelerating Cross-Matching Operation of Geospatial Datasets using a CPU-GPU Hybrid Platform2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622600(3402-3411)Online publication date: Dec-2018
  • (2017)A Spatial Join Algorithm Based on a Non-uniform Grid Technique over GPGPUProceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems10.1145/3139958.3140056(1-4)Online publication date: 7-Nov-2017
  • (2017)Towards GPU-Accelerated Web-GIS for Query-Driven Visual ExplorationWeb and Wireless Geographical Information Systems10.1007/978-3-319-55998-8_8(119-136)Online publication date: 22-Mar-2017
  • (2017)Polygonal Overlay Computation on Cloud, Hadoop, and MPIEncyclopedia of GIS10.1007/978-3-319-17885-1_1574(1598-1606)Online publication date: 12-May-2017
  • (2017)Medical Image Dataset Processing over Cloud/MapReduce with Heterogeneous ArchitecturesEncyclopedia of GIS10.1007/978-3-319-17885-1_1571(1206-1215)Online publication date: 12-May-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media