skip to main content
column

A Survey of Traditional and MapReduceBased Spatial Query Processing Approaches

Authors Info & Claims
Published:01 September 2017Publication History
Skip Abstract Section

Abstract

Various indexing methods of spatial data have come out after rigorous efforts put by many researchers for fast processing of spatial queries. Parallelizing spatial index building and query processing have become very popular for improving efficiency. The MapReduce framework provides a modern way of parallel processing. A MapReduce-based works for spatial queries consider the existing traditional spatial indexing for building spatial indexes in parallel. The majority of the spatial indexes implemented in MapReduce use R-Tree and its variants. Therefore, R-Tree and its variantbased traditional spatial indexes are thoroughly surveyed in the paper. The objective is to search for still less explored spatial indexing approaches, having the potential for parallelism in MapReduce. The review work also provides a detailed survey of MapReduce-based spatial query processing approaches - hierarchical indexed and packed key-value storage based spatial dataset. Both approaches use different data partitioning strategies for distributing data among cluster nodes and managing the partitioned dataset through different indexing. Finally, a number of parameters are selected for comparison and analysis of all the existing approaches in the literature.

References

  1. Hadoop. In http://hadoop.apache.org.Google ScholarGoogle Scholar
  2. HBase. In http://hbase.apache.org.Google ScholarGoogle Scholar
  3. OGC. In http://www.opengis.orgltechno.Google ScholarGoogle Scholar
  4. Performance Measurement of a Hadoop Cluster. In http://www.acma.com/acma/pdfs /AMAX Emulex Hadoop Whitepaper.pdf.Google ScholarGoogle Scholar
  5. R-Tree. In http://en.wikipedia.org/wiki/R-tree.Google ScholarGoogle Scholar
  6. D. Achakeev, M. Seidemann, M. Schmidt, and B. Seeger. Sort-Based Parallel Loading of R-Trees. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, pages 62--70, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Aji and F. Wang. High Performance Spatial Query Processing for Large Scale Scientific Data. In Proceedings of the SIGMODPODS PhD Symposium, pages 9--14, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. Saltz. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce. Proceedings of the VLDB Endowment, 6(11):1009--1020, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. M. Arasanal and D. U. Rumani. Improving MapReduce Performance through Complexity and Performance Based Data Placement in Heterogeneous Hadoop Clusters. In Proceedings of the International Conference Distributed Computing and Internet Technology, pages 115--125, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  10. L. Arge, M. de Berg, H. J. Haverkort, and K. Yi. The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree. ACM Transactions on Algorithms, 4(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. ACM SIGMOD Record, 19(2):322--331, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Beckmann and B. Seegar. A Revised R*-Tree in Comparison with Related Index Structures. In Proceedings of the ACM SIGMOD international conference on Management of data, pages 799--812, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Blanas, J. M. Patel, V. Ercegovac, J. Rao, E. J. Shekita, and Y. Tian. A Comparison of Join Algorithms for Log Processing in MapReduce. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 975--986, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Bozanis and P. Foteinos. WeR-Trees. Data and Knowledge Engineering, 63(2):397--413, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Brakatsoulas, D. Pfoser, and Y. Theodoridis. Revisiting R-Tree Construction Principles. In Proceedings of the 6th Springer East European Conference on Advances in Databases and Information System, pages 149--162, 2002. Google ScholarGoogle ScholarCross RefCross Ref
  16. T. Brinkhoff, H.-P. Kriegel, and B. Seeger. Efficient Processing of Spatial Joins Using R-Trees. ACM SIGMOD Record, 22(2):237--246, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Cary, Y. Yesha, M. Adjouadi, and N. Rishe. Leveraging Cloud Computing in Geodatabase Management. In Proceedings of the IEEE International Conference on Granular Computing, pages 73--78, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Cary, Zhengguo, V. Hristidis, and N. Rishe. Experiences on Processing Spatial Data with MapReduce. In Proceedings of the 21st International Conference on Scientific and Statistical Database Management, pages 302--319, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable-A Distributed Storage System for Structured Data. ACM Transactions on Computer Systems (TOCS), 26(2):1--26, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Doulkeridis and K. Norvag. A Survey of Large-Scale Analytical Query Processing in MapReduce. VLDB Journal, 23(3):355--380, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Eldawy, Y. Li, M. F. Mokbel, and R. Janardan. CG Hadoop: Computational Geometry in MapReduce. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 294--303, 2013.Google ScholarGoogle Scholar
  22. A. Eldawy and M. F. Mokbel. SpatialHadoop. In http://spatialhadoop.cs.umn.edu/.Google ScholarGoogle Scholar
  23. A. Eldawy and M. F. Mokbel. A Demonstration of SpatialHadoop: An Efficient MapReduce Framework for Spatial Data. VLDB Journal, 6(12):1230--1233, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Eldawy and M. F. Mokbel. The Ecosystem of SpatialHadoop. SIGSPATIAL Special, 6(3):3--10, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. J. Garcia, M. A. Lopez, and S. T. Leutenegger. A Greedy Algorithm for Bulk Loading R-Trees. In Proceddings of the 6th ACM international symposium on Advances in geographic information system, pages 163--164, 1998.Google ScholarGoogle Scholar
  26. D. Gavrila. R-Tree Index Optimization. In Proceedings of the 6th International Symposium on Spatial Data Handling, pages 771--791, 1994.Google ScholarGoogle Scholar
  27. H. Gupta, B. Chawda, S. Negi, T. A. Faruquie, and L. Subramanium. Processing Multi-Way Spatial Joins on MapReduce. In Proceedings of the 16th International Conference on Extending Database Technology, pages 113--124, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. H. Guting. An Introduction to Spatial Database Systems. VLDB Journal, 3(4):357--399, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching. ACM SIGMOD Record, 14(2):47--57, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. B. Hedlund. Understanding Hadoop Clusters and the Network. In http://bradhedlund.com/2011/09/10/ understanding-hadoop-clusters-and-the-network/.Google ScholarGoogle Scholar
  31. D. A. Heger. Hadoop Design, Architecture and MapReduce Performance. In http://www.datanubes.com/mediac/HadoopArchPerfDHT.pdf.Google ScholarGoogle Scholar
  32. E. Hoel and H. Samet. A Qualitative Comparison Study of Data Structures for Large Line Segment Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 205--214, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. E. G. Hoel and H. Samet. Performance of Data-Parallel Spatial Operations. In Proceedings of the 20th International Conference on very Large Data Bases, pages 156--167, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Hwang, K. Kwon, S. K. Cha, and B. S. Lee. Performance Evaluation of Main-Memory R-Tree Variants. In Proceedings of the International Symposium on Advances in Spatial and Temporal Databases, pages 10--27, 2003. Google ScholarGoogle ScholarCross RefCross Ref
  35. C. F. Ibrahim Kamel. Parallel R-Trees. ACM SIGMOD Record, 21(2):195--204, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jens, Dittrich, Jorge-Arnulfo, Quiane-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing). Proceedings of the VLDB Endowment, 3(1-2):515--529, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Jiang, B. C. Ooi, L. Shi, and S. Wu. The Performance of MapReduce: An In-depth Study. Proceedings of the VLDB Endowment, 3(1-2):472--483, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. F. Jun, T. Zhixian, W. Mian, and X. Liming. HQ-Tree: A Distributed Spatial Index Based on Hadoop. China communications, 11(7):128--141, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  39. I. Kamel and C. Faloutsos. On packing R-trees. In Proceedings of the 2nd International Conference on Information and Knowledge Management, pages 490--499, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. I. Kamel and C. Faloutsos. Hilbert R-Tree: An Improved R-tree Using Fractals. In Proceedings of the 20th International Conference on Very Large Data Bases, pages 500--509, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. D. Keim, B. Bustos, S. Berchtold, and H.-P. Kreigel. Indexing, X-tree. 2008.Google ScholarGoogle Scholar
  42. K. Kim, S. K. Cha, and K. Kwon. Optimizing Multidimensional Index Trees for Main Memory Access. ACM SIGMOD Record, 30(2):139--150, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Lakshman and P. Malik. Cassandra-A Decentralized Structured Storage System. In ACM SIGOPS Operating Systems Review, pages 35--40, 2010.Google ScholarGoogle Scholar
  44. J. Lawder and P. King. Using Space-Filling Curves for Multi-Dimensional Indexing. In Proceedings of the 17th British National Conference on Databases: Advances in Databases, pages 20--35, 2000. Google ScholarGoogle ScholarCross RefCross Ref
  45. K.-H. Lee, Y.-J. Lee, H. Choi, Y. D. Chung, and B. Moon. Parallel Data Processing with MapReduce: A Survey. ACM SIGMOD Record, 40(4):11--20, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. S. Leutenegger, M. Lopez, and J. Edgington. STR: A Simple and Efficient Algorithm for R-Tree Packing. In Proceedings of the 13th IEEE International Conference on Data Engineering, pages 497--506, 1997. Google ScholarGoogle ScholarCross RefCross Ref
  47. H. Liao, J. Han, and J. Fang. Multi-Dimensional Index on Hadoop Distributed File System. In Proceedings of the 5th IEEE International Conference on Networking, Architecture, and Storage, pages 240--249, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. X. Liu, J. Han, Y. Zhong, C. Han, and X. He. Implementing WebGIS on Hadoop: A Case Study of Improving Small File I/O Performance on HDFS. In Proceedings of the IEEE International Conference on Cluster Computing and Workshops, pages 1--8, 2009. Google ScholarGoogle ScholarCross RefCross Ref
  49. Y. Liu, N. Jing, L. Chen, and H. Chen. Parallel Bulk-Loading of Spatial Data with MapReduce: An R-Tree Case. Wuhan University Journal of Natural Sciences, 16(6):513--519, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  50. M.-L. Lo and C. V. Ravishankar. Spatial Joins Using Seeded Trees. ACM SIGMOD Record, 23(2):209--220, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. Lu and R. H. Guting. Parallel Secondo-Boosting Database Engines with Hadoop. In Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, pages 738--743, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Q. Ma, B. Yang, W. Qian, and A. Zhou. Query Processing of Massive Trajectory Data Based on MapReduce. In Proceedings of the 1st International Workshop on Cloud Data Management, pages 9--16, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Y. Manolopoulos, A. Nanopoulos, A. N. Papadopoulos, and Y. Theodoridis. R-Trees: Theory and Applications. 2006.Google ScholarGoogle Scholar
  54. S. Nishimura, S. Das, D. Agarwal, and A. E. Abbadi. MD-HBase, Design and Implementation of An Elastic Data Infrastructure for Cloud-Based Location Services. Distributed Parallel Databases, 31:289--319, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. A. Papadopoulos and Y. Manolopoulos. Parallel Bulk-Loading of Satial Data. Parallel Computing, 29(10):1419--1444, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. J. M. Patel and D. J. DeWitt. Partition Based SpatialMerge Join. ACM SIGMOD Record, 25(2):259--270, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. A. Pavlo, E. Paulson, A.Rasin, D. abadi, D. DeWitt, S. Madden, and M. S. braker. A Comparison of Approaches to Large-Scale Data Analysis. In Proceedings of the 35th ACM SIGMOD International Conference on Management of Data, pages 165--178, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. J. Rao and K. A. Ross. Making B+- Trees Cache Conscious in Main Memory. ACM SIGMOD Record, 29(2):475--486, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. N. Roussopoulos and D. Leifker. Direct Spatial Search on Pictorial Databases Using Packed R-Trees. ACM SIGMOD Record, 14(4):17--31, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. H. Samet. The Design and Analysis of Spatial Data Structures. 1990.Google ScholarGoogle Scholar
  61. B. Schnitzer and S. T. Leutenegger. Master-Client R-Trees: A New parallel R-Tree Architecture. In Proceedings of the 11th IEEE International Conference on Scientific and Statistical Database Management, pages 68--77, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. T. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. In Proceedings of the 13th International Conference on Very Large Data Bases, pages 507--518, 1987.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. S. Shekhar and S. Chawla. Spatial Databases-A Tour. 2003.Google ScholarGoogle Scholar
  64. K.-L. Tan, B. C. Ooi, and D. J. Abel. Exploiting Spatial Indexes for Semijoin-Based Join Processing in Distributed Spatial Database. IEEE Transactions on Knowledge and Data Engineering, 12(6):920--937, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. K. Wang, J. Han, B. Tu, J. Dai, W. Zhou, and X. Song. Accelerating Spatial Data Processing with MapReduce. In Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, pages 229--236, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Y. Wang and S. Weng. Research and Implementation on Spatial Data Storage and Operation Based on Hadoop Platform. In Proceedings of the 2nd IITA International Conference on Geoscience and Remote Sensing, pages 275--278, 2010.Google ScholarGoogle Scholar
  67. X. Wu and C. Zang. A New Spatial Index Structure for GIS Data. In Proceedings of the 3rd IEEE International Conference on Multimedia and Ubiquitous Engineering, pages 471--476, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. L. Xun and Z. Wenfeng. Parallel Spatial Index Algorithm based on Hilbert Partition. In Proceedings of the IEEE International Conference on Computational and Information Sciences, pages 876--879, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. C. Zhang, F. Li, and J. Jestes. Efficient Parallel kNN Joins for Large Data in MapReduce. In Proceedings of the 15th International Conference on Extending Database Technology, pages 38--49, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. S. Zhang, J. Han, Z. Liu, K. Hwang, and Z. Xu. SJMR: Parallelizing Spatial Join with MapReduce on Clusters. In Proceedings of the IEEE International Conference on Cluster Computing and Workshops, pages 1--8, 2009. Google ScholarGoogle ScholarCross RefCross Ref
  71. K. Zheng and Y. Fu. Research on Vector Spatial Data Storage Schema Based on Hadoop Platform. International Journal of Database Theory and Application, 6(5):85--94, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  72. Y. Zhong, J. Han, T. Zhang, Z. Li, J. Fang, and G. Chen. Towards Parallel Spatial Query Processing for Big Spatial Data. In Proceedings of the IEEE 26th International Conference on Parallel and Distributed Processing, pages 2085--2094, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader