Abstract
Various indexing methods of spatial data have come out after rigorous efforts put by many researchers for fast processing of spatial queries. Parallelizing spatial index building and query processing have become very popular for improving efficiency. The MapReduce framework provides a modern way of parallel processing. A MapReduce-based works for spatial queries consider the existing traditional spatial indexing for building spatial indexes in parallel. The majority of the spatial indexes implemented in MapReduce use R-Tree and its variants. Therefore, R-Tree and its variantbased traditional spatial indexes are thoroughly surveyed in the paper. The objective is to search for still less explored spatial indexing approaches, having the potential for parallelism in MapReduce. The review work also provides a detailed survey of MapReduce-based spatial query processing approaches - hierarchical indexed and packed key-value storage based spatial dataset. Both approaches use different data partitioning strategies for distributing data among cluster nodes and managing the partitioned dataset through different indexing. Finally, a number of parameters are selected for comparison and analysis of all the existing approaches in the literature.
- Hadoop. In http://hadoop.apache.org.Google Scholar
- HBase. In http://hbase.apache.org.Google Scholar
- OGC. In http://www.opengis.orgltechno.Google Scholar
- Performance Measurement of a Hadoop Cluster. In http://www.acma.com/acma/pdfs /AMAX Emulex Hadoop Whitepaper.pdf.Google Scholar
- R-Tree. In http://en.wikipedia.org/wiki/R-tree.Google Scholar
- D. Achakeev, M. Seidemann, M. Schmidt, and B. Seeger. Sort-Based Parallel Loading of R-Trees. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, pages 62--70, 2012. Google ScholarDigital Library
- A. Aji and F. Wang. High Performance Spatial Query Processing for Large Scale Scientific Data. In Proceedings of the SIGMODPODS PhD Symposium, pages 9--14, 2012. Google ScholarDigital Library
- A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. Saltz. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce. Proceedings of the VLDB Endowment, 6(11):1009--1020, 2013. Google ScholarDigital Library
- R. M. Arasanal and D. U. Rumani. Improving MapReduce Performance through Complexity and Performance Based Data Placement in Heterogeneous Hadoop Clusters. In Proceedings of the International Conference Distributed Computing and Internet Technology, pages 115--125, 2013. Google ScholarCross Ref
- L. Arge, M. de Berg, H. J. Haverkort, and K. Yi. The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree. ACM Transactions on Algorithms, 4(1), 2008. Google ScholarDigital Library
- N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. ACM SIGMOD Record, 19(2):322--331, 1990. Google ScholarDigital Library
- N. Beckmann and B. Seegar. A Revised R*-Tree in Comparison with Related Index Structures. In Proceedings of the ACM SIGMOD international conference on Management of data, pages 799--812, 2009. Google ScholarDigital Library
- S. Blanas, J. M. Patel, V. Ercegovac, J. Rao, E. J. Shekita, and Y. Tian. A Comparison of Join Algorithms for Log Processing in MapReduce. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 975--986, 2010. Google ScholarDigital Library
- P. Bozanis and P. Foteinos. WeR-Trees. Data and Knowledge Engineering, 63(2):397--413, 2007. Google ScholarDigital Library
- S. Brakatsoulas, D. Pfoser, and Y. Theodoridis. Revisiting R-Tree Construction Principles. In Proceedings of the 6th Springer East European Conference on Advances in Databases and Information System, pages 149--162, 2002. Google ScholarCross Ref
- T. Brinkhoff, H.-P. Kriegel, and B. Seeger. Efficient Processing of Spatial Joins Using R-Trees. ACM SIGMOD Record, 22(2):237--246, 1993. Google ScholarDigital Library
- A. Cary, Y. Yesha, M. Adjouadi, and N. Rishe. Leveraging Cloud Computing in Geodatabase Management. In Proceedings of the IEEE International Conference on Granular Computing, pages 73--78, 2010. Google ScholarDigital Library
- A. Cary, Zhengguo, V. Hristidis, and N. Rishe. Experiences on Processing Spatial Data with MapReduce. In Proceedings of the 21st International Conference on Scientific and Statistical Database Management, pages 302--319, 2009. Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable-A Distributed Storage System for Structured Data. ACM Transactions on Computer Systems (TOCS), 26(2):1--26, 2008. Google ScholarDigital Library
- C. Doulkeridis and K. Norvag. A Survey of Large-Scale Analytical Query Processing in MapReduce. VLDB Journal, 23(3):355--380, 2013. Google ScholarDigital Library
- A. Eldawy, Y. Li, M. F. Mokbel, and R. Janardan. CG Hadoop: Computational Geometry in MapReduce. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 294--303, 2013.Google Scholar
- A. Eldawy and M. F. Mokbel. SpatialHadoop. In http://spatialhadoop.cs.umn.edu/.Google Scholar
- A. Eldawy and M. F. Mokbel. A Demonstration of SpatialHadoop: An Efficient MapReduce Framework for Spatial Data. VLDB Journal, 6(12):1230--1233, 2013. Google ScholarDigital Library
- A. Eldawy and M. F. Mokbel. The Ecosystem of SpatialHadoop. SIGSPATIAL Special, 6(3):3--10, 2014. Google ScholarDigital Library
- Y. J. Garcia, M. A. Lopez, and S. T. Leutenegger. A Greedy Algorithm for Bulk Loading R-Trees. In Proceddings of the 6th ACM international symposium on Advances in geographic information system, pages 163--164, 1998.Google Scholar
- D. Gavrila. R-Tree Index Optimization. In Proceedings of the 6th International Symposium on Spatial Data Handling, pages 771--791, 1994.Google Scholar
- H. Gupta, B. Chawda, S. Negi, T. A. Faruquie, and L. Subramanium. Processing Multi-Way Spatial Joins on MapReduce. In Proceedings of the 16th International Conference on Extending Database Technology, pages 113--124, 2013.Google ScholarDigital Library
- R. H. Guting. An Introduction to Spatial Database Systems. VLDB Journal, 3(4):357--399, 1994. Google ScholarDigital Library
- A. Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching. ACM SIGMOD Record, 14(2):47--57, 1984. Google ScholarDigital Library
- B. Hedlund. Understanding Hadoop Clusters and the Network. In http://bradhedlund.com/2011/09/10/ understanding-hadoop-clusters-and-the-network/.Google Scholar
- D. A. Heger. Hadoop Design, Architecture and MapReduce Performance. In http://www.datanubes.com/mediac/HadoopArchPerfDHT.pdf.Google Scholar
- E. Hoel and H. Samet. A Qualitative Comparison Study of Data Structures for Large Line Segment Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 205--214, 1992. Google ScholarDigital Library
- E. G. Hoel and H. Samet. Performance of Data-Parallel Spatial Operations. In Proceedings of the 20th International Conference on very Large Data Bases, pages 156--167, 1994.Google ScholarDigital Library
- S. Hwang, K. Kwon, S. K. Cha, and B. S. Lee. Performance Evaluation of Main-Memory R-Tree Variants. In Proceedings of the International Symposium on Advances in Spatial and Temporal Databases, pages 10--27, 2003. Google ScholarCross Ref
- C. F. Ibrahim Kamel. Parallel R-Trees. ACM SIGMOD Record, 21(2):195--204, 1992. Google ScholarDigital Library
- Jens, Dittrich, Jorge-Arnulfo, Quiane-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing). Proceedings of the VLDB Endowment, 3(1-2):515--529, 2010. Google ScholarDigital Library
- D. Jiang, B. C. Ooi, L. Shi, and S. Wu. The Performance of MapReduce: An In-depth Study. Proceedings of the VLDB Endowment, 3(1-2):472--483, 2010. Google ScholarDigital Library
- F. Jun, T. Zhixian, W. Mian, and X. Liming. HQ-Tree: A Distributed Spatial Index Based on Hadoop. China communications, 11(7):128--141, 2014. Google ScholarCross Ref
- I. Kamel and C. Faloutsos. On packing R-trees. In Proceedings of the 2nd International Conference on Information and Knowledge Management, pages 490--499, 1993. Google ScholarDigital Library
- I. Kamel and C. Faloutsos. Hilbert R-Tree: An Improved R-tree Using Fractals. In Proceedings of the 20th International Conference on Very Large Data Bases, pages 500--509, 1994.Google ScholarDigital Library
- D. Keim, B. Bustos, S. Berchtold, and H.-P. Kreigel. Indexing, X-tree. 2008.Google Scholar
- K. Kim, S. K. Cha, and K. Kwon. Optimizing Multidimensional Index Trees for Main Memory Access. ACM SIGMOD Record, 30(2):139--150, 2001. Google ScholarDigital Library
- A. Lakshman and P. Malik. Cassandra-A Decentralized Structured Storage System. In ACM SIGOPS Operating Systems Review, pages 35--40, 2010.Google Scholar
- J. Lawder and P. King. Using Space-Filling Curves for Multi-Dimensional Indexing. In Proceedings of the 17th British National Conference on Databases: Advances in Databases, pages 20--35, 2000. Google ScholarCross Ref
- K.-H. Lee, Y.-J. Lee, H. Choi, Y. D. Chung, and B. Moon. Parallel Data Processing with MapReduce: A Survey. ACM SIGMOD Record, 40(4):11--20, 2011. Google ScholarDigital Library
- S. Leutenegger, M. Lopez, and J. Edgington. STR: A Simple and Efficient Algorithm for R-Tree Packing. In Proceedings of the 13th IEEE International Conference on Data Engineering, pages 497--506, 1997. Google ScholarCross Ref
- H. Liao, J. Han, and J. Fang. Multi-Dimensional Index on Hadoop Distributed File System. In Proceedings of the 5th IEEE International Conference on Networking, Architecture, and Storage, pages 240--249, 2010. Google ScholarDigital Library
- X. Liu, J. Han, Y. Zhong, C. Han, and X. He. Implementing WebGIS on Hadoop: A Case Study of Improving Small File I/O Performance on HDFS. In Proceedings of the IEEE International Conference on Cluster Computing and Workshops, pages 1--8, 2009. Google ScholarCross Ref
- Y. Liu, N. Jing, L. Chen, and H. Chen. Parallel Bulk-Loading of Spatial Data with MapReduce: An R-Tree Case. Wuhan University Journal of Natural Sciences, 16(6):513--519, 2011. Google ScholarCross Ref
- M.-L. Lo and C. V. Ravishankar. Spatial Joins Using Seeded Trees. ACM SIGMOD Record, 23(2):209--220, 1994. Google ScholarDigital Library
- J. Lu and R. H. Guting. Parallel Secondo-Boosting Database Engines with Hadoop. In Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, pages 738--743, 2012. Google ScholarDigital Library
- Q. Ma, B. Yang, W. Qian, and A. Zhou. Query Processing of Massive Trajectory Data Based on MapReduce. In Proceedings of the 1st International Workshop on Cloud Data Management, pages 9--16, 2009. Google ScholarDigital Library
- Y. Manolopoulos, A. Nanopoulos, A. N. Papadopoulos, and Y. Theodoridis. R-Trees: Theory and Applications. 2006.Google Scholar
- S. Nishimura, S. Das, D. Agarwal, and A. E. Abbadi. MD-HBase, Design and Implementation of An Elastic Data Infrastructure for Cloud-Based Location Services. Distributed Parallel Databases, 31:289--319, 2013. Google ScholarDigital Library
- A. Papadopoulos and Y. Manolopoulos. Parallel Bulk-Loading of Satial Data. Parallel Computing, 29(10):1419--1444, 2013. Google ScholarDigital Library
- J. M. Patel and D. J. DeWitt. Partition Based SpatialMerge Join. ACM SIGMOD Record, 25(2):259--270, 1996. Google ScholarDigital Library
- A. Pavlo, E. Paulson, A.Rasin, D. abadi, D. DeWitt, S. Madden, and M. S. braker. A Comparison of Approaches to Large-Scale Data Analysis. In Proceedings of the 35th ACM SIGMOD International Conference on Management of Data, pages 165--178, 2009. Google ScholarDigital Library
- J. Rao and K. A. Ross. Making B+- Trees Cache Conscious in Main Memory. ACM SIGMOD Record, 29(2):475--486, 2000. Google ScholarDigital Library
- N. Roussopoulos and D. Leifker. Direct Spatial Search on Pictorial Databases Using Packed R-Trees. ACM SIGMOD Record, 14(4):17--31, 1985. Google ScholarDigital Library
- H. Samet. The Design and Analysis of Spatial Data Structures. 1990.Google Scholar
- B. Schnitzer and S. T. Leutenegger. Master-Client R-Trees: A New parallel R-Tree Architecture. In Proceedings of the 11th IEEE International Conference on Scientific and Statistical Database Management, pages 68--77, 1999. Google ScholarDigital Library
- T. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. In Proceedings of the 13th International Conference on Very Large Data Bases, pages 507--518, 1987.Google ScholarDigital Library
- S. Shekhar and S. Chawla. Spatial Databases-A Tour. 2003.Google Scholar
- K.-L. Tan, B. C. Ooi, and D. J. Abel. Exploiting Spatial Indexes for Semijoin-Based Join Processing in Distributed Spatial Database. IEEE Transactions on Knowledge and Data Engineering, 12(6):920--937, 2000. Google ScholarDigital Library
- K. Wang, J. Han, B. Tu, J. Dai, W. Zhou, and X. Song. Accelerating Spatial Data Processing with MapReduce. In Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, pages 229--236, 2010. Google ScholarDigital Library
- Y. Wang and S. Weng. Research and Implementation on Spatial Data Storage and Operation Based on Hadoop Platform. In Proceedings of the 2nd IITA International Conference on Geoscience and Remote Sensing, pages 275--278, 2010.Google Scholar
- X. Wu and C. Zang. A New Spatial Index Structure for GIS Data. In Proceedings of the 3rd IEEE International Conference on Multimedia and Ubiquitous Engineering, pages 471--476, 2009. Google ScholarDigital Library
- L. Xun and Z. Wenfeng. Parallel Spatial Index Algorithm based on Hilbert Partition. In Proceedings of the IEEE International Conference on Computational and Information Sciences, pages 876--879, 2013. Google ScholarDigital Library
- C. Zhang, F. Li, and J. Jestes. Efficient Parallel kNN Joins for Large Data in MapReduce. In Proceedings of the 15th International Conference on Extending Database Technology, pages 38--49, 2012. Google ScholarDigital Library
- S. Zhang, J. Han, Z. Liu, K. Hwang, and Z. Xu. SJMR: Parallelizing Spatial Join with MapReduce on Clusters. In Proceedings of the IEEE International Conference on Cluster Computing and Workshops, pages 1--8, 2009. Google ScholarCross Ref
- K. Zheng and Y. Fu. Research on Vector Spatial Data Storage Schema Based on Hadoop Platform. International Journal of Database Theory and Application, 6(5):85--94, 2013. Google ScholarCross Ref
- Y. Zhong, J. Han, T. Zhang, Z. Li, J. Fang, and G. Chen. Towards Parallel Spatial Query Processing for Big Spatial Data. In Proceedings of the IEEE 26th International Conference on Parallel and Distributed Processing, pages 2085--2094, 2012. Google ScholarDigital Library
Recommendations
Efficient spatial query processing for big data
SIGSPATIAL '14: Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsSpatial queries are widely used in many data mining and analytics applications. However, a huge and growing size of spatial data makes it challenging to process the spatial queries efficiently. In this paper we present a lightweight and scalable spatial ...
Comparative Evaluation of Various Indexing Techniques of Geospatial Vector Data for Processing in Distributed Computing Environment
COMPUTE '16: Proceedings of the 9th Annual ACM India ConferenceThe explosion of ever increasing geospatial data is today met with the challenge of maintaining it in spatial databases and utilization of traditional methods of spatial data processing. The sheer volume and complexity of spatial databases makes them an ...
SpatialHadoop: towards flexible and scalable spatial processing using mapreduce
SIGMOD'14 PhD Symposium: Proceedings of the 2014 SIGMOD PhD symposiumRecently, MapReduce frameworks, e.g., Hadoop, have been used extensively in different applications that include tera-byte sorting, machine learning, and graph processing. With the huge volumes of spatial data coming from different sources, there is an ...
Comments