skip to main content
10.1145/3139958.3139967acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

On Spatial Joins in MapReduce

Authors Info & Claims
Published:07 November 2017Publication History

ABSTRACT

This paper provides the first attempt for a full-fledged query optimizer for MapReduce-based spatial join algorithms. The optimizer develops its own taxonomy that covers almost all possible ways of doing a spatial join for any two input datasets. The optimizer comes in two flavors; cost-based and rule-based. Given two input data sets, the cost-based query optimizer evaluates the costs of all possible options in the developed taxonomy, and selects the one with the lowest cost. The rule-based query optimizer abstracts the developed cost models of the cost-based optimizer into a set of simple easy-to-check heuristic rules. Then, it applies its rules to select the lowest cost option. Both query optimizers are deployed and experimentally evaluated inside a widely used open-source MapReduce-based big spatial data system. Exhaustive experiments show that both query optimizers are always successful in taking the right decision for spatially joining any two datasets of up to 500GB each.

References

  1. ESRI Tools on Hadoop. http://esri.github.io/gis-tools-for-hadoop/.Google ScholarGoogle Scholar
  2. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, and Joel Saltz. Hadoop GIS: A High Performance Spatial Data Warehousing System over Mapreduce. PVLDB, 6(11), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sattam Alsubaiee, Yasser Altowim, Hotham Altwaijry, Alexander Behm, Vinayak Borkar, Yingyi Bu, Michael Carey, Inci Cetindil, Madhusudan Cheelangi, Khurram Faraaz, Eugenia Gabrielova, Raman Grover, Zachary Heilbron, Young-Seok Kim, Chen Li, Guangqiang Li, Ji Mahn Ok, Nicola Onose, Pouria Pirzadeh, Vassilis Tsotras, Rares Vernica, Jian Wen, and Till Westmann. AsterixDB: A Scalable, Open Source BDMS. PVLDB, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ning An, Zhen-Yu Yang, and Anand Sivasubramaniam. Selectivity Estimation for Spatial Joins. In ICDE, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Walid G. Aref and Hanan Samet. A Cost Model for Query Optimization Using R-Trees. In SIGSPATIAL, 1994.Google ScholarGoogle Scholar
  6. Lars Arge, Octavian Procopiuc, Sridhar Ramaswamy, Torsten Suel, Jan Vahrenhold, and Jeffrey Vitter. A Unified Approach for Indexed and Non-indexed Spatial Joins. In EDBT, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jon Louis Bentley. Multidimensional Binary Search Trees Used for Associative Searching. CACM, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Thomas Brinkhoff, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. Multi-step Processing of Spatial Joins. SIGMOD Record, 23(2), 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Thomas Brinkhoff, Hans-Peter Kriegel, and Bernhard Seeger. Efficient Processing of Spatial Joins Using R-trees. In SIGMOD, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Thomas Brinkhoff, Hans-Peter Kriegel, and Bernhard Seeger. Parallel Processing of Spatial Joins using R-trees. In ICDE, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. CACM, 51(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jochen Van den Bercken, Bernhard Seeger, and Peter Widmayer. The Bulk Index Join: A Generic Approach to Processing Non-Equijoins. In ICDE, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  13. Jens-Peter Dittrich and Bernhard Seeger. Data Redundancy and Duplicate Detection in Spatial Join Processing. In ICDE, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ahmed Eldawy, Louai Alarabi, and Mohamed F. Mokbel. Spatial Partitioning Techniques in SpatialHadoop. PVLDB, 8(12), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ahmed Eldawy, Yuan Li, Mohamed F. Mokbel, and Ravi Janardan. CGHadoop: Computational Geometry in MapReduce. In SIGSPATIAL, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ahmed Eldawy and Mohamed F. Mokbel. SpatialHadoop: A MapReduce Framework for Spatial Data. In ICDE, 2015.Google ScholarGoogle Scholar
  17. Ahmed Eldawy and Mohamed F. Mokbel. The Era of Big Spatial Data. In ICDE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  18. Christos Faloutsos, Bernhard Seeger, Agma Traina, and Caetano Traina Jr. Spatial Join Selectivity Using Power Laws. In SIGMOD, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R.A. Finkel and J.L. Bentley. Quad Trees a Data Structure for Retrieval on Composite Keys. Acta Informatica, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Miguel R. Fornari, Joao Luiz D. Comba, and Cirano Iochpe. Query Optimizer for Spatial Join Operations. In GIS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Huijun Gao, Hao Zhang, Daosheng Hu, Ran Tian, and Dazhi Guo. Multi-scale Features of Urban Planning Spatial Data. In Geoinformatics, 2010.Google ScholarGoogle Scholar
  22. Oliver GuÌĹnther. Efficient Computation of Spatial Joins. In ICDE, 1993.Google ScholarGoogle Scholar
  23. Oliver Gunther, Vincent Oria, Philippe Picouet, Jean-Marc Saglio, and Michel Scholl. Benchmarking Spatial Joins A La Carte. In SSDM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Himanshu Gupta, Bhupesh Chawda, Sumit Negi, Tanveer A. Faruquie, L. V. Subramaniam, and Mukesh Mohania. Processing Multi-way Spatial Joins on Map-reduce. In EDBT, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Christophe Gurret and Philippe Rigaux. The Sort/Sweep Algorithm: A New Method for R-tree based Spatial Joins. In SSDM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Guttman. R-trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Rec., 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lilian Harada, Miyuki Nakano, Masaru Kitsuregawa, and Mikio Takagi. Query Processing for Multi-Attribute Clustered Records. In VLDB, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Erik G. Hoel and Hanan Samet. Benchmarking Spatial Join Operations with Spatial Output. In VLDB, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Edwin H. Jacox and Hanan Samet. Iterative Spatial Join. TODS, 28(3), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Edwin H. Jacox and Hanan Samet. Spatial Join Techniques. TODS, 32(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jin-Deog Kim and Bong-Hee Hong. Parallel Spatial Join Algorithms using Grid Files. In DANTE, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Scott T. Leutenegger, Mario A. Lopez, and Jeffrey Edgington. STR: A Simple and Efficient Algorithm for R-tree Packing. In ICDE, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ming-Ling Lo and Chinya Ravishankar. Spatial Joins Using Seeded Trees. In SIGMOD, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jiamin Lu and Ralf Hartmut Guting. Parallel Secondo: Boosting Database Engines with Hadoop. In ICPADS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Gang Luo, Jeffrey F. Naughton, and Curt J. Ellmann. A Non-blocking Parallel Spatial Join Algorithm. In ICDE, 2002.Google ScholarGoogle Scholar
  36. Nikos Mamoulis, Panos Kalnis, Spiridon Bakiras, and Xiaochen Li. Optimization of Spatial Joins on Mobile Devices. In SSTD, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  37. Henry Markram, Karlheinz Meier, Thomas Lippert, Sten Grillner, Richard Frackowiak, Stanislas Dehaene, Alois Knoll, Haim Sompolinsky, Kris Verstreken, Javier DeFelipe, Seth Grant, Jean-Pierre Changeux, and Alois Saria. Introducing the human brain project. Procedia Computer Science, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  38. J. Nievergelt, Hans Hinterberger, and Kenneth C. Sevcik. The Grid File: An Adaptable, Symmetric Multikey File Structure. TODS, 9(1), 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Nievergelt and F. P. Preparata. Plane-sweep Algorithms for Intersecting Geometric Figures. CACM, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. OpenStreetMap. https://www.openstreetmap.org/.Google ScholarGoogle Scholar
  41. Apostolos Papadopoulos, Philippe Rigaux, and Michel Scholl. A Performance Evaluation of Spatial Join Processing Strategies. Adv. in Spatial Databases, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jignesh M. Patel and David J. DeWitt. Partition Based Spatial-merge Join. In SIGMOD, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jignesh M. Patel and David J. DeWitt. Clone Join and Shadow Join: Two Parallel Spatial Join Algorithms. In GIS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Satish Puri, Dinesh Agarwal, Xi He, and Sushil K. Prasad. MapReduce Algorithms for GIS Polygonal Overlay Processing. In IPDPSW, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Darius Sidlauskas and Christian S. Jensen. Spatial Joins in Main Memory: Implementation Matters! PVLDB, 8(1), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Benjamin Sowell, Marcos Vaz Salles, Tuan Cao, Alan Demers, and Johannes Gehrke. An Experimental Analysis of Iterated Spatial Joins in Main Memory. PVLDB, 6(14), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Chengyu Sun, Divyakant Agrawal, and Amr El Abbadi. Selectivity Estimation for Spatial Joins with Geometric Selections. In EDBT, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Kai Wang, Jizhong Han, Bibo Tu, Jiao Dai, Wei Zhou, and Xuan Song. Accelerating Spatial Data Processing with MapReduce. In ICPADS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Kaibo Wang, Yin Huai, Rubao Lee, Fusheng Wang, Xiaodong Zhang, and Joel Saltz. Accelerating Pathology Image Data Cross-comparison on CPU-GPU Hybrid Systems. PVLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Randall T. Whitman, Michael B. Park, Sarah M. Ambrose, and Erik G. Hoel. Spatial Indexing and Analytics on Hadoop. In SIGSPATIAL, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Shubin Zhang, Jizhong Han, Zhiyong Liu, Kai Wang, and Shengzhong Feng. Spatial Queries Evaluation with MapReduce. In GCC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Shubin Zhang, Jizhong Han, Zhiyong Liu, Kai Wang, and Zhiyong Xu. SJMR: Parallelizing spatial join with MapReduce on clusters. In CLUSTER, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  53. Yunqin Zhong, Jizhong Han, Tieying Zhang, Zhenhua Li, Jinyun Fang, and Guihai Chen. Towards Parallel Spatial Query Processing for Big Spatial Data. In IPDPSW, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Xiaofang Zhou, David J. Abel, and David Truffet. Data Partitioning for Parallel Spatial Join Processing. Geoinformatica, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On Spatial Joins in MapReduce

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGSPATIAL '17: Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
            November 2017
            677 pages
            ISBN:9781450354905
            DOI:10.1145/3139958

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 7 November 2017

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            SIGSPATIAL '17 Paper Acceptance Rate39of193submissions,20%Overall Acceptance Rate220of1,116submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader