skip to main content
10.1145/2525314.2525349acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

CG_Hadoop: computational geometry in MapReduce

Published: 05 November 2013 Publication History

Abstract

Hadoop, employing the MapReduce programming paradigm, has been widely accepted as the standard framework for analyzing big data in distributed environments. Unfortunately, this rich framework was not truly exploited towards processing large-scale computational geometry operations. This paper introduces CG_Hadoop; a suite of scalable and efficient MapReduce algorithms for various fundamental computational geometry problems, namely, polygon union, skyline, convex hull, farthest pair, and closest pair, which present a set of key components for other geometric algorithms. For each computational geometry operation, CG_Hadoop has two versions, one for the Apache Hadoop system and one for the SpatialHadoop system; a Hadoop-based system that is more suited for spatial operations. These proposed algorithms form a nucleus of a comprehensive MapReduce library of computational geometry operations. Extensive experimental results on a cluster of 25 machines of datasets up to 128GB show that CG_Hadoop achieves up to 29x and 260x better performance than traditional algorithms when using Hadoop and SpatialHadoop systems, respectively.

References

[1]
A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. Saltz. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce. In VLDB, 2013.
[2]
A. Akdogan, U. Demiryurek, F. Banaei-Kashani, and C. Shahabi. Voronoi-based Geospatial Query Processing with MapReduce. In CLOUDCOM, Nov. 2010.
[3]
A. M. Andrew. Another Efficient Algorithm for Convex Hulls in Two Dimensions. Information Processing Letters, 9(5), 1979.
[4]
J. L. Bentley, H. Kung, M. Schkolnick, and C. D. Thompson. On the Average Number of Maxima in a Set of Vectors and Applications. Journal of the ACM (JACM), 25(4), 1978.
[5]
M. D. Berg, O. Cheong, M. V. Kreveld, and M. Overmars. Computational Geometry: Algorithms and Applications. Springer, 2008.
[6]
K. D. Borne, S. A. Baum, A. Fruchter, and K. S. Long. The Hubble Space Telescope Data Archive. In Astronomical Data Analysis Software and Systems IV, volume 77, Sept. 1995.
[7]
S. Börzsönyi, D. Kossmann, and K. Stocker. The Skyline Operator. In ICDE, Apr. 2001.
[8]
A. Cary, Z. Sun, V. Hristidis, and N. Rishe. Experiences on Processing Spatial Data with MapReduce. In SSDBM, June 2009.
[9]
B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yemeni. PNUTS: Yahoo!'s hosted data serving platform. PVLDB, 1(2), 2008.
[10]
K. Dalai. Counting the Onion. Random Structures & Algorithms, 24(2), 2004.
[11]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of ACM, 51, 2008.
[12]
A. Eldawy and M. F. Mokbel. A Demonstration of SpatialHadoop: An Efficient MapReduce Framework for Spatial Data. In VLDB, 2013.
[13]
A. Ghoting, R. Krishnamurthy, E. Pednault, B. Reinwald, V. Sindhwani, S. Tatikonda, Y. Tian, and S. Vaithyanathan. SystemML: Declarative Machine Learning on MapReduce. In ICDE, Apr. 2011.
[14]
Giraph. http://giraph.apache.org/.
[15]
M. T. Goodrich, N. Sitchinava, and Q. Zhang. Sorting, Searching, and Simulation in the MapReduce Framework. In ISAAC, Dec. 2011.
[16]
A. Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching. In SIGMOD, June 1984.
[17]
Apache. Hadoop. http://hadoop.apache.org/.
[18]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. In EuroSys, Mar. 2007.
[19]
A. Lakshman and P. Malik. Cassandra: A Decentralized Structured Storage System. Operating Systems Review, 44(2), 2010.
[20]
G. Lee, J. Lin, C. Liu, A. Lorek, and D. V. Ryaboy. The Unified Logging Infrastructure for Data Analytics at Twitter. PVLDB, 5(12), 2012.
[21]
H. Liao, J. Han, and J. Fang. Multi-dimensional Index on Hadoop Distributed File System. ICNAS, 0, 2010.
[22]
J. Lu and R. H. Guting. Parallel Secondo: Boosting Database Engines with Hadoop. In ICPADS, Dec. 2012.
[23]
W. Lu, Y. Shen, S. Chen, and B. C. Ooi. Efficient Processing of k Nearest Neighbor Joins using MapReduce. PVLDB, 5, 2012.
[24]
Q. Ma, B. Yang, W. Qian, and A. Zhou. Query Processing of Massive Trajectory Data Based on MapReduce. In CLOUDDB, Oct. 2009.
[25]
K. Mulmuley. Computational Geometry: An Introduction Through Randomized Algorithms, volume 54. Prentice-Hall Englewood Cliffs, 1994.
[26]
J. Nievergelt, H. Hinterberger, and K. Sevcik. The Grid File: An Adaptable, Symmetric Multikey File Structure. TODS, 9(1), 1984.
[27]
S. Nishimura, S. Das, D. Agrawal, and A. E. Abbadi. MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services. In MDM, June 2011.
[28]
D. Oliver and D. J. Steinberger. From Geography to Medicine: Exploring Innerspace via Spatial and Temporal Databases. In SSTD, 2011.
[29]
O. O'Malley Terabyte Sort on Apache Hadoop. Yahoo!, 2008.
[30]
OpenStreetMaps. http://www.openstreetmap.org/.
[31]
D. Papadias, Y. Tao, G. Fu, and B. Seeger. Progressive skyline computation in database systems. TODS, 30(1), 2005.
[32]
PostGIS. Spatial and Geographic Objects for PostgreSQL. http://postgis.net/.
[33]
F. Preparata and M. I. Shamos. Computational Geometry: An Introduction. Springer-Verlag, 1985.
[34]
M. I. Shamos. Computational geometry. PhD thesis, Yale University, 1978.
[35]
F. Tauheed, L. Biveinis, T. Heinis, F. Schürmann, H. Markram, and A. Ailamaki. Accelerating range queries for brain simulations. In ICDE, Apr. 2012.
[36]
K. Wang, J. Han, B. Tu, J. D. amd Wei Zhou, and X. Song. Accelerating Spatial Data Processing with MapReduce. In ICPADS, Dec. 2010.
[37]
C. Zhang, F. Li, and J. Jestes. Efficient Parallel kNN Joins for Large Data in MapReduce. In EDBT, Mar. 2012.
[38]
S. Zhang, J. Han, Z. Liu, K. Wang, and S. Feng. Spatial Queries Evaluation with MapReduce. In GCC, Aug. 2009.

Cited By

View all
  • (2023)An Enhanced Partitioning Approach in SpatialHadoop for Handling Big Spatial DataInternational Journal of Computational Intelligence Systems10.1007/s44196-023-00188-816:1Online publication date: 15-Feb-2023
  • (2022)MIX-RS: A Multi-Indexing System Based on HDFS for Remote Sensing Data StorageTsinghua Science and Technology10.26599/TST.2021.901008227:6(881-893)Online publication date: Dec-2022
  • (2022)Incremental partitioning for efficient spatial data analyticsProceedings of the VLDB Endowment10.14778/3494124.349415015:3(713-726)Online publication date: 4-Feb-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL'13: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
November 2013
598 pages
ISBN:9781450325219
DOI:10.1145/2525314
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 November 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hadoop
  2. MapReduce
  3. geometric algorithms

Qualifiers

  • Research-article

Conference

SIGSPATIAL'13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 257 of 1,238 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)An Enhanced Partitioning Approach in SpatialHadoop for Handling Big Spatial DataInternational Journal of Computational Intelligence Systems10.1007/s44196-023-00188-816:1Online publication date: 15-Feb-2023
  • (2022)MIX-RS: A Multi-Indexing System Based on HDFS for Remote Sensing Data StorageTsinghua Science and Technology10.26599/TST.2021.901008227:6(881-893)Online publication date: Dec-2022
  • (2022)Incremental partitioning for efficient spatial data analyticsProceedings of the VLDB Endowment10.14778/3494124.349415015:3(713-726)Online publication date: 4-Feb-2022
  • (2022)Weighted Reservoir Sampling On Evolving Streams: A Sampling Algorithmic Framework For Stream Event IdentificationProceedings of the 12th Hellenic Conference on Artificial Intelligence10.1145/3549737.3549767(1-9)Online publication date: 7-Sep-2022
  • (2022)Efficient Interactive Global Cellular Signal Strength VisualizationIEEE Transactions on Big Data10.1109/TBDATA.2020.30295598:5(1209-1219)Online publication date: 1-Oct-2022
  • (2022)Optimized Closest Pair Computation with CPU-GPU Combined ModelICT Analysis and Applications10.1007/978-981-19-5224-1_74(743-755)Online publication date: 6-Nov-2022
  • (2021)A proposal to minimize the cost of processing big geospatial data in public cloud providersTransactions in GIS10.1111/tgis.1275425:3(1599-1624)Online publication date: 5-May-2021
  • (2021)Rectilinear Range Query Processing on SpatialHadoop Platform2021 IEEE Global Communications Conference (GLOBECOM)10.1109/GLOBECOM46510.2021.9685052(1-6)Online publication date: 7-Dec-2021
  • (2021)A MapReduce-based distributed and scalable framework for stitching of satellite mosaic imagesArabian Journal of Geosciences10.1007/s12517-021-07500-w14:18Online publication date: 23-Aug-2021
  • (2021)Big Spatial and Spatio-Temporal Data Analytics SystemsTransactions on Large-Scale Data- and Knowledge-Centered Systems XLVII10.1007/978-3-662-62919-2_7(155-180)Online publication date: 17-Jan-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media