Skip to main content

On Parallelizing Large Spatial Queries Using Map-Reduce

  • Conference paper
Web and Wireless Geographical Information Systems (W2GIS 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8470))

Abstract

Vector Spatial data types such as lines, polygons or regions etc usually comprises of hundreds of thousands of latitude-longitude pairs to accurately represent the geometry of spatial features such as towns, rivers or villages. This leads to spatial data operations being computationally and memory intensive. A solution to deal with this is to distribute the operations amongst multiple computational nodes. Parallel spatial databases attempt to do this but at very small scales (of the order of 10s of nodes at most). Another approach would be to use distributed approaches such as Map-Reduce since spatial data operations map well to this paradigm. It affords us the advantage of being able to harness commodity hardware operating in a shared nothing mode while at the same time lending robustness to the computation since parts of the computation can be restarted on failure. In this paper, we present HadoopDB - a combination of Hadoop and Postgres spatial to efficiently handle computations on large spatial data sets. In HadoopDB, Hadoop serves as a means of coordinating amongst various computational nodes each of which performs the spatial query on a part of the data set. The Reduce stage helps collate the result data to yield the result of the original query. We present performance results to show that common spatial queries yields a speedup that nearly linear with the number of Hadoop processes deployed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation, vol. 6, p. 10. USENIX Association, San Francisco (2004)

    Google Scholar 

  2. Bialecki, A., Cafarella, M., Cutting, D., Malley, O.: Hadoop: a framework for running applications on large clusters built of commodity hardware, Wiki at http://lucene.apache.org/hadoop

  3. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S.R., Stonebraker, M.A.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 165–178. ACM Press, New York (2009)

    Chapter  Google Scholar 

  4. Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53(1), 64–71 (2010)

    Article  Google Scholar 

  5. Zhang, J., Mamoulis, N., Papadias, D., Tao, Y.: All-nearest-neighbors queries in spatial databases, p. 297 (June 2004)

    Google Scholar 

  6. Zhang, S., Han, J., Liu, Z., Wang, K., Xu, Z.: SJMR: Parallelizing spatial join with MapReduce on clusters. In: Proceedings of CLUSTER, pp. 1–8 (2009)

    Google Scholar 

  7. Dittrich, J.P., Seeger, B.: Data redundancy and duplicate detection in spatial join processing. In: ICDE 2000: Proceedings of the 16th International Conference on Data Engineering, pp. 535–546 (2000)

    Google Scholar 

  8. Brinkhoff, T., Kriegel, H.P., Seeger, B.: Parallel processing of spatial joins using R-trees. In: ICDE 1996: Proceedings of the Twelfth International Conference on Data Engineering, pp. 258–265 (1996)

    Google Scholar 

  9. Patel, J.M., DeWitt, D.J.: Partition based spatial-merge join. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 259–270. ACM, New York (1996)

    Chapter  Google Scholar 

  10. Akdogan, A., Demiryurek, U., Banaei-Kashani, F., Shahabi, C.: Integrated Media Systems Center, University of Southern California, Los Angeles, CA 90089

    Google Scholar 

  11. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive - a warehousing solution over a map-reduce framework. PVLDB 2(2), 1626–1629 (2009)

    Google Scholar 

  12. Abouzeid, A., Bajda-pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, E.: HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In: Proc. VLDB 2009 (2009)

    Google Scholar 

  13. http://en.wikipedia.org/wiki/GeoServer

  14. http://arcdata.esri.com/data/tiger2000/tiger_download.cfm

  15. Leptoukh, G.: NASA remote sensing data in earth sciences: Processing, archiving, distribution, applications at the GES DISC. In: Proc. of the 31st Intl. Symposium of Remote Sensing of Environment (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bellur, U. (2014). On Parallelizing Large Spatial Queries Using Map-Reduce. In: Pfoser, D., Li, KJ. (eds) Web and Wireless Geographical Information Systems. W2GIS 2014. Lecture Notes in Computer Science, vol 8470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55334-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-55334-9_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-55333-2

  • Online ISBN: 978-3-642-55334-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics