On Parallelizing Large Spatial Queries Using Map-Reduce

Bellur, Umesh

doi:10.1007/978-3-642-55334-9_1

Umesh Bellur¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8470))

Included in the following conference series:

International Symposium on Web and Wireless Geographical Information Systems

1146 Accesses
3 Citations

Abstract

Vector Spatial data types such as lines, polygons or regions etc usually comprises of hundreds of thousands of latitude-longitude pairs to accurately represent the geometry of spatial features such as towns, rivers or villages. This leads to spatial data operations being computationally and memory intensive. A solution to deal with this is to distribute the operations amongst multiple computational nodes. Parallel spatial databases attempt to do this but at very small scales (of the order of 10s of nodes at most). Another approach would be to use distributed approaches such as Map-Reduce since spatial data operations map well to this paradigm. It affords us the advantage of being able to harness commodity hardware operating in a shared nothing mode while at the same time lending robustness to the computation since parts of the computation can be restarted on failure. In this paper, we present HadoopDB - a combination of Hadoop and Postgres spatial to efficiently handle computations on large spatial data sets. In HadoopDB, Hadoop serves as a means of coordinating amongst various computational nodes each of which performs the spatial query on a part of the data set. The Reduce stage helps collate the result data to yield the result of the original query. We present performance results to show that common spatial queries yields a speedup that nearly linear with the number of Hadoop processes deployed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation, vol. 6, p. 10. USENIX Association, San Francisco (2004)
Google Scholar
Bialecki, A., Cafarella, M., Cutting, D., Malley, O.: Hadoop: a framework for running applications on large clusters built of commodity hardware, Wiki at http://lucene.apache.org/hadoop
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S.R., Stonebraker, M.A.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 165–178. ACM Press, New York (2009)
Chapter Google Scholar
Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53(1), 64–71 (2010)
Article Google Scholar
Zhang, J., Mamoulis, N., Papadias, D., Tao, Y.: All-nearest-neighbors queries in spatial databases, p. 297 (June 2004)
Google Scholar
Zhang, S., Han, J., Liu, Z., Wang, K., Xu, Z.: SJMR: Parallelizing spatial join with MapReduce on clusters. In: Proceedings of CLUSTER, pp. 1–8 (2009)
Google Scholar
Dittrich, J.P., Seeger, B.: Data redundancy and duplicate detection in spatial join processing. In: ICDE 2000: Proceedings of the 16th International Conference on Data Engineering, pp. 535–546 (2000)
Google Scholar
Brinkhoff, T., Kriegel, H.P., Seeger, B.: Parallel processing of spatial joins using R-trees. In: ICDE 1996: Proceedings of the Twelfth International Conference on Data Engineering, pp. 258–265 (1996)
Google Scholar
Patel, J.M., DeWitt, D.J.: Partition based spatial-merge join. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 259–270. ACM, New York (1996)
Chapter Google Scholar
Akdogan, A., Demiryurek, U., Banaei-Kashani, F., Shahabi, C.: Integrated Media Systems Center, University of Southern California, Los Angeles, CA 90089
Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive - a warehousing solution over a map-reduce framework. PVLDB 2(2), 1626–1629 (2009)
Google Scholar
Abouzeid, A., Bajda-pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, E.: HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In: Proc. VLDB 2009 (2009)
Google Scholar
http://en.wikipedia.org/wiki/GeoServer
http://arcdata.esri.com/data/tiger2000/tiger_download.cfm
Leptoukh, G.: NASA remote sensing data in earth sciences: Processing, archiving, distribution, applications at the GES DISC. In: Proc. of the 31st Intl. Symposium of Remote Sensing of Environment (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

GISE Lab, Department of Computer Science, Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India
Umesh Bellur

Authors

Umesh Bellur
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

George Mason University, 22032, Fairfax, VA, USA
Dieter Pfoser
Department of Computer Science and Engineering, Pusan National University, 609-735, Pusan, South Korea
Ki-Joune Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bellur, U. (2014). On Parallelizing Large Spatial Queries Using Map-Reduce. In: Pfoser, D., Li, KJ. (eds) Web and Wireless Geographical Information Systems. W2GIS 2014. Lecture Notes in Computer Science, vol 8470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55334-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-55334-9_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55333-2
Online ISBN: 978-3-642-55334-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics