Abstract
Vector Spatial data types such as lines, polygons or regions etc usually comprises of hundreds of thousands of latitude-longitude pairs to accurately represent the geometry of spatial features such as towns, rivers or villages. This leads to spatial data operations being computationally and memory intensive. A solution to deal with this is to distribute the operations amongst multiple computational nodes. Parallel spatial databases attempt to do this but at very small scales (of the order of 10s of nodes at most). Another approach would be to use distributed approaches such as Map-Reduce since spatial data operations map well to this paradigm. It affords us the advantage of being able to harness commodity hardware operating in a shared nothing mode while at the same time lending robustness to the computation since parts of the computation can be restarted on failure. In this paper, we present HadoopDB - a combination of Hadoop and Postgres spatial to efficiently handle computations on large spatial data sets. In HadoopDB, Hadoop serves as a means of coordinating amongst various computational nodes each of which performs the spatial query on a part of the data set. The Reduce stage helps collate the result data to yield the result of the original query. We present performance results to show that common spatial queries yields a speedup that nearly linear with the number of Hadoop processes deployed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation, vol. 6, p. 10. USENIX Association, San Francisco (2004)
Bialecki, A., Cafarella, M., Cutting, D., Malley, O.: Hadoop: a framework for running applications on large clusters built of commodity hardware, Wiki at http://lucene.apache.org/hadoop
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S.R., Stonebraker, M.A.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 165–178. ACM Press, New York (2009)
Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53(1), 64–71 (2010)
Zhang, J., Mamoulis, N., Papadias, D., Tao, Y.: All-nearest-neighbors queries in spatial databases, p. 297 (June 2004)
Zhang, S., Han, J., Liu, Z., Wang, K., Xu, Z.: SJMR: Parallelizing spatial join with MapReduce on clusters. In: Proceedings of CLUSTER, pp. 1–8 (2009)
Dittrich, J.P., Seeger, B.: Data redundancy and duplicate detection in spatial join processing. In: ICDE 2000: Proceedings of the 16th International Conference on Data Engineering, pp. 535–546 (2000)
Brinkhoff, T., Kriegel, H.P., Seeger, B.: Parallel processing of spatial joins using R-trees. In: ICDE 1996: Proceedings of the Twelfth International Conference on Data Engineering, pp. 258–265 (1996)
Patel, J.M., DeWitt, D.J.: Partition based spatial-merge join. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 259–270. ACM, New York (1996)
Akdogan, A., Demiryurek, U., Banaei-Kashani, F., Shahabi, C.: Integrated Media Systems Center, University of Southern California, Los Angeles, CA 90089
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive - a warehousing solution over a map-reduce framework. PVLDB 2(2), 1626–1629 (2009)
Abouzeid, A., Bajda-pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, E.: HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In: Proc. VLDB 2009 (2009)
Leptoukh, G.: NASA remote sensing data in earth sciences: Processing, archiving, distribution, applications at the GES DISC. In: Proc. of the 31st Intl. Symposium of Remote Sensing of Environment (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bellur, U. (2014). On Parallelizing Large Spatial Queries Using Map-Reduce. In: Pfoser, D., Li, KJ. (eds) Web and Wireless Geographical Information Systems. W2GIS 2014. Lecture Notes in Computer Science, vol 8470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55334-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-55334-9_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55333-2
Online ISBN: 978-3-642-55334-9
eBook Packages: Computer ScienceComputer Science (R0)