Abstract
The amount of information in spatial databases is growing as more data is made available. Spatial databases mainly store two types of data: raster data (satellite/aerial digital images), and vector data (points, lines, polygons). The complexity and nature of spatial databases makes them ideal for applying parallel processing. MapReduce is an emerging massively parallel computing model, proposed by Google. In this work, we present our experiences in applying the MapReduce model to solve two important spatial problems: (a) bulk-construction of R-Trees and (b) aerial image quality computation, which involve vector and raster data, respectively. We present our results on the scalability of MapReduce, and the effect of parallelism on the quality of the results. Our algorithms were executed on a Google&IBM cluster, which became available to us through an NSF-supported program. The cluster supports the Hadoop framework – an open source implementation of MapReduce. Our results confirm the excellent scalability of the MapReduce framework in processing parallelizable problems.
This research was supported in part by NSF grants IIS-0837716, CNS-0821345, HRD-0833093, EIA-0220562, IIS-0811922, IIP-0829576 and IIS-0534530, and equipment support by Google and IBM.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Guttman, A.: R-Trees: A Dynamic Index Structure for Spatial Searching. In: SIGMOD 1984, pp. 47–57 (1984)
NSF Cluster Exploratory Program, http://www.nsf.gov/pubs/2008/nsf08560/nsf08560.htm
Google&IBM Academic Cluster Computing Initiative, http://www.google.com/intl/en/press/pressrel/20071008_ibm_univ.html
Apache Hadoop project, http://hadoop.apache.org
Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. SIGOPS Operating Systems Review 37(5), 29–43 (2003)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, vol. 6, p. 10. USENIX Association (December 2004)
Asanoa, T., Ranjanb, D., Roosc, T., Welzld, E., Widmayer, P.: Space-filling curves and their use in the design of geometric data structures. Theoretical Computer Science 181(1), 3–15 (1997)
Lawder, J.K., King, P.J.H.: Using Space-Filling Curves for Multi-dimensional Indexing. In: Jeffery, K., Lings, B. (eds.) BNCOD 2000. LNCS, vol. 1832, pp. 20–35. Springer, Heidelberg (2000)
Morton, G.M.: A computer Oriented Geodetic Data Base; and a New Technique in File Sequencing, Technical Report, Ottawa. IBM Ltd, Canada (1966)
O’Malley, O.: TeraByte Sort on Apache Hadoop, Yahoo! (May 2008)
Doqq file format, http://egsc.usgs.gov/isb/pubs/factsheets/fs05701.html
High Performance Database Research Center (HPDRC), Research Division of the Florida International University, School of Computing and Information Sciences, University Park, Telephone (305) 348-1706, FIU ECS-243, Miami, FL 33199
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles 19(2), 322–331 (1990)
U.S. Census Bureau, Florida State and County QuickFacts, http://quickfacts.census.gov/qfd/states/12000.html (last revised: July 25, 2008)
Abel, D.J., Mark, D.M.: A comparative analysis of some two-dimensional orderings. International Journal of Geographical Information Science 4(1), 21–31 (1990)
Schnitzer, B., Leutenegger, S.T.: Master-client R-trees: a new parallel R-tree architecture. In: Proceedings of the 11th International Conference on Scientific and Statistical Database Management, pp. 68–77 (August 1999)
Papadopoulos, A., Manolopoulos, Y.: Parallel bulk-loading of spatial data. Parallel Computing 29(10), 1419–1444 (2003)
Wu, X., Carceroni, R., Fang, H., Zelinka, S., Kirmse, A.: Automatic alignment of large-scale aerial rasters to road-maps, Geographic Information Systems. In: Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems, Article No. 17 (2007)
Schlosser, S.W., Ryan, M.P., Taborda, R., Lopez, J., O’Hallaron, D.R., Bielak, J.: Materialized community ground models for large-scale earthquake simulation. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, Conference on High Performance Networking and Computing, pp. 1–12 (2008)
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge (1999)
Yang, H.-c., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pp. 1029–1040 (2007)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1099–1110 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cary, A., Sun, Z., Hristidis, V., Rishe, N. (2009). Experiences on Processing Spatial Data with MapReduce. In: Winslett, M. (eds) Scientific and Statistical Database Management. SSDBM 2009. Lecture Notes in Computer Science, vol 5566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02279-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-02279-1_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02278-4
Online ISBN: 978-3-642-02279-1
eBook Packages: Computer ScienceComputer Science (R0)