Skip to main content

Experiences on Processing Spatial Data with MapReduce

  • Conference paper
Scientific and Statistical Database Management (SSDBM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5566))

  • 2093 Accesses

Abstract

The amount of information in spatial databases is growing as more data is made available. Spatial databases mainly store two types of data: raster data (satellite/aerial digital images), and vector data (points, lines, polygons). The complexity and nature of spatial databases makes them ideal for applying parallel processing. MapReduce is an emerging massively parallel computing model, proposed by Google. In this work, we present our experiences in applying the MapReduce model to solve two important spatial problems: (a) bulk-construction of R-Trees and (b) aerial image quality computation, which involve vector and raster data, respectively. We present our results on the scalability of MapReduce, and the effect of parallelism on the quality of the results. Our algorithms were executed on a Google&IBM cluster, which became available to us through an NSF-supported program. The cluster supports the Hadoop framework – an open source implementation of MapReduce. Our results confirm the excellent scalability of the MapReduce framework in processing parallelizable problems.

This research was supported in part by NSF grants IIS-0837716, CNS-0821345, HRD-0833093, EIA-0220562, IIS-0811922, IIP-0829576 and IIS-0534530, and equipment support by Google and IBM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Guttman, A.: R-Trees: A Dynamic Index Structure for Spatial Searching. In: SIGMOD 1984, pp. 47–57 (1984)

    Google Scholar 

  2. NSF Cluster Exploratory Program, http://www.nsf.gov/pubs/2008/nsf08560/nsf08560.htm

  3. Google&IBM Academic Cluster Computing Initiative, http://www.google.com/intl/en/press/pressrel/20071008_ibm_univ.html

  4. Apache Hadoop project, http://hadoop.apache.org

  5. Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. SIGOPS Operating Systems Review 37(5), 29–43 (2003)

    Article  Google Scholar 

  6. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, vol. 6, p. 10. USENIX Association (December 2004)

    Google Scholar 

  7. Asanoa, T., Ranjanb, D., Roosc, T., Welzld, E., Widmayer, P.: Space-filling curves and their use in the design of geometric data structures. Theoretical Computer Science 181(1), 3–15 (1997)

    Article  MathSciNet  Google Scholar 

  8. Lawder, J.K., King, P.J.H.: Using Space-Filling Curves for Multi-dimensional Indexing. In: Jeffery, K., Lings, B. (eds.) BNCOD 2000. LNCS, vol. 1832, pp. 20–35. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  9. Morton, G.M.: A computer Oriented Geodetic Data Base; and a New Technique in File Sequencing, Technical Report, Ottawa. IBM Ltd, Canada (1966)

    Google Scholar 

  10. O’Malley, O.: TeraByte Sort on Apache Hadoop, Yahoo! (May 2008)

    Google Scholar 

  11. Doqq file format, http://egsc.usgs.gov/isb/pubs/factsheets/fs05701.html

  12. High Performance Database Research Center (HPDRC), Research Division of the Florida International University, School of Computing and Information Sciences, University Park, Telephone (305) 348-1706, FIU ECS-243, Miami, FL 33199

    Google Scholar 

  13. Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles 19(2), 322–331 (1990)

    Google Scholar 

  14. U.S. Census Bureau, Florida State and County QuickFacts, http://quickfacts.census.gov/qfd/states/12000.html (last revised: July 25, 2008)

  15. Abel, D.J., Mark, D.M.: A comparative analysis of some two-dimensional orderings. International Journal of Geographical Information Science 4(1), 21–31 (1990)

    Article  Google Scholar 

  16. Schnitzer, B., Leutenegger, S.T.: Master-client R-trees: a new parallel R-tree architecture. In: Proceedings of the 11th International Conference on Scientific and Statistical Database Management, pp. 68–77 (August 1999)

    Google Scholar 

  17. Papadopoulos, A., Manolopoulos, Y.: Parallel bulk-loading of spatial data. Parallel Computing 29(10), 1419–1444 (2003)

    Article  MathSciNet  Google Scholar 

  18. Wu, X., Carceroni, R., Fang, H., Zelinka, S., Kirmse, A.: Automatic alignment of large-scale aerial rasters to road-maps, Geographic Information Systems. In: Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems, Article No. 17 (2007)

    Google Scholar 

  19. Schlosser, S.W., Ryan, M.P., Taborda, R., Lopez, J., O’Hallaron, D.R., Bielak, J.: Materialized community ground models for large-scale earthquake simulation. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, Conference on High Performance Networking and Computing, pp. 1–12 (2008)

    Google Scholar 

  20. Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  21. Yang, H.-c., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pp. 1029–1040 (2007)

    Google Scholar 

  22. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1099–1110 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cary, A., Sun, Z., Hristidis, V., Rishe, N. (2009). Experiences on Processing Spatial Data with MapReduce. In: Winslett, M. (eds) Scientific and Statistical Database Management. SSDBM 2009. Lecture Notes in Computer Science, vol 5566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02279-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02279-1_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02278-4

  • Online ISBN: 978-3-642-02279-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics