skip to main content
10.1145/2070770.2070773acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

A parallel input-output system for resolving spatial data challenges: an agent-based model case study

Authors Info & Claims
Published:01 November 2011Publication History

ABSTRACT

With recent advances in data collection technologies such as remote sensing and global positioning systems, the amount of spatial data being produced has been increasing at a staggering rate. Simultaneously, a shift is being experienced in computing from single-core to multi-core processors. To effectively utilize the computational power afforded by these new generation of processors for serving data-intensive geospatial applications, parallel computing techniques need to be employed. Parallel computing, however, raises new challenges associated with handling the input and output of spatial data in parallel. This paper describes a Parallel Input/Output System (PIOS) to address challenges associated with handling large amounts of diverse spatial data. The PIOS is based on a hierarchical structure that uses a scalable file partitioning strategy and combines data and metadata to enable efficient handling of terabyte-scale data sets in parallel. A spatially-explicit agent-based model is developed as a case study. Computational experiments were conducted on a supercomputer supported by the National Science Foundation. PIOS achieved ten times speedup in parallel input/output time, and was demonstrated to efficiently scale to over one thousand processing cores and handle multiple terabytes of data.

References

  1. K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, et al. A View of the Parallel Computing Landscape. Communications of the ACM, 52(10):56--67, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Bell, T. Hey, and A. Szalay. Beyond the Data Deluge. Science, 323(5919):1297--1298, 2009.Google ScholarGoogle Scholar
  3. L. Bian. The Representation of the Environment in the Context of Individual-based Modeling. Ecological Modelling, 159(2--3):279--296, 2003.Google ScholarGoogle Scholar
  4. R. Cattell. Scalable SQL and NoSQL Data Stores. SIGMOD Record, 39(4):13, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Caulfield, L. Grupp, and S. Swanson. Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications. In Proceeding of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 217--228. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Center for International Earth Science Information Network (CIESIN), Columbia University; and Centro Internacional de Agricultura Tropical (CIAT). Gridded Population of the World Version 3 (GPWv3): Population Grids. Palisades, NY: Socioeconomic Data and Applications Center (SEDAC), Columbia University. Available at http://sedac.ciesin.columbia.edu/gpw. (June 14, 2011).Google ScholarGoogle Scholar
  7. J. Epstein. Modelling to contain pandemics. Nature, 460(7256):687--687, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Epstein and R. Axtell. Growing Artificial Societies: Social Science from the Bottom Up. The MIT Press, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Gimblett. Integrating Geographic Information Systems and Agent-based Modeling Techniques for Simulating Social and Ecological Processes. Oxford University Press, USA, 2002.Google ScholarGoogle Scholar
  10. W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, B. Nitzberg, W. Saphir, and M. Snir. MPI-The Complete Reference: Volume 2, The MPI-2 Extensions. MIT Press, Cambridge, MA, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  11. A. Hey, S. Tansley, and K. Tolle. The fourth paradigm: data-intensive scientific discovery. Microsoft Research Redmond, WA, 2009.Google ScholarGoogle Scholar
  12. S. Lang, P. Carns, R. Latham, R. Ross, K. Harms, and W. Allcock. I/O Performance Challenges at Leadership Scale. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, page 40. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Liu, K. Wu, S. Wang, Y. Zhao, and Q. Huang. A MapReduce approach to Gi*(d) spatial statistic. In Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, pages 11--18. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Prost, R. Treumann, R. Hedges, B. Jia, and A. Koniges. MPI-IO/GPFS, an optimized implementation of MPI-IO on top of GPFS. In Proceedings of the 2001 ACM/IEEE conference on Supercomputing, pages 1--15. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Skinner. Performance Monitoring of Parallel Scientific Applications. Technical report LBNL/PUB-5503, Lawrence Berkeley National Laboratory, Berkeley, CA, 2005.Google ScholarGoogle Scholar
  16. R. Thakur, W. Gropp, and E. Lusk. A case for using MPI's derived datatypes to improve I/O performance. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing, pages 1--10. IEEE Computer Society, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Thakur, W. Gropp, and E. Lusk. Data Sieving and Collective I/O in ROMIO. Frontiers, page 182, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Thakur, W. Gropp, and E. Lusk. On Implementing MPI-IO Portably and with High Performance. In Proceedings of the Sixth Workshop on I/O in Parallel and Distributed Systems, pages 23--32. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Thakur, W. Gropp, and E. Lusk. Optimizing Noncontiguous Accesses in MPI-IO. Parallel Computing, 28(1):83--105, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Wang, J. Han, B. Tu, J. Dai, W. Zhou, and X. Song. Accelerating Spatial Data Processing with MapReduce. In 2010 IEEE 16th International Conference on Parallel and Distributed Systems, pages 229--236. IEEE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A parallel input-output system for resolving spatial data challenges: an agent-based model case study

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                HPDGIS '11: Proceedings of the ACM SIGSPATIAL Second International Workshop on High Performance and Distributed Geographic Information Systems
                November 2011
                53 pages
                ISBN:9781450310406
                DOI:10.1145/2070770

                Copyright © 2011 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 November 2011

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader