skip to main content
10.1145/2335755.2335805acmotherconferencesArticle/Chapter ViewAbstractPublication PagesxsedeConference Proceedingsconference-collections
research-article

The data supercell

Published:16 July 2012Publication History

ABSTRACT

The Data SuperCell (DSC) is a new, disk-based data archive deployed and in production at the Pittsburgh Supercomputing Center (PSC). It specifically deals with the archival demands of large data processing in an economic way. DSC incorporates PSCs SLASH2, layered filesystem technology, with commodity hardware and open software, to provide superior functionality, flexibility, manageability, reliability, performance and cost. Below, we describe DSC functionality goals; SLASH2 architecture, capabilities and suitability for archival applications; ZFS as an underlying file system; DSC architecture, structure and capabilities; followed by discussion of our experience with DSC, some performance measurements and plans for further development.

References

  1. SLASH2 - (https://quipu.psc.teragrid.org/slash2)Google ScholarGoogle Scholar
  2. ZFS - (http://en.wikipedia.org/wiki/ZFS)Google ScholarGoogle Scholar
  3. Nowoczynski, P.; Stone, N.; Yanovich, J.; Sommerfield, J. 2008. Zest - Checkpoint storage system for large supercomputers. Petascale Data Storage Workshop, 2008. PDSW '08. 3rd Digital Object Identifier: 10.1109/PDSW.2008.4811883 Publication Year: 2008, Page(s): 1--5Google ScholarGoogle ScholarCross RefCross Ref
  4. Sandia Portals (http://www.cs.sandia.gov/Portals/)Google ScholarGoogle Scholar
  5. File System in Userspace -- FUSE (http://fuse.sourceforge.net/)Google ScholarGoogle Scholar
  6. ZFS-FUSE (http://zfs-fuse.net/)Google ScholarGoogle Scholar
  7. GPFS/HPSS Interface -- GHI (www.hpss-collaboration.org/documents/HPSS-GPFS2009.pdf)Google ScholarGoogle Scholar
  8. Data Supercell (http://www.psc.edu/general/filesys/far/data.php)Google ScholarGoogle Scholar
  9. Simms, S. C., M. Davy, B. Hammond, M. Link, C. Stewart, R. Bramley, B. Plale, D. Gannon, M. - H. Baik, S. Teige, et al., All in a day's work: advancing data-intensive research with the data capacitor" Conference on High Performance Networking and Computing, Tampa, FL, ACM, pp. 244, 11/2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Data Capacitor (https://pti.iu.edu/dc)Google ScholarGoogle Scholar
  11. IOzone Filesystem Benchmark (http://www.iozone.org)Google ScholarGoogle Scholar
  12. Lustre-HSM (http://wiki.lustre.org/images/4/4d/Lustre_hsm_seminar_lug10.pdf)Google ScholarGoogle Scholar
  13. NWFS2 (http://www.pdsi-scidac.org/docs/sc06/pnnl_sc06_pdsi.pdf)Google ScholarGoogle Scholar
  14. Albedo (https://www.xsede.org/web/guest/psc-albedo)Google ScholarGoogle Scholar
  15. ExTENCI (http://www.ogf.org/OGF34/materials/2418/ExTENCI-GIN-OGF34.pdf)Google ScholarGoogle Scholar
  16. GLUSTER (http://www.gluster.org)Google ScholarGoogle Scholar
  17. GPFS (http://www-03.ibm.com/systems/software/gpfs/)Google ScholarGoogle Scholar
  18. TeraGrid Data Movement with GPFS-WAN and Parallel NFS. 2007. Supercomputing '07 Bandwidth Challenge.Google ScholarGoogle Scholar
  19. High Performance Storage System - HPSS (http://www.hpss-collaboration.org/)Google ScholarGoogle Scholar
  20. MooseFS (http://www.moosefs.org)Google ScholarGoogle Scholar
  21. The integrated Rule-Oriented Data System -- iRODS (http://www.irods.org)Google ScholarGoogle Scholar
  22. ZFS on Linux (http://zfsonlinux.org)Google ScholarGoogle Scholar

Index Terms

  1. The data supercell

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      XSEDE '12: Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
      July 2012
      423 pages
      ISBN:9781450316026
      DOI:10.1145/2335755

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 July 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate129of190submissions,68%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader