skip to main content
10.1145/3219104.3219125acmotherconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article

VC3: A Virtual Cluster Service for Community Computation

Published:22 July 2018Publication History

ABSTRACT

A traditional HPC computing facility provides a large amount of computing power but has a fixed environment designed to satisfy local needs. This makes it very challenging for users to deploy complex applications that span multiple sites and require specific application software, scheduling middleware, or sharing policies. The DOE-funded VC3 project aims to address these challenges by making it possible for researchers to easily aggregate and share resources, install custom software environments, and deploy clustering frameworks across multiple HPC facilities through the concept of "virtual clusters". This paper presents the design, implementation, and initial experience with our prototype self-service VC3 platform which automates deployment of cluster frameworks across diverse computing facilities. To create a virtual cluster, the VC3 platform materializes a custom head node in a secure private cloud, specifies a choice of scheduling middleware, then allocates resources from the remote facilities where the desired software and clustering framework is installed in user space. As resources become available from scheduled nodes from individual clusters, the research team simply sees a private cluster they can access directly or share with collaborators, such as a science gateway community. We discuss how this service can be used by research collaborations requiring shared resources, specific middleware frameworks, and complex applications and workflows in the areas of astrophysics, bioinformatics and high energy physics.

References

  1. {n. d.}. Docker Website. http://www.docker.com. ({n. d.}). Accessed: 2018-03-26.Google ScholarGoogle Scholar
  2. {n. d.}. OSG Application and Software Installation Service (OASIS). https://opensciencegrid.github.io/docs/worker-node/install-wn-oasis. ({n. d.}). Accessed: 2018-03-26.Google ScholarGoogle Scholar
  3. 2018. CI Connect Website. (2018). Retrieved March 21, 2018 from http://www.ci-connect.netGoogle ScholarGoogle Scholar
  4. 2018. Google HTCondor Website. (2018). Retrieved March 21, 2018 from https://cloud.google.com/solutions/high-throughput-computing-htcondorGoogle ScholarGoogle Scholar
  5. 2018. Google Large Scale Technical Computing Website. (2018). Retrieved March 21, 2018 from https://cloud.google.com/solutions/using-clusters-for-large-scale-technical-computingGoogle ScholarGoogle Scholar
  6. 2018. NERSC Cori Website. (2018). Retrieved 2018/03/21 from http://www.nersc.gov/users/computational-systems/coriGoogle ScholarGoogle Scholar
  7. 2018. NERSC Website. (2018). Retrieved 2018/03/21 from https://www.nersc.govGoogle ScholarGoogle Scholar
  8. 2018. OSG Connect Website. (2018). Retrieved March 21, 2018 from http://osgconnect.netGoogle ScholarGoogle Scholar
  9. 2018. UChicago RCC Main Website. (2018). Retrieved March 15, 2018 from https://rcc.uchicago.edu/Google ScholarGoogle Scholar
  10. Michael Albrecht, Patrick Donnelly, Peter Bui, and Douglas Thain. 2012. Make-flow: A Portable Abstraction for Data Intensive Computing on Clusters, Clouds, and Grids. In Workshop on Scalable Workflow Enactment Engines and Technologies (SWEET) at ACM SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Michael Albrecht, Dinesh Rajan, and Douglas Thain. 2013. Making Work Queue Cluster-Friendly for Data Intensive Scientific Applications. In IEEE International Conference on Cluster Computing.Google ScholarGoogle Scholar
  12. Rachana Ananthakrishnan, Kyle Chard, Ian Foster, and Steven Tuecke. 2015. Globus Platform-as-a-Service for Collaborative Science Applications. Concurrency - Practice and Experience 27, 1 (2015), 290--305.Google ScholarGoogle ScholarCross RefCross Ref
  13. E. Aprile et al. 2011. Design and performance of the XENON10 dark matter experiment. Astroparticle Physics 34 (April 2011), 679--698. arXiv:astro-ph.IM/1001.2834Google ScholarGoogle Scholar
  14. Cantarel B., Korf I., Robb SMC., Parra G., Ross E., Moore B., Holt C., Sanchez Alvarado A., and Yandell M. 2008. MAKER: An Easy-to-use Annotation Pipeline Designed for Emerging Model Organism Genomes. Genome Research 18, 1 (2008).Google ScholarGoogle Scholar
  15. B. A. Benson et al. 2014. SPT-3G: a next-generation cosmic microwave background polarization experiment on the South Pole telescope. In Millimeter, Submillimeter, and Far-Infrared Detectors and Instrumentation for Astronomy VII (Proc. SPIE), Vol. 9153. Article 91531P, 91531P pages. arXiv:astro-ph.IM/1407.2973Google ScholarGoogle Scholar
  16. Ian Bird. 2011. Computing for the Large Hadron Collider. Annual Review of Nuclear and Particle Science 61, 1 (2011), 99--118.Google ScholarGoogle ScholarCross RefCross Ref
  17. Peter Bui, Dinesh Rajan, Badi Abdul-Wahid, Jesus Izaguirre, and Douglas Thain. 2011. Work Queue + Python: A Framework For Scalable Scientific Ensemble Applications. In Workshop on Python for High Performance and Scientific Computing (PyHPC) at the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (Supercomputing).Google ScholarGoogle Scholar
  18. P Buncic, C Aguado Sanchez, J Blomer, L Franco, A Harutyunian, P Mato, and Y Yao. 2010. CernVM -- a virtual software appliance for LHC applications. Journal of Physics: Conference Series 219, 4 (2010), 042003.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. Caballero et al. 2012. AutoPyFactory: A Scalable Flexible Pilot Factory Implementation. Journal of Physics: Conference Series 396 (2012), 032016.Google ScholarGoogle ScholarCross RefCross Ref
  20. Scott Callaghan, Gideon Juve, Karan Vahi, Philip J. Maechling, Thomas H.Jordan, and Ewa Deelman. 2017. rvGAHP: Push-based Job Submission Using Reverse SSH Connections. In Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science (WORKS '17). ACM, New York, NY, USA, Article 3, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. E. Carlstrom et al. 2011. The 10 Meter South Pole Telescope. Proc. PASP 123 (May 2011), 568. arXiv:astro-ph.IM/0907.4445Google ScholarGoogle ScholarCross RefCross Ref
  22. Ewa Deelman et al. 2015. Pegasus: a Workflow Management System for Science Automation. Future Generation Computer Systems 46 (2015), 17--35. Funding Acknowledgements: NSF ACI SDCI 0722019, NSF ACI SI2-SSI 1148515 and NSF OCI-1053575. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. The CMS Collaboration (Adolphi R et al.). 2008. The CMS experiment at the CERN LHC. Journal of Instrumentation 3 (2008).Google ScholarGoogle Scholar
  24. Lisa Gerhardt, Wahid Bhimji, Shane Canon, Markus Fasel, Doug Jacobsen, Mustafa Mustafa, Jeff Porter, and Vakho Tsulaia. 2017. Shifter: Containers for HPC. Journal of Physics: Conference Series 898, 8 (2017), 082021. http://stacks.iop.org/1742-6596/898/i=8/a=082021Google ScholarGoogle ScholarCross RefCross Ref
  25. Gregory M. Kurtzer, Vanessa Sochat, and Michael W. Bauer. 2017. Singularity: Scientific containers for mobility of compute. PLOS ONE 12, 5 (05 2017), 1--20.Google ScholarGoogle Scholar
  26. Michael Litzkow, Miron Livny, and Matthew Mutka. 1988. Condor - A Hunter of Idle Workstations. In Proceedings of the 8th International Conference of Distributed Computing Systems.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ruth Pordes et al. 2007. The open science grid. Journal of Physics: Conference Series 78, 1 (2007), 012057. http://stacks.iop.org/1742-6596/78/i=1/a=012057Google ScholarGoogle ScholarCross RefCross Ref
  28. Reid Priedhorsky and Tim Randies. 2017. Charliecloud: Unprivileged Containers for User-defined Software Stacks in HPC. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SO '17). ACM, New York, NY, USA, Article 36, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. I. Sfiligoi, D. C. Bradley, B. Holzman, P. Mhashilkar, S. Padhi, and F. Wurthwein. 2009. The Pilot Way to Grid Resources Using glideinWMS. In 2009 WRI World Congress on Computer Science and Information Engineering, Vol. 2. 428--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Douglas Thain and Miron Livny. 2005. Parrot: An Application Environment for Data-Intensive Computing. Scalable Computing: Practice and Experience 6, 3 (2005), 9--18.Google ScholarGoogle Scholar
  31. Douglas Thain, Todd Tannenbaum, and Miron Livny. 2003. Condor and the Grid. In Grid Computing: Making the Global Infrastructure a Reality, Fran Berman, Anthony Hey, and Geoffrey Fox (Eds.). John Wiley.Google ScholarGoogle Scholar
  32. Benjamin Tovar, Nicholas Hazekamp, Nathaniel Kremer-Herman, and Douglas Thain. 2018. Automatic Dependency Management for Scientific Applications on Clusters. In IEEE International Conference on Cloud Engineering (IC2E).Google ScholarGoogle Scholar
  33. J. Towns et al. 2014. XSEDE: Accelerating Scientific Discovery. Comp. in Sci. and Engr. 16, 5 (Sept 2014), 62--74.Google ScholarGoogle ScholarCross RefCross Ref
  34. S. Tuecke, R. Ananthakrishnan, K. Chard, M. Lidman, B. McCollam, S. Rosen, and I. Foster. 2016. Globus auth: A research identity and access management platform. In 2016 IEEE 12th International Conference on e-Science (e-Science). 203--212.Google ScholarGoogle Scholar
  35. Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10--10 (2010), 95. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. VC3: A Virtual Cluster Service for Community Computation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      PEARC '18: Proceedings of the Practice and Experience on Advanced Research Computing
      July 2018
      652 pages
      ISBN:9781450364461
      DOI:10.1145/3219104

      Copyright © 2018 ACM

      © 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 July 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      PEARC '18 Paper Acceptance Rate79of123submissions,64%Overall Acceptance Rate133of202submissions,66%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader