ABSTRACT
A traditional HPC computing facility provides a large amount of computing power but has a fixed environment designed to satisfy local needs. This makes it very challenging for users to deploy complex applications that span multiple sites and require specific application software, scheduling middleware, or sharing policies. The DOE-funded VC3 project aims to address these challenges by making it possible for researchers to easily aggregate and share resources, install custom software environments, and deploy clustering frameworks across multiple HPC facilities through the concept of "virtual clusters". This paper presents the design, implementation, and initial experience with our prototype self-service VC3 platform which automates deployment of cluster frameworks across diverse computing facilities. To create a virtual cluster, the VC3 platform materializes a custom head node in a secure private cloud, specifies a choice of scheduling middleware, then allocates resources from the remote facilities where the desired software and clustering framework is installed in user space. As resources become available from scheduled nodes from individual clusters, the research team simply sees a private cluster they can access directly or share with collaborators, such as a science gateway community. We discuss how this service can be used by research collaborations requiring shared resources, specific middleware frameworks, and complex applications and workflows in the areas of astrophysics, bioinformatics and high energy physics.
- {n. d.}. Docker Website. http://www.docker.com. ({n. d.}). Accessed: 2018-03-26.Google Scholar
- {n. d.}. OSG Application and Software Installation Service (OASIS). https://opensciencegrid.github.io/docs/worker-node/install-wn-oasis. ({n. d.}). Accessed: 2018-03-26.Google Scholar
- 2018. CI Connect Website. (2018). Retrieved March 21, 2018 from http://www.ci-connect.netGoogle Scholar
- 2018. Google HTCondor Website. (2018). Retrieved March 21, 2018 from https://cloud.google.com/solutions/high-throughput-computing-htcondorGoogle Scholar
- 2018. Google Large Scale Technical Computing Website. (2018). Retrieved March 21, 2018 from https://cloud.google.com/solutions/using-clusters-for-large-scale-technical-computingGoogle Scholar
- 2018. NERSC Cori Website. (2018). Retrieved 2018/03/21 from http://www.nersc.gov/users/computational-systems/coriGoogle Scholar
- 2018. NERSC Website. (2018). Retrieved 2018/03/21 from https://www.nersc.govGoogle Scholar
- 2018. OSG Connect Website. (2018). Retrieved March 21, 2018 from http://osgconnect.netGoogle Scholar
- 2018. UChicago RCC Main Website. (2018). Retrieved March 15, 2018 from https://rcc.uchicago.edu/Google Scholar
- Michael Albrecht, Patrick Donnelly, Peter Bui, and Douglas Thain. 2012. Make-flow: A Portable Abstraction for Data Intensive Computing on Clusters, Clouds, and Grids. In Workshop on Scalable Workflow Enactment Engines and Technologies (SWEET) at ACM SIGMOD. Google ScholarDigital Library
- Michael Albrecht, Dinesh Rajan, and Douglas Thain. 2013. Making Work Queue Cluster-Friendly for Data Intensive Scientific Applications. In IEEE International Conference on Cluster Computing.Google Scholar
- Rachana Ananthakrishnan, Kyle Chard, Ian Foster, and Steven Tuecke. 2015. Globus Platform-as-a-Service for Collaborative Science Applications. Concurrency - Practice and Experience 27, 1 (2015), 290--305.Google ScholarCross Ref
- E. Aprile et al. 2011. Design and performance of the XENON10 dark matter experiment. Astroparticle Physics 34 (April 2011), 679--698. arXiv:astro-ph.IM/1001.2834Google Scholar
- Cantarel B., Korf I., Robb SMC., Parra G., Ross E., Moore B., Holt C., Sanchez Alvarado A., and Yandell M. 2008. MAKER: An Easy-to-use Annotation Pipeline Designed for Emerging Model Organism Genomes. Genome Research 18, 1 (2008).Google Scholar
- B. A. Benson et al. 2014. SPT-3G: a next-generation cosmic microwave background polarization experiment on the South Pole telescope. In Millimeter, Submillimeter, and Far-Infrared Detectors and Instrumentation for Astronomy VII (Proc. SPIE), Vol. 9153. Article 91531P, 91531P pages. arXiv:astro-ph.IM/1407.2973Google Scholar
- Ian Bird. 2011. Computing for the Large Hadron Collider. Annual Review of Nuclear and Particle Science 61, 1 (2011), 99--118.Google ScholarCross Ref
- Peter Bui, Dinesh Rajan, Badi Abdul-Wahid, Jesus Izaguirre, and Douglas Thain. 2011. Work Queue + Python: A Framework For Scalable Scientific Ensemble Applications. In Workshop on Python for High Performance and Scientific Computing (PyHPC) at the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (Supercomputing).Google Scholar
- P Buncic, C Aguado Sanchez, J Blomer, L Franco, A Harutyunian, P Mato, and Y Yao. 2010. CernVM -- a virtual software appliance for LHC applications. Journal of Physics: Conference Series 219, 4 (2010), 042003.Google ScholarCross Ref
- J. Caballero et al. 2012. AutoPyFactory: A Scalable Flexible Pilot Factory Implementation. Journal of Physics: Conference Series 396 (2012), 032016.Google ScholarCross Ref
- Scott Callaghan, Gideon Juve, Karan Vahi, Philip J. Maechling, Thomas H.Jordan, and Ewa Deelman. 2017. rvGAHP: Push-based Job Submission Using Reverse SSH Connections. In Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science (WORKS '17). ACM, New York, NY, USA, Article 3, 8 pages. Google ScholarDigital Library
- J. E. Carlstrom et al. 2011. The 10 Meter South Pole Telescope. Proc. PASP 123 (May 2011), 568. arXiv:astro-ph.IM/0907.4445Google ScholarCross Ref
- Ewa Deelman et al. 2015. Pegasus: a Workflow Management System for Science Automation. Future Generation Computer Systems 46 (2015), 17--35. Funding Acknowledgements: NSF ACI SDCI 0722019, NSF ACI SI2-SSI 1148515 and NSF OCI-1053575. Google ScholarDigital Library
- The CMS Collaboration (Adolphi R et al.). 2008. The CMS experiment at the CERN LHC. Journal of Instrumentation 3 (2008).Google Scholar
- Lisa Gerhardt, Wahid Bhimji, Shane Canon, Markus Fasel, Doug Jacobsen, Mustafa Mustafa, Jeff Porter, and Vakho Tsulaia. 2017. Shifter: Containers for HPC. Journal of Physics: Conference Series 898, 8 (2017), 082021. http://stacks.iop.org/1742-6596/898/i=8/a=082021Google ScholarCross Ref
- Gregory M. Kurtzer, Vanessa Sochat, and Michael W. Bauer. 2017. Singularity: Scientific containers for mobility of compute. PLOS ONE 12, 5 (05 2017), 1--20.Google Scholar
- Michael Litzkow, Miron Livny, and Matthew Mutka. 1988. Condor - A Hunter of Idle Workstations. In Proceedings of the 8th International Conference of Distributed Computing Systems.Google ScholarCross Ref
- Ruth Pordes et al. 2007. The open science grid. Journal of Physics: Conference Series 78, 1 (2007), 012057. http://stacks.iop.org/1742-6596/78/i=1/a=012057Google ScholarCross Ref
- Reid Priedhorsky and Tim Randies. 2017. Charliecloud: Unprivileged Containers for User-defined Software Stacks in HPC. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SO '17). ACM, New York, NY, USA, Article 36, 10 pages. Google ScholarDigital Library
- I. Sfiligoi, D. C. Bradley, B. Holzman, P. Mhashilkar, S. Padhi, and F. Wurthwein. 2009. The Pilot Way to Grid Resources Using glideinWMS. In 2009 WRI World Congress on Computer Science and Information Engineering, Vol. 2. 428--432. Google ScholarDigital Library
- Douglas Thain and Miron Livny. 2005. Parrot: An Application Environment for Data-Intensive Computing. Scalable Computing: Practice and Experience 6, 3 (2005), 9--18.Google Scholar
- Douglas Thain, Todd Tannenbaum, and Miron Livny. 2003. Condor and the Grid. In Grid Computing: Making the Global Infrastructure a Reality, Fran Berman, Anthony Hey, and Geoffrey Fox (Eds.). John Wiley.Google Scholar
- Benjamin Tovar, Nicholas Hazekamp, Nathaniel Kremer-Herman, and Douglas Thain. 2018. Automatic Dependency Management for Scientific Applications on Clusters. In IEEE International Conference on Cloud Engineering (IC2E).Google Scholar
- J. Towns et al. 2014. XSEDE: Accelerating Scientific Discovery. Comp. in Sci. and Engr. 16, 5 (Sept 2014), 62--74.Google ScholarCross Ref
- S. Tuecke, R. Ananthakrishnan, K. Chard, M. Lidman, B. McCollam, S. Rosen, and I. Foster. 2016. Globus auth: A research identity and access management platform. In 2016 IEEE 12th International Conference on e-Science (e-Science). 203--212.Google Scholar
- Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10--10 (2010), 95. Google ScholarDigital Library
Index Terms
- VC3: A Virtual Cluster Service for Community Computation
Recommendations
Distributed Data and Job Management for the XENON1T Experiment
PEARC '18: Proceedings of the Practice and Experience on Advanced Research ComputingWe present the distributed data and job management scheme on the Open Science Grid (OSG) and European Grid Infrastructure (EGI) that was developed for the XENON1T experiment. The experiment aims to detect dark matter using the first ton-scale detector ...
Homogenizing OSG and XSEDE: Providing Access to XSEDE Allocations through OSG Infrastructure
PEARC '18: Proceedings of the Practice and Experience on Advanced Research ComputingWe present a system that allows individual researchers and virtual organizations (VOs) to access allocations on Stampede2 and Bridges through the Open Science Grid (OSG), a national grid infrastructure for running high throughput computing (HTC) tasks. ...
Enhancing the Grid with Cloud Computing
Scientific computing has evolved considerably in recent years. Scientific applications have become more complex and require an increasing number of computing resources to perform on a large scale. Grid computing has become widely used and is the chosen ...
Comments