ABSTRACT
NSF-funded computing centers have primarily focused on delivering high-performance computing resources to academic researchers with the most computationally demanding applications. But now that computational science is so pervasive, there is a need for infrastructure that can serve more researchers and disciplines than just those at the peak of the HPC pyramid. Here we describe SDSC's Comet system, which is scheduled for production in January 2015 and was designed to address the needs of a much larger and more expansive science community-- the "long tail of science". Comet will have a peak performance of 2 petaflop/s, mostly delivered using Intel's next generation Xeon processor. It will include some large-memory and GPU-accelerated nodes, node-local flash memory, 7 PB of Performance Storage, and 6 PB of Durable Storage. These features, together with the availability of high performance virtualization, will enable users to run complex, heterogeneous workloads on a single integrated resource.
- NSB 93-205 -- NSF Blue Ribbon Panel on High Performance Computing. NSF, Arlington, VA, 1993.Google Scholar
- NSF Advisory Committee for Cyberinfrastructure Task Force on Campus Bridging. Final Report. NSF, Arlington, VA, March 2011.Google Scholar
- Cyberinfrastructure Framework for 21st Century Science and Engineering: Vision. NSF, Arlington, VA, May 2012.Google Scholar
- Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21), http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504730Google Scholar
- ACCI Task Force reports, available at http://www.nsf.gov/od/oci/taskforces/, NSF, Arlington, VA.Google Scholar
- Stewart, C. A., Katz, D. S., Hart, D. L., Lantrip, D., McCaulay, D. S. and Moore, R. L. Survey of Cyberinfrastructure Needs and Interests of NSF-funded Principal Investigators. Indiana University, Bloomington, IN, January 2011.Google Scholar
- Katz, D. S., Keahey, K. and Jul, S. TeraGrid eXtreme Digital 'Wide Users' Requirements Elicitation Meeting, Computation Institute Technical Report CI-TR-10-0811. University of Chicago and Argonne National Laboratory, 2011.Google Scholar
- XDMoD - XSEDE Metrics on Demand, NSF award OCI-1025159.Google Scholar
- Moore, R. L., Jundt, A., Carson, L. K., Yoshimoto, K., Ghadersohi, A. and Young, W. S. Analyzing throughput and utilization on Trestles, Proceedings of XSEDE12. ACM, (Chicago, IL, 2012). Google ScholarDigital Library
- Furlani, T. R., Schneider, B. I., Jones, M. D., Towns, J., Hart, D. L., Patra, A. K., DeLeon, R. L., Gallo, S. M., Lu, C.-D. and Ghadersohi, A. Data analytics driven cyberinfrastructure operations, planning and analysis using XDMoD, SC12 Conference, Salt Lake City, UT, 2012.Google Scholar
- Hart, D. Deep and wide metrics for HPC resource capability and project usage, Supercomputing '11, November 2011, Seattle, WA, USA. ACM. Google ScholarDigital Library
- Schneider, B. A Data History of TeraGrid/XSEDE Usage: Defining a Strategy for Advanced CyberInfrastructure (ACI), April 2012.Google Scholar
- FutureGrid, https://portal.futuregrid.org/.Google Scholar
- SLURM: Simple linux utility for resource management.Google Scholar
- Yoo, A. B., Jette, M. A. and Grondona, M. SLURM: Simple linux utility for resource management. Springer, City, 2003.Google ScholarCross Ref
- Boyd, E., Newman, H., McKee, S. and Sheldon, P. MRI-R2 Consortium: Development of Dynamic Network System (DYNES), NSF ACI award 0958998, 2010.Google Scholar
- Cortese, J. New Dynamic Circuit Provisioning Available on Pacific Wave, http://pacificwave.net/p=433/, November 26, 2012.Google Scholar
- 2012 Annual HPCwire Readers' Choice Awards, November 2012,http://www.hpcwire.com/specialfeatures/2012_Annual_HPCwire_Readers_Choice_Awards.html.Google Scholar
- Jorissen, K., Vila, F. D. and Rehr, J. J. A high performance scientific cloud computing environment for materials simulations. Computer Physics Communications, 183, (9) 2012, 1911--1919.Google ScholarCross Ref
- Rehr, J. SI2-SSE: Cloud-Computing-Clusters for Scientific Research, NSF ACI award 1048052. NSF, 2010.Google Scholar
- Rehr, J. J., Vila, F. D., Gardner, J. P., Svec, L. and Prange, M. Scientific computing in the cloud. Computing in Science & Engineering, 12, (3) 2010, 34--43. Google ScholarDigital Library
- Yelick, K., Coghlan, S., Draney, B. and Cannon, R. S. The Magellan Report on Cloud Computing for Science, U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research. December 2011.Google Scholar
- Jackson, K. R., Ramakrishnan, L., Muriki, K., Canon, S., Cholia, S., Shalf, J., Wasserman, H. J. and Wright, N. J. Performance analysis of high performance computing applications on the amazon web services cloud. IEEE 2nd International Conference on Cloud Computing Technology and Science (CloudCom), 2010. Google ScholarDigital Library
- Mehrotra, P., Djomehri, J., Heistand, S., Hood, R., Jin, H., Lazanoff, A., Saini, S. and Biswas, R. Performance evaluation of Amazon EC2 for NASA HPC applications. Proceedings of the 3rd workshop on Scientific Cloud Computing, ACM, (Delft, The Netherlands, June 2012). Google ScholarDigital Library
- Overview of Single Root I/O Virtualization (SR-IOV), http://msdn.microsoft.com/enus/library/windows/hardware/hh440148%28v=vs.85%29.aspx.Google Scholar
- Lockwood, G. K., Tatineni, M. and Wagner, R. P. SR-IOV: Performance Benefits for Virtualized Interconnects, Proceedings of XSEDE14, ACM (Atlanta, GA, July 2014). Google ScholarDigital Library
- Neuroscience Gateway Portal, http://www.nsgportal.org.Google Scholar
- Wagner, R., Tatineni, M., Hocks, E., Yoshimoto, K., Sakai, S., Norman, M. L., Bockelman, B., Sfiligoi, I., Tadel, M. and Letts, J. Using Gordon to accelerate LHC science. Proceedings of XSEDE13, ACM, (San Diego, CA, July, 2013). Google ScholarDigital Library
- Gaussian, http://www.gaussian.comGoogle Scholar
- Kong, J., White, C. A., Krylov, A. I., Sherrill, D., Adamson, R. D., Furlani, T. R., Lee, M. S., Lee, A. M., Gwaltney, S. R. and Adams, T. R. Q-Chem 2.0: a high-performance ab initio electronic structure program package. Journal of Computational Chemistry, 21, (16) 2000, 1532--1548.Google Scholar
- Hibbitt, Karlsson and Sorensen ABAQUS/Standard user's manual. Hibbitt, Karlsson & Sorensen, 2001.Google Scholar
- Clean Energy Project, https://cleanenergy.harvard.edu.Google Scholar
- Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q. and Liu, Y. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience, 1, (1) 2012, 18.Google ScholarCross Ref
- Zerbino, D. R. and Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome research, 18, (5) 2008, 821--829.Google Scholar
- Götz, A. W., Williamson, M. J., Xu, D., Poole, D., Le Grand, S. and Walker, R. C. Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. Generalized Born. Journal of chemical theory and computation, 8, (5) 2012, 1542--1555.Google Scholar
- Salomon-Ferrer, R., Götz, A. W., Poole, D., Le Grand, S. and Walker, R. C. Routine microsecond molecular dynamics simulations with Amber on GPUs. 2. Explicit solvent particle mesh Ewald. Journal of Chemical Theory and Computation, 9, (9) 2013, 3878--3888.Google ScholarCross Ref
- Brooks, B. R., Brooks, C. L., MacKerell, A. D., Nilsson, L., Petrella, R. J., Roux, B., Won, Y., Archontis, G., Bartels, C. and Boresch, S. CHARMM: the biomolecular simulation program. Journal of computational chemistry, 30, (10) 2009, 1545--1614.Google Scholar
- Hess, B., Kutzner, C., Van Der Spoel, D. and Lindahl, E. GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. Journal of chemical theory and computation, 4, (3) 2008, 435--447.Google Scholar
- Phillips, J. C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R. D., Kale, L. and Schulten, K. Scalable molecular dynamics with NAMD. Journal of computational chemistry, 26, (16) 2005, 1781--1802.Google Scholar
- AMBER (PMEMD) Benchmarks, http://ambermd.org/gpus/benchmarks.htm.Google Scholar
- CIPRES, http://www.phylo.org.Google Scholar
- Drummond, A. J. and Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC evolutionary biology, 7, (1) 2007, 214.Google Scholar
- Huelsenbeck, J. P. and Ronquist, F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics, 17, (8) 2001, 754--755.Google ScholarCross Ref
- Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics, 22, (21) 2006, 2688--2690. Google ScholarDigital Library
- CyberGIS, http://cybergis.cigi.uiuc.edu.Google Scholar
- Behzad, B., Liu, Y., Shook, E., Finn, M. P., Mattli, D. M. and Wang, S. A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data, In Auto-Carto 2012, a cartography and geographic information society research symposium, Columbus, OH, 2012.Google Scholar
- Tarboton, D. G. Terrain analysis using digital elevation models (TauDEM). Utah State University, Logan2005).Google Scholar
- Computational Chemistry Grid, http://www.gridchem.org.Google Scholar
- Bower, J. M. and Beeman, D. The book of GENESIS: exploring realistic neural models with the GEneral NEural SImulation System. The Electronic Library of Science, 1995. Google ScholarDigital Library
- Carnevale, N. T. and Hines, M. L. The NEURON book. Cambridge University Press, 2006. Google ScholarDigital Library
- UltraScan Analysis Software, http://ultrascan.uthscsa.edu.Google Scholar
- Wilkins-Diehr, N., Gannon, D., Klimeck, G., Oster, S. and Pamidighantam, S. TeraGrid science gateways and their impact on science. Computer, 41, (11) 2008, 32--41. Google ScholarDigital Library
- Science Gateway Institute, http://sciencegateways.org.Google Scholar
- Miller, M. A., Pfeiffer, W. and Schwartz, T. The CIPRES science gateway: enabling high-impact science for phylogenetics researchers with limited resources, Proceedings of XSEDE12, ACM, (Chicago, IL, 2012). Google ScholarDigital Library
- Moore, R. L., Hart, D. L., Pfeiffer, W., Tatineni, M., Yoshimoto, K. and Young, W. S. Trestles: a high-productivity HPC system targeted to modest-scale and gateway users. Proceedings of TeraGrid 11, ACM, (Salt Lake City, UT, 2011). Google ScholarDigital Library
- Yoshimoto, K., Choi, D., Moore, R., Majumdar, A. and Hocks, E. Implementations of Urgent Computing on Production HPC Systems. Procedia Computer Science, (9) 2012, 1687--1693.Google ScholarCross Ref
Index Terms
- Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science
Recommendations
Comet: Tales from the Long Tail: Two Years In and 10,000 Users Later
PEARC '17: Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and ImpactThe Comet petascale supercomputer was put into production as an XSEDE resource in early 2015 with the goal of serving a much larger user community than HPC systems of similar size. The Comet project set an audacious goal of reaching over 10,000 users in ...
New capabilities in qoscosgrid middleware for advanced job management, advance reservation and co-allocation of computing resources --- quantum chemistry application use case
Building a National Distributed e-Infrastructure - PL-GridIn this chapter we present the new capabilities of QosCosGrid (QCG) middleware for advanced job and resource management in the grid environment. By connecting many computing clusters together, QosCosGrid offers easy-to-use mapping, execution and ...
An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs
ICS '12: Proceedings of the 26th ACM international conference on SupercomputingIn heterogeneous systems that include CPUs and GPUs, the data transfers between these components play a critical role in determining the performance of applications. Software pipelining is a common approach to mitigate the overheads of those transfers. ...
Comments