Abstract
The RSC PetaStream architecture is a massively parallel computer design based on Intel® Xeon® Phi manycore co-processors. Each RSC PetaStream module contains eight Intel Xeon Phi co-processors with PCI-express fabric and Infiniband interconnect for intermodule communication. This paper concentrates on the performance of a single RSC PetaStream module, evaluated with the help of low-level (point-to-point MPI), library (linear algebra, MAGMA) and application-level (classical molecular dynamics, GROMACS and LAMMPS codes) tests. The Intel Xeon E5-2690 top bin CPU dual-socket system has been used for comparison. This early evaluation demonstrates that in general each Xeon Phi co-processor of RSC PetaStream delivers approximately the same performance as dual-socket Intel Xeon E5 system, with only a half energy-to-solution. Fine-grain parallelism of Intel Xeon Phi cores takes advantage of higher messages exchange rates on MPI level for communication of threads placed on different Xeon Phi chips.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
TOP500 Supercomputer Site, http://www.top500.org
Kogge, P., Bergman, K., Borkar, S., et al.: ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems. Technical report. Gov. Procure. TR-2008-13, 278 (2008)
Kogge, P.: The Challenges of Petascale Architectures. Comput. Sci. Eng. 11, 10–16 (2009)
OSU MPI benchmarks, http://mvapich.cse.ohio-state.edu/benchmarks
Agullo, E., Demmel, J., Dongarra, J., et al.: Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. J. Phys. Conf. Ser. 180, 012037 (2009)
Dongarra, J., Dong, T., Gates, M., et al.: MAGMA : Matrix Algebra on GPU and Multicore Architectures. In: SC12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, Salt Lake City (2012)
Dongarra, J., Gates, M., Jia, Y., Kabir, K., Luszczek, P., Tomov, S.: MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors, http://icl.cs.utk.edu/projectsfiles/magma/pubs/24-MAGMA_MIC_03.pdf
Plimpton, S.: Fast Parallel Algorithms for Short-Range Molecular Dynamics. J. Comput. Phys. 117, 1–19 (1995)
Hess, B., Kutzner, C., van der Spoel, D., Lindahl, E.: GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput. 4, 435–447 (2008)
Kerbyson, D.J., Barker, K.J., Vishnu, A., Hoisie, A.: A performance comparison of current HPC systems: Blue Gene/Q, Cray XE6 and InfiniBand systems. Futur. Gener. Comput. Syst. 30, 291–304 (2014)
Kandalla, K., Venkatesh, A., Hamidouche, K., et al.: Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters. In: 2013 IEEE 21st Annual Symposium on High-Performance Interconnects, San Jose, CA, USA, pp. 63–70 (2013)
Yamazaki, I., Tomov, S., Dongarra, J.: One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators. Procedia Comput. Sci. 9, 37–46 (2012)
Petitet, A., Whaley, R.C., Dongarra, J., Cleary, A.: HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers (2008), http://www.netlib.org/benchmark/hpl
Dongarra, J.: Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report). Technical report (2013)
You, H., Lu, C.-D., Zhao, Z., Xing, F.: Optimizing utilization across XSEDE platforms. In: Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery, XSEDE 2013, p. 1. ACM Press, New York (2013)
Loeffler, H., Winna, M.: Large biomolecular simulation on HPC platforms III. AMBER, CHARMM, GROMACS, LAMMPS and NAMD, Warrington, UK (2012)
LAMMPS Benchmarks, http://lammps.sandia.gov/bench.html
Nvidia Corporation: GROMACS 4.6 Pre-Beta Benchmark Report, Revision 1.0 (September 10, 2012), http://www.nvidia.com/docs/IO/122634/gromacs-benchmark.pdf
Eicker, N., Lippert, T., Moschny, T., Suarez, E.: The DEEP project: Pursuing cluster-computing in the many-core era. In: Proc. of the 42nd International Conference on Parallel Processing Workshops (ICPPW) 2013, Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications (HUCAA), Lyon, France, pp. 885–892 (2013)
Green500 list (November 2013), http://green500.org/lists/green201311
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Semin, A., Druzhinin, E., Mironov, V., Shmelev, A., Moskovsky, A. (2014). The Performance Characterization of the RSC PetaStream Module. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2014. Lecture Notes in Computer Science, vol 8488. Springer, Cham. https://doi.org/10.1007/978-3-319-07518-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-07518-1_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07517-4
Online ISBN: 978-3-319-07518-1
eBook Packages: Computer ScienceComputer Science (R0)