The Performance Characterization of the RSC PetaStream Module

Semin, Andrey; Druzhinin, Egor; Mironov, Vladimir; Shmelev, Alexey; Moskovsky, Alexander

doi:10.1007/978-3-319-07518-1_27

The Performance Characterization of the RSC PetaStream Module

Andrey Semin¹⁸,
Egor Druzhinin¹⁹,
Vladimir Mironov¹⁹,
Alexey Shmelev¹⁹ &
…
Alexander Moskovsky¹⁹

Conference paper

2636 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8488))

Abstract

The RSC PetaStream architecture is a massively parallel computer design based on Intel® Xeon® Phi manycore co-processors. Each RSC PetaStream module contains eight Intel Xeon Phi co-processors with PCI-express fabric and Infiniband interconnect for intermodule communication. This paper concentrates on the performance of a single RSC PetaStream module, evaluated with the help of low-level (point-to-point MPI), library (linear algebra, MAGMA) and application-level (classical molecular dynamics, GROMACS and LAMMPS codes) tests. The Intel Xeon E5-2690 top bin CPU dual-socket system has been used for comparison. This early evaluation demonstrates that in general each Xeon Phi co-processor of RSC PetaStream delivers approximately the same performance as dual-socket Intel Xeon E5 system, with only a half energy-to-solution. Fine-grain parallelism of Intel Xeon Phi cores takes advantage of higher messages exchange rates on MPI level for communication of threads placed on different Xeon Phi chips.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

TOP500 Supercomputer Site, http://www.top500.org
Kogge, P., Bergman, K., Borkar, S., et al.: ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems. Technical report. Gov. Procure. TR-2008-13, 278 (2008)
Google Scholar
Kogge, P.: The Challenges of Petascale Architectures. Comput. Sci. Eng. 11, 10–16 (2009)
Article Google Scholar
OSU MPI benchmarks, http://mvapich.cse.ohio-state.edu/benchmarks
Agullo, E., Demmel, J., Dongarra, J., et al.: Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. J. Phys. Conf. Ser. 180, 012037 (2009)
Google Scholar
Dongarra, J., Dong, T., Gates, M., et al.: MAGMA : Matrix Algebra on GPU and Multicore Architectures. In: SC12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, Salt Lake City (2012)
Google Scholar
Dongarra, J., Gates, M., Jia, Y., Kabir, K., Luszczek, P., Tomov, S.: MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors, http://icl.cs.utk.edu/projectsfiles/magma/pubs/24-MAGMA_MIC_03.pdf
Plimpton, S.: Fast Parallel Algorithms for Short-Range Molecular Dynamics. J. Comput. Phys. 117, 1–19 (1995)
Article Google Scholar
Hess, B., Kutzner, C., van der Spoel, D., Lindahl, E.: GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput. 4, 435–447 (2008)
Article Google Scholar
Kerbyson, D.J., Barker, K.J., Vishnu, A., Hoisie, A.: A performance comparison of current HPC systems: Blue Gene/Q, Cray XE6 and InfiniBand systems. Futur. Gener. Comput. Syst. 30, 291–304 (2014)
Article Google Scholar
Kandalla, K., Venkatesh, A., Hamidouche, K., et al.: Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters. In: 2013 IEEE 21st Annual Symposium on High-Performance Interconnects, San Jose, CA, USA, pp. 63–70 (2013)
Google Scholar
Yamazaki, I., Tomov, S., Dongarra, J.: One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators. Procedia Comput. Sci. 9, 37–46 (2012)
Article Google Scholar
Petitet, A., Whaley, R.C., Dongarra, J., Cleary, A.: HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers (2008), http://www.netlib.org/benchmark/hpl
Dongarra, J.: Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report). Technical report (2013)
Google Scholar
You, H., Lu, C.-D., Zhao, Z., Xing, F.: Optimizing utilization across XSEDE platforms. In: Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery, XSEDE 2013, p. 1. ACM Press, New York (2013)
Google Scholar
Loeffler, H., Winna, M.: Large biomolecular simulation on HPC platforms III. AMBER, CHARMM, GROMACS, LAMMPS and NAMD, Warrington, UK (2012)
Google Scholar
LAMMPS Benchmarks, http://lammps.sandia.gov/bench.html
Nvidia Corporation: GROMACS 4.6 Pre-Beta Benchmark Report, Revision 1.0 (September 10, 2012), http://www.nvidia.com/docs/IO/122634/gromacs-benchmark.pdf
Eicker, N., Lippert, T., Moschny, T., Suarez, E.: The DEEP project: Pursuing cluster-computing in the many-core era. In: Proc. of the 42nd International Conference on Parallel Processing Workshops (ICPPW) 2013, Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications (HUCAA), Lyon, France, pp. 885–892 (2013)
Google Scholar
Green500 list (November 2013), http://green500.org/lists/green201311

Download references

Author information

Authors and Affiliations

Intel Corporation, Munich, Germany
Andrey Semin
RSC Group, Moscow, Russian Federation
Egor Druzhinin, Vladimir Mironov, Alexey Shmelev & Alexander Moskovsky

Authors

Andrey Semin
View author publications
You can also search for this author in PubMed Google Scholar
Egor Druzhinin
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Mironov
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Shmelev
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Moskovsky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

MIN Faculty, Department of Informatics Scientific Computing, University of Hamburg, Bundestraße 45a, 20146, Hamburg, Germany
Julian Martin Kunkel
Deutsches Klimarechenzentrum, Bundesstraße 45a, 20146, Hamburg, Germany
Thomas Ludwig
Germany and Prometeus GmbH, University of Mannheim, Fliederstraße 2, 74915, Waibstadt, Germany
Hans Werner Meuer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Semin, A., Druzhinin, E., Mironov, V., Shmelev, A., Moskovsky, A. (2014). The Performance Characterization of the RSC PetaStream Module. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2014. Lecture Notes in Computer Science, vol 8488. Springer, Cham. https://doi.org/10.1007/978-3-319-07518-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-07518-1_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07517-4
Online ISBN: 978-3-319-07518-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics