research-article

Supercomputing with commodity CPUs: are mobile SoCs ready for HPC?

Authors:
Nikola Rajovic

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politècnica de Catalunya, Barcelona, Spain

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politècnica de Catalunya, Barcelona, Spain
View Profile

,
Paul M. Carpenter

Barcelona Supercomputing Center, Barcelona, Spain

Barcelona Supercomputing Center, Barcelona, Spain
View Profile

,
Isaac Gelado

Barcelona Supercomputing Center, Barcelona, Spain

Barcelona Supercomputing Center, Barcelona, Spain
View Profile

,
Nikola Puzovic

Barcelona Supercomputing Center, Barcelona, Spain

Barcelona Supercomputing Center, Barcelona, Spain
View Profile

,
Alex Ramirez

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politècnica de Catalunya, Barcelona, Spain

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politècnica de Catalunya, Barcelona, Spain
View Profile

,
Mateo Valero

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politècnica de Catalunya, Barcelona, Spain

Barcelona Supercomputing Center, Barcelona, Spain and Universitat Politècnica de Catalunya, Barcelona, Spain
View Profile

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisNovember 2013Article No.: 40Pages 1–12https://doi.org/10.1145/2503210.2503281

Published:17 November 2013Publication History

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Pages 1–12

ABSTRACT

In the late 1990s, powerful economic forces led to the adoption of commodity desktop processors in high-performance computing. This transformation has been so effective that the June 2013 TOP500 list is still dominated by x86.

In 2013, the largest commodity market in computing is not PCs or servers, but mobile computing, comprising smart-phones and tablets, most of which are built with ARM-based SoCs. This leads to the suggestion that once mobile SoCs deliver sufficient performance, mobile SoCs can help reduce the cost of HPC.

This paper addresses this question in detail. We analyze the trend in mobile SoC performance, comparing it with the similar trend in the 1990s. We also present our experience evaluating performance and efficiency of mobile SoCs, deploying a cluster and evaluating the network and scalability of production applications. In summary, we give a first answer as to whether mobile SoCs are ready for HPC.

References

N. R. Adiga, G. Almási, G. S. Almasi, Y. Aridor, R. Barik, D. Beece, R. Bellofatto, G. Bhanot, et al. An overview of the BlueGene/L supercomputer. In ACM/IEEE 2002 Conference on Supercomputing. IEEE Computer Society, 2002. Google ScholarDigital Library
S. Alam, R. Barrett, M. Bast, M. R. Fahey, J. Kuehn, C. McCurdy, J. Rogers, P. Roth, R. Sankaran, J. S. Vetter, P. Worley, and W. Yu. Early evaluation of IBM BlueGene/P. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, pages 23:1--23:12, Piscataway, NJ, USA, 2008. IEEE Press. Google ScholarDigital Library
Applied Micro. APM "X-Gene" Launch Press Briefing. http://www.apm.com/global/x-gene/docs/X-GeneOverview.pdf, 2012.Google Scholar
J. Balart, A. Duran, M. Gonzàlez, X. Martorell, E. Ayguadé, and J. Labarta. Nanos Mercurium: a research compiler for OpenMP. In Proceedings of the European Workshop on OpenMP, volume 8, 2004.Google Scholar
H. Berendsen, D. van der Spoel, and R. van Drunen. Gromacs: A message-passing parallel molecular dynamics implementation. Computer Physics Communications, 91(1):43--56, 1995.Google ScholarCross Ref
E. Blem, J. Menon, and K. Sankaralingam. Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures. In 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013), 2013. Google ScholarDigital Library
Calxeda. Calxeda EnergyCore ECX-1000 Series. http://www.calxeda.com/wp-content/uploads/2012/06/ECX1000-Product-Brief-612.pdf, 2012.Google Scholar
DEISA project. DEISA 2; Distributed European Infrastructure for Supercomputing Applications; Maintenance of the DEISA Benchmark Suite in the Second Year. Technical report.Google Scholar
J. Dongarra, P. Luszczek, and A. Petitet. The LINPACK Benchmark: past, present and future. Concurrency and Computation: Practice and Experience, 15(9):803--820, 2003.Google ScholarCross Ref
A. Duran, E. Ayguade, R. M. Badia, J. Labarta, L. Martinell, X. Martorell, and J. Planas. OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters, 21(02):173--193, 2011.Google ScholarCross Ref
M. Folk, A. Cheng, and K. Yates. HDF5: A file format and I/O library for high performance computing applications. In Proceedings of Supercomputing, volume 99, 1999.Google Scholar
M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005. Special issue on "Program Generation, Optimization, and Platform Adaptation".Google ScholarCross Ref
D. Göddeke, D. Komatitsch, M. Geveler, D. Ribbrock, N. Rajovic, N. Puzovic, and A. Ramirez. Energy efficiency vs. performance of the numerical solution of PDEs: an application study on a low-power ARM-based cluster. Journal of Computational Physics, 237:132--150, Mar. 2013. Google ScholarDigital Library
B. Goglin. High-Performance Message Passing over generic Ethernet Hardware with Open-MX. Elsevier Journal of Parallel Computing (PARCO), 37(2):85--100, Feb. 2011. Google ScholarDigital Library
R. Grisenthwaite. ARMv8 Technology Preview. http://www.arm.com/files/downloads/ARMv8_Architecture.pdf, 2011.Google Scholar
R. Haring, M. Ohmacht, T. Fox, M. Gschwind, D. Satterfield, K. Sugavanam, P. Coteus, P. Heidelberger, M. Blumrich, R. Wisniewski, a. gara, G. Chiu, P. Boyle, N. Chist, and C. Kim. The IBM Blue Gene/Q Compute Chip. Micro, IEEE, 32(2):48--60, march-april 2012. Google ScholarDigital Library
J.-H. Huang. GTC 2013 Keynote, March 2013.Google Scholar
IBM Systems and Technology. IBM System Blue Gene/Q Data Sheet, November 2011.Google Scholar
IHS iSuppli News Flash. Low-End Google Nexus 7 Carries $152 BOM, Teardown Reveals. http://www.isuppli.com/Teardowns/News/pages/Low-End-Google-Nexus-7-Carries-\$157-BOM-Teardown-Reveals.aspx.Google Scholar
Intel. Intel ATOM S1260. http://ark.intel.com/products/71267/Intel-Atom-Processor-S1260-1MB-Cache-2_00-GHz.Google Scholar
Intel. Intel MPI Benchmarks 3.2.4. http://software.intel.com/en-us/articles/intel-mpi-benchmarks.Google Scholar
Intel. Intel Xeon Processor E5-2670. http://ark.intel.com/products/64595/Intel-Xeon-Processor-E5-2670-20M-Cache-2_60-GHz-8_00-GTs-Intel-QPI.Google Scholar
Khronos OpenCL Working Group. The OpenCL Specification, version 1.0.29. http://khronos.org/registry/cl/specs/opencl-1.0.29.pdf, 2008.Google Scholar
D. Komatitsch and J. Tromp. Introduction to the spectral element method for three-dimensional seismic wave propagation. Geophysical Journal International, 139(3):806--822, 1999.Google ScholarCross Ref
W. Lang, J. Patel, and S. Shankar. Wimpy node clusters: What about non-wimpy workloads? In Proceedings of the Sixth International Workshop on Data Management on New Hardware, pages 47--55. ACM, 2010. Google ScholarDigital Library
S. Li, K. Lim, P. Faraboschi, J. Chang, P. Ranganathan, and N. P. Jouppi. System-level integrated server architectures for scale-out datacenters. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pages 260--271. ACM, 2011. Google ScholarDigital Library
T. Mattson and G. Henry. An Overview of the Intel TFLOPS Supercomputer. Intel Technology Journal, 2(1), 1998.Google Scholar
J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.Google Scholar
H. Nakashima, H. Nakamura, M. Sato, T. Boku, S. Matsuoka, D. Takahashi, and Y. Hotta. Megaproto: 1 TFlops/10kW rack is feasible even with only commodity technology. In Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing. IEEE, 2005. Google ScholarDigital Library
NVIDIA. CUDA Programming Guide 2.2, 2009.Google Scholar
NVIDIA. Bringing High-End Graphics to Handheld Devices, 2011.Google Scholar
V. Pillet, J. Labarta, T. Cortes, and S. Girona. Paraver: A tool to visualize and analyze parallel code. WoTUG-18, pages 17--31, 1995.Google Scholar
N. Rajovic, A. Rico, N. Puzovic, C. Adeniyi-Jones, and A. Ramirez. Tibidabo: Making the case for an ARM-based HPC system. Future Generation Computer Systems, 2013. DOI: http://dx.doi.org/10.1016/j.future.2013.07.013.Google Scholar
N. Rajovic, A. Rico, J. Vipond, I. Gelado, N. Puzovic, and A. Ramirez. Experiences With Mobile Processors for Energy Efficient HPC. In Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 464--468, 2013. Google ScholarDigital Library
N. Rajovic, L. Vilanova, C. Villavieja, N. Puzovic, and A. Ramirez. The low power architecture approach towards exascale computing. Journal of Computational Science, 2013. DOI: http://dx.doi.org/10.1016/j.jocs.2013.01.002. Google ScholarDigital Library
K. P. Saravanan, P. M. Carpenter, and A. Ramirez. Power/Performance evaluation of Energy Efficient Ethernet (EEE) for High Performance Computing. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX, April 2013.Google ScholarCross Ref
B. Schroeder, E. Pinheiro, and W.-D. Weber. DRAM errors in the wild: a large-scale field study. In Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems, pages 193--204. ACM, 2009. Google ScholarDigital Library
B. Subramaniam and W.-c. Feng. The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems. In 8th IEEE Workshop on High-Performance, Power-Aware Computing (HPPAC), Shanghai, China, May 2012. Google ScholarDigital Library
Texas Instruments. AM5K2E04/02 Multicore ARM KeyStone II System-on-Chip (SoC) Data Manual, November 2012.Google Scholar
R. Teyssier. Cosmological hydrodynamics with adaptive mesh refinement. Astronomy and Astrophysics, 385(1):337--364, 2002.Google ScholarCross Ref
TOP500. Top500®supercomputer cites. http://www.top500.org/.Google Scholar
J. Turley. Cortex-A15 "Eagle" flies the coop. Microprocessor Report, 24(11):1--11, November 2010.Google Scholar
V. Vasudevan, D. Andersen, M. Kaminsky, L. Tan, J. Franklin, and I. Moraru. Energy-efficient cluster computing with fawn: Workloads and implications. In Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking, pages 195--204. ACM, 2010. Google ScholarDigital Library
M. Warren, E. Weigle, and W. Feng. High-density computing: A 240-processor beowulf in one cubic meter. In Supercomputing, ACM/IEEE 2002 Conference, pages 61--61. IEEE, 2002. Google ScholarDigital Library
R. Whaley and J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1--27. IEEE Computer Society, 1998. Google ScholarDigital Library
Yokogawa. WT210/WT230 Digital Power Meters. http://tmi.yokogawa.com/products/digital-power-analyzers/digital-power-analyzers/wt210wt230-digital-power-meters/.Google Scholar

Recommendations

DEISA--Distributed European Infrastructure for Supercomputing Applications

The paper presents an overview of the current research and achievements of the DEISA project, with a focus on the general concept of the infrastructure, the operational model, application projects and science communities, the DEISA Extreme Computing ...
Read More
Hungarian Supercomputing Grid
ICCS '02: Proceedings of the International Conference on Computational Science-Part II

The main objective of the paper is to describe the main goals and activities within the newly formed Hungarian Supercomputing Grid (H-SuperGrid) which will be used as a high-performance and highthroughput Grid. In order to achieve these two features ...
Read More
A european perspective on supercomputing
ICS '09: Proceedings of the 23rd international conference on Supercomputing

Massive computing systems will be needed to maintain competitiveness in all areas of science, engineering and business, to provide both management efficiency and computing capability. From a systems management perspective, massive installations offer an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2013
1123 pages
ISBN:9781450323789
DOI:10.1145/2503210
General Chair:
William Gropp
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Satoshi Matsuoka
Tokyo Institute of Technology, Tokyo, Japan
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 November 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
SC '13 Paper Acceptance Rate91of449submissions,20%Overall Acceptance Rate1,516of6,373submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 94
  Total Citations
  View Citations
- 1,221
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Supercomputing with commodity CPUs: are mobile SoCs ready for HPC?

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

ABSTRACT

References

Cited By

Recommendations

DEISA--Distributed European Infrastructure for Supercomputing Applications

Hungarian Supercomputing Grid

A european perspective on supercomputing