ABSTRACT
In the late 1990s, powerful economic forces led to the adoption of commodity desktop processors in high-performance computing. This transformation has been so effective that the June 2013 TOP500 list is still dominated by x86.
In 2013, the largest commodity market in computing is not PCs or servers, but mobile computing, comprising smart-phones and tablets, most of which are built with ARM-based SoCs. This leads to the suggestion that once mobile SoCs deliver sufficient performance, mobile SoCs can help reduce the cost of HPC.
This paper addresses this question in detail. We analyze the trend in mobile SoC performance, comparing it with the similar trend in the 1990s. We also present our experience evaluating performance and efficiency of mobile SoCs, deploying a cluster and evaluating the network and scalability of production applications. In summary, we give a first answer as to whether mobile SoCs are ready for HPC.
- N. R. Adiga, G. Almási, G. S. Almasi, Y. Aridor, R. Barik, D. Beece, R. Bellofatto, G. Bhanot, et al. An overview of the BlueGene/L supercomputer. In ACM/IEEE 2002 Conference on Supercomputing. IEEE Computer Society, 2002. Google ScholarDigital Library
- S. Alam, R. Barrett, M. Bast, M. R. Fahey, J. Kuehn, C. McCurdy, J. Rogers, P. Roth, R. Sankaran, J. S. Vetter, P. Worley, and W. Yu. Early evaluation of IBM BlueGene/P. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, pages 23:1--23:12, Piscataway, NJ, USA, 2008. IEEE Press. Google ScholarDigital Library
- Applied Micro. APM "X-Gene" Launch Press Briefing. http://www.apm.com/global/x-gene/docs/X-GeneOverview.pdf, 2012.Google Scholar
- J. Balart, A. Duran, M. Gonzàlez, X. Martorell, E. Ayguadé, and J. Labarta. Nanos Mercurium: a research compiler for OpenMP. In Proceedings of the European Workshop on OpenMP, volume 8, 2004.Google Scholar
- H. Berendsen, D. van der Spoel, and R. van Drunen. Gromacs: A message-passing parallel molecular dynamics implementation. Computer Physics Communications, 91(1):43--56, 1995.Google ScholarCross Ref
- E. Blem, J. Menon, and K. Sankaralingam. Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures. In 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013), 2013. Google ScholarDigital Library
- Calxeda. Calxeda EnergyCore ECX-1000 Series. http://www.calxeda.com/wp-content/uploads/2012/06/ECX1000-Product-Brief-612.pdf, 2012.Google Scholar
- DEISA project. DEISA 2; Distributed European Infrastructure for Supercomputing Applications; Maintenance of the DEISA Benchmark Suite in the Second Year. Technical report.Google Scholar
- J. Dongarra, P. Luszczek, and A. Petitet. The LINPACK Benchmark: past, present and future. Concurrency and Computation: Practice and Experience, 15(9):803--820, 2003.Google ScholarCross Ref
- A. Duran, E. Ayguade, R. M. Badia, J. Labarta, L. Martinell, X. Martorell, and J. Planas. OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters, 21(02):173--193, 2011.Google ScholarCross Ref
- M. Folk, A. Cheng, and K. Yates. HDF5: A file format and I/O library for high performance computing applications. In Proceedings of Supercomputing, volume 99, 1999.Google Scholar
- M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005. Special issue on "Program Generation, Optimization, and Platform Adaptation".Google ScholarCross Ref
- D. Göddeke, D. Komatitsch, M. Geveler, D. Ribbrock, N. Rajovic, N. Puzovic, and A. Ramirez. Energy efficiency vs. performance of the numerical solution of PDEs: an application study on a low-power ARM-based cluster. Journal of Computational Physics, 237:132--150, Mar. 2013. Google ScholarDigital Library
- B. Goglin. High-Performance Message Passing over generic Ethernet Hardware with Open-MX. Elsevier Journal of Parallel Computing (PARCO), 37(2):85--100, Feb. 2011. Google ScholarDigital Library
- R. Grisenthwaite. ARMv8 Technology Preview. http://www.arm.com/files/downloads/ARMv8_Architecture.pdf, 2011.Google Scholar
- R. Haring, M. Ohmacht, T. Fox, M. Gschwind, D. Satterfield, K. Sugavanam, P. Coteus, P. Heidelberger, M. Blumrich, R. Wisniewski, a. gara, G. Chiu, P. Boyle, N. Chist, and C. Kim. The IBM Blue Gene/Q Compute Chip. Micro, IEEE, 32(2):48--60, march-april 2012. Google ScholarDigital Library
- J.-H. Huang. GTC 2013 Keynote, March 2013.Google Scholar
- IBM Systems and Technology. IBM System Blue Gene/Q Data Sheet, November 2011.Google Scholar
- IHS iSuppli News Flash. Low-End Google Nexus 7 Carries $152 BOM, Teardown Reveals. http://www.isuppli.com/Teardowns/News/pages/Low-End-Google-Nexus-7-Carries-\$157-BOM-Teardown-Reveals.aspx.Google Scholar
- Intel. Intel ATOM S1260. http://ark.intel.com/products/71267/Intel-Atom-Processor-S1260-1MB-Cache-2_00-GHz.Google Scholar
- Intel. Intel MPI Benchmarks 3.2.4. http://software.intel.com/en-us/articles/intel-mpi-benchmarks.Google Scholar
- Intel. Intel Xeon Processor E5-2670. http://ark.intel.com/products/64595/Intel-Xeon-Processor-E5-2670-20M-Cache-2_60-GHz-8_00-GTs-Intel-QPI.Google Scholar
- Khronos OpenCL Working Group. The OpenCL Specification, version 1.0.29. http://khronos.org/registry/cl/specs/opencl-1.0.29.pdf, 2008.Google Scholar
- D. Komatitsch and J. Tromp. Introduction to the spectral element method for three-dimensional seismic wave propagation. Geophysical Journal International, 139(3):806--822, 1999.Google ScholarCross Ref
- W. Lang, J. Patel, and S. Shankar. Wimpy node clusters: What about non-wimpy workloads? In Proceedings of the Sixth International Workshop on Data Management on New Hardware, pages 47--55. ACM, 2010. Google ScholarDigital Library
- S. Li, K. Lim, P. Faraboschi, J. Chang, P. Ranganathan, and N. P. Jouppi. System-level integrated server architectures for scale-out datacenters. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pages 260--271. ACM, 2011. Google ScholarDigital Library
- T. Mattson and G. Henry. An Overview of the Intel TFLOPS Supercomputer. Intel Technology Journal, 2(1), 1998.Google Scholar
- J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.Google Scholar
- H. Nakashima, H. Nakamura, M. Sato, T. Boku, S. Matsuoka, D. Takahashi, and Y. Hotta. Megaproto: 1 TFlops/10kW rack is feasible even with only commodity technology. In Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing. IEEE, 2005. Google ScholarDigital Library
- NVIDIA. CUDA Programming Guide 2.2, 2009.Google Scholar
- NVIDIA. Bringing High-End Graphics to Handheld Devices, 2011.Google Scholar
- V. Pillet, J. Labarta, T. Cortes, and S. Girona. Paraver: A tool to visualize and analyze parallel code. WoTUG-18, pages 17--31, 1995.Google Scholar
- N. Rajovic, A. Rico, N. Puzovic, C. Adeniyi-Jones, and A. Ramirez. Tibidabo: Making the case for an ARM-based HPC system. Future Generation Computer Systems, 2013. DOI: http://dx.doi.org/10.1016/j.future.2013.07.013.Google Scholar
- N. Rajovic, A. Rico, J. Vipond, I. Gelado, N. Puzovic, and A. Ramirez. Experiences With Mobile Processors for Energy Efficient HPC. In Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 464--468, 2013. Google ScholarDigital Library
- N. Rajovic, L. Vilanova, C. Villavieja, N. Puzovic, and A. Ramirez. The low power architecture approach towards exascale computing. Journal of Computational Science, 2013. DOI: http://dx.doi.org/10.1016/j.jocs.2013.01.002. Google ScholarDigital Library
- K. P. Saravanan, P. M. Carpenter, and A. Ramirez. Power/Performance evaluation of Energy Efficient Ethernet (EEE) for High Performance Computing. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX, April 2013.Google ScholarCross Ref
- B. Schroeder, E. Pinheiro, and W.-D. Weber. DRAM errors in the wild: a large-scale field study. In Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems, pages 193--204. ACM, 2009. Google ScholarDigital Library
- B. Subramaniam and W.-c. Feng. The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems. In 8th IEEE Workshop on High-Performance, Power-Aware Computing (HPPAC), Shanghai, China, May 2012. Google ScholarDigital Library
- Texas Instruments. AM5K2E04/02 Multicore ARM KeyStone II System-on-Chip (SoC) Data Manual, November 2012.Google Scholar
- R. Teyssier. Cosmological hydrodynamics with adaptive mesh refinement. Astronomy and Astrophysics, 385(1):337--364, 2002.Google ScholarCross Ref
- TOP500. Top500®supercomputer cites. http://www.top500.org/.Google Scholar
- J. Turley. Cortex-A15 "Eagle" flies the coop. Microprocessor Report, 24(11):1--11, November 2010.Google Scholar
- V. Vasudevan, D. Andersen, M. Kaminsky, L. Tan, J. Franklin, and I. Moraru. Energy-efficient cluster computing with fawn: Workloads and implications. In Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking, pages 195--204. ACM, 2010. Google ScholarDigital Library
- M. Warren, E. Weigle, and W. Feng. High-density computing: A 240-processor beowulf in one cubic meter. In Supercomputing, ACM/IEEE 2002 Conference, pages 61--61. IEEE, 2002. Google ScholarDigital Library
- R. Whaley and J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1--27. IEEE Computer Society, 1998. Google ScholarDigital Library
- Yokogawa. WT210/WT230 Digital Power Meters. http://tmi.yokogawa.com/products/digital-power-analyzers/digital-power-analyzers/wt210wt230-digital-power-meters/.Google Scholar
Recommendations
DEISA--Distributed European Infrastructure for Supercomputing Applications
The paper presents an overview of the current research and achievements of the DEISA project, with a focus on the general concept of the infrastructure, the operational model, application projects and science communities, the DEISA Extreme Computing ...
Hungarian Supercomputing Grid
ICCS '02: Proceedings of the International Conference on Computational Science-Part IIThe main objective of the paper is to describe the main goals and activities within the newly formed Hungarian Supercomputing Grid (H-SuperGrid) which will be used as a high-performance and highthroughput Grid. In order to achieve these two features ...
A european perspective on supercomputing
ICS '09: Proceedings of the 23rd international conference on SupercomputingMassive computing systems will be needed to maintain competitiveness in all areas of science, engineering and business, to provide both management efficiency and computing capability. From a systems management perspective, massive installations offer an ...
Comments