skip to main content
10.1145/2503210.2503281acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Supercomputing with commodity CPUs: are mobile SoCs ready for HPC?

Authors Info & Claims
Published:17 November 2013Publication History

ABSTRACT

In the late 1990s, powerful economic forces led to the adoption of commodity desktop processors in high-performance computing. This transformation has been so effective that the June 2013 TOP500 list is still dominated by x86.

In 2013, the largest commodity market in computing is not PCs or servers, but mobile computing, comprising smart-phones and tablets, most of which are built with ARM-based SoCs. This leads to the suggestion that once mobile SoCs deliver sufficient performance, mobile SoCs can help reduce the cost of HPC.

This paper addresses this question in detail. We analyze the trend in mobile SoC performance, comparing it with the similar trend in the 1990s. We also present our experience evaluating performance and efficiency of mobile SoCs, deploying a cluster and evaluating the network and scalability of production applications. In summary, we give a first answer as to whether mobile SoCs are ready for HPC.

References

  1. N. R. Adiga, G. Almási, G. S. Almasi, Y. Aridor, R. Barik, D. Beece, R. Bellofatto, G. Bhanot, et al. An overview of the BlueGene/L supercomputer. In ACM/IEEE 2002 Conference on Supercomputing. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Alam, R. Barrett, M. Bast, M. R. Fahey, J. Kuehn, C. McCurdy, J. Rogers, P. Roth, R. Sankaran, J. S. Vetter, P. Worley, and W. Yu. Early evaluation of IBM BlueGene/P. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, pages 23:1--23:12, Piscataway, NJ, USA, 2008. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Applied Micro. APM "X-Gene" Launch Press Briefing. http://www.apm.com/global/x-gene/docs/X-GeneOverview.pdf, 2012.Google ScholarGoogle Scholar
  4. J. Balart, A. Duran, M. Gonzàlez, X. Martorell, E. Ayguadé, and J. Labarta. Nanos Mercurium: a research compiler for OpenMP. In Proceedings of the European Workshop on OpenMP, volume 8, 2004.Google ScholarGoogle Scholar
  5. H. Berendsen, D. van der Spoel, and R. van Drunen. Gromacs: A message-passing parallel molecular dynamics implementation. Computer Physics Communications, 91(1):43--56, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  6. E. Blem, J. Menon, and K. Sankaralingam. Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures. In 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Calxeda. Calxeda EnergyCore ECX-1000 Series. http://www.calxeda.com/wp-content/uploads/2012/06/ECX1000-Product-Brief-612.pdf, 2012.Google ScholarGoogle Scholar
  8. DEISA project. DEISA 2; Distributed European Infrastructure for Supercomputing Applications; Maintenance of the DEISA Benchmark Suite in the Second Year. Technical report.Google ScholarGoogle Scholar
  9. J. Dongarra, P. Luszczek, and A. Petitet. The LINPACK Benchmark: past, present and future. Concurrency and Computation: Practice and Experience, 15(9):803--820, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Duran, E. Ayguade, R. M. Badia, J. Labarta, L. Martinell, X. Martorell, and J. Planas. OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters, 21(02):173--193, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. Folk, A. Cheng, and K. Yates. HDF5: A file format and I/O library for high performance computing applications. In Proceedings of Supercomputing, volume 99, 1999.Google ScholarGoogle Scholar
  12. M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005. Special issue on "Program Generation, Optimization, and Platform Adaptation".Google ScholarGoogle ScholarCross RefCross Ref
  13. D. Göddeke, D. Komatitsch, M. Geveler, D. Ribbrock, N. Rajovic, N. Puzovic, and A. Ramirez. Energy efficiency vs. performance of the numerical solution of PDEs: an application study on a low-power ARM-based cluster. Journal of Computational Physics, 237:132--150, Mar. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Goglin. High-Performance Message Passing over generic Ethernet Hardware with Open-MX. Elsevier Journal of Parallel Computing (PARCO), 37(2):85--100, Feb. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Grisenthwaite. ARMv8 Technology Preview. http://www.arm.com/files/downloads/ARMv8_Architecture.pdf, 2011.Google ScholarGoogle Scholar
  16. R. Haring, M. Ohmacht, T. Fox, M. Gschwind, D. Satterfield, K. Sugavanam, P. Coteus, P. Heidelberger, M. Blumrich, R. Wisniewski, a. gara, G. Chiu, P. Boyle, N. Chist, and C. Kim. The IBM Blue Gene/Q Compute Chip. Micro, IEEE, 32(2):48--60, march-april 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J.-H. Huang. GTC 2013 Keynote, March 2013.Google ScholarGoogle Scholar
  18. IBM Systems and Technology. IBM System Blue Gene/Q Data Sheet, November 2011.Google ScholarGoogle Scholar
  19. IHS iSuppli News Flash. Low-End Google Nexus 7 Carries $152 BOM, Teardown Reveals. http://www.isuppli.com/Teardowns/News/pages/Low-End-Google-Nexus-7-Carries-\$157-BOM-Teardown-Reveals.aspx.Google ScholarGoogle Scholar
  20. Intel. Intel ATOM S1260. http://ark.intel.com/products/71267/Intel-Atom-Processor-S1260-1MB-Cache-2_00-GHz.Google ScholarGoogle Scholar
  21. Intel. Intel MPI Benchmarks 3.2.4. http://software.intel.com/en-us/articles/intel-mpi-benchmarks.Google ScholarGoogle Scholar
  22. Intel. Intel Xeon Processor E5-2670. http://ark.intel.com/products/64595/Intel-Xeon-Processor-E5-2670-20M-Cache-2_60-GHz-8_00-GTs-Intel-QPI.Google ScholarGoogle Scholar
  23. Khronos OpenCL Working Group. The OpenCL Specification, version 1.0.29. http://khronos.org/registry/cl/specs/opencl-1.0.29.pdf, 2008.Google ScholarGoogle Scholar
  24. D. Komatitsch and J. Tromp. Introduction to the spectral element method for three-dimensional seismic wave propagation. Geophysical Journal International, 139(3):806--822, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  25. W. Lang, J. Patel, and S. Shankar. Wimpy node clusters: What about non-wimpy workloads? In Proceedings of the Sixth International Workshop on Data Management on New Hardware, pages 47--55. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Li, K. Lim, P. Faraboschi, J. Chang, P. Ranganathan, and N. P. Jouppi. System-level integrated server architectures for scale-out datacenters. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pages 260--271. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Mattson and G. Henry. An Overview of the Intel TFLOPS Supercomputer. Intel Technology Journal, 2(1), 1998.Google ScholarGoogle Scholar
  28. J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.Google ScholarGoogle Scholar
  29. H. Nakashima, H. Nakamura, M. Sato, T. Boku, S. Matsuoka, D. Takahashi, and Y. Hotta. Megaproto: 1 TFlops/10kW rack is feasible even with only commodity technology. In Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing. IEEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. NVIDIA. CUDA Programming Guide 2.2, 2009.Google ScholarGoogle Scholar
  31. NVIDIA. Bringing High-End Graphics to Handheld Devices, 2011.Google ScholarGoogle Scholar
  32. V. Pillet, J. Labarta, T. Cortes, and S. Girona. Paraver: A tool to visualize and analyze parallel code. WoTUG-18, pages 17--31, 1995.Google ScholarGoogle Scholar
  33. N. Rajovic, A. Rico, N. Puzovic, C. Adeniyi-Jones, and A. Ramirez. Tibidabo: Making the case for an ARM-based HPC system. Future Generation Computer Systems, 2013. DOI: http://dx.doi.org/10.1016/j.future.2013.07.013.Google ScholarGoogle Scholar
  34. N. Rajovic, A. Rico, J. Vipond, I. Gelado, N. Puzovic, and A. Ramirez. Experiences With Mobile Processors for Energy Efficient HPC. In Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 464--468, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. N. Rajovic, L. Vilanova, C. Villavieja, N. Puzovic, and A. Ramirez. The low power architecture approach towards exascale computing. Journal of Computational Science, 2013. DOI: http://dx.doi.org/10.1016/j.jocs.2013.01.002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. K. P. Saravanan, P. M. Carpenter, and A. Ramirez. Power/Performance evaluation of Energy Efficient Ethernet (EEE) for High Performance Computing. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX, April 2013.Google ScholarGoogle ScholarCross RefCross Ref
  37. B. Schroeder, E. Pinheiro, and W.-D. Weber. DRAM errors in the wild: a large-scale field study. In Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems, pages 193--204. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. B. Subramaniam and W.-c. Feng. The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems. In 8th IEEE Workshop on High-Performance, Power-Aware Computing (HPPAC), Shanghai, China, May 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Texas Instruments. AM5K2E04/02 Multicore ARM KeyStone II System-on-Chip (SoC) Data Manual, November 2012.Google ScholarGoogle Scholar
  40. R. Teyssier. Cosmological hydrodynamics with adaptive mesh refinement. Astronomy and Astrophysics, 385(1):337--364, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  41. TOP500. Top500®supercomputer cites. http://www.top500.org/.Google ScholarGoogle Scholar
  42. J. Turley. Cortex-A15 "Eagle" flies the coop. Microprocessor Report, 24(11):1--11, November 2010.Google ScholarGoogle Scholar
  43. V. Vasudevan, D. Andersen, M. Kaminsky, L. Tan, J. Franklin, and I. Moraru. Energy-efficient cluster computing with fawn: Workloads and implications. In Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking, pages 195--204. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Warren, E. Weigle, and W. Feng. High-density computing: A 240-processor beowulf in one cubic meter. In Supercomputing, ACM/IEEE 2002 Conference, pages 61--61. IEEE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. R. Whaley and J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1--27. IEEE Computer Society, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yokogawa. WT210/WT230 Digital Power Meters. http://tmi.yokogawa.com/products/digital-power-analyzers/digital-power-analyzers/wt210wt230-digital-power-meters/.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
    November 2013
    1123 pages
    ISBN:9781450323789
    DOI:10.1145/2503210
    • General Chair:
    • William Gropp,
    • Program Chair:
    • Satoshi Matsuoka

    Copyright © 2013 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 17 November 2013

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    SC '13 Paper Acceptance Rate91of449submissions,20%Overall Acceptance Rate1,516of6,373submissions,24%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader