skip to main content
10.1145/2535753.2535757acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application

Published: 17 November 2013 Publication History

Abstract

The exponential growth in processor performance seems to have reached a turning point. Nowadays, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints paved the way for the development of multi and manycore processors. Research on the performance and the energy efficiency of numerical kernels on multicores are common but studies in the context of manycores are sparse. Unlike these works, in this paper we analyze a well-known irregular NP-complete problem, the Traveling-Salesman Problem (TSP). This study investigates two aspects of the TSP on multicore, NUMA, and manycore processors. First, we concentrate on the nontrivial task of adapting this application to a manycore, specifically the novel MPPA-256 manycore processor. Then, we analyze its performance and energy consumption on different platforms that comprise general-purpose and low-power multicores, a NUMA machine, and the MPPA-256 manycore. Our results show that applications able to fully use the resources of a manycore can have better performance and may consume 9.8 and 13 times less energy when compared to low-power and general-purpose multicore processors, respectively.

References

[1]
P. Aubry, P.-E. Beaucamps, and F. Blanc et. al. Extended Cyclostatic Dataflow Program Compilation and Execution for an Integrated Manycore Processor. In International Conference on Computational Science (ICCS), volume 18, pages 1624--1633, Barcelona, Spain, 2013. Elsevier.
[2]
D. Brooks, P. Bose, and S. E. Schuster et. al. Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors. IEEE Micro, 20(6): 26--44, 2000.
[3]
B. D. de Dinechin, P. G. de Massasa, and G. Lagera et. al. A Distributed Run-Time Environment for the Kalray MPPA-256 Integrated Manycore Processor. In Intl. Conference on Computational Science (ICCS), volume 18, pages 1654--1663, Barcelona, Spain, 2013. Elsevier.
[4]
D. Göddeke and Dimitri Komatitsch et al. Energy Efficiency vs. Performance of the Numerical Solution of PDEs: An Application Study on a Low-power ARM-based Cluster. J. Comput. Physics, 237: 132--150, 2013.
[5]
M. Hähnel, B. Döbel, M. Völp, and H. Härtig. Measuring Energy Consumption for Short Code Paths Using RAPL. ACM Sigmetrics Performance Evaluation Review, 40(3): 13--17, 2012.
[6]
L. V. Kale and G. Zheng. Charm++ and AMPI: Adaptive Runtime Strategies via Migratable Objects. In M. Parashar and X. Li, editors, Advanced Computational Infrastructures for Parallel and Distributed Adaptive Applications, chapter 13. John Wiley & Sons, Inc., Hoboken, NUSA, 2009.
[7]
G. Laporte. The Traveling Salesman Problem: An Overview of Exact and Approximate Algorithms. European Journal of Operational Research, 59(2): 231--247, June 1992.
[8]
J. Larus. Spending Moore's Dividend. Communications of the ACM, 52: 62--69, 2009.
[9]
Li, Hui et. al. Locality and Loop Scheduling on NUMA Multiprocessors. In International Conference on Parallel Processing (ICPP), volume 2, pages 140--147, Syracuse, USA, 1993. IEEE Computer Society.
[10]
N. Rajovic et. al. The Low-Power Architecture Approach Towards Exascale Computing. In Workshop on Scalable Algorithms for Large-Scale Systems (ScalA), pages 1--2, New York, USA, 2011. ACM.
[11]
Z. Ou, B. Pang, Y. Deng, J. Nurminen, A. Ylä-Jääski, and P. Hui. Energy and Cost-Efficiency Analysis of ARM-Based Clusters. In IEEE/ACM Intl. Symposium on Cluster, Cloud and Grid Computing (CCGrid), pages 115--123, Ottawa, Canada, 2012. IEEE Computer Society.
[12]
E. L. Padoin, D. A. G. de Oliveira, P. Velho, and P. Navaux. Time-to-Solution and Energy-to-Solution: A Comparison between ARM and Xeon. In Workshop on Applications for Multi-Core Architectures (WAMCA), pages 48--53, New York, USA, 2012. IEEE Computer Society.
[13]
E. Rotem, A. Naveh, A. Ananthakrishnan, and E. Weissmann et al. Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge. IEEE Micro, 32(2): 20--27, 2012.
[14]
L. Stanisic, B. Videau, J. Cronsioe, and A. Degomme et al. Performance Analysis of HPC Applications on Low-Power Embedded Platforms. In Design, Automation & Test in Europe (DATE), pages 475--480, Grenoble, France, 2013. IEEE Computer Society.
[15]
Tilera Corporation. TILE-Gx Processor Family. http://www.tilera.com/products/processors/TILE-Gx_Family. Accessed: September 2013.
[16]
E. Totoni and B. Behzad et. al. Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs. In IEEE Intl. Symposium on Performance Analysis of Systems and Software (ISPASS), pages 78--87, New Brunswick, Canada, 2012. IEEE Computer Society.

Cited By

View all
  • (2018)Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computationsThe Journal of Supercomputing10.1007/s11227-018-2460-0Online publication date: 23-Jun-2018
  • (2017)Accelerating Graph Community Detection with Approximate Updates via an Energy-Efficient NoCProceedings of the 54th Annual Design Automation Conference 201710.1145/3061639.3062194(1-6)Online publication date: 18-Jun-2017
  • (2017)Islands-of-Cores Approach for Harnessing SMP/NUMA Architectures in Heterogeneous Stencil ComputationsParallel Computing Technologies10.1007/978-3-319-62932-2_34(351-364)Online publication date: 29-Jul-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IA3 '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
November 2013
92 pages
ISBN:9781450325035
DOI:10.1145/2535753
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NUMA
  2. TSP
  3. energy
  4. manycore
  5. multicore
  6. performance

Qualifiers

  • Research-article

Funding Sources

Conference

SC13

Acceptance Rates

IA3 '13 Paper Acceptance Rate 6 of 21 submissions, 29%;
Overall Acceptance Rate 18 of 67 submissions, 27%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computationsThe Journal of Supercomputing10.1007/s11227-018-2460-0Online publication date: 23-Jun-2018
  • (2017)Accelerating Graph Community Detection with Approximate Updates via an Energy-Efficient NoCProceedings of the 54th Annual Design Automation Conference 201710.1145/3061639.3062194(1-6)Online publication date: 18-Jun-2017
  • (2017)Islands-of-Cores Approach for Harnessing SMP/NUMA Architectures in Heterogeneous Stencil ComputationsParallel Computing Technologies10.1007/978-3-319-62932-2_34(351-364)Online publication date: 29-Jul-2017
  • (2016)High-Performance and Energy-Efficient Network-on-Chip Architectures for Graph AnalyticsACM Transactions on Embedded Computing Systems10.1145/296102715:4(1-26)Online publication date: 1-Sep-2016
  • (2016)Virtualization Guided Tsunami and Storm Surge Simulations for Low Power ArchitecturesSimulation and Modeling Methodologies, Technologies and Applications10.1007/978-3-319-31295-8_7(99-114)Online publication date: 28-May-2016
  • (2016)CAP Bench: a benchmark suite for performance and energy evaluation of low‐power many‐core processorsConcurrency and Computation: Practice and Experience10.1002/cpe.389229:4Online publication date: 17-Jun-2016
  • (2014)Energy Efficient Seismic Wave Propagation Simulation on a Low-Power Manycore ProcessorProceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2014.28(57-64)Online publication date: 22-Oct-2014
  • (2014)Evaluating performance and power efficiency of scientific applications on multi-threaded systemsProceedings of the 2nd International Workshop on Energy Efficient Supercomputing10.1109/E2SC.2014.15(11-20)Online publication date: 16-Nov-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media