research-article

Preliminary analysis of feasible benchmark problems for the hydrid PRAM/NUMA REPLICA architecture

Authors:
Jari-Matti Mäkelä

University of Turku, Finland

University of Turku, Finland
View Profile

,
Ville Leppänen

University of Turku, Finland

University of Turku, Finland
View Profile

,
Martti Forsell

Martti Forsell, VTT, Oulu, Finland

Martti Forsell, VTT, Oulu, Finland
View Profile

CompSysTech '12: Proceedings of the 13th International Conference on Computer Systems and TechnologiesJune 2012Pages 37–44https://doi.org/10.1145/2383276.2383283

Published:22 June 2012Publication History

CompSysTech '12: Proceedings of the 13th International Conference on Computer Systems and Technologies

Pages 37–44

ABSTRACT

We study benchmarking on modern chip multi-processors (CMP), and outline a set of programs to measure the architectural performance properties, focusing on the REPLICA architecture employing a hybrid of PRAM and NUMA computational models. We analyse the parallel data processing and storage mechanisms on mainstream and research CMPs and their utilization in benchmarks to identify the strong and weak points of REPLICA and to further develop the benchmarks to demonstrate its scalability and performance.

References

M. Forsell. A Scalable High-Performance Computing Solution for Network-on-Chips. Micro, IEEE, 22(5):46--55, sep--oct 2002. Google ScholarDigital Library
M. Forsell, Configurable Emulated Shared Memory Architecture for General Purpose MP-SOCs and NOC regions, Int. Symposium on Networks-on-Chip, vol. 0, pp. 163--172, 2009. Google ScholarDigital Library
M. Forsell and M. Hiivala, Multi-core Portability Abstraction, to appear in the Proceedings of the 14th Workshop on Advances in Parallel and Distributed Computational Models (APDCM'12), in conjunction with the 26th IEEE Int. Parallel and Distributed Processing Symposium (IPDPS'12), May 21, 2012, Shanghai, China. Google ScholarDigital Library
M. Forsell, A PRAM-NUMA Model of Computation for Addressing Low-TLP Workloads, International Journal of Networking and Computing 1, pp. 21--35, 2011.Google ScholarCross Ref
Intel Corporation. An Introduction to the Intel(r) QuickPath Interconnect, 2010.Google Scholar
Intel Corporation. Intel 64 and IA-32 Architectures SW Developer's Manual, Vol 1, 2011.Google Scholar
J.A. Kahle et al. Introduction to the Cell multiprocessor. IBM journal of research and development, 49(4/5):589, 2005. Google ScholarDigital Library
G.E. Moore. Cramming More Components onto Integrated Circuits. Electronics Magazine, 4, 1965.Google Scholar
J-M. Mäkelä et al. Design of the Language Replica for Hybrid PRAM-NUMA Many-Core Architectures. The 4th IEEE Int. Workshop on Multicore and Multithreaded Architectures and Algorithms, 2012. Google ScholarDigital Library
D. Naishlos et al. Towards a First Vertical Prototyping of an Extremely Fine-Grained Parallel Programming Approach. In Proc. 13th ACM Symposium on Parallel Algorithms and Architectures (SPAA-01), 2001. Google ScholarDigital Library
NVIDIA Corporation. NVIDIA CUDA Programming Guide, version 3.0, 2010.Google Scholar
NVIDIA Corporation. The Benefits of Quad Core CPUs in Mobile Devices. NVIDIA White Paper. Revision 1.1. 2011.Google Scholar
X. Wen and U. Vishkin. FPGA-based Prototype of a PRAM-on-chip Processor. In CF '08: Proceedings of the 2008 Conf. on Computing Frontiers, pp. 55--66, NY USA, 2008. ACM. Google ScholarDigital Library
X. Wen and U. Vishkin. PRAM-on-chip: First Commitment to Silicon. In Proceedings of the 19th ACM Symposium on Parallel Algorithms and Architectures (SPAA), 2007. Google ScholarDigital Library

Preliminary analysis of feasible benchmark problems for the hydrid PRAM/NUMA REPLICA architecture
1. General and reference
  1. Cross-computing tools and techniques
2. Theory of computation
  1. Models of computation
    1. Concurrency

Recommendations

Towards a parallel debugging framework for the massively multi-threaded, step-synchronous REPLICA architecture
CompSysTech '13: Proceedings of the 14th International Conference on Computer Systems and Technologies

Modern chip-multiprocessors pack an increasing amount of computational cores with each generation. Along with new computational power comes a problem of managing a large pool of active threads. Traditional debuggers often deal with concurrency style ...
Read More
An early performance evaluation of many integrated core architecture based SGI rackable computing system
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Intel recently introduced the Xeon Phi coprocessor based on the Many Integrated Core architecture featuring 60 cores with a peak performance of 1.0 Tflop/s. NASA has deployed a 128-node SGI Rackable system where each node has two Intel Xeon E2670 8-core ...
Read More
Performance analysis of the high-performance conjugate gradient benchmark on GPUs

Graphics processing unit accelerated supercomputers have proved to be very effective, especially with regard to power efficiency, for accelerating compute intensive applications like the high-performance Linpack used in the TOP500 list. This paper ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CompSysTech '12: Proceedings of the 13th International Conference on Computer Systems and Technologies
June 2012
440 pages
ISBN:9781450311939
DOI:10.1145/2383276
Editors:
Boris Rachev
Technical University of Varna, Varna, Bulgaria
,
Angel Smrikarov
University of Ruse, Ruse, Bulgaria
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 June 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
benchmarking
multi-core
parallel computing
processor architecture
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate241of492submissions,49%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 70
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Preliminary analysis of feasible benchmark problems for the hydrid PRAM/NUMA REPLICA architecture

CompSysTech '12: Proceedings of the 13th International Conference on Computer Systems and Technologies

ABSTRACT

References

Cited By

Recommendations

Towards a parallel debugging framework for the massively multi-threaded, step-synchronous REPLICA architecture

An early performance evaluation of many integrated core architecture based SGI rackable computing system

Performance analysis of the high-performance conjugate gradient benchmark on GPUs