ABSTRACT
We study benchmarking on modern chip multi-processors (CMP), and outline a set of programs to measure the architectural performance properties, focusing on the REPLICA architecture employing a hybrid of PRAM and NUMA computational models. We analyse the parallel data processing and storage mechanisms on mainstream and research CMPs and their utilization in benchmarks to identify the strong and weak points of REPLICA and to further develop the benchmarks to demonstrate its scalability and performance.
- M. Forsell. A Scalable High-Performance Computing Solution for Network-on-Chips. Micro, IEEE, 22(5):46--55, sep--oct 2002. Google ScholarDigital Library
- M. Forsell, Configurable Emulated Shared Memory Architecture for General Purpose MP-SOCs and NOC regions, Int. Symposium on Networks-on-Chip, vol. 0, pp. 163--172, 2009. Google ScholarDigital Library
- M. Forsell and M. Hiivala, Multi-core Portability Abstraction, to appear in the Proceedings of the 14th Workshop on Advances in Parallel and Distributed Computational Models (APDCM'12), in conjunction with the 26th IEEE Int. Parallel and Distributed Processing Symposium (IPDPS'12), May 21, 2012, Shanghai, China. Google ScholarDigital Library
- M. Forsell, A PRAM-NUMA Model of Computation for Addressing Low-TLP Workloads, International Journal of Networking and Computing 1, pp. 21--35, 2011.Google ScholarCross Ref
- Intel Corporation. An Introduction to the Intel(r) QuickPath Interconnect, 2010.Google Scholar
- Intel Corporation. Intel 64 and IA-32 Architectures SW Developer's Manual, Vol 1, 2011.Google Scholar
- J.A. Kahle et al. Introduction to the Cell multiprocessor. IBM journal of research and development, 49(4/5):589, 2005. Google ScholarDigital Library
- G.E. Moore. Cramming More Components onto Integrated Circuits. Electronics Magazine, 4, 1965.Google Scholar
- J-M. Mäkelä et al. Design of the Language Replica for Hybrid PRAM-NUMA Many-Core Architectures. The 4th IEEE Int. Workshop on Multicore and Multithreaded Architectures and Algorithms, 2012. Google ScholarDigital Library
- D. Naishlos et al. Towards a First Vertical Prototyping of an Extremely Fine-Grained Parallel Programming Approach. In Proc. 13th ACM Symposium on Parallel Algorithms and Architectures (SPAA-01), 2001. Google ScholarDigital Library
- NVIDIA Corporation. NVIDIA CUDA Programming Guide, version 3.0, 2010.Google Scholar
- NVIDIA Corporation. The Benefits of Quad Core CPUs in Mobile Devices. NVIDIA White Paper. Revision 1.1. 2011.Google Scholar
- X. Wen and U. Vishkin. FPGA-based Prototype of a PRAM-on-chip Processor. In CF '08: Proceedings of the 2008 Conf. on Computing Frontiers, pp. 55--66, NY USA, 2008. ACM. Google ScholarDigital Library
- X. Wen and U. Vishkin. PRAM-on-chip: First Commitment to Silicon. In Proceedings of the 19th ACM Symposium on Parallel Algorithms and Architectures (SPAA), 2007. Google ScholarDigital Library
- Preliminary analysis of feasible benchmark problems for the hydrid PRAM/NUMA REPLICA architecture
Recommendations
Towards a parallel debugging framework for the massively multi-threaded, step-synchronous REPLICA architecture
CompSysTech '13: Proceedings of the 14th International Conference on Computer Systems and TechnologiesModern chip-multiprocessors pack an increasing amount of computational cores with each generation. Along with new computational power comes a problem of managing a large pool of active threads. Traditional debuggers often deal with concurrency style ...
An early performance evaluation of many integrated core architecture based SGI rackable computing system
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisIntel recently introduced the Xeon Phi coprocessor based on the Many Integrated Core architecture featuring 60 cores with a peak performance of 1.0 Tflop/s. NASA has deployed a 128-node SGI Rackable system where each node has two Intel Xeon E2670 8-core ...
Performance analysis of the high-performance conjugate gradient benchmark on GPUs
Graphics processing unit accelerated supercomputers have proved to be very effective, especially with regard to power efficiency, for accelerating compute intensive applications like the high-performance Linpack used in the TOP500 list. This paper ...
Comments