Abstract
Overcoming the memory wall [15] may be achieved by increasing the bandwidth and reducing the latency of the processor to memory connection, for example by implementing Cellular architectures, such as the IBM Cyclops. Such massively parallel architectures have sophisticated memory models. In this paper we used DIMES (the Delaware Iterative Multiprocessor Emulation System), developed by CAPSL at the University of Delaware, as a hardware evaluation tool for cellular architectures. The authors contend that there is an open question regarding the potential, ideal approach to parallelism from the programmer’s perspective. For example, at language-level such as UPC or HPF, or using trace-scheduling, or at a library-level, for example OpenMP or POSIX-threads. To investigate this, we have chosen to use a threaded Mandelbrot-set generator with a work-stealing algorithm to evaluate the DIMES cthread programming model for writing a simple multi-threaded program.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Almásil, G., Cascaval, C., Castaños, J.G., Denneau, M., Lieber, D., Moreira, J.E., Warren, H.S.: Dissecting Cyclops: Detailed Analysis of a Multithreaded Architecture. ACM SIGARCH Computer Architecture News 31 (March 2003)
Cascaval, C., Castaños, J.G., Ceze, L., Denneau, M., Gupta, M., Lieber, D., Moreira, J.E., Strauss, K., Warren, H.S.: Evaluation of a Multithreaded Architecture for Cellular Computing. In: 8th International Symposium on High-Performance Computer Architecture (HPCA) (2002)
Cavalherio, G.G.H., Doreille, M., Galilée, F., Gautier, T., Roch, J.-L.: Scheduling Parallel Programs on Non-Uniform Memory Architectures. In: HPCA Conference – Workshop on Parallel Computing for Irregular Applications WPCIA1, Orlando, USA (January 1999)
del Cuvillo, J.B., Zhu, W., Hu, Z., Gao, G.R.: FAST: A Functionally Accurate Simulation Toolset for the Cyclops-64 Cellular Architecture. In: Workshop on Modeling, Benchmarking and Simulation (MoBS), held in conjunction with the 32nd Annual International Symposium on Computer Architecture (ISCA 2005), Madison, Wisconsin, June 4 (2005)
del Cuvillo, J.B., Zhu, W., Hu, Z., Gao, G.R.: TiNy Threads: a Thread Virtual Machine for the Cyclops64 Cellular Architecture. In: Fifth Workshop on Massively Parallel Processing (WMPP), held in conjunction with the 19th International Parallel and Distributed Processing System, Denver, Colorado, April 3 - 8 (2005)
Duller, A., Towner, D., Panesar, G., Gray, A., Robbins, W.: picoArray technology: the tool’s story. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. IEEE, Los Alamitos (2005)
Gao, G.R., Sarkar, V.: Location Consistency - a New Memory Model and Cache Consistency Protocol. IEEE Transactions on Computers 49(8) (August 2000)
Gao, G.R., Theobald, K.B., Govindarajan, R., Leung, C., Hu, Z., Wu, H., Lu, J., del Cuvillo, J., Jacquet, A., Janot, V., Sterling, T.L.: Programming Models and System Software for Future High-End Computing Systems: Work-in-Progress. In: International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France, April 22 - 26 (2003)
El-Ghazawi, T.A., Carlson, W.W., Draper, J.M.: UPC Language Specifications V1.1.1 (October 2003)
Kakulavarapu, P., Morrone, C.J., Theobald, K., Amaral, J.N., Gao, G.R.: A Comparative Performance Study of Fine-Grain Multi-threading on Distributed Memory Machines. In: 19th IEEE International Performance, Computing and Communication Conference-IPCCC 2000, Phoenix, Arizona, USA, February 20-22 (2000)
McGuiness, J.M.: A DIMES Demonstration Application: Mandelbrot-Set Generation Using a Work-Stealing Algorithm. CAPSL Technical Note 11, Department of Electrical and Computer Engineering, University of Delaware, Newark, Delaware (June 2003), ftp://ftp.capsl.udel.edu/pub/doc/notes
Mandelbrot, B.B.: The Fractal Geometry of Nature. W.H.Freeman & Co., New York (1982)
Rodenas, D., Martorell, X., Ayguade, E., Labarta, J., Almasi, G., Cascaval, C., Castanos, J., Moreira, J.: Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture. In: 19th IEEE International Parallel and Distributed Processing Symposium, vol. 1, p. 110 (2005)
Sakane, H., Yakay, L., Karna, V., Leung, C., Gao, G.R.: DIMES: An Iterative Emulation Platform for Multiprocessor-System-on-Chip Designs. In: IEEE International Conference on Field-Programmable Technology, Tokyo, Japan, December 15-17 (2003)
Wulf, W., McKee, S.: Hitting the memory wall: Implications of the obvious. Computer Architecture News 23(1), 20–24 (1995)
Zhang, Y., Zhu., W., Chen, F., Hu, Z., Gao, G.R.: Sequential Consistency Revisited: The Sufficient Conditions and Method to Reason Consistency Model of a Multiprocessor-on-a chip Architecture. In: The IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN2005), Innsbruck, Austria, February 15 - 17 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McGuiness, J.M., Egan, C., Christianson, B., Gao, G. (2006). The Challenges of Efficient Code-Generation for Massively Parallel Architectures. In: Jesshope, C., Egan, C. (eds) Advances in Computer Systems Architecture. ACSAC 2006. Lecture Notes in Computer Science, vol 4186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11859802_38
Download citation
DOI: https://doi.org/10.1007/11859802_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40056-1
Online ISBN: 978-3-540-40058-5
eBook Packages: Computer ScienceComputer Science (R0)