Abstract
Much research has focused on reducing and/or tolerating remote memory access latencies on distributed-memory parallel computers. Caching remote data is intended to reduce average access latency by handling as many remote memory accesses as possible using local copies of the data in the cache. Data-flow and multithreaded approaches help programs tolerate the latency of remote memory accesses by allowing processors to do other work while remote operations take place. The thread migration technique described here is a multithreaded architecture where threads migrate to remote processors that contain data they need. By exploiting access locality, the threads often use several data items from that processor before migrating to other processors for more data. Because the threads migrate in search of data, the approach is called Nomadic Threads. A prototype runtime system has been implemented on the CM5 and is portable to other distributed memory parallel computers.
Similar content being viewed by others
References
Thinking Machines Corporation, C* User’s Guide-Version 6.0.2, Cambridge, Massachusetts (1991).
J. R. McGraw, S. Skedzielewski, S. Allan, D. Grit, R. Oldehoeft, J. R. W. Glauert, I. Dobes, and P. Hohensee, SISAL: Streams and Iterations in a Single Assignment Language: Language Reference Manual, Version 1.2, Technical Report TR M-146, University of California, Lawrence Livermore Laboratory (1985).
R. S. Nikhil, Id (Version 88.0) Reference Manual, Technical Report CSG Memo 284, MIT Laboratory for Computer Science, Cambridge, Massachusetts (1988).
M. Annaratone, F. Bitz, E. Clune, H. T. Kung, P. Maulik, H. Ribas, P. Tseng, and J. Webb, Applications and Algorithm Partitioning on Warp,Proc. COMPCON Spring ’87 (1987).
Thinking Machines Corporation, The Connection Machine CM5 Technical Summary, Cambridge, Massachusetts (1991).
Arvind, R. S. Nikhil, and K. Pingali, I-Structures: Data Structures for Parallel Computing,ACM Tran. Prog. Lang. Syst. 11(4):598–632 (1989).
Arvind, L. Bic and T. Ungerer, Evolution of Data-Flow Computers, inAdvanced Topics in Data-Flow Computing, J-L. Gaudiot and L. Bic, (Eds.), Prentice Hall, Englewood Cliffs, New Jersey, pp. 3–33 (1991).
D. E. Culler and G. M. Papadopoulos, The Explicit Token Store,J. Parallel and Distributed Computing,10:289–308 (1990).
Electrotechnical Laboratory, Computer Architecture Section, http://www.etl.go.jp:8080/etl/comparc/welcome.html (1996).
M. Gulati and N. Bagherzadeh, Performance Study of a Multithreaded Superscalar Microprocessor,Proc. Second In. Symp. on High-Performance Computer Architecture, pp. 291–301 (1996).
J. B. Dennis and G. R. Gao, Multithreaded Architectures: Principles, Projects, and Issues, inMultithreaded Computer Architecture: A Summary of a State of the Art, R. Iannucci, G. Gao, J. R. Halstead and B. Smith, (Eds.), Kluwer Academic Publishers, Boston, pp. 1–72 (1994).
G. R. Gao, A Flexible Architecture Model for Hybrid Data-Flow and Control-Flow Evaluation, inAdvanced Topics in Data-Flow Computing, J.-L. Gaudiot and L. Bic, (Eds.), Prentice Hall, Englewood Cliffs, New Jersey, pp. 327–346 (1991).
P. Evripidou and J.-L. Gaudiot, The USC Decoupled Multilevel Data-Flow Execution Model, inAdvanced Topics in Data-Flow Computing, J-L. Gaudiot and L. Bic, (Eds.), Prentice Hall, Englewood Cliffs, New Jersey, pp. 347–379 (1991).
D. E. Culler, A. Sah, K. E. Schauser, T. von Eicken, and J. Wawrzynek, Fine-grain Parallelism with Minimal Hardware Support: A Compiler-Controlled Threaded Abstract Machine, inProc. 1991Int. Conf. Architectural Support for Prog. Lang. Oper. Syst. (ASPLOS), pp. 164–175 (1991).
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou, Cilk: An Efficient Multithreaded Runtime System, inProc. Fifth ACM SIGPLAN Symp. Parallel Programming (1995).
H. H. J. Hum and G. R. Gao, Supporting a Dynamic SPMD Model in a Multi-Threaded Architecture,Proc. Compcon ’93, pp. 165–174 (1993).
A. Rogers, M. C. Carlisle, J. H. Reppy, and L. J. Hendren, Supporting Dynamic Data Structures on Distributed Memory Machines,ACM Trans. Prog. Lang. Syst. 17(2): 233–263 (1995).
J. Ramanujam and P. Sadayappan, Compile-Time Techniques for Data Distribution in Distributed Memory Machines,IEEE Trans. Parallel and Distrib. Syst. 2(4):472–481 (1991).
H.-Y. Tseng and J-L. Gaudiot, A Compiler Strategy for Generating Efficient Communication Based on Array Distribution Directives, inProc. Third Inter. Conf. Computer Science and Informatics (CS & I ’97), Research Triangle Park, North Carolina (March 1997).
S. Jenks and J-L. Gaudiot, Nomadic Threads: A Runtime Approach for Managing Remote Memory Accesses in Multiprocessors, Technical Report 95-01, Department of EE-Systems, University of Southern California (1995).
T. von Eicken, D. E. Coller, S. C. Goldstein, and K. E. Schauser, Active Messages: A Mechanism for Integrated Communication and Computation,Comm. ACM, pp. 256–266 (1992).
Thinking Machines Corporation, CMMD Reference Manual-Version 3.0, Cambridge, Massachusetts (1993).
N. Guérin and J-L. Gaudiot, Simulation of the Communications Libraries of the CM-5 on UNIX Workstations, Technical Report 95-19, Department of EE-Systems, University of Southern California (1995).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Jenks, S., Gaudiot, JL. Exploiting locality and tolerating remote memory access latency using thread migration. Int J Parallel Prog 25, 281–304 (1997). https://doi.org/10.1007/BF02699904
Issue Date:
DOI: https://doi.org/10.1007/BF02699904