Exploiting locality and tolerating remote memory access latency using thread migration

Jenks, Stephen; Gaudiot, Jean-Luc

doi:10.1007/BF02699904

Exploiting locality and tolerating remote memory access latency using thread migration

Published: August 1997

Volume 25, pages 281–304, (1997)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Stephen Jenks¹ &
Jean-Luc Gaudiot¹

67 Accesses
Explore all metrics

Abstract

Much research has focused on reducing and/or tolerating remote memory access latencies on distributed-memory parallel computers. Caching remote data is intended to reduce average access latency by handling as many remote memory accesses as possible using local copies of the data in the cache. Data-flow and multithreaded approaches help programs tolerate the latency of remote memory accesses by allowing processors to do other work while remote operations take place. The thread migration technique described here is a multithreaded architecture where threads migrate to remote processors that contain data they need. By exploiting access locality, the threads often use several data items from that processor before migrating to other processors for more data. Because the threads migrate in search of data, the approach is called Nomadic Threads. A prototype runtime system has been implemented on the CM5 and is portable to other distributed memory parallel computers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Thinking Machines Corporation, C* User’s Guide-Version 6.0.2, Cambridge, Massachusetts (1991).
J. R. McGraw, S. Skedzielewski, S. Allan, D. Grit, R. Oldehoeft, J. R. W. Glauert, I. Dobes, and P. Hohensee, SISAL: Streams and Iterations in a Single Assignment Language: Language Reference Manual, Version 1.2, Technical Report TR M-146, University of California, Lawrence Livermore Laboratory (1985).
R. S. Nikhil, Id (Version 88.0) Reference Manual, Technical Report CSG Memo 284, MIT Laboratory for Computer Science, Cambridge, Massachusetts (1988).
M. Annaratone, F. Bitz, E. Clune, H. T. Kung, P. Maulik, H. Ribas, P. Tseng, and J. Webb, Applications and Algorithm Partitioning on Warp,Proc. COMPCON Spring ’87 (1987).
Thinking Machines Corporation, The Connection Machine CM5 Technical Summary, Cambridge, Massachusetts (1991).
Arvind, R. S. Nikhil, and K. Pingali, I-Structures: Data Structures for Parallel Computing,ACM Tran. Prog. Lang. Syst. 11(4):598–632 (1989).
Article Google Scholar
Arvind, L. Bic and T. Ungerer, Evolution of Data-Flow Computers, inAdvanced Topics in Data-Flow Computing, J-L. Gaudiot and L. Bic, (Eds.), Prentice Hall, Englewood Cliffs, New Jersey, pp. 3–33 (1991).
Google Scholar
D. E. Culler and G. M. Papadopoulos, The Explicit Token Store,J. Parallel and Distributed Computing,10:289–308 (1990).
Article Google Scholar
Electrotechnical Laboratory, Computer Architecture Section, http://www.etl.go.jp:8080/etl/comparc/welcome.html (1996).
M. Gulati and N. Bagherzadeh, Performance Study of a Multithreaded Superscalar Microprocessor,Proc. Second In. Symp. on High-Performance Computer Architecture, pp. 291–301 (1996).
J. B. Dennis and G. R. Gao, Multithreaded Architectures: Principles, Projects, and Issues, inMultithreaded Computer Architecture: A Summary of a State of the Art, R. Iannucci, G. Gao, J. R. Halstead and B. Smith, (Eds.), Kluwer Academic Publishers, Boston, pp. 1–72 (1994).
Google Scholar
G. R. Gao, A Flexible Architecture Model for Hybrid Data-Flow and Control-Flow Evaluation, inAdvanced Topics in Data-Flow Computing, J.-L. Gaudiot and L. Bic, (Eds.), Prentice Hall, Englewood Cliffs, New Jersey, pp. 327–346 (1991).
Google Scholar
P. Evripidou and J.-L. Gaudiot, The USC Decoupled Multilevel Data-Flow Execution Model, inAdvanced Topics in Data-Flow Computing, J-L. Gaudiot and L. Bic, (Eds.), Prentice Hall, Englewood Cliffs, New Jersey, pp. 347–379 (1991).
Google Scholar
D. E. Culler, A. Sah, K. E. Schauser, T. von Eicken, and J. Wawrzynek, Fine-grain Parallelism with Minimal Hardware Support: A Compiler-Controlled Threaded Abstract Machine, inProc. 1991Int. Conf. Architectural Support for Prog. Lang. Oper. Syst. (ASPLOS), pp. 164–175 (1991).
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou, Cilk: An Efficient Multithreaded Runtime System, inProc. Fifth ACM SIGPLAN Symp. Parallel Programming (1995).
H. H. J. Hum and G. R. Gao, Supporting a Dynamic SPMD Model in a Multi-Threaded Architecture,Proc. Compcon ’93, pp. 165–174 (1993).
A. Rogers, M. C. Carlisle, J. H. Reppy, and L. J. Hendren, Supporting Dynamic Data Structures on Distributed Memory Machines,ACM Trans. Prog. Lang. Syst. 17(2): 233–263 (1995).
Article Google Scholar
J. Ramanujam and P. Sadayappan, Compile-Time Techniques for Data Distribution in Distributed Memory Machines,IEEE Trans. Parallel and Distrib. Syst. 2(4):472–481 (1991).
Article Google Scholar
H.-Y. Tseng and J-L. Gaudiot, A Compiler Strategy for Generating Efficient Communication Based on Array Distribution Directives, inProc. Third Inter. Conf. Computer Science and Informatics (CS & I ’97), Research Triangle Park, North Carolina (March 1997).
S. Jenks and J-L. Gaudiot, Nomadic Threads: A Runtime Approach for Managing Remote Memory Accesses in Multiprocessors, Technical Report 95-01, Department of EE-Systems, University of Southern California (1995).
T. von Eicken, D. E. Coller, S. C. Goldstein, and K. E. Schauser, Active Messages: A Mechanism for Integrated Communication and Computation,Comm. ACM, pp. 256–266 (1992).
Thinking Machines Corporation, CMMD Reference Manual-Version 3.0, Cambridge, Massachusetts (1993).
N. Guérin and J-L. Gaudiot, Simulation of the Communications Libraries of the CM-5 on UNIX Workstations, Technical Report 95-19, Department of EE-Systems, University of Southern California (1995).

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering-Systems, University of Southern California, 90089, Los Angeles, California
Stephen Jenks & Jean-Luc Gaudiot

Authors

Stephen Jenks
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Gaudiot
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jenks, S., Gaudiot, JL. Exploiting locality and tolerating remote memory access latency using thread migration. Int J Parallel Prog 25, 281–304 (1997). https://doi.org/10.1007/BF02699904

Download citation

Issue Date: August 1997
DOI: https://doi.org/10.1007/BF02699904

Key words

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting locality and tolerating remote memory access latency using thread migration

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multithreaded runtime framework for parallel and adaptive applications

Mitigating the NUMA effect on task-based runtime systems

Locality-Based Optimizations in the Chapel Compiler

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

Subscribe and save

Buy Now

Exploiting locality and tolerating remote memory access latency using thread migration

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multithreaded runtime framework for parallel and adaptive applications

Mitigating the NUMA effect on task-based runtime systems

Locality-Based Optimizations in the Chapel Compiler

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

Subscribe and save

Buy Now