Skip to main content
Log in

Exploiting locality and tolerating remote memory access latency using thread migration

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Much research has focused on reducing and/or tolerating remote memory access latencies on distributed-memory parallel computers. Caching remote data is intended to reduce average access latency by handling as many remote memory accesses as possible using local copies of the data in the cache. Data-flow and multithreaded approaches help programs tolerate the latency of remote memory accesses by allowing processors to do other work while remote operations take place. The thread migration technique described here is a multithreaded architecture where threads migrate to remote processors that contain data they need. By exploiting access locality, the threads often use several data items from that processor before migrating to other processors for more data. Because the threads migrate in search of data, the approach is called Nomadic Threads. A prototype runtime system has been implemented on the CM5 and is portable to other distributed memory parallel computers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Thinking Machines Corporation, C* User’s Guide-Version 6.0.2, Cambridge, Massachusetts (1991).

  2. J. R. McGraw, S. Skedzielewski, S. Allan, D. Grit, R. Oldehoeft, J. R. W. Glauert, I. Dobes, and P. Hohensee, SISAL: Streams and Iterations in a Single Assignment Language: Language Reference Manual, Version 1.2, Technical Report TR M-146, University of California, Lawrence Livermore Laboratory (1985).

  3. R. S. Nikhil, Id (Version 88.0) Reference Manual, Technical Report CSG Memo 284, MIT Laboratory for Computer Science, Cambridge, Massachusetts (1988).

  4. M. Annaratone, F. Bitz, E. Clune, H. T. Kung, P. Maulik, H. Ribas, P. Tseng, and J. Webb, Applications and Algorithm Partitioning on Warp,Proc. COMPCON Spring ’87 (1987).

  5. Thinking Machines Corporation, The Connection Machine CM5 Technical Summary, Cambridge, Massachusetts (1991).

  6. Arvind, R. S. Nikhil, and K. Pingali, I-Structures: Data Structures for Parallel Computing,ACM Tran. Prog. Lang. Syst. 11(4):598–632 (1989).

    Article  Google Scholar 

  7. Arvind, L. Bic and T. Ungerer, Evolution of Data-Flow Computers, inAdvanced Topics in Data-Flow Computing, J-L. Gaudiot and L. Bic, (Eds.), Prentice Hall, Englewood Cliffs, New Jersey, pp. 3–33 (1991).

    Google Scholar 

  8. D. E. Culler and G. M. Papadopoulos, The Explicit Token Store,J. Parallel and Distributed Computing,10:289–308 (1990).

    Article  Google Scholar 

  9. Electrotechnical Laboratory, Computer Architecture Section, http://www.etl.go.jp:8080/etl/comparc/welcome.html (1996).

  10. M. Gulati and N. Bagherzadeh, Performance Study of a Multithreaded Superscalar Microprocessor,Proc. Second In. Symp. on High-Performance Computer Architecture, pp. 291–301 (1996).

  11. J. B. Dennis and G. R. Gao, Multithreaded Architectures: Principles, Projects, and Issues, inMultithreaded Computer Architecture: A Summary of a State of the Art, R. Iannucci, G. Gao, J. R. Halstead and B. Smith, (Eds.), Kluwer Academic Publishers, Boston, pp. 1–72 (1994).

    Google Scholar 

  12. G. R. Gao, A Flexible Architecture Model for Hybrid Data-Flow and Control-Flow Evaluation, inAdvanced Topics in Data-Flow Computing, J.-L. Gaudiot and L. Bic, (Eds.), Prentice Hall, Englewood Cliffs, New Jersey, pp. 327–346 (1991).

    Google Scholar 

  13. P. Evripidou and J.-L. Gaudiot, The USC Decoupled Multilevel Data-Flow Execution Model, inAdvanced Topics in Data-Flow Computing, J-L. Gaudiot and L. Bic, (Eds.), Prentice Hall, Englewood Cliffs, New Jersey, pp. 347–379 (1991).

    Google Scholar 

  14. D. E. Culler, A. Sah, K. E. Schauser, T. von Eicken, and J. Wawrzynek, Fine-grain Parallelism with Minimal Hardware Support: A Compiler-Controlled Threaded Abstract Machine, inProc. 1991Int. Conf. Architectural Support for Prog. Lang. Oper. Syst. (ASPLOS), pp. 164–175 (1991).

  15. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou, Cilk: An Efficient Multithreaded Runtime System, inProc. Fifth ACM SIGPLAN Symp. Parallel Programming (1995).

  16. H. H. J. Hum and G. R. Gao, Supporting a Dynamic SPMD Model in a Multi-Threaded Architecture,Proc. Compcon ’93, pp. 165–174 (1993).

  17. A. Rogers, M. C. Carlisle, J. H. Reppy, and L. J. Hendren, Supporting Dynamic Data Structures on Distributed Memory Machines,ACM Trans. Prog. Lang. Syst. 17(2): 233–263 (1995).

    Article  Google Scholar 

  18. J. Ramanujam and P. Sadayappan, Compile-Time Techniques for Data Distribution in Distributed Memory Machines,IEEE Trans. Parallel and Distrib. Syst. 2(4):472–481 (1991).

    Article  Google Scholar 

  19. H.-Y. Tseng and J-L. Gaudiot, A Compiler Strategy for Generating Efficient Communication Based on Array Distribution Directives, inProc. Third Inter. Conf. Computer Science and Informatics (CS & I ’97), Research Triangle Park, North Carolina (March 1997).

  20. S. Jenks and J-L. Gaudiot, Nomadic Threads: A Runtime Approach for Managing Remote Memory Accesses in Multiprocessors, Technical Report 95-01, Department of EE-Systems, University of Southern California (1995).

  21. T. von Eicken, D. E. Coller, S. C. Goldstein, and K. E. Schauser, Active Messages: A Mechanism for Integrated Communication and Computation,Comm. ACM, pp. 256–266 (1992).

  22. Thinking Machines Corporation, CMMD Reference Manual-Version 3.0, Cambridge, Massachusetts (1993).

  23. N. Guérin and J-L. Gaudiot, Simulation of the Communications Libraries of the CM-5 on UNIX Workstations, Technical Report 95-19, Department of EE-Systems, University of Southern California (1995).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jenks, S., Gaudiot, JL. Exploiting locality and tolerating remote memory access latency using thread migration. Int J Parallel Prog 25, 281–304 (1997). https://doi.org/10.1007/BF02699904

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02699904

Key words

Navigation