Skip to main content
Log in

Dynamic Instruction Scheduling in a Trace-based Multi-threaded Architecture

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Simulation results are presented using the hardware-implemented, trace-based dynamic instruction scheduler of our single process DTSVLIW architecture to schedule instructions from several processes into multiple streams of VLIW instructions for execution by a wide-issue, simultaneous multi-threading (SMT) execution engine. The scheduling process involves single instruction execution of each process, dynamically scheduling executed instructions into blocks of VLIW instructions cached for subsequent SMT execution: SMT provides a mechanism to reduce the impact of horizontal and vertical waste, and variable memory latencies, seen in the DTSVLIW. Preliminary experiments explore this extended model. Results achieve PE utilization of up to 87% on a 4-thread, 1-scalar, 8 PE design, with speed-ups of up to 6.3 that of a single processor. Noticeably it only needs a single scalar process to be scheduled at any time, with main memory fetches being 1–4% that of a single processor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Olukotun, K., Hammond, L.: The Future of Microprocessors. ACM Queue, pp. 27–34, September 2005

  2. Ungerer T., Robic B. and Silc J. (2002). Multithreaded processors. Comput. J. 45(3): 320–348

    Article  MATH  Google Scholar 

  3. Schlansker M. and Rau B. (2000). EPIC: Explicitly parallel instruction processing. IEEE Computer 33: 37–45

    Google Scholar 

  4. Ozer E. and Conte M. (2005). High-performance and low-cost dual-thread VLIW processor using weld architectural paradigm. IEEE Trans. Parallel Distribut. Syst. 16(12): 1132–1142

    Article  Google Scholar 

  5. Özer, E., Conte, T.M., Sharma, S.: Weld: a multithreading technique towards latency-tolerant VLIW processors. In: Proceedings of the 8th International Conference on High Performance Computing–HiPC 2001, Lecture Notes in Computer Science 2228, pp. 192–203, December 2001

  6. Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: maximizing on-chip parallelism. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, Assoc. Comput. Mach., pp. 392–403 (1995)

  7. Eggers S.J., Emer J.S., Levy H.M., Lo J.L., Stamm R.L. and Tullsen D.M. (1997). Simultaneous multithreading: a platform for next-generation processors. IEEE Micro. 17(5): 12–19

    Article  Google Scholar 

  8. Rau, B.R.: Dynamically scheduled VLIW processors. In: Proceedings of the 26th Annual International Symposium on Microarchitecture, pp. 80–92. Austin, Texas (1993)

  9. Spadini, F., Fahs, B., Patel, S., Lumetta, S.S.: Improving quasi-dynamic schedules through region slip. In: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization. ACM International Conference Proceeding Series, vol. 37, pp. 149–158. San Francisco, California (2003)

  10. Nair, R., Hopkins, M.E.: Exploiting instruction level parallelism in processors by caching scheduled groups. In: Proceedings of the 24th Annual International Symposium on Computer Architecture, pp. 13–25 (1997)

  11. De Souza, A.F., Rounce, P.A.: Dynamically trace scheduled VLIW architectures. In: Proceedings of the High-performance Computing and Networking 1998–HPCN’98, Lecture Notes in Computer Science 1401, pp. 993–995, April 1998

  12. De Souza, A.F.: Integer performance evaluation of the dynamically trace scheduled VLIW architecture. Ph.D. thesis, Department of Computer Science, University College London, University of London (1999)

  13. De Souza A.F. (2000). Dynamically scheduling VLIW instructions. J. Parallel Distribut. Comput. 60(12): 1480–1511

    Article  MATH  MathSciNet  Google Scholar 

  14. De Souza, A.F.: Integer performance via block Compaction. In: Proceedings of the 13th Symposium on Computer Architecture and High Performance Computing, pp. 98–105 (2001)

  15. Santana, S.C., De Souza, A.F., Rounce, P.A.: A comparative analysis between EPIC static instruction scheduling and DTSVLIW dynamic instruction scheduling. In: Proceedings of the ICS 03 Workshop on Exploring the Trace Space for Dynamic Optimization Techniques, International Conference on Supercomputing, San Francisco, ACM SIGARCH, June 22–26, 2003

  16. Rounce, P.A., De Souza, A.F.: The mDTSVLIW: a multi-threaded trace-based VLIW architecture, sbac-pad. In: 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’06), pp. 63–72 (2006)

  17. Fisher J.A. (1984). The VLIW machine: a multiprocessor for compiling scientific code. IEEE Computer 17(7): 45–53

    Google Scholar 

  18. Hwu W.W., Mahlke S.A., Chen W.Y., Chang P.P., Warter N.J., Bringmann R.A., Ouellette R.G., Hank R.E., Kiyohara T., Haab G.E., Holm J.G. and Lavery D.M. (1993). The superblock: an effective technique for VLIW and superscalar compilation. J. Supercomput. 7: 229–248

    Article  Google Scholar 

  19. Sun Microsystems: The Sparc Architecture Manual—Version 7. Sun Microsystems, Inc. (1987)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter A. Rounce.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rounce, P., De Souza, A. Dynamic Instruction Scheduling in a Trace-based Multi-threaded Architecture. Int J Parallel Prog 36, 184–205 (2008). https://doi.org/10.1007/s10766-007-0062-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-007-0062-1

Keywords

Navigation