ABSTRACT
A coarse-grain multithreaded processor can effectively hide long memory latencies by quickly switching to an alternate task when the active task issues a memory request, improving overall throughput. However, dynamic switching cannot be safely exploited to improve throughput in hard-real-time embedded systems. The schedulability of a task-set (guaranteeing all tasks meet deadlines) must be determined a priori using offline schedulability tests. Any computation/memory overlap must be statically accounted for. We develop a novel analytical framework that bounds the overlap between computation of a pipeline-resident-task and on-going memory transfers of other tasks. A simple closed-form schedulability test is derived, that only depends on the aggregate computation (C) and memory (M) components of tasks. Namely, the technique does not require specificity regarding the location of memory transfers within and among tasks and avoids searching all task permutations for a specific feasible schedule. To the best of our knowledge, this is the first work to provide the necessary formalism for safely and tractably exploiting coarse-grain multithreaded processors to tolerate memory latency in hard-real-time systems, exceeding the schedulability limits of classic real-time theory for uniprocessors. Our techniques make it possible to capitalize on higher frequency embedded processors, despite the widening processor-memory speed gap. Experiments with task-sets from C-lab benchmarks reveal improvement in the schedulability of task-sets, measured as the ability to schedule previously infeasible task-sets or reduce utilization for already feasible task-sets. We also demonstrate proof-of-concept by deploying our method in a cycle-level simulator of an ARM11-like embedded microprocessor augmented with multiple register contexts, the same hardware multithreading support available in Ubicom's IP3023 embedded microprocessor.
- R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera Computer System. In Proceedings of the International Conference on Supercomputing, June 1990. Google ScholarDigital Library
- A. Anantaraman, K. Seth, K. Patil, E. Rotenberg, and F. Mueller. Virtual Simple Architecture (VISA): Exceeding the Complexity Limit in Safe Real-Time Systems. In Proceedings of the 30th International Symposium on Computer Architecture, June 2003. Google ScholarDigital Library
- ARM, Inc. ARM-11 Technical Reference Manual. Available from: http://www.arm.com/pdfs/DDI0211D_arm1136_r0p2_trm.pdf.Google Scholar
- D. Burger, T. Austin, and S. Bennett. The Simplescalar Tool Set, Version 2.0. Technical Report 1342, Computer Science Department, University of Wisconsin-Madison, 1997.Google ScholarDigital Library
- G. Buttazzo. Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications. Kluwer Academic Publishers, 1997. Google ScholarDigital Library
- C-Lab WCET Benchmarks. Available from: http://www.c-lab.de/home/en/download.html.Google Scholar
- B. Cogswell and Z. Segall. MACS: A Predictable Architecture for Real Time Systems. In Proceedings of the 12th IEEE Real-Time Systems Symposium, December 1991.Google ScholarCross Ref
- A. Dean and J. Shen. Techniques for Software Thread Integration in Real-Time Embedded Systems. In Proceedings of the 19th IEEE Real-Time Systems Symposium, December 1998. Google ScholarDigital Library
- R. Eickemeyer, R. Johnson, S. Kunkel, M. Squillante, and S. Liu. Evaluation of Multithreaded Uniprocessors for Commercial Application Environments. In Proceedings of the 23rd International Symposium on Computer Architecture, May 1996. Google ScholarDigital Library
- K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Thread Level Parallelism of Desktop Applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, November 2000. Google ScholarDigital Library
- T. Hand. Real-Time Systems Need Predictability. Computer Design (RISC Supplement), August 1989.Google Scholar
- C. Healy, D. Whalley, and M. Harmon. Integrating the Timing Analysis of Pipelining and Instruction Caching. In Proceedings of the 16th Real-Time Systems Symposium, December 1995. Google ScholarDigital Library
- R. Jain, C. J. Hughes, and S. V. Adve. Soft Real-Time Scheduling on Simultaneous Multithreaded Processors. In Proceedings of the 23rd IEEE Real-Time Systems Symposium, December 2002. Google ScholarDigital Library
- D. Kirk. SMART (Strategic Memory Allocation for Real-Time) Cache Design. In Proceedings of the 10th IEEE Real-Time Systems Symposium, December 1989.Google Scholar
- J. Kreuzinger, A. Schulz, M. Pfeffer, and T. Ungerer. Real-Time Scheduling on Multithreaded Processors. In Proceedings of the 7th International Conference on Real-Time Computer Systems and Applications, December 2000. Google ScholarDigital Library
- C. Liu and J. Layland. Scheduling Algorithms for Multiprogramming in a Hard Real Time Environment. Journal of ACM, vol. 20, pp. 46--61, January 1973. Google ScholarDigital Library
- J. Liu. Real-Time Systems. Prentice Hall, 2000. Google ScholarDigital Library
- T. Lundqvist and P. Stenstrom. An Integrated Path and Timing Analysis Method Based on Cycle-Level Symbolic Execution. Journal of Real-Time Systems, 17(2/3):183-208, November 1999. Google ScholarDigital Library
- F. Mueller. Compiler Support for Software-Based Cache Partitioning. In Proceedings of Programming Language Design and Implementation, June 1995. Google ScholarDigital Library
- B. Smith. Architecture and Applications of the HEP Multiprocessor Computer System. In Proceedings of Real Time Signal Processing IV, 1981.Google Scholar
- J. Stankovic, M. Spuri, K. Ramamritham, and G. Buttazzo. Deadline Scheduling for Real-Time Systems. Kluwer Academic Publishers, 1998. Google ScholarDigital Library
- S. Storino and J. Borkenhagen. A Multi-Threaded 64-bit PowerPC Commercial RISC Processor Design. In Proceedings of the International Symposium on High-Performance Chips, August 1999.Google Scholar
- D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, and R. Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. In Proceedings of the 23rd International Symposium on Computer Architecture, May 1996. Google ScholarDigital Library
- Ubicom, Inc. The Ubicom IP3023 Wireless Network Processor. Available from: http://www.ubicom.com/products/ip3000/ip3000.htmlGoogle Scholar
- T. Ungerer, B. Robic, and J. Silc. A Survey of Processors with Explicit Multithreading. ACM Computing Surveys, Vol. 35, No. 1, March 2003. Google ScholarDigital Library
- A. Wolfe. Software-Based Cache Partitioning for Real-Time Applications. In Proceedings of the 3rd International Workshop on Responsive Computer Systems, September 1993.Google Scholar
Index Terms
- Safely exploiting multithreaded processors to tolerate memory latency in real-time systems
Recommendations
An evaluation of speculative instruction execution on simultaneous multithreaded processors
Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
A hard real-time capable multi-core SMT processor
Hard real-time applications in safety critical domains require high performance and time analyzability. Multi-core processors are an answer to these demands, however task interferences make multi-cores more difficult to analyze from a worst-case ...
Comments