Article

Safely exploiting multithreaded processors to tolerate memory latency in real-time systems

Authors:
Ali El-Haj-Mahmoud

North Carolina State University, Raleigh, NC

North Carolina State University, Raleigh, NC
View Profile

,
Eric Rotenberg

North Carolina State University, Raleigh, NC

North Carolina State University, Raleigh, NC
View Profile

CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systemsSeptember 2004Pages 2–13https://doi.org/10.1145/1023833.1023837

Published:22 September 2004Publication History

CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems

Pages 2–13

ABSTRACT

A coarse-grain multithreaded processor can effectively hide long memory latencies by quickly switching to an alternate task when the active task issues a memory request, improving overall throughput. However, dynamic switching cannot be safely exploited to improve throughput in hard-real-time embedded systems. The schedulability of a task-set (guaranteeing all tasks meet deadlines) must be determined a priori using offline schedulability tests. Any computation/memory overlap must be statically accounted for. We develop a novel analytical framework that bounds the overlap between computation of a pipeline-resident-task and on-going memory transfers of other tasks. A simple closed-form schedulability test is derived, that only depends on the aggregate computation (C) and memory (M) components of tasks. Namely, the technique does not require specificity regarding the location of memory transfers within and among tasks and avoids searching all task permutations for a specific feasible schedule. To the best of our knowledge, this is the first work to provide the necessary formalism for safely and tractably exploiting coarse-grain multithreaded processors to tolerate memory latency in hard-real-time systems, exceeding the schedulability limits of classic real-time theory for uniprocessors. Our techniques make it possible to capitalize on higher frequency embedded processors, despite the widening processor-memory speed gap. Experiments with task-sets from C-lab benchmarks reveal improvement in the schedulability of task-sets, measured as the ability to schedule previously infeasible task-sets or reduce utilization for already feasible task-sets. We also demonstrate proof-of-concept by deploying our method in a cycle-level simulator of an ARM11-like embedded microprocessor augmented with multiple register contexts, the same hardware multithreading support available in Ubicom's IP3023 embedded microprocessor.

References

R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera Computer System. In Proceedings of the International Conference on Supercomputing, June 1990. Google ScholarDigital Library
A. Anantaraman, K. Seth, K. Patil, E. Rotenberg, and F. Mueller. Virtual Simple Architecture (VISA): Exceeding the Complexity Limit in Safe Real-Time Systems. In Proceedings of the 30th International Symposium on Computer Architecture, June 2003. Google ScholarDigital Library
ARM, Inc. ARM-11 Technical Reference Manual. Available from: http://www.arm.com/pdfs/DDI0211D_arm1136_r0p2_trm.pdf.Google Scholar
D. Burger, T. Austin, and S. Bennett. The Simplescalar Tool Set, Version 2.0. Technical Report 1342, Computer Science Department, University of Wisconsin-Madison, 1997.Google ScholarDigital Library
G. Buttazzo. Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications. Kluwer Academic Publishers, 1997. Google ScholarDigital Library
C-Lab WCET Benchmarks. Available from: http://www.c-lab.de/home/en/download.html.Google Scholar
B. Cogswell and Z. Segall. MACS: A Predictable Architecture for Real Time Systems. In Proceedings of the 12th IEEE Real-Time Systems Symposium, December 1991.Google ScholarCross Ref
A. Dean and J. Shen. Techniques for Software Thread Integration in Real-Time Embedded Systems. In Proceedings of the 19th IEEE Real-Time Systems Symposium, December 1998. Google ScholarDigital Library
R. Eickemeyer, R. Johnson, S. Kunkel, M. Squillante, and S. Liu. Evaluation of Multithreaded Uniprocessors for Commercial Application Environments. In Proceedings of the 23rd International Symposium on Computer Architecture, May 1996. Google ScholarDigital Library
K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Thread Level Parallelism of Desktop Applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, November 2000. Google ScholarDigital Library
T. Hand. Real-Time Systems Need Predictability. Computer Design (RISC Supplement), August 1989.Google Scholar
C. Healy, D. Whalley, and M. Harmon. Integrating the Timing Analysis of Pipelining and Instruction Caching. In Proceedings of the 16th Real-Time Systems Symposium, December 1995. Google ScholarDigital Library
R. Jain, C. J. Hughes, and S. V. Adve. Soft Real-Time Scheduling on Simultaneous Multithreaded Processors. In Proceedings of the 23rd IEEE Real-Time Systems Symposium, December 2002. Google ScholarDigital Library
D. Kirk. SMART (Strategic Memory Allocation for Real-Time) Cache Design. In Proceedings of the 10th IEEE Real-Time Systems Symposium, December 1989.Google Scholar
J. Kreuzinger, A. Schulz, M. Pfeffer, and T. Ungerer. Real-Time Scheduling on Multithreaded Processors. In Proceedings of the 7th International Conference on Real-Time Computer Systems and Applications, December 2000. Google ScholarDigital Library
C. Liu and J. Layland. Scheduling Algorithms for Multiprogramming in a Hard Real Time Environment. Journal of ACM, vol. 20, pp. 46--61, January 1973. Google ScholarDigital Library
J. Liu. Real-Time Systems. Prentice Hall, 2000. Google ScholarDigital Library
T. Lundqvist and P. Stenstrom. An Integrated Path and Timing Analysis Method Based on Cycle-Level Symbolic Execution. Journal of Real-Time Systems, 17(2/3):183-208, November 1999. Google ScholarDigital Library
F. Mueller. Compiler Support for Software-Based Cache Partitioning. In Proceedings of Programming Language Design and Implementation, June 1995. Google ScholarDigital Library
B. Smith. Architecture and Applications of the HEP Multiprocessor Computer System. In Proceedings of Real Time Signal Processing IV, 1981.Google Scholar
J. Stankovic, M. Spuri, K. Ramamritham, and G. Buttazzo. Deadline Scheduling for Real-Time Systems. Kluwer Academic Publishers, 1998. Google ScholarDigital Library
S. Storino and J. Borkenhagen. A Multi-Threaded 64-bit PowerPC Commercial RISC Processor Design. In Proceedings of the International Symposium on High-Performance Chips, August 1999.Google Scholar
D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, and R. Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. In Proceedings of the 23rd International Symposium on Computer Architecture, May 1996. Google ScholarDigital Library
Ubicom, Inc. The Ubicom IP3023 Wireless Network Processor. Available from: http://www.ubicom.com/products/ip3000/ip3000.htmlGoogle Scholar
T. Ungerer, B. Robic, and J. Silc. A Survey of Processors with Explicit Multithreading. ACM Computing Surveys, Vol. 35, No. 1, March 2003. Google ScholarDigital Library
A. Wolfe. Software-Based Cache Partitioning for Real-Time Applications. In Proceedings of the 3rd International Workshop on Responsive Computer Systems, September 1993.Google Scholar

Index Terms

Safely exploiting multithreaded processors to tolerate memory latency in real-time systems
1. Computer systems organization

Recommendations

An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Read More
A hard real-time capable multi-core SMT processor

Hard real-time applications in safety critical domains require high performance and time analyzability. Multi-core processors are an answer to these demands, however task interferences make multi-cores more difficult to analyze from a worst-case ...
Read More
Implicitly-multithreaded processors
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
September 2004
324 pages
ISBN:1581138903
DOI:10.1145/1023833
General Chairs:
Mary Jane Irwin
Pennsylvania State University
,
Wei Zhao
Texas Instruments
,
Program Chairs:
Luciano Lavagno
Politecnico di Torino/Cadence Labs
,
Scott Mahlke
University of Michigan, Ann Arbor, MI
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 September 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
memory latency
multithreading
real-time systems
schedulability test
worst-case execution time
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate52of230submissions,23%
Upcoming Conference
ESWEEK '24

Sponsor:

sigbed

sigbed

sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 623
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Safely exploiting multithreaded processors to tolerate memory latency in real-time systems

CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

An evaluation of speculative instruction execution on simultaneous multithreaded processors

A hard real-time capable multi-core SMT processor

Implicitly-multithreaded processors