ABSTRACT
This paper proposes an efficient method to analyze worst case interruption delay (WCID) of a workload running on modern microprocessors using a cycle accurate simulator (CAS). Our method is highly accurate because it simulates all possible cases inserting an interruption just before the retirement of every instruction executed in a workload. It is also (reasonably) efficient because it takes O(N log N) time for a workload with N executed instructions, instead of O(N2) of a straightforward iterative simulation of interrupted executions. The key idea for the efficiency is that a pair of executions with different interruption points has a set of durations in which they behave exactly coherent and thus one of simulations for the durations may be omitted. We implemented this method modifying the SimpleScalar tool set to prove it finds out WCID of workloads with five million executed instructions in reasonable time, less than 30 minutes, which would be 200-300 days by the straightforward method. We also show a parallelization of our method achieves a good speedup, about 7-fold with 8-node PC cluster.
- T. Austin, E. Larson, and D. Ernst. Simplescalar: An infrastructure for computer system modeling. Computer, 35(2):59--67, Feb. 2002. Google ScholarDigital Library
- A. Burns, K. Tindel, and A. Wellings. Effective analysis for engineering real-time fixed priority schedulars. IEEE Trans. Software Eng., 21(5):475--480, May 1995. Google ScholarDigital Library
- F. Burns, K. Koelmans, and A. Yakovlev. WCET analysis of super-scalar processors using simulation with colored Petri nets. Real-Time Systems, 18(2/3):267--280, May 2000. Google ScholarDigital Library
- J. Engblom and A. Ermedahl. Pipeline timing analysis using a trace-driven simulator. In RTCSA'99, pages 88--95, Dec. 1999. Google ScholarDigital Library
- C.-G. Lee et al. Analysis of cache-related preemption delay in fixed-priority preemptive scheduling. IEEE Trans. Computers, 47(6):700--713, June 1998. Google ScholarDigital Library
- S. McFarling. Combining branch predictors. Technical Report WRL TN-36, DEC, June 1993.Google Scholar
- H. Miyamoto, S. Iiyama, H. Tomiyama, H. Takada, and H. Nakashima. An efficient search algorithm of worst-case cache flush timings. In RTCSA 2005, pages 45--52, Aug. 2005. Google ScholarDigital Library
- H. Nakashima. An O(log N) algorithm to increment subarray members of an array of N elements. Technical Report http://www. para. tutics. tut. ac. jp/TR/tree-add. pdf, Toyohashi U. Tech., 2006.Google Scholar
- H. S. Negi, T. Mitra, and A. Roychoudhury. Accurate estimation of cache-related preemption delay. In CODES+ISSS 2003, pages 201--206, Oct. 2003. Google ScholarDigital Library
- S.-T. Pan, K. So, and J. T. Rahmeh. Improving the accuracy of dynamic branch prediction using branch correlation. In ASPLOS-V, pages 76--84, Oct. 1992. Google ScholarDigital Library
- P. Puschner and A. Burns. A review of worst-case execution-time analysis. Real-Time Systems, 18(2/3):115--128, May 2000. Google ScholarDigital Library
- J. E. Smith. A study of branch prediction strategies. In ISCA'81, pages 135--148, May 1981. Google ScholarDigital Library
- Y. Tan and V. Mooney. Integrated intra- and inter-task cache analysis for preemptive multi-tasking real-time systems. In SCOPES 2004, LNCS 3199, pages 182--199, Sept. 2004. Google ScholarDigital Library
Index Terms
- An accurate and efficient simulation-based analysis for worst case interruption delay
Recommendations
Rethinking cycle accurate DRAM simulation
MEMSYS '19: Proceedings of the International Symposium on Memory SystemsCycle accurate DRAM simulations have been the dominating architecture simulation model for DRAM for a long time. Although accurate, its poor simulation speed has not improved for years while a lot of other architecture simulators such as CPU and cache ...
Improving processor hardware compiled cycle accurate simulation using program abstraction
SIMUTools '14: Proceedings of the 7th International ICST Conference on Simulation Tools and TechniquesVerification is an important step in the development of real-time embedded systems. The validation of a real-time system uses a timing accurate simulator and, when the actual binary code is used, a cycle accurate simulator (CAS). However, a CAS is slow ...
Accurate and Efficient Cache Warmup for Sampled Processor Simulation Through NSL–BLRL
Architectural simulation is extremely time-consuming given the huge number of instructions that need to be simulated for contemporary benchmarks. Sampled simulation that selects a number of samples from the complete benchmark execution yields ...
Comments