ABSTRACT
Recently, software performance estimation based on source code instrumentation shows promising results in the literature. It achieves significant speedup without compromising accuracy, compared with cycle-accurate simulations. However, much work still remains to be done to make this technique flexible and accurate enough to estimate software on complex processors. To the best of our knowledge, we are the first to propose ways to tackle microarchitecture related issues in the source code instrumentation approach. We perform static instruction scheduling for superscalar architectures at instrumentation time and combine instrumented code and microarchitecture simulators to model runtime interactions between software and microarchitecture. We have developed a new framework, SciSim, to provide a common infrastructure for the proposed approach. It is designed to be easily extendable and retargetable to different instruction set architectures and processors. Using SciSim SystemC modules may be automatically generated to integrate software into system-level simulation. We will present the applicability of SciSim in system-level design exploration of multiprocessor systems. At last, experiments with standard benchmarks are presented to validate the speed and accuracy of SciSim.
- J. R. Bammi, W. Kruijtzer, L. Lavagno, E. Harcourt, and M. Lazarescu. Software performance estimation strategies in a system-level design tool. In Proceedings of the Eighth International Workshop on Hardware/Software Codesign, 2000. Google ScholarDigital Library
- G. Bontempi and W. Kruijtzer. A data analysis method for software performance prediction. In Proceedings of the Design, Automation, and Test in Europe (DATE) Conference, 2002. Google ScholarDigital Library
- G. Braun, A. Nohl, A. Hoffmann, O. Schliebusch, R. Leupers, and H. Meyr. A universal technique for fast and flexible instruction-set architecture simulation. IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems, 2004. Google ScholarDigital Library
- A. Cagney. Psim-model of the powerpc architecture, 1994-1996.Google Scholar
- M.-K. Chung, S. Yang, S.-H. Lee, and C.-M. Kyung. System-level HW/SW co-simulation framework for multiprocessor and multithread SoC. In Proceedings of IEEE VLSI-TSA international symposium on VLSI Design, Automation and Test, pages 177--180, 2005.Google Scholar
- M. J. Eager. Introduction to the DWARF debugging format, 2007.Google Scholar
- L. Formaggio, F. Fummi, and G. Pravadelli. A timing-accurate HW/SW co-simulation of an ISS with SystemC. In Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis (CODES+ISSS '04), pages 152--157, 2004. Google ScholarDigital Library
- F. Fummi, G. Perbellini, M. Loghi, and M. Poncino. ISS-centric modular HW/SW co-simulation. In Proceedings of the 16th ACM Great Lakes symposium on VLSI (GLSVLSI '06), pages 31--36, 2006. Google ScholarDigital Library
- P. Giusto, G. Martin, and E. Harcourt. Reliable estimation of execution time of embedded software. In Proceedings of the conference on Design, automation and test in Europe (DATE'01), pages 580--589, 2001. Google ScholarDigital Library
- M. Guthaus, J. Ringenberg, D. Ernst, T. Austin, T. Mudge, and R. Brown. Mibench: a free, commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Workshop on Workload Characterization, 2001. Google ScholarDigital Library
- Institute of Electrical and Electronics Engineers. IEEE Std 1666 - 2005 IEEE Standard SystemC Language Reference Manual. IEEE Std 1666-2005, 2006.Google Scholar
- T. Kempf, K. Karuri, S. Wallentowitz, G. Ascheid, R. Leupers, and H. Meyr. A SW performance estimation framework for early system-level-design using fine-grained instrumentation. In Proceedings of the conference on Design, automation and test in Europe (DATE'06), pages 468--473, 2006. Google ScholarDigital Library
- R. Kirner and P. Puschner. Classification of wcet analysis techniques. In Proceedings of the Eighth IEEE International Symposium on Object-Oriented Realtime distributed Computing, 2005. Google ScholarDigital Library
- J.-Y. Lee and I.-C. Park. Timed compiled-code simulation of embedded software for performance analysis of SOC design. In Proceedings of the Design Automation Conference (DAC'02), pages 293--298, 2002. Google ScholarDigital Library
- C. Mills, S. C. Ahalt, and J. Fowler. Compiled instruction set simulation. Software-Practice Experience, 21(8):877--889, 1991.Google ScholarCross Ref
- F. Z. M.S.Oyamada and F. Wagner. Accurate software performance estimation using domain classification and neural networks. In Proceedings of the Symposium on Integrated Circuits and System Design, 2004. Google ScholarDigital Library
- H. Nakamura, N. Sato, and N. Tabuchi. An efficient and portable scheduler for rtos simulation and its certified integration to systemc. In Proceedings of the conference on Design, automation and test in Europe (DATE'06), pages 1157--1158, 2006. Google ScholarDigital Library
- V. J. Reddi, A. Settle, D. A. Connors, and R. S. Cohn. PIN: a binary instrumentation tool for computer architecture research and education. In Proceedings of the 2004 workshop on Computer architecture education (WCAE'04), 2004. Google ScholarDigital Library
- A. Srivastava and A. Eustace. ATOM: A system for building customized program analysis tools. In Proceedings of the ACM Symposium on Programming Languages Design and Implementation (PLDI'94), pages 196--205, 1994. Google ScholarDigital Library
- P. Viana, E. Barros, S. Rigo, R. Azevedo, and G. Araujo. Exploring memory hierarchy with ArchC. In Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'03), 2003. Google ScholarDigital Library
- H. Yu, A. Gerstlauer, and D. Gajski. Rtos scheduling in transaction level models. In Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis (CODES+ISSS' 03), pages 31--36, 2003. Google ScholarDigital Library
- V. Zivojnovic and H. Meyr. Compiled HW/SW co-simulation. In Proceedings of the Design Automation Conference (DAC), 1996. Google ScholarDigital Library
Index Terms
- SciSim: a software performance estimation framework using source code instrumentation
Recommendations
Difficult-path branch prediction using subordinate microthreads
Special Issue: Proceedings of the 29th annual international symposium on Computer architecture (ISCA '02)Branch misprediction penalties continue to increase as microprocessor cores become wider and deeper. Thus, improving branch prediction accuracy remains an important challenge. Simultaneous Subordinate Microthreading (SSMT) provides a means to improve ...
Efficiently scaling out-of-order cores for simultaneous multithreading
ISCA '16: Proceedings of the 43rd International Symposium on Computer ArchitectureSimultaneous multithreading (SMT) out-of-order cores waste a significant portion of structural out-of-order core resources on instructions that do not need them. These resources eliminate false ordering dependences. However, because thread interleaving ...
NoSQ: Store-Load Communication without a Store Queue
The NoSQ microarchitecture performs store-load communication without a store queue and without executing stores in the out-of-order engine. It uses speculative memory bypassing for all in-flight store-load communication, enabled by a 99.8 percent ...
Comments