ABSTRACT
The early performance evaluation of complex platforms and software stacks requires fast and sufficiently accurate workload representations. In the literature, two different approaches have been proposed: Host-based simulation with abstract performance annotations, enabling fast and functional simulations with limited architectural accuracy, and abstract workload models (or traffic generators) with more detailed platform resource usage patterns.
In this work, we present an approach for automatic workload extraction from functional application code, combining the benefits of both approaches. First, the algorithmic behaviour of the embedded software is characterised statically both in terms of target processor usage and target memory access patterns, resulting in an abstracted, control flowaware workload model. Secondly, this model can be used on the target architecture itself as well as within a host-based simulation environment. We demonstrate the effectiveness of our approach by running our performance model on a virtual platform with and without a target Instruction Set Simulator (ISS) and comparing the simulation traces with the unaltered target processor binary execution.
- A. Awad and Y. Solihin. STM: Cloning the spatial and temporal memory access behavior. In 2014 IEEE 20th Intl. Symposium on High Performance Computer Architecture (HPCA), pages 237--247, 2014. doi: 10.1109/HPCA.2014.6835935.Google ScholarCross Ref
- R. H. Bell, Jr. and L. K. John. Improved automatic test-case synthesis for performance model validation. In Proc. of the 19th Annual Intl. Conf. on Supercomputing, ICS '05, pages 111--120. ACM, 2005. ISBN 1-59593-167-8. doi: 10.1145/1088149.1088164. Google ScholarDigital Library
- A. Bouchhima, P. Gerin, and F. Pétrot. Automatic instrumentation of embedded software for high level hardware/software co-simulation. In 2009 Asia and South Pacific Design Automation Conference, pages 546--551. IEEE, Jan. 2009. ISBN 978-1-4244-2748-2. Google ScholarDigital Library
- C. Brandolese, S. Corbetta, and W. Fornaciari. Software energy estimation based on statistical characterization of intermediate compilation code. Proc. of the Intl. Symposium on Low Power Electronics and Design, pages 333--338, 2011. ISSN 15334678. Google ScholarDigital Library
- A. Díaz, H. Posadas, and E. Villar. Obtaining Memory Address Traces from Native Co-Simulation for Data Cache Modeling in SystemC. XXV Conf. on Design of Circuits and Integrated Systems (DCIS'10), 2010.Google Scholar
- L. Eeckhout, J. Bell, R. H., B. Stougie, K. De Bosschere, and L. John. Control flow modeling in statistical simulation for accurate and efficient processor design studies. In 31st Annual Intl. Symposium on Computer Architecture, 2004. Proceedings, pages 350--361, June 2004. doi: 10.1109/ISCA.2004.1310787. Google ScholarDigital Library
- K. Ganesan and L. K. John. Automatic generation of miniaturized synthetic proxies for target applications to efficiently design multicore processors. IEEE Transactions on Computers, 63(4):833--846, Apr. 2014. ISSN 0018-9340. doi: 10.1109/TC.2013.36. Google ScholarDigital Library
- P. Gerin, M. M. Hamayun, and F. Pétrot. Native MPSoC co-simulation environment for software performance estimation. Proc. of the 7th IEEE/ACM intl. conf. on Hardware/software codesign and system synthesis (CODES+ISSS'09), pages 403--412, 2009. Google ScholarDigital Library
- P. González, P. P. Sánchez, and A. Díaz. Embedded software execution time estimation at different abstraction levels. XXV Conf. on Design of Circuits and Integrated Systems (DCIS'10), 2010.Google Scholar
- K. Grüttner, P. A. Hartmann, T. Fandrey, K. Hylla, D. Lorenz, S. Stattelmann, B. Sander, O. Bringmann, W. Nebel, and W. Rosenstiel. An ESL Timing & Power Estimation and Simulation Framework for Heterogeneous SoCs. In Intl. Conf. on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), pages 181--190, 2014.Google Scholar
- J. Gustafsson, A. Betts, A. Ermedahl, and B. Lisper. The Mälardalen WCET benchmarks -- past, present and future. In 10th Intl. Workshop on Worst-Case Execution Time Analysis (WCET'10), volume 15 of OASIcs, pages 137--147, 2010.Google Scholar
- C. Hughes and T. Li. Accelerating multi-core processor design space evaluation using automatic multi-threaded workload synthesis. In IEEE Intl. Symposium on Workload Characterization (IISWC'08), pages 163--172. IEEE, 2008. ISBN 978-1-4244-2777-2. doi: 10.1109/IISWC.2008.4636101.Google ScholarCross Ref
- Y. Hwang, S. Abdi, and D. Gajski. Cycle-approximate re-targetable performance estimation at the transaction level. Proc. of the Design, Automation & Test in Europe Conf. (DATE'08), pages 3--8, 2008. Google ScholarDigital Library
- A. Joshi, L. Eeckhout, R. Bell, and L. John. Performance cloning: A technique for disseminating proprietary applications as benchmarks. In IEEE Intl. Symposium on Workload Characterization, pages 105--115. IEEE, Oct. 2006. ISBN 1-4244-0509-2, 1-4244-0508-4. doi: 10.1109/IISWC.2006.302734.Google ScholarCross Ref
- T. Kempf, K. Karuri, S. Wallentowitz, G. Ascheid, R. Leupers, and H. Meyr. A SW performance estimation framework for early system-level-design using fine-grained instrumentation. Proc. of the Design Automation & Test in Europe Conf. (DATE'06), 1: 468--473, 2006. Google ScholarDigital Library
- LLVM. The LLVM Compiler Infrastructure. URL http://llvm.org.Google Scholar
- S. Nussbaum and J. E. Smith. Modeling superscalar processors via statistical simulation. In Proc. of the 2001 Intl. Conf. on Parallel Architectures and Compilation Techniques, PACT '01, pages 15--24. IEEE Computer Society, 2001. ISBN 0-7695-1363-8. Google ScholarDigital Library
- R. Plyaskin and A. Herkersdorf. A method for accurate high-level performance evaluation of MPSoC architectures using fine-grained generated traces. In Architecture of Computing Systems - ARCS 2010, volume LNCS 5974, chapter 18, pages 199--210. Springer, Berlin, Heidelberg, 2010. ISBN 978-3-642-11949-1. Google ScholarDigital Library
- H. Posadas, A. Díaz, and E. Villar. SW Annotation Techniques and RTOS Modelling for Native Simulation of Heterogeneous Embedded Systems. In K. Tanaka, editor, Embedded Systems - Theory and Design Methodology, chapter 13. InTech, Mar. 2012. ISBN 978-953-51-0167-3.Google Scholar
- H. Shen, M.-M. Hamayun, and F. Pétrot. Native Simulation of MPSoC Using Hardware-Assisted Virtualization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(7): 1074--1087, July 2012. ISSN 0278-0070. Google ScholarDigital Library
- S. Stattelmann, O. Bringmann, and W. Rosenstiel. Fast and accurate source-level simulation of software timing considering complex code optimizations. In 48th ACM/EDAC/IEEE Design Automation Conf. (DAC), pages 486--491. IEEE, 2011. ISBN 9781450306362. Google ScholarDigital Library
- L. Van Ertvelde and L. Eeckhout. Benchmark synthesis for architecture and compiler exploration. In 2010 IEEE Intl. Symposium on Workload Characterization (IISWC), pages 1--11, Dec. 2010. doi: 10.1109/IISWC.2010.5650208. Google ScholarDigital Library
- A workload extraction framework for software performance model generation
Recommendations
Batch scheduling of consolidated virtual machines based on their workload interference model
The use of virtualization technology (VT) has become widespread in modern datacenters and Clouds in recent years. In spite of their many advantages, such as provisioning of isolated execution environments and migration, current implementations of VT do ...
An analysis of database workload performance on simultaneous multithreaded processors
ISCA '98: Proceedings of the 25th annual international symposium on Computer architectureSimultaneous multithreading (SMT) is an architectural technique in which the processor issues multiple instructions from multiple threads each cycle. While SMT has been shown to be effective on scientific workloads, its performance on database systems ...
Understanding Performance Interference of I/O Workload in Virtualized Cloud Environments
CLOUD '10: Proceedings of the 2010 IEEE 3rd International Conference on Cloud ComputingServer virtualization offers the ability to slice large, underutilized physical servers into smaller, parallel virtual machines (VMs), enabling diverse applications to run in isolated environments on a shared hardware platform. Effective management of ...
Comments