ABSTRACT
Flexible, accurate performance predictions offer numerous benefits such as gaining insight into and optimizing applications and architectures. However, the development and evaluation of such performance predictions has been a major research challenge, due to the architectural complexities. To address this challenge, we have designed and implemented a prototype system, named COMPASS, for automated performance model generation and prediction. COMPASS generates a structured performance model from the target application's source code using automated static analysis, and then, it evaluates this model using various performance prediction techniques. As we demonstrate on several applications, the results of these predictions can be used for a variety of purposes, such as design space exploration, identifying performance tradeoffs for applications, and understanding sensitivities of important parameters. COMPASS can generate these predictions across several types of applications from traditional, sequential CPU applications to GPU-based, heterogeneous, parallel applications. Our empirical evaluation demonstrates a maximum overhead of 4%, flexibility to generate models for 9 applications, speed, ease of creation, and very low relative errors across a diverse set of architectures.
- A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman, "LogGP: Incorporating long messages into the LogP model," in Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures, ser. SPAA '95, 1995, pp. 95--105. Google ScholarDigital Library
- A. Bhattacharyya and T. Hoefler, "Pemogen: Automatic adaptive performance modeling during program runtime," in Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, ser. PACT '14. New York, NY, USA: ACM, 2014, pp. 393--404. Google ScholarDigital Library
- C. Chan, D. Unat et al., "Software design space exploration for exascale combustion co-design," in Supercomputing. Springer, 2013, pp. 196--212.Google Scholar
- D. Culler, R. Karp et al., "LogP: Towards a realistic model of parallel computation," in Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPOPP '93, 1993, pp. 1--12. Google ScholarDigital Library
- C. Dave, H. Bae et al., "Cetus: A source-to-source compiler infrastructure for multicores," IEEE Computer, vol. 42, no. 12, pp. 36--42, 2009. Google ScholarDigital Library
- C. L. Janssen, H. Adalsteinsson, and J. P. Kenny, "Using simulation to design extremescale applications and architectures: Programming model exploration," SIGMETRICS Performance Evaluation Review, vol. 38, no. 4, pp. 4--8, 2011. Google ScholarDigital Library
- I. Karlin, A. Bhatele et al., "LULESH programming model and performance ports overview," Lawrence Livermore National Laboratory (LLNL), Livermore, CA, Tech. Rep. LLNL-TR-608824, 2012.Google Scholar
- B. C. Lee and D. M. Brooks, "Accurate and efficient regression modeling for microarchitectural performance and power prediction," in Proceedings of the 12th international conference on Architectural support for programming languages and operating systems. San Jose, California, USA: ACM, 2006, pp. 185--194. Google ScholarDigital Library
- S. Lee and J. Vetter, "OpenARC: Open Accelerator Research Compiler for Directive-Based, Efficient Heterogeneous Computing," in HPDC '14: Proceedings of the ACM Symposium on High-Performance Parallel and Distributed Computing, Short Paper, June 2014. Google ScholarDigital Library
- C.-K. Luk, R. Cohn et al., "Pin: building customized program analysis tools with dynamic instrumentation," in Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, ser. PLDI '05. New York, NY, USA: ACM, 2005, pp. 190--200. Google ScholarDigital Library
- M. M. K. Martin, D. J. Sorin et al., "Multifacet's general execution-driven multiprocessor simulator (gems) toolset," SIGARCH Comput. Archit. News, vol. 33, no. 4, pp. 92--99, Nov. 2005. Google ScholarDigital Library
- OpenACC, "OpenACC: Directives for Accelerators," {Online}. Available: http://www.openacc-standard.org, 2011, (Accessed Jan. 16, 2015).Google Scholar
- D. J. Quinlan, "Rose: Compiler support for object-oriented frameworks." Parallel Processing Letters, vol. 10, no. 2/3, pp. 215--226, 2000.Google ScholarCross Ref
- A. F. Rodrigues, K. S. Hemmert et al., "The structural simulation toolkit," SIGMETRICS Performance Evaluation Review, vol. 38, no. 4, pp. 37--42, 2011. Google ScholarDigital Library
- K. Spafford and J. S. Vetter, "Aspen: A domain specific language for performance modeling," in SC12: ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis, 2012. Google ScholarDigital Library
- N. R. Tallent and A. Hoisie, "Palm: Easing the burden of analytical performance modeling," in to appear, 2014.Google ScholarDigital Library
- L. G. Valiant, "A bridging model for parallel computation," Communications of the ACM, vol. 33, no. 8, pp. 103--111, 1990. Google ScholarDigital Library
- J. S. Vetter, R. Glassbrook et al., "Keeneland: Bringing heterogeneous GPU computing to the computational science community," IEEE Computing in Science and Engineering, vol. 13, no. 5, pp. 90--95, 2011. Google ScholarDigital Library
- W. Wu and B. C. Lee, "Inferred models for dynamic and sparse hardware-software spaces," in Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on. IEEE, 2012, pp. 413--424. Google ScholarDigital Library
Index Terms
- COMPASS: A Framework for Automated Performance Modeling and Prediction
Recommendations
Modeling and predicting performance of high performance computing applications on hardware accelerators
Hybrid-core systems speedup applications by offloading certain compute operations that can run faster on hardware accelerators. However, such systems require significant programming and porting effort to gain a performance benefit from the accelerators. ...
BenchFriend: Correlating the performance of GPU benchmarks
Graphics processing units GPUs have become an important platform for general-purpose computing, thanks to their high parallel throughput and high memory bandwidth. GPUs present significantly different architectures from CPUs and require specific ...
An Investigation into the Application of Different Performance Prediction Methods to Distributed Enterprise Applications
Response time predictions for workload on new server architectures can enhance Service Level Agreement--based resource management. This paper evaluates two performance prediction methods using a distributed enterprise application benchmark. The ...
Comments