ABSTRACT
When applying optimizations, a number of decisions are made using fixed strategies, such as always applying an optimization if it is applicable, applying optimizations in a fixed order and assuming a fixed configuration for optimizations such as tile size and loop unrolling factor. While it is widely recognized that these fixed strategies may not be the most appropriate for producing high quality code, especially for embedded systems, there are no general and automatic strategies that do otherwise. In this paper, we present a framework that enables these decisions to be made based on predicting the impact of an optimization, taking into account resources and code context. The framework consists of optimization models, code models and resource models, which are integrated for predicting the impact of applying optimizations. Because data cache performance is important to embedded codes, we focus on cache performance and present an instance of the framework for cache performance in this paper. Since most opportunities for cache improvement come from loop optimizations, we describe code, optimization and cache models tailored to predict the impact of applying loop optimizations for data locality. Experimentally we demonstrate the need to selectively apply optimizations and show the performance benefit of our framework in predicting when to apply an optimization. We also show that our framework can be used to choose the most beneficial optimization when a number of optimizations can be applied to a loop nest. And lastly, we show that we can use the framework to combine optimizations on a loop nest.
- Ampro's EnCore Family of Processor-Independent Modules for Embedded Systems, 2000, http://www.ampro.com/assets/applets/EnCore_Backgrounder.PDFGoogle Scholar
- D.C. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. UW Computer Sciences Technical Report 1342, June, 1997. Google ScholarDigital Library
- M. Berry, D. Chen, P. Koss, D. Kuck and et al. PERFECT Club Benchmarks: Effective Performance Evaluation of Supercomputers. The International Journal of Supercomputer Applications,1988.Google Scholar
- D. Bacon, S. Graham, and O. Sharp. Compiler Transformations for High-Performance Computing. ACM Computing Surveys, 26(4): 345--420, December 1994. Google ScholarDigital Library
- Click, C. and Cooper, K. D. Combining Analyses, Combining Optimizations. ACM Transactions on Programming Languages and Systems (TOPLAS) March 1995. Google ScholarDigital Library
- B. Chandramouli, J. Carter, W. Hsieh, and S. McKee. A Cost Framework for Evaluating Integrated Restructuring Optimizations. International Conference on Parallel Architectures and Compilation Techniques, Barcelona, Spain, September 2001. Google ScholarDigital Library
- K. Cooper, D. Subramanian, and L. Torczon. Adaptive Optimizing Compilers for the 21st Century. Proceedings of the 2001 LACSI Symposium, Santa Fe, NM, USA, October, 2001.Google Scholar
- S. Ghosh, M. Martonosi, and S. Malik. Cache Miss Equations: A Compiler Framework for Analyzing and Tuning Behavior. ACM Transactions on Programming Languages and Systems, 21(4): 703--746, July 1999. Google ScholarDigital Library
- J. S. Hu, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, H. Saputra, and W. Zhang. Compiler-Directed Cache Polymorphism. In Proc. of LCTES/SCOPES, June 2002. Google ScholarDigital Library
- M. Kandemir, J. Ramanujam, and A. Choudhary. Improving Cache Locality by a Combination of Loop and Data Transformations. IEEE Transactions on Computers, Vol. 48, No. 2, February 1999. Google ScholarDigital Library
- C. Lee, M. Potkonjak, and W.H. Mangione-Smith, "MediaBench: A tool for evaluating and synthesizing multimedia and communications systems", 30th Int'l. Symposium on Microarchitecture (MICRO-30), December 1997. Google ScholarDigital Library
- K. Mckinley, S. Carr, and C. Tseng. Improving Data Locality with Loop Transformations. ACM Transactions on Programming Languages and Systems, 18(4): 424--453, July 1996. Google ScholarDigital Library
- K. McKinley and O. Temam. A Quantitative Analysis of Loop Nest Locality. Proc. of the Seventh International Symposium on Architectural Support for Programming Languages and Operating Systems, October 1996. Google ScholarDigital Library
- G. Rivera and C. Tseng. Data Transformations for Eliminating Conflict Misses. In Proc. of SIGPLAN'98 Conference on Programming Language Design and Implementation, 1998. Google ScholarDigital Library
- V. Sarkar, Automatic Selection of high-order transformations in the IBM XL FORTRAN compilers, IBM Journal of Research and Development, May 1997. Google ScholarDigital Library
- V. Sarkar and R. Thekkath, A General Framework for Iteration-Reordering Loop Transformations. SIGPLAN Conf. on Programming Lang. Design and Implementation, 1992. Google ScholarDigital Library
- Jim Turley, Embedded Processors by the Numbers, http://www.embedded.com/1999/9905/9905turley.htmGoogle Scholar
- O. Temam, C. Fricker and W. Jalby. Cache Interference Phenomena. In Proc. of SIGMETRICS Conference on Measurement and Modeling Computer Systems, 1994. Google ScholarDigital Library
- S. Triantafyllis, M. Vachharajani, N. Vachharajani, and D.I. August. Compiler Optimization-space Exploration. 1st International Symposium on Code Generation and Optimization, March 2003. Google ScholarDigital Library
- N. Vijaykrishnan, M. Kandemir, M. J. Irwin, H. Kim and W. Ye. A unified energy estimation framework with integrated hardware-software optimizations. In Proc. of the 27th International Symposium on Computer Architecture, 2000. Google ScholarDigital Library
- M. Wolf and M. Lam, A Data Locality Optimizing Algorithm, In Proc. of SIGPLAN'91 Conference on Programming Language Design and Implementation, Toronto, Canada, 1991. Google ScholarDigital Library
- D. Weikle, S. Mckee, K. Skadron, and W. Wulf. Caches As Filters: A New Approach To Cache Analysis. 6th Intl. Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS'98), July 1998, Montreal Canada. Google ScholarDigital Library
- D. Whitfield and M. L. Soffa. An Approach for Exploring Code Improving Transformations. ACM Transactions on Programming Languages, 19(6):1053--1084, 1997. Google ScholarDigital Library
- W. Zhao, B. Cai, D. Whalley et al., VISTA: A System for Interactive Code Improvement, ACM Conf. On Languages, Compilers, and Tools for Embedded Systems, 2002. Google ScholarDigital Library
- V. Zivojnovic, J. Martinez, C. Schlager, and H. Meyr, DSPstone: A DSP-Oriented Benchmarking methodology, Proc. of International Conference on Signal Processing Applications and Technology, Dallas, Texas, Oct 1994.Google Scholar
Index Terms
- Predicting the impact of optimizations for embedded systems
Recommendations
Predicting the impact of optimizations for embedded systems
Special Issue: Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool support for embedded systems (San Diego, CA).When applying optimizations, a number of decisions are made using fixed strategies, such as always applying an optimization if it is applicable, applying optimizations in a fixed order and assuming a fixed configuration for optimizations such as tile ...
Improving whole-program locality using intra-procedural and inter-procedural transformations
Exploiting spatial and temporal locality is essential for obtaining high performance on modern computers. Writing programs that exhibit high locality of reference is difficult and error-prone. Compiler researchers have developed loop transformations that ...
Quasidynamic Layout Optimizations for Improving Data Locality
Compiler-directed locality optimization techniques are effective in reducing the number of cycles spent in off-chip memory accesses. Recently, methods have been developed that transform memory layouts of data structures at compile-time to improve ...
Comments