ABSTRACT
Generating instead of implementing variable design platforms is becoming increasingly popular in the development of System on Chips. This shift also poses the challenge of rapid compiler optimization that adapts to each newly generated platform. In this paper, we evaluate the impact of 104 compiler flags on memory usage and core execution time against standard optimization levels. Each flag has a different influence on these costs, which is difficult to predict. In this work, we apply cost estimation methods to predict the impact of each flag on the generated core using unsupervised Machine Learning, in the form of k-means clustering. The key strengths of the approach are the low need for data, the adaptability to new cores, and the ease of use. This helps the designer to understand the impact of flags on related applications, showing which combination is optimizing the most. As a result, we can obtain 20,93% optimization on the software size, 3,10% on the performance, and 1,75% on their trade-off beyond the -O3 optimization.
Supplemental Material
- Amir H Ashouri, William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano. 2018. A survey on compiler autotuning using machine learning. ACM Computing Surveys (CSUR) 51, 5 (2018), 1--42. Google ScholarDigital Library
- Amir Hossein Ashouri, Gianluca Palermo, and Cristina Silvano. 2016. An Evaluation of Autotuning Techniques for the Compiler Optimization Problems.. In RES4ANT@ DATE. 23--27.Google Scholar
- Craig Blackmore, Oliver Ray, and Kerstin Eder. 2017. Automatically Tuning the GCC Compiler to Optimize the Performance of Applications Running on Embedded Systems. arXiv preprint arXiv:1703.08228 (2017).Google Scholar
- François Bodin, Toru Kisuki, Peter Knijnenburg, Mike O'Boyle, and Erven Rohou. 1998. Iterative compilation in a non-linear optimisation space.Google Scholar
- Leslie Pérez Cáceres, Federico Pagnozzi, Alberto Franzin, and Thomas Stützle. 2017. Automatic configuration of GCC using irace. In International Conference on Artificial Evolution (Evolution Artificielle). Springer, 202--216.Google Scholar
- Yang Chen, Shuangde Fang, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Olivier Temam, and ChengyongWu. 2012. Deconstructing iterative optimization. ACM Transactions on Architecture and Code Optimization (TACO) 9, 3 (2012), 1--30. Google ScholarDigital Library
- Wolfgang Ecker, Keerthikumara Devarajegowda, Michael Werner, Zhao Han, and Lorenzo Servadei. 2019. Embedded Systems? Automation following OMG?s Model Driven Architecture Vision. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1301--1306.Google Scholar
- Charles Elkan. 2003. Using the triangle inequality to accelerate k-means. In Proceedings of the 20th international conference on Machine Learning (ICML-03). 147--153. Google ScholarDigital Library
- Vladimir Estivill-Castro. 2002. Why so many clustering algorithms: a position paper. ACM SIGKDD explorations newsletter 4, 1 (2002), 65--75. Google ScholarDigital Library
- Brian S. Everitt, Sabine Landau, and Morven Leese. 2009. Cluster Analysis (4th ed.). Wiley Publishing. Google ScholarDigital Library
- Grigori Fursin, Yuriy Kashnikov, Abdul Wahid Memon, Zbigniew Chamski, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Bilha Mendelson, Ayal Zaks, Eric Courtois, et al. 2011. Milepost gcc: Machine learning enabled self-tuning compiler. International journal of parallel programming 39, 3 (2011), 296--327.Google Scholar
- Kyriakos Georgiou, Craig Blackmore, Samuel Xavier-de Souza, and Kerstin Eder. 2018. Less is more: Exploiting the standard compiler optimization levels for better performance and energy consumption. In Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems. 35--42. Google ScholarDigital Library
- Geoffrey E Hinton, Terrence Joseph Sejnowski, Tomaso A Poggio, et al. 1999. Unsupervised learning: foundations of neural computation. MIT press. Google ScholarDigital Library
- S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory (1982). Google ScholarDigital Library
- Luiz GA Martins, Ricardo Nobre, Joao MP Cardoso, Alexandre CB Delbem, and Eduardo Marques. 2016. Clustering-based selection for the exploration of compiler optimization sequences. ACM Transactions on Architecture and Code Optimization (TACO) 13, 1 (2016), 1--28. Google ScholarDigital Library
- William F Ogilvie, PetoumeAutomatic Tuning of Compiler Optimizations, Pavlos Analysis of their Impactnos, Zheng Wang, and Hugh Leather. 2017. Minimizing the cost of iterative compilation with active learning. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 245--256. Google ScholarDigital Library
- James Pallister, Simon J Hollis, and Jeremy Bennett. 2015. Identifying compiler options to minimize energy consumption for embedded platforms. Comput. J. 58, 1 (2015), 95--109.Google ScholarCross Ref
- Dmitry Plotnikov, Dmitry Melnik, Mamikon Vardanyan, Ruben Buchatskiy, Roman Zhuykov, and Je-Hyung Lee. 2013. Automatic tuning of compiler optimizations and analysis of their impact. Procedia Computer Science 18 (2013), 1312--1321.Google ScholarCross Ref
- Robert Thorndike. 1953. Who belongs in the family? Psychometrika (1953).Google Scholar
- Spyridon Triantafyllis, Manish Vachharajani, Neil Vachharajani, and David I August. 2003. Compiler optimization-space exploration. In International Symposium on Code Generation and Optimization, 2003. CGO 2003. IEEE, 204--215. Google ScholarDigital Library
- Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik. 2009. Dimensionality reduction: a comparative. J Mach Learn Res 10, 66--71 (2009), 13.Google Scholar
- Zheng Wang and Michael O'Boyle. 2018. Machine learning in compiler optimization. Proc. IEEE 106, 11 (2018), 1879--1901.Google ScholarCross Ref
- Michael Werner, Keerthikumara Devarajegowda, Moomen Chaari, and Wolfgang Ecker. 2019. Increasing Soft Error Resilience by Software. In 2019 56th ACM/IEEE Design Automation Conference (DAC). IEEE, 1--4. Google ScholarDigital Library
Index Terms
- Automatic compiler optimization on embedded software through k-means clustering
Recommendations
A MATLAB Vectorizing Compiler Targeting Application-Specific Instruction Set Processors
Special Section of IDEA: Integrating Dataflow, Embedded Computing, and ArchitectureThis article discusses a MATLAB-to-C vectorizing compiler that exploits custom instructions, for example, for Single Instruction Multiple Data (SIMD) processing and instructions for complex arithmetic present in Application-Specific Instruction Set ...
FPGA prototyping of a RISC processor core for embedded applications
Application-specific processors offer an attractive option in the design of embedded systems by providing high performance for a specific application domain. In this work, we describe the use of a reconfigurable processor core based on an RISC ...
Energy savings and speedups from partitioning critical software loops to hardware in embedded systems
We present results of extensive hardware/software partitioning experiments on numerous benchmarks. We describe our loop-oriented partitioning methodology for moving critical code from hardware to software. Our benchmarks included programs from ...
Comments