ABSTRACT
Reducing the energy to carry out computational tasks is key to almost any computing application. We focus in this paper on iterative applications that have explicit computational deadlines per iteration. Our objective is to meet the computational deadlines while minimizing energy. We leverage the vast configuration space offered by heterogeneous multicore platforms which typically expose three dimensions for energy saving configurability: Voltage/frequency levels, thread count and core type (e.g. ARM big/LITTLE). We note that when choosing the most energy-efficient configuration that meets the computational deadline, an iteration will typically finish before the deadline and execution-time slack will build up across iterations. Our proposed slack management policy - SaC (Slack as a Currency) - proactively explores the configuration space to select configurations that can save substantial amounts of energy. To avoid the overheads of an exhaustive search of the configuration space, our proposal also comprises a low-overhead, on-line method by which one can assess each point in the configuration space by linearly interpolating between the endpoints in each configuration-space dimension. Overall, we show that our proposed slack management policy and linear-interpolation configuration assessment method can yield 62% energy savings on top of race-to-idle without missing any deadlines.
- Susanne Albers and Antonios Antoniadis. 2014. Race to Idle: New Algorithms for Speed Scaling with a Sleep State. ACM Trans. Algorithms 10, 2, Article 9 (Feb. 2014), 31 pages. Google ScholarDigital Library
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC). 44--54. Google ScholarDigital Library
- N. Chitlur, G. Srinivasa, S. Hahn, P. K. Gupta, D. Reddy, D. Koufaty, P. Brett, A. Prabhakaran, L. Zhao, N. Ijih, S. Subhaschandra, S. Grover, X. Jiang, and R. Iyer. 2012. QuickIA: Exploring heterogeneous architectures on real prototypes. In IEEE International Symposium on High-Performance Comp Architecture. 1--8. Google ScholarDigital Library
- Hongsuk Chung, Munsik Kang, and Hyun-Duk Cho. 2013. Heterogeneous Multi-Processing Solution of Exynos 5 Octa with ARM big.LITTLETM Technology. (2013). https://www.arm.com/files/pdf/Heterogeneous_Multi_Processing_Solution_of_Exynos_5_Octa_with_ARM_bigLITTLE_Technology.pdfGoogle Scholar
- M. Curtis-Maury, A. Shah, F. Blagojevic, D. S. Nikolopoulos, B. R. de Supinski, and M. Schulz. 2008. Prediction models for multi-dimensional power-performance optimization on many cores. In 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT). 250--259. Google ScholarDigital Library
- Stephen Dawson-Haggerty, Andrew Krioukov, and David E. Culler. 2009. Power Optimization - a Reality Check. Technical Report UCB/EECS-2009-140. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-140.htmlGoogle Scholar
- Daniele De Sensi, Massimo Torquati, and Marco Danelutto. 2016. A Reconfiguration Algorithm for Power-Aware Parallel Applications. ACM Trans. Archit. Code Optim. 13, 4, Article 43 (Dec. 2016), 25 pages. Google ScholarDigital Library
- Christina Delimitrou and Christos Kozyrakis. 2013. QoS-Aware Scheduling in Heterogeneous Datacenters with Paragon. ACM Trans. Comput. Syst. 31, 4, Article 12 (Dec. 2013), 34 pages. Google ScholarDigital Library
- B. Donyanavard, T. Mück, S. Sarma, and N. Dutt. 2016. SPARTA: Runtime task allocation for energy efficient heterogeneous manycores. In 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 1--10. Google ScholarDigital Library
- Stijn Eyerman and Lieven Eeckhout. 2011. Fine-grained DVFS Using On-chip Regulators. ACM Trans. Archit. Code Optim. 8, 1, Article 1 (Feb. 2011), 24 pages. Google ScholarDigital Library
- C.J. Hughes, J. Srinivasan, and S. V. Adve. 2001. Saving energy with architectural and frequency adaptations for multimedia applications. In Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34. 250--261. Google ScholarDigital Library
- J. Li and J. F. Martinez. 2006. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In The Twelfth International Symposium on High-Performance Computer Architecture, 2006. 77--87.Google Scholar
- I. Lin, B.Jeff, and I. Rickard. 2016. ARM platform for performance and power efficiency --- Hardware and software perspectives. In 2016 International Symposium on VLSI Design, Automation and Test (VLSI-DAT). 1--5.Google Scholar
- A. Lukefahr, S. Padmanabha, R. Das, F. M. Sleiman, R. Dreslinski, T. F. Wenisch, and S. Mahlke. 2012. Composite Cores: Pushing Heterogeneity Into a Core. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. 317--328. Google ScholarDigital Library
- Nikola Markovic. 2015. Hardware thread scheduling algorithms for single-ISA asymmetric CMPs. TDX (Tesis Doctorals en Xarxa) November (12 2015). https://upcommons.upc.edu/handle/2117/96039Google Scholar
- S. Park, J. Park, D. Shin, Y. Wang, Q. Xie, M. Pedram, and N. Chang. 2013. Accurate Modeling of the Delay and Energy Overhead of Dynamic Voltage and Frequency Scaling in Modern Microprocessors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 5 (May 2013), 695--708. Google ScholarDigital Library
- V. Petrucci, M. A. Laurenzano, J. Doherty, Y. Zhang, D. Mossé, J. Mars, and L. Tang. 2015. Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 246--258.Google Scholar
- Vinicius Petrucci, Orlando Loques, Daniel Mossé, Rami Melhem, Neven Abou Gazala, and Sameh Gobriel. 2015. Energy-Efficient Thread Assignment Optimization for Heterogeneous Multicore Systems. ACM Trans. Embed. Comput. Syst. 14, 1, Article 15 (Jan. 2015), 26 pages. Google ScholarDigital Library
- A. K. Porterfield, S. L. Olivier, S. Bhalachandra, and J. F. Prins. 2013. Power Measurement and Concurrency Throttling for Energy Reduction in OpenMP Programs. In 2013 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum. 884--891. Google ScholarDigital Library
- A. Raghavan, L. Emurian, L. Shao, M. Papaefthymiou, K. P. Pipe, T. F. Wenisch, and M. M. K. Martin. 2013. Utilizing Dark Silicon to Save Energy with Computational Sprinting. IEEE Micro 33, 5 (Sep. 2013), 20--28. Google ScholarDigital Library
- H. Sayadi, D. Pathak, I. Savidis, and H. Homayoun. 2018. Power conversion efficiency-aware mapping of multithreaded applications on heterogeneous architectures: A comprehensive parameter tuning. In 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC). 70--75. Google ScholarDigital Library
- M. Själander, S. A. McKee, P. Brauer, D. Engdal, and A. Vajda. 2012. An LTE Uplink Receiver PHY benchmark and subframe-based power management. In 2012 IEEE International Symposium on Performance Analysis of Systems Software. 25--34. Google ScholarDigital Library
- V. Spiliopoulos, S. Kaxiras, and G. Keramidas. 2011. Green governors: A framework for Continuously Adaptive DVFS. In 2011 International Green Computing Conference and Workshops. 1--8. Google ScholarDigital Library
- B. Su, J. Gu, L. Shen, W. Huang, J. L. Greathouse, and Z. Wang. 2014. PPEP: Online Performance, Power, and Energy Prediction Framework and DVFS Space Exploration. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 445--457. Google ScholarDigital Library
- Jinho Suh, Chieh-Ting Huang, and Michel Dubois. 2015. Dynamic MIPS Rate Stabilization for Complex Processors. ACM Trans. Archit. Code Optim. 12, 1, Article 4 (April 2015), 25 pages. Google ScholarDigital Library
- M. Aater Suleman, Moinuddin K. Qureshi, and Yale N. Patt. 2008. Feedback-driven Threading: Power-efficient and High-performance Execution of Multithreaded Workloads on CMPs. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). ACM, New York, NY, USA, 277--286. Google ScholarDigital Library
- K. Van Craeynest, A. Jaleel, L. Eeckhout, P. Narvaez, and J. Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In 2012 39th Annual International Symposium on Computer Architecture (ISCA). 213--224. Google ScholarDigital Library
Index Terms
- SaC: Exploiting Execution-Time Slack to Save Energy in Heterogeneous Multicore Systems
Recommendations
Approx-RM: Reducing Energy on Heterogeneous Multicore Processors under Accuracy and Timing Constraints
Reducing energy consumption while providing performance and quality guarantees is crucial for computing systems ranging from battery-powered embedded systems to data centers. This article considers approximate iterative applications executing on ...
Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service Constraints
Improving energy efficiency is an important goal of computer system design. This article focuses on a general model of task-parallel applications under quality-of-service requirements on the completion time. Our technique, called Task-RM, exploits the ...
Efficient Data Migration to Conserve Energy in Streaming Media Storage Systems
Reducing energy consumption has been an important design issue for large-scale streaming media storage systems. Existing energy conservation techniques are inadequate to achieve high energy efficiency for streaming media computing environments due to ...
Comments