skip to main content
10.1145/3337821.3337865acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

SaC: Exploiting Execution-Time Slack to Save Energy in Heterogeneous Multicore Systems

Published:05 August 2019Publication History

ABSTRACT

Reducing the energy to carry out computational tasks is key to almost any computing application. We focus in this paper on iterative applications that have explicit computational deadlines per iteration. Our objective is to meet the computational deadlines while minimizing energy. We leverage the vast configuration space offered by heterogeneous multicore platforms which typically expose three dimensions for energy saving configurability: Voltage/frequency levels, thread count and core type (e.g. ARM big/LITTLE). We note that when choosing the most energy-efficient configuration that meets the computational deadline, an iteration will typically finish before the deadline and execution-time slack will build up across iterations. Our proposed slack management policy - SaC (Slack as a Currency) - proactively explores the configuration space to select configurations that can save substantial amounts of energy. To avoid the overheads of an exhaustive search of the configuration space, our proposal also comprises a low-overhead, on-line method by which one can assess each point in the configuration space by linearly interpolating between the endpoints in each configuration-space dimension. Overall, we show that our proposed slack management policy and linear-interpolation configuration assessment method can yield 62% energy savings on top of race-to-idle without missing any deadlines.

References

  1. Susanne Albers and Antonios Antoniadis. 2014. Race to Idle: New Algorithms for Speed Scaling with a Sleep State. ACM Trans. Algorithms 10, 2, Article 9 (Feb. 2014), 31 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC). 44--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Chitlur, G. Srinivasa, S. Hahn, P. K. Gupta, D. Reddy, D. Koufaty, P. Brett, A. Prabhakaran, L. Zhao, N. Ijih, S. Subhaschandra, S. Grover, X. Jiang, and R. Iyer. 2012. QuickIA: Exploring heterogeneous architectures on real prototypes. In IEEE International Symposium on High-Performance Comp Architecture. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hongsuk Chung, Munsik Kang, and Hyun-Duk Cho. 2013. Heterogeneous Multi-Processing Solution of Exynos 5 Octa with ARM big.LITTLETM Technology. (2013). https://www.arm.com/files/pdf/Heterogeneous_Multi_Processing_Solution_of_Exynos_5_Octa_with_ARM_bigLITTLE_Technology.pdfGoogle ScholarGoogle Scholar
  5. M. Curtis-Maury, A. Shah, F. Blagojevic, D. S. Nikolopoulos, B. R. de Supinski, and M. Schulz. 2008. Prediction models for multi-dimensional power-performance optimization on many cores. In 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT). 250--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Stephen Dawson-Haggerty, Andrew Krioukov, and David E. Culler. 2009. Power Optimization - a Reality Check. Technical Report UCB/EECS-2009-140. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-140.htmlGoogle ScholarGoogle Scholar
  7. Daniele De Sensi, Massimo Torquati, and Marco Danelutto. 2016. A Reconfiguration Algorithm for Power-Aware Parallel Applications. ACM Trans. Archit. Code Optim. 13, 4, Article 43 (Dec. 2016), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Christina Delimitrou and Christos Kozyrakis. 2013. QoS-Aware Scheduling in Heterogeneous Datacenters with Paragon. ACM Trans. Comput. Syst. 31, 4, Article 12 (Dec. 2013), 34 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Donyanavard, T. Mück, S. Sarma, and N. Dutt. 2016. SPARTA: Runtime task allocation for energy efficient heterogeneous manycores. In 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Stijn Eyerman and Lieven Eeckhout. 2011. Fine-grained DVFS Using On-chip Regulators. ACM Trans. Archit. Code Optim. 8, 1, Article 1 (Feb. 2011), 24 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C.J. Hughes, J. Srinivasan, and S. V. Adve. 2001. Saving energy with architectural and frequency adaptations for multimedia applications. In Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34. 250--261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Li and J. F. Martinez. 2006. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In The Twelfth International Symposium on High-Performance Computer Architecture, 2006. 77--87.Google ScholarGoogle Scholar
  13. I. Lin, B.Jeff, and I. Rickard. 2016. ARM platform for performance and power efficiency --- Hardware and software perspectives. In 2016 International Symposium on VLSI Design, Automation and Test (VLSI-DAT). 1--5.Google ScholarGoogle Scholar
  14. A. Lukefahr, S. Padmanabha, R. Das, F. M. Sleiman, R. Dreslinski, T. F. Wenisch, and S. Mahlke. 2012. Composite Cores: Pushing Heterogeneity Into a Core. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. 317--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nikola Markovic. 2015. Hardware thread scheduling algorithms for single-ISA asymmetric CMPs. TDX (Tesis Doctorals en Xarxa) November (12 2015). https://upcommons.upc.edu/handle/2117/96039Google ScholarGoogle Scholar
  16. S. Park, J. Park, D. Shin, Y. Wang, Q. Xie, M. Pedram, and N. Chang. 2013. Accurate Modeling of the Delay and Energy Overhead of Dynamic Voltage and Frequency Scaling in Modern Microprocessors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 5 (May 2013), 695--708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. V. Petrucci, M. A. Laurenzano, J. Doherty, Y. Zhang, D. Mossé, J. Mars, and L. Tang. 2015. Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 246--258.Google ScholarGoogle Scholar
  18. Vinicius Petrucci, Orlando Loques, Daniel Mossé, Rami Melhem, Neven Abou Gazala, and Sameh Gobriel. 2015. Energy-Efficient Thread Assignment Optimization for Heterogeneous Multicore Systems. ACM Trans. Embed. Comput. Syst. 14, 1, Article 15 (Jan. 2015), 26 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. K. Porterfield, S. L. Olivier, S. Bhalachandra, and J. F. Prins. 2013. Power Measurement and Concurrency Throttling for Energy Reduction in OpenMP Programs. In 2013 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum. 884--891. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Raghavan, L. Emurian, L. Shao, M. Papaefthymiou, K. P. Pipe, T. F. Wenisch, and M. M. K. Martin. 2013. Utilizing Dark Silicon to Save Energy with Computational Sprinting. IEEE Micro 33, 5 (Sep. 2013), 20--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Sayadi, D. Pathak, I. Savidis, and H. Homayoun. 2018. Power conversion efficiency-aware mapping of multithreaded applications on heterogeneous architectures: A comprehensive parameter tuning. In 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC). 70--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Själander, S. A. McKee, P. Brauer, D. Engdal, and A. Vajda. 2012. An LTE Uplink Receiver PHY benchmark and subframe-based power management. In 2012 IEEE International Symposium on Performance Analysis of Systems Software. 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Spiliopoulos, S. Kaxiras, and G. Keramidas. 2011. Green governors: A framework for Continuously Adaptive DVFS. In 2011 International Green Computing Conference and Workshops. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Su, J. Gu, L. Shen, W. Huang, J. L. Greathouse, and Z. Wang. 2014. PPEP: Online Performance, Power, and Energy Prediction Framework and DVFS Space Exploration. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 445--457. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jinho Suh, Chieh-Ting Huang, and Michel Dubois. 2015. Dynamic MIPS Rate Stabilization for Complex Processors. ACM Trans. Archit. Code Optim. 12, 1, Article 4 (April 2015), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Aater Suleman, Moinuddin K. Qureshi, and Yale N. Patt. 2008. Feedback-driven Threading: Power-efficient and High-performance Execution of Multithreaded Workloads on CMPs. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). ACM, New York, NY, USA, 277--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Van Craeynest, A. Jaleel, L. Eeckhout, P. Narvaez, and J. Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In 2012 39th Annual International Symposium on Computer Architecture (ISCA). 213--224. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SaC: Exploiting Execution-Time Slack to Save Energy in Heterogeneous Multicore Systems

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICPP '19: Proceedings of the 48th International Conference on Parallel Processing
          August 2019
          1107 pages
          ISBN:9781450362955
          DOI:10.1145/3337821

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 August 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate91of313submissions,29%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader