Energy-Awareness and Performance Management with Parallel Dataflow Applications

Holmbacka, Simon; Nogues, Erwan; Pelcat, Maxime; Lafond, Sébastien; Menard, Daniel; Lilius, Johan

doi:10.1007/s11265-015-1059-4

Energy-Awareness and Performance Management with Parallel Dataflow Applications

Published: 06 November 2015

Volume 87, pages 33–48, (2017)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Simon Holmbacka¹,
Erwan Nogues²,
Maxime Pelcat²,
Sébastien Lafond³,
Daniel Menard² &
…
Johan Lilius³

418 Accesses
4 Citations
Explore all metrics

Abstract

Applications have traditionally been executed as fast as possible (Race-to-Idle) and mapped to as many cores as possible (Fair scheduling) to minimize the energy consumption. With modern hardware, this method has become inefficient because of the power characteristics of the platforms. Instead, applications should utilize an optimal combination of clock frequency and number of cores to balance the dynamic and static power. Such approaches have been difficult to achieve since resource allocation is based only on CPU utilization. Resources are then allocated to prohibit over utilization rather than following software performance requirements. By adjusting the clock frequency directly according to software requirements and activating CPU cores according to the application parallelism, significant energy can be saved by lowering the average power dissipation. To enforce these recommendations, this paper provides means of expressing performance and parallelism in applications for more tight integration with the power management to balance the execution speed and mapping on multi-core systems. An interface between the applications and the hardware resources is provided in combination with a novel power management runtime system called Bricktop. A signal processing case study demonstrates real-world energy savings up to 50 % without performance degradation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

Illuminating the I/O Optimization Path of Scientific Applications

Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir

Notes

Note that the performance model can be exchanged or manipulated if the user considers Amdahl’s law not accurate enough or if it does not represent the real system behavior in other ways.
E and Q are normalized to the range in which q and c operate.
The upthreshold in Linux is usually set based on best practice for the system in question. Typical settings are around 80–95 % of full workload (100 %).

References

Aydin, H., Melhem, R., Mosse, D., & Mejia-Alvarez, P. (2004). Power-aware scheduling for periodic real-time tasks. IEEE Transactions on Computers, 53(5), 584–600. doi:10.1109/TC.2004.1275298.
Article Google Scholar
Azeemi, N.Z. (2006). Exploiting parallelism for energy efficient source code high performance computing. In IEEE International Conference on Industrial Technology, 2006. ICIT 2006. doi:10.1109/ICIT.2006.372685(pp. 2741–2746).
Brodowski, D. (2013). Cpu frequency and voltage scaling code in the linux(tm) kernel. https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt.
Cervin, A., Henriksson, D., Lincoln, B., Eker, J., & Årzén, K.E. (2003). How does control timing affect performance? Analysis and simulation of timing using Jitterbug and TrueTime. IEEE Control Systems Magazine, 23 (3), 16–30.
Article Google Scholar
Chandrakasan, A., Sheng, S., & Brodersen, R. (1992). Low-power cmos digital design. Solid-State Circuits . Journal of IEEE, 27(4), 473–484. doi:10.1109/4.126534.
Google Scholar
Cho, S., & Melhem, R. (2010). On the interplay of parallelization, program performance, and energy consumption. Parallel and Distributed Systems. Transactions on IEEE, 21(3), 342–353. doi:10.1109/TPDS.2009.41.
Google Scholar
Cristea, A., & Okamoto, T. (1999). Speed-up opportunities for ann in a time-share parallel environment. In International Joint Conference on Neural Networks, 1999. IJCNN ’99. vol. 4. doi:10.1109/IJCNN.1999.833446, (Vol. 4 pp. 2410–2413).
Lee, E., & D.m. (1987). Static scheduling of synchronous data-flow programs for digital signal processing. IEEE Transactions on Computers, 24–35.
Eyerman, S., Eeckhout, L., Karkhanis, T., & Smith, J.E. (2009). A mechanistic performance model for superscalar out-of-order processors. ACM Transactions on Computer Systems 27 (2), 3:1–3:37. doi:10.1145/1534909.1534910.
Gill, P. E., Murray, W., & Michael, Saunders, M.A. (1997). Snopt An sqp algorithm for large-scale constrained optimization. SIAM Journal on Optimization, 12, 979–1006.
Article MathSciNet MATH Google Scholar
Hähnel, M., & Härtig, H. (2014). Heterogeneity by the numbers: A study of the odroid xu+e big. little platform. In Proceedings of the 6th USENIX Conference on Power-Aware Computing and Systems, HotPower’14, pp. 3–3. USENIX Association, Berkeley, CA, USA. http://dl.acm.org/citation.cfm?id=2696568.2696571.
Hällis, F., Holmbacka, S., Lund, W., Slotte, R., Lafond, S., & Lilius, J. (2013). Thermal influence on the energy efficiency of workload consolidation in many-core architectures. In Digital Communications - Green ICT (TIWDC), 2013 24th Tyrrhenian International Workshop on. doi:10.1109/TIWDC.2013.6664218 (pp. 1–6).
Haque, M., Aydin, H., & Zhu, D. (2013). Energy-aware task replication to manage reliability for periodic real-time applications on multicore platforms. In International Green Computing Conference (IGCC), 2013. doi:10.1109/IGCC.2013.6604518 (pp. 1–11).
He, Y., Leiserson, C.E., & Leiserson, W.M. (2010). The cilkview scalability analyzer. In Proceedings of the Twenty-second Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’10, pp. 145–156. ACM, New York, NY, USA. doi:10.1145/1810479.1810509.
Hoffmann, H., Eastep, J., Santambrogio, M.D., Miller, J.E., & Agarwal, A. (2010). Application heartbeats: A generic interface for specifying program performance and goals in autonomous computing environments. In Proceedings of the 7th International Conference on Autonomic Computing, ICAC ’10, pp. 79–88. ACM, New York, NY, USA. doi:10.1145/1809049.1809065.
Hoffmann, H., Eastep, J., Santambrogio, M.D., Miller, J.E., & Agarwal, A. (2010). Application heartbeats for software performance and health. SIGPLAN Not, 45(5), 347–348. doi:10.1145/1837853.1693507.
Hoffmann, H., Sidiroglou, S., Carbin, M., Misailovic, S., Agarwal, A., & Rinard, M. (2011). Dynamic knobs for responsive power-aware computing. SIGPLAN Not, 46(3), 199–212. doi:10.1145/1961296.1950390.
Holmbacka, S., Lafond, S., & Lilius, J. (2015). Performance monitor based power management for big.little platforms. In HIPEAC Workshop on energy efficiency with heterogeneous computing (pp. 1–6).
Hong, I., Kirovski, D., Qu, G., Potkonjak, M., & Srivastava, M. (1998). Power optimization of variable voltage core-based systems. In Design automation conference, 1998. Proceedings (pp. 176–181).
Huang, K., Santinelli, L., Chen, J.J., Thiele, L., & Buttazzo, G. (2009). Adaptive dynamic power management for hard real-time systems. In Real-Time Systems Symposium, 2009, RTSS 2009. 30th IEEE. doi:10.1109/RTSS.2009.25 (pp. 23–32).
Huang, K., Santinelli, L., Chen, J.J., Thiele, L., & Buttazzo, G. (2009). Periodic power management schemes for real-time event streams. In CDC/CCC 2009. Proceedings of the 48th IEEE Conference. doi:10.1109/CDC.2009.5400034 (pp. 6224–6231).
Iondry, K. (1999). Iterative methods for optimization society for industrial and applied mathematics.
Jafri, S., Tajammul, M., Hemani, A., Paul, K., Plosila, J., & Tenhunen, H. (2013). Energy-aware-task-parallelism for efficient dynamic voltage, and frequency scaling, in cgras. In International Conference on Embedded computer systems: Architectures, Modeling, and Simulation (SAMOS XIII), 2013 (pp. 104–112).
Jejurikar, R., Pereira, C., & Gupta, R. (2004). Leakage aware dynamic voltage scaling for real-time embedded systems. In Proceedings of the 41st Annual Design Automation Conference, DAC ’04, pp. 275–280. ACM, New York, NY, USA. doi:10.1145/996566.996650.
Jones, M.T. (2006). Inside the linux scheduler. http://www.ibm.com/developerworks/linux/library/l-scheduler/.
Kahng, A., Kang, S., Kumar, R., & Sartori, J. (2013). Enhancing the efficiency of energy-constrained dvfs designs. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 21(10), 1769–1782. doi:10.1109/TVLSI.2012.2219084.
Karmarkar, N. (1984). A new polynomial-time algorithm for linear programming. In Proceedings of the 16th ACM symposium on Theory of computing, STOC ’84, pp. 302–311. ACM. doi:10.1145/800057.808695.
Khalid, N., Ahmad, S., Noor, N., Fadzil, A., & Taib, M. (2011). Parallel approach of sobel edge detector on multicore platform. International Journal of Computers and Communications Issue, 4, 236–244.
Google Scholar
Kim, N., Austin, T., Baauw, D., Mudge, T., Flautner, K., Hu, J., Irwin, M., Kandemir, M., & Narayanan, V. (2003). Leakage current: Moore’s law meets static power. Computer, 36(12), 68–75. doi:10.1109/MC.2003.1250885.
Kim, W., Shin, D., Yun, H.S., Kim, J., & Min, S.L. (2002). Performance comparison of dynamic voltage scaling algorithms for hard real-time systems. In Real-Time and Embedded Technology and Applications Symposium, 2002. Proceedings. Eighth IEEE. doi:10.1109/RTTAS.2002.1137397 (pp. 219–228).
M’zah, A., & Hammami, O. (2010). Parallel programming and speed up evaluation of a noc 2-ary 4-fly. In International Conference on Microelectronics (ICM), 2010. 10.1109/ICM.2010.5696103 (pp. 156–159).
Nollet, V., Verkest, D., & Corporaal, H. (2008). A safari through the mpsoc run-time management jungle. Journal of Signal Processing Systems, 60(2), 251–268.
Article Google Scholar
Pelcat, M., Piat, J., Wipliez, M., Aridhi, S., & Nezan, J. F. (2009). An open framework for rapid prototyping of signal processing applications. EURASIP journal on embedded systems, 2009, 11.
Article Google Scholar
Qiu, M., Niu, J.W., Yang, L., Qin, X., Zhang, S., & Wang, B. (2010). Energy-aware loop parallelism maximization for multi-core dsp architectures. In Green Computing and Communications (GreenCom), 2010 IEEE/ACM Int’l Conference on Int’l Conference on Cyber, Physical and Social Computing (CPSCom). doi:10.1109/GreenCom-CPSCom.2010.87 (pp. 205–212).
Rauber, T., & Runger, G. (2012). Energy-aware execution of fork-join-based task parallelism. In IEEE 20th International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2012. doi:10.1109/MASCOTS.2012.35 (pp. 231–240).
Sadri, M., Bartolini, A., & Benini, L. (2011). Single-chip cloud computer thermal model. In 17th international workshop on Thermal investigations of ICs and systems (THERMINIC), 2011 (pp. 1–6).
Sasaki, H., Imamura, S., & Inoue, K. (2013). Coordinated power-performance optimization in manycores. In 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT), 2013. doi:10.1109/PACT.2013.6618803 (pp. 51–61).
Seth, K., Anantaraman, A., Mueller, F., & Rotenberg, E. (2003). Fast: Frequency-aware static timing analysis. In Proceedings of the 24th IEEE international Real-Time Systems Symposium, RTSS ’03, pp. 40–. IEEE computer society, washington, DC, USA.
Singh, H., Agarwal, K., Sylvester, D., & Nowka, K. (2007). IEEE Transactions on Enhanced leakage reduction techniques using intermediate strength power gating. Very Large Scale Integration (VLSI) Systems, 15(11), 1215–1224. doi:10.1109/TVLSI.2007.904101.
Takouna, I., Dawoud, W., & Meinel, C. (2011). Accurate mutlicore processor power models for power-aware resource management. In IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC), 2011. doi:10.1109/DASC.2011.85 (pp. 419–426).
Truchet, C., Richoux, F., & Codognet, P. (2013). Prediction of parallel speed-ups for las vegas algorithms. In 42nd International Conference on Parallel Processing (ICPP), 2013. doi:10.1109/ICPP.2013.25(pp. 160–169).

Download references

Author information

Authors and Affiliations

Turku Centre for Computer Science, Joukahaisenkatu 3–5, 20520, Turku, Finland
Simon Holmbacka
IETR Image Group, INSA de Rennes, Rennes, France
Erwan Nogues, Maxime Pelcat & Daniel Menard
Faculty of Science and Engineering, Åbo Akademi University, Turku, Finland
Sébastien Lafond & Johan Lilius

Authors

Simon Holmbacka
View author publications
You can also search for this author in PubMed Google Scholar
Erwan Nogues
View author publications
You can also search for this author in PubMed Google Scholar
Maxime Pelcat
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Lafond
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Menard
View author publications
You can also search for this author in PubMed Google Scholar
Johan Lilius
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Holmbacka.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Holmbacka, S., Nogues, E., Pelcat, M. et al. Energy-Awareness and Performance Management with Parallel Dataflow Applications. J Sign Process Syst 87, 33–48 (2017). https://doi.org/10.1007/s11265-015-1059-4

Download citation

Received: 28 February 2015
Revised: 10 August 2015
Accepted: 05 October 2015
Published: 06 November 2015
Issue Date: April 2017
DOI: https://doi.org/10.1007/s11265-015-1059-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Energy-Awareness and Performance Management with Parallel Dataflow Applications

Abstract

Access this article

Similar content being viewed by others

Shared Memory Parallelism in Modern C++ and HPX

Illuminating the I/O Optimization Path of Scientific Applications

Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Energy-Awareness and Performance Management with Parallel Dataflow Applications

Abstract

Access this article

Similar content being viewed by others

Shared Memory Parallelism in Modern C++ and HPX

Illuminating the I/O Optimization Path of Scientific Applications

Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation