Skip to main content

Advertisement

Log in

Energy-Awareness and Performance Management with Parallel Dataflow Applications

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Applications have traditionally been executed as fast as possible (Race-to-Idle) and mapped to as many cores as possible (Fair scheduling) to minimize the energy consumption. With modern hardware, this method has become inefficient because of the power characteristics of the platforms. Instead, applications should utilize an optimal combination of clock frequency and number of cores to balance the dynamic and static power. Such approaches have been difficult to achieve since resource allocation is based only on CPU utilization. Resources are then allocated to prohibit over utilization rather than following software performance requirements. By adjusting the clock frequency directly according to software requirements and activating CPU cores according to the application parallelism, significant energy can be saved by lowering the average power dissipation. To enforce these recommendations, this paper provides means of expressing performance and parallelism in applications for more tight integration with the power management to balance the execution speed and mapping on multi-core systems. An interface between the applications and the hardware resources is provided in combination with a novel power management runtime system called Bricktop. A signal processing case study demonstrates real-world energy savings up to 50 % without performance degradation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19

Similar content being viewed by others

Notes

  1. Note that the performance model can be exchanged or manipulated if the user considers Amdahl’s law not accurate enough or if it does not represent the real system behavior in other ways.

  2. E and Q are normalized to the range in which q and c operate.

  3. The upthreshold in Linux is usually set based on best practice for the system in question. Typical settings are around 80–95 % of full workload (100 %).

References

  1. Aydin, H., Melhem, R., Mosse, D., & Mejia-Alvarez, P. (2004). Power-aware scheduling for periodic real-time tasks. IEEE Transactions on Computers, 53(5), 584–600. doi:10.1109/TC.2004.1275298.

    Article  Google Scholar 

  2. Azeemi, N.Z. (2006). Exploiting parallelism for energy efficient source code high performance computing. In IEEE International Conference on Industrial Technology, 2006. ICIT 2006. doi:10.1109/ICIT.2006.372685(pp. 2741–2746).

  3. Brodowski, D. (2013). Cpu frequency and voltage scaling code in the linux(tm) kernel. https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt.

  4. Cervin, A., Henriksson, D., Lincoln, B., Eker, J., & Årzén, K.E. (2003). How does control timing affect performance? Analysis and simulation of timing using Jitterbug and TrueTime. IEEE Control Systems Magazine, 23 (3), 16–30.

    Article  Google Scholar 

  5. Chandrakasan, A., Sheng, S., & Brodersen, R. (1992). Low-power cmos digital design. Solid-State Circuits . Journal of IEEE, 27(4), 473–484. doi:10.1109/4.126534.

    Google Scholar 

  6. Cho, S., & Melhem, R. (2010). On the interplay of parallelization, program performance, and energy consumption. Parallel and Distributed Systems. Transactions on IEEE, 21(3), 342–353. doi:10.1109/TPDS.2009.41.

    Google Scholar 

  7. Cristea, A., & Okamoto, T. (1999). Speed-up opportunities for ann in a time-share parallel environment. In International Joint Conference on Neural Networks, 1999. IJCNN ’99. vol. 4. doi:10.1109/IJCNN.1999.833446, (Vol. 4 pp. 2410–2413).

  8. Lee, E., & D.m. (1987). Static scheduling of synchronous data-flow programs for digital signal processing. IEEE Transactions on Computers, 24–35.

  9. Eyerman, S., Eeckhout, L., Karkhanis, T., & Smith, J.E. (2009). A mechanistic performance model for superscalar out-of-order processors. ACM Transactions on Computer Systems 27 (2), 3:1–3:37. doi:10.1145/1534909.1534910.

  10. Gill, P. E., Murray, W., & Michael, Saunders, M.A. (1997). Snopt An sqp algorithm for large-scale constrained optimization. SIAM Journal on Optimization, 12, 979–1006.

    Article  MathSciNet  MATH  Google Scholar 

  11. Hähnel, M., & Härtig, H. (2014). Heterogeneity by the numbers: A study of the odroid xu+e big. little platform. In Proceedings of the 6th USENIX Conference on Power-Aware Computing and Systems, HotPower’14, pp. 3–3. USENIX Association, Berkeley, CA, USA. http://dl.acm.org/citation.cfm?id=2696568.2696571.

  12. Hällis, F., Holmbacka, S., Lund, W., Slotte, R., Lafond, S., & Lilius, J. (2013). Thermal influence on the energy efficiency of workload consolidation in many-core architectures. In Digital Communications - Green ICT (TIWDC), 2013 24th Tyrrhenian International Workshop on. doi:10.1109/TIWDC.2013.6664218 (pp. 1–6).

  13. Haque, M., Aydin, H., & Zhu, D. (2013). Energy-aware task replication to manage reliability for periodic real-time applications on multicore platforms. In International Green Computing Conference (IGCC), 2013. doi:10.1109/IGCC.2013.6604518 (pp. 1–11).

  14. He, Y., Leiserson, C.E., & Leiserson, W.M. (2010). The cilkview scalability analyzer. In Proceedings of the Twenty-second Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’10, pp. 145–156. ACM, New York, NY, USA. doi:10.1145/1810479.1810509.

  15. Hoffmann, H., Eastep, J., Santambrogio, M.D., Miller, J.E., & Agarwal, A. (2010). Application heartbeats: A generic interface for specifying program performance and goals in autonomous computing environments. In Proceedings of the 7th International Conference on Autonomic Computing, ICAC ’10, pp. 79–88. ACM, New York, NY, USA. doi:10.1145/1809049.1809065.

  16. Hoffmann, H., Eastep, J., Santambrogio, M.D., Miller, J.E., & Agarwal, A. (2010). Application heartbeats for software performance and health. SIGPLAN Not, 45(5), 347–348. doi:10.1145/1837853.1693507.

  17. Hoffmann, H., Sidiroglou, S., Carbin, M., Misailovic, S., Agarwal, A., & Rinard, M. (2011). Dynamic knobs for responsive power-aware computing. SIGPLAN Not, 46(3), 199–212. doi:10.1145/1961296.1950390.

  18. Holmbacka, S., Lafond, S., & Lilius, J. (2015). Performance monitor based power management for big.little platforms. In HIPEAC Workshop on energy efficiency with heterogeneous computing (pp. 1–6).

  19. Hong, I., Kirovski, D., Qu, G., Potkonjak, M., & Srivastava, M. (1998). Power optimization of variable voltage core-based systems. In Design automation conference, 1998. Proceedings (pp. 176–181).

  20. Huang, K., Santinelli, L., Chen, J.J., Thiele, L., & Buttazzo, G. (2009). Adaptive dynamic power management for hard real-time systems. In Real-Time Systems Symposium, 2009, RTSS 2009. 30th IEEE. doi:10.1109/RTSS.2009.25 (pp. 23–32).

  21. Huang, K., Santinelli, L., Chen, J.J., Thiele, L., & Buttazzo, G. (2009). Periodic power management schemes for real-time event streams. In CDC/CCC 2009. Proceedings of the 48th IEEE Conference. doi:10.1109/CDC.2009.5400034 (pp. 6224–6231).

  22. Iondry, K. (1999). Iterative methods for optimization society for industrial and applied mathematics.

  23. Jafri, S., Tajammul, M., Hemani, A., Paul, K., Plosila, J., & Tenhunen, H. (2013). Energy-aware-task-parallelism for efficient dynamic voltage, and frequency scaling, in cgras. In International Conference on Embedded computer systems: Architectures, Modeling, and Simulation (SAMOS XIII), 2013 (pp. 104–112).

  24. Jejurikar, R., Pereira, C., & Gupta, R. (2004). Leakage aware dynamic voltage scaling for real-time embedded systems. In Proceedings of the 41st Annual Design Automation Conference, DAC ’04, pp. 275–280. ACM, New York, NY, USA. doi:10.1145/996566.996650.

  25. Jones, M.T. (2006). Inside the linux scheduler. http://www.ibm.com/developerworks/linux/library/l-scheduler/.

  26. Kahng, A., Kang, S., Kumar, R., & Sartori, J. (2013). Enhancing the efficiency of energy-constrained dvfs designs. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 21(10), 1769–1782. doi:10.1109/TVLSI.2012.2219084.

  27. Karmarkar, N. (1984). A new polynomial-time algorithm for linear programming. In Proceedings of the 16th ACM symposium on Theory of computing, STOC ’84, pp. 302–311. ACM. doi:10.1145/800057.808695.

  28. Khalid, N., Ahmad, S., Noor, N., Fadzil, A., & Taib, M. (2011). Parallel approach of sobel edge detector on multicore platform. International Journal of Computers and Communications Issue, 4, 236–244.

    Google Scholar 

  29. Kim, N., Austin, T., Baauw, D., Mudge, T., Flautner, K., Hu, J., Irwin, M., Kandemir, M., & Narayanan, V. (2003). Leakage current: Moore’s law meets static power. Computer, 36(12), 68–75. doi:10.1109/MC.2003.1250885.

  30. Kim, W., Shin, D., Yun, H.S., Kim, J., & Min, S.L. (2002). Performance comparison of dynamic voltage scaling algorithms for hard real-time systems. In Real-Time and Embedded Technology and Applications Symposium, 2002. Proceedings. Eighth IEEE. doi:10.1109/RTTAS.2002.1137397 (pp. 219–228).

  31. M’zah, A., & Hammami, O. (2010). Parallel programming and speed up evaluation of a noc 2-ary 4-fly. In International Conference on Microelectronics (ICM), 2010. 10.1109/ICM.2010.5696103 (pp. 156–159).

  32. Nollet, V., Verkest, D., & Corporaal, H. (2008). A safari through the mpsoc run-time management jungle. Journal of Signal Processing Systems, 60(2), 251–268.

    Article  Google Scholar 

  33. Pelcat, M., Piat, J., Wipliez, M., Aridhi, S., & Nezan, J. F. (2009). An open framework for rapid prototyping of signal processing applications. EURASIP journal on embedded systems, 2009, 11.

    Article  Google Scholar 

  34. Qiu, M., Niu, J.W., Yang, L., Qin, X., Zhang, S., & Wang, B. (2010). Energy-aware loop parallelism maximization for multi-core dsp architectures. In Green Computing and Communications (GreenCom), 2010 IEEE/ACM Int’l Conference on Int’l Conference on Cyber, Physical and Social Computing (CPSCom). doi:10.1109/GreenCom-CPSCom.2010.87 (pp. 205–212).

  35. Rauber, T., & Runger, G. (2012). Energy-aware execution of fork-join-based task parallelism. In IEEE 20th International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2012. doi:10.1109/MASCOTS.2012.35 (pp. 231–240).

  36. Sadri, M., Bartolini, A., & Benini, L. (2011). Single-chip cloud computer thermal model. In 17th international workshop on Thermal investigations of ICs and systems (THERMINIC), 2011 (pp. 1–6).

  37. Sasaki, H., Imamura, S., & Inoue, K. (2013). Coordinated power-performance optimization in manycores. In 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT), 2013. doi:10.1109/PACT.2013.6618803 (pp. 51–61).

  38. Seth, K., Anantaraman, A., Mueller, F., & Rotenberg, E. (2003). Fast: Frequency-aware static timing analysis. In Proceedings of the 24th IEEE international Real-Time Systems Symposium, RTSS ’03, pp. 40–. IEEE computer society, washington, DC, USA.

  39. Singh, H., Agarwal, K., Sylvester, D., & Nowka, K. (2007). IEEE Transactions on Enhanced leakage reduction techniques using intermediate strength power gating. Very Large Scale Integration (VLSI) Systems, 15(11), 1215–1224. doi:10.1109/TVLSI.2007.904101.

  40. Takouna, I., Dawoud, W., & Meinel, C. (2011). Accurate mutlicore processor power models for power-aware resource management. In IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC), 2011. doi:10.1109/DASC.2011.85 (pp. 419–426).

  41. Truchet, C., Richoux, F., & Codognet, P. (2013). Prediction of parallel speed-ups for las vegas algorithms. In 42nd International Conference on Parallel Processing (ICPP), 2013. doi:10.1109/ICPP.2013.25(pp. 160–169).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Holmbacka.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Holmbacka, S., Nogues, E., Pelcat, M. et al. Energy-Awareness and Performance Management with Parallel Dataflow Applications. J Sign Process Syst 87, 33–48 (2017). https://doi.org/10.1007/s11265-015-1059-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-015-1059-4

Keywords

Navigation