Skip to main content

Advertisement

Log in

Energy-efficient execution of dense linear algebra algorithms on multi-core processors

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

This paper addresses the efficient exploitation of task-level parallelism, present in many dense linear algebra operations, from the point of view of both computational performance and energy consumption. The strategies considered here, referred to as the Slack Reduction Algorithm (SRA) and the Race-to-Idle Algorithm (RIA), adjust the operation frequency of the cores during the execution of a collection of tasks (in which many dense linear algebra algorithms can be decomposed) with very different approaches to save energy. The procedures are evaluated using an energy-aware simulator, which is in charge of scheduling/mapping the execution of these tasks to the cores, leveraging dynamic frequency voltage scaling featured by current technology. Experiments with this tool and the practical integration of the RIA strategy into a runtime show the energy gains for two versions of the QR factorization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 2
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Borkar, S., Chien, A.: The future of microprocessors. Commun. ACM 54, 67–77 (2011)

    Article  Google Scholar 

  2. Esmaeilzadeh, H., Blem, E., Amant, R.St., Sankaralingam, K., Burger, D.: Dark silicon and the end of multicore scaling. In: Proceeding of the 38th Annual International Symposium on Computer Architecture, ISCA’11, New York, NY, USA, pp. 365–376. ACM Press, New York (2011)

    Chapter  Google Scholar 

  3. Dongarra, J., Beckman, P., Moore, T., Aerts, P., Aloisio, G., Andre, J.C., Barkai, D., Berthou, J.Y., Boku, T., Braunschweig, B., et al.: The international exascale software project roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3 (2011)

    Article  Google Scholar 

  4. Duranton, M., et al.: The HiPEAC vision (2010). Available from http://www.hipeac.net/roadmap

  5. Feng, W.-c., Feng, X., Ce, R.: Green supercomputing comes of age. IT Prof. 10(1), 17–23 (2008)

    Article  Google Scholar 

  6. Hsu, C., Feng, W.: A feasibility analysis of power awareness in commodity-based high-performance clusters. In: Cluster 2005 (2005)

    Google Scholar 

  7. Albers, S.: Energy-efficient algorithms. Commun. ACM 53, 86–96 (2010)

    Article  Google Scholar 

  8. Cilk project home page (2012). http://supertech.csail.mit.edu/cilk/

  9. SMP superscalar project home page (2012). http://www.bsc.es/plantillaG.php?cat_id=385

  10. StarPU project home page (2012). http://runtime.bordeaux.inria.fr/StarPU/

  11. Van Zee, F.G.: libflame: The Complete Reference (2009). www.lulu.com

  12. Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J., Dongarra, J.J., Du Croz, J., Hammarling, S., Greenbaum, A., McKenney, A., Sorensen, D.: LAPACK Users’ Guide, 3rd edn. SIAM, Philadelphia (1999)

    Book  Google Scholar 

  13. PLASMA project home page (2012). http://icl.cs.utk.edu/plasma/

  14. Alonso, P., Dolz, M.F., Mayo, R., Quintana-Ortí, E.S.: Improving power efficiency on multi-core processors via slack control. In: Proceedings of the 2011 International Conference on High Performance Computing & Simulation (HPCS 2011). IEE Catalog Number CFP1178H-CDR, pp. 463–470 (2011)

    Chapter  Google Scholar 

  15. Alonso, P., Dolz, M.F., Igual, F., Mayo, R., Quintana-Ortí, E.S.: DVFS-control techniques for dense linear algebra operations on multi-core processors. Comput. Sci. Res. Dev., 1–10 (2011). doi:10.1007/s00450-011-0188-7

  16. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  17. Gunter, B.C., van de Geijn, R.A.: Parallel out-of-core computation and updating the QR factorization. ACM Trans. Math. Softw. 31(1), 60–78 (2005)

    Article  MATH  Google Scholar 

  18. Etinski, M., Corbalán, J., Labarta, J., Valero, M.: Utilization driven power-aware parallel job scheduling. Comput. Sci. Res. Dev. 25(3–4), 207–216 (2010)

    Article  Google Scholar 

  19. Yao, F., Demers, A., Shenker, S.: A scheduling model for reduced cpu energy. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, FOCS’95, Washington, DC, USA, p. 374. IEEE Computer Society, Los Alamitos (1995)

    Google Scholar 

  20. Manzak, A., Chakrabarti, C.: Variable voltage task scheduling for minimizing energy or minimizing power. In: Proceedings on IEEE International Conference of the Acoustics, Speech, and Signal Processing, 2000, Washington, DC, USA, vol. 06, pp. 3239–3242. IEEE Computer Society, Los Alamitos (2000)

    Google Scholar 

  21. Gruian, F., Kuchcinski, K.: Lenes: task scheduling for low-energy systems using variable supply voltage processors. In: Proceedings of the 2001 Asia and South Pacific Design Automation Conference, ASP-DAC’01, New York, NY, USA, pp. 449–455. ACM Press, New York (2001)

    Chapter  Google Scholar 

  22. Martin, S.M., Flautner, K., Mudge, T., Blaauw, D.: Combined dynamic voltage scaling and adaptive body biasing for lower power microprocessors under dynamic workloads. In: Proceedings of the 2002 IEEE/ACM International Conference on Computer-aided Design, ICCAD’02, New York, NY, USA, pp. 721–725. ACM Press, New York (2002)

    Google Scholar 

  23. Zhang, Y., Hu, X.S., Chen, D.Z.: Task scheduling and voltage selection for energy minimization. In: Proceedings of the 39th Annual Design Automation Conference, DAC’02, New York, NY, USA, pp. 183–188. ACM Press, New York (2002)

    Google Scholar 

  24. Robert, Y., Parashar, M., Badrinath, R., Prasanna, V.K.: High performance computing—HiPC 2006. In: Proceedings of 13th International Conference, Bangalore, India, December 18–21, 2006. Lecture Notes in Computer Science, vol. 4297. Springer, Berlin (2006)

    Google Scholar 

  25. Lee, Y.C., Zomaya, A.Y.: Minimizing energy consumption for precedence-constrained applications using dynamic voltage scaling. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid-Volume 00, pp. 92–99. IEEE Computer Society, Los Alamitos (2009)

    Chapter  Google Scholar 

  26. Kimura, H., Sato, M., Hotta, Y., Boku, T., Takahashi, D.: Empirical study on reducing energy of parallel programs using slack reclamation by DVFS in a power-scalable high performance cluster. In: IEEE International Conference on Cluster Computing, 2006, pp. 1–10. IEEE Press, New York (2007)

    Google Scholar 

  27. Shekar, V., Izadi, B.: Energy aware scheduling for DAG structured applications on heterogeneous and DVS enabled processors. In: International Conference on Green Computing, pp. 495–502. IEEE Press, New York (2010)

    Chapter  Google Scholar 

  28. King, D., Ahmad, I., Sheikh, H.F.: Stretch and compress based re-scheduling techniques for minimizing the execution times of DAGs on multi-core processors under energy constraints. In: International Conference on Green Computing, pp. 49–60. IEEE Press, New York (2010)

    Chapter  Google Scholar 

  29. Palli, K.: Scheduling dags for minimum finish time and power consumption on heterogeneous processors. Master’s thesis, Albers University, Albers, AL (2005)

  30. Shaffer, L.R., Ritter, J.B., Meyer, W.L.: The Critical-Path Method. McGraw-Hill, New York (1965)

    Google Scholar 

  31. Li, R., Huang, H.C.: List scheduling for jobs with arbitrary release times and similar lengths. J. Sched. 10(6), 365–373 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  32. Mtibaa, A., Ouni, B., Abid, M.: An efficient list scheduling algorithm for time placement problem. Comput. Electr. Eng. 33(4), 285–298 (2007)

    Article  MATH  Google Scholar 

  33. Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. 36(3), 14:1–14:26 (2009)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by project CICYT TIN2011-23283 and FEDER.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel F. Dolz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alonso, P., Dolz, M.F., Mayo, R. et al. Energy-efficient execution of dense linear algebra algorithms on multi-core processors. Cluster Comput 16, 497–509 (2013). https://doi.org/10.1007/s10586-012-0215-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-012-0215-x

Keywords

Navigation