Energy-efficient execution of dense linear algebra algorithms on multi-core processors

Alonso, Pedro; Dolz, Manuel F.; Mayo, Rafael; Quintana-Ortí, Enrique S.

doi:10.1007/s10586-012-0215-x

Energy-efficient execution of dense linear algebra algorithms on multi-core processors

Published: 12 May 2012

Volume 16, pages 497–509, (2013)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Pedro Alonso¹,
Manuel F. Dolz²,
Rafael Mayo² &
…
Enrique S. Quintana-Ortí²

460 Accesses
6 Citations
Explore all metrics

Abstract

This paper addresses the efficient exploitation of task-level parallelism, present in many dense linear algebra operations, from the point of view of both computational performance and energy consumption. The strategies considered here, referred to as the Slack Reduction Algorithm (SRA) and the Race-to-Idle Algorithm (RIA), adjust the operation frequency of the cores during the execution of a collection of tasks (in which many dense linear algebra algorithms can be decomposed) with very different approaches to save energy. The procedures are evaluated using an energy-aware simulator, which is in charge of scheduling/mapping the execution of these tasks to the cores, leveraging dynamic frequency voltage scaling featured by current technology. Experiments with this tool and the practical integration of the RIA strategy into a runtime show the energy gains for two versions of the QR factorization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Impact of Optimization on the Time-Power-Energy Balance of Dense Linear Algebra Factorizations

Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture

Runtime Scheduling of the LU Factorization: Performance and Energy

References

Borkar, S., Chien, A.: The future of microprocessors. Commun. ACM 54, 67–77 (2011)
Article Google Scholar
Esmaeilzadeh, H., Blem, E., Amant, R.St., Sankaralingam, K., Burger, D.: Dark silicon and the end of multicore scaling. In: Proceeding of the 38th Annual International Symposium on Computer Architecture, ISCA’11, New York, NY, USA, pp. 365–376. ACM Press, New York (2011)
Chapter Google Scholar
Dongarra, J., Beckman, P., Moore, T., Aerts, P., Aloisio, G., Andre, J.C., Barkai, D., Berthou, J.Y., Boku, T., Braunschweig, B., et al.: The international exascale software project roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3 (2011)
Article Google Scholar
Duranton, M., et al.: The HiPEAC vision (2010). Available from http://www.hipeac.net/roadmap
Feng, W.-c., Feng, X., Ce, R.: Green supercomputing comes of age. IT Prof. 10(1), 17–23 (2008)
Article Google Scholar
Hsu, C., Feng, W.: A feasibility analysis of power awareness in commodity-based high-performance clusters. In: Cluster 2005 (2005)
Google Scholar
Albers, S.: Energy-efficient algorithms. Commun. ACM 53, 86–96 (2010)
Article Google Scholar
Cilk project home page (2012). http://supertech.csail.mit.edu/cilk/
SMP superscalar project home page (2012). http://www.bsc.es/plantillaG.php?cat_id=385
StarPU project home page (2012). http://runtime.bordeaux.inria.fr/StarPU/
Van Zee, F.G.: libflame: The Complete Reference (2009). www.lulu.com
Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J., Dongarra, J.J., Du Croz, J., Hammarling, S., Greenbaum, A., McKenney, A., Sorensen, D.: LAPACK Users’ Guide, 3rd edn. SIAM, Philadelphia (1999)
Book Google Scholar
PLASMA project home page (2012). http://icl.cs.utk.edu/plasma/
Alonso, P., Dolz, M.F., Mayo, R., Quintana-Ortí, E.S.: Improving power efficiency on multi-core processors via slack control. In: Proceedings of the 2011 International Conference on High Performance Computing & Simulation (HPCS 2011). IEE Catalog Number CFP1178H-CDR, pp. 463–470 (2011)
Chapter Google Scholar
Alonso, P., Dolz, M.F., Igual, F., Mayo, R., Quintana-Ortí, E.S.: DVFS-control techniques for dense linear algebra operations on multi-core processors. Comput. Sci. Res. Dev., 1–10 (2011). doi:10.1007/s00450-011-0188-7
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Gunter, B.C., van de Geijn, R.A.: Parallel out-of-core computation and updating the QR factorization. ACM Trans. Math. Softw. 31(1), 60–78 (2005)
Article MATH Google Scholar
Etinski, M., Corbalán, J., Labarta, J., Valero, M.: Utilization driven power-aware parallel job scheduling. Comput. Sci. Res. Dev. 25(3–4), 207–216 (2010)
Article Google Scholar
Yao, F., Demers, A., Shenker, S.: A scheduling model for reduced cpu energy. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, FOCS’95, Washington, DC, USA, p. 374. IEEE Computer Society, Los Alamitos (1995)
Google Scholar
Manzak, A., Chakrabarti, C.: Variable voltage task scheduling for minimizing energy or minimizing power. In: Proceedings on IEEE International Conference of the Acoustics, Speech, and Signal Processing, 2000, Washington, DC, USA, vol. 06, pp. 3239–3242. IEEE Computer Society, Los Alamitos (2000)
Google Scholar
Gruian, F., Kuchcinski, K.: Lenes: task scheduling for low-energy systems using variable supply voltage processors. In: Proceedings of the 2001 Asia and South Pacific Design Automation Conference, ASP-DAC’01, New York, NY, USA, pp. 449–455. ACM Press, New York (2001)
Chapter Google Scholar
Martin, S.M., Flautner, K., Mudge, T., Blaauw, D.: Combined dynamic voltage scaling and adaptive body biasing for lower power microprocessors under dynamic workloads. In: Proceedings of the 2002 IEEE/ACM International Conference on Computer-aided Design, ICCAD’02, New York, NY, USA, pp. 721–725. ACM Press, New York (2002)
Google Scholar
Zhang, Y., Hu, X.S., Chen, D.Z.: Task scheduling and voltage selection for energy minimization. In: Proceedings of the 39th Annual Design Automation Conference, DAC’02, New York, NY, USA, pp. 183–188. ACM Press, New York (2002)
Google Scholar
Robert, Y., Parashar, M., Badrinath, R., Prasanna, V.K.: High performance computing—HiPC 2006. In: Proceedings of 13th International Conference, Bangalore, India, December 18–21, 2006. Lecture Notes in Computer Science, vol. 4297. Springer, Berlin (2006)
Google Scholar
Lee, Y.C., Zomaya, A.Y.: Minimizing energy consumption for precedence-constrained applications using dynamic voltage scaling. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid-Volume 00, pp. 92–99. IEEE Computer Society, Los Alamitos (2009)
Chapter Google Scholar
Kimura, H., Sato, M., Hotta, Y., Boku, T., Takahashi, D.: Empirical study on reducing energy of parallel programs using slack reclamation by DVFS in a power-scalable high performance cluster. In: IEEE International Conference on Cluster Computing, 2006, pp. 1–10. IEEE Press, New York (2007)
Google Scholar
Shekar, V., Izadi, B.: Energy aware scheduling for DAG structured applications on heterogeneous and DVS enabled processors. In: International Conference on Green Computing, pp. 495–502. IEEE Press, New York (2010)
Chapter Google Scholar
King, D., Ahmad, I., Sheikh, H.F.: Stretch and compress based re-scheduling techniques for minimizing the execution times of DAGs on multi-core processors under energy constraints. In: International Conference on Green Computing, pp. 49–60. IEEE Press, New York (2010)
Chapter Google Scholar
Palli, K.: Scheduling dags for minimum finish time and power consumption on heterogeneous processors. Master’s thesis, Albers University, Albers, AL (2005)
Shaffer, L.R., Ritter, J.B., Meyer, W.L.: The Critical-Path Method. McGraw-Hill, New York (1965)
Google Scholar
Li, R., Huang, H.C.: List scheduling for jobs with arbitrary release times and similar lengths. J. Sched. 10(6), 365–373 (2007)
Article MathSciNet MATH Google Scholar
Mtibaa, A., Ouni, B., Abid, M.: An efficient list scheduling algorithm for time placement problem. Comput. Electr. Eng. 33(4), 285–298 (2007)
Article MATH Google Scholar
Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. 36(3), 14:1–14:26 (2009)
Article Google Scholar

Download references

Acknowledgements

This work was supported by project CICYT TIN2011-23283 and FEDER.

Author information

Authors and Affiliations

Dep. de Sistemas Informáticos y Computación, Universitat Politècnica de València, 46022, Valencia, Spain
Pedro Alonso
Dep. de Ingeniería y Ciencia de los Computadores, Universitat Jaume I, 12071, Castellón, Spain
Manuel F. Dolz, Rafael Mayo & Enrique S. Quintana-Ortí

Authors

Pedro Alonso
View author publications
You can also search for this author in PubMed Google Scholar
Manuel F. Dolz
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Mayo
View author publications
You can also search for this author in PubMed Google Scholar
Enrique S. Quintana-Ortí
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuel F. Dolz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alonso, P., Dolz, M.F., Mayo, R. et al. Energy-efficient execution of dense linear algebra algorithms on multi-core processors. Cluster Comput 16, 497–509 (2013). https://doi.org/10.1007/s10586-012-0215-x

Download citation

Received: 31 October 2011
Accepted: 02 May 2012
Published: 12 May 2012
Issue Date: September 2013
DOI: https://doi.org/10.1007/s10586-012-0215-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Energy-efficient execution of dense linear algebra algorithms on multi-core processors

Abstract

Access this article

Similar content being viewed by others

On the Impact of Optimization on the Time-Power-Energy Balance of Dense Linear Algebra Factorizations

Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture

Runtime Scheduling of the LU Factorization: Performance and Energy

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Energy-efficient execution of dense linear algebra algorithms on multi-core processors

Abstract

Access this article

Similar content being viewed by others

On the Impact of Optimization on the Time-Power-Energy Balance of Dense Linear Algebra Factorizations

Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture

Runtime Scheduling of the LU Factorization: Performance and Energy

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation