Reducing energy consumption of parallel sparse matrix applications through integrated link/CPU voltage scaling

Son, Seung Woo; Malkowski, Konrad; Chen, Guilin; Kandemir, Mahmut; Raghavan, Padma

doi:10.1007/s11227-007-0113-9

Reducing energy consumption of parallel sparse matrix applications through integrated link/CPU voltage scaling

Published: 03 April 2007

Volume 41, pages 179–213, (2007)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Seung Woo Son¹,
Konrad Malkowski¹,
Guilin Chen¹,
Mahmut Kandemir¹ &
…
Padma Raghavan¹

78 Accesses
16 Citations
Explore all metrics

Abstract

Reducing power consumption is quickly becoming a first-class optimization metric for many high-performance parallel computing platforms. One of the techniques employed by many prior proposals along this direction is voltage scaling and past research used it on different components such as networks, CPUs, and memories. In contrast to most of the existent efforts on voltage scaling that target a single component (CPU, network or memory components), this paper proposes and experimentally evaluates a voltage/frequency scaling algorithm that considers CPU and communication links in a mesh network at the same time. More specifically, it scales voltages/frequencies of CPUs in the nodes and the communication links among them in a coordinated fashion (instead of one after another) such that energy savings are maximized without impacting execution time. Our experiments with several tree-based sparse matrix computations reveal that the proposed integrated voltage scaling approach is very effective in practice and brings 13% and 17% energy savings over the pure CPU and pure communication link voltage scaling schemes, respectively. The results also show that our savings are consistent with the different network sizes and different sets of voltage/frequency levels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey on chiplets: interface, interconnect and integration methodology

Article 31 March 2022

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

References

Advanced Micro Devices, Inc. AMD Athlon 64 processor power and thermal data sheet, 2004
Chandrakasan A, Brodersen R (1995) Low power digital CMOS design. Kluwer Academic, Dordrecht
Google Scholar
Chase J, Anderson D, Thackar P, Vahdat A, Boyle R (2001) Managing energy and server resources in hosting centers. In: Proceedings of the 18th symposium on operating systems principles, October 2001, pp 103–116
Chen G, Malkowski K, Kandemir MT, Raghavan P (2005) Reducing power with performance constraints for parallel sparse applications. In: Proceedings of international parallel and distributed processing symposium, April 2005
Chen X, Peh L (2003) Leakage power modeling and optimization in interconnection networks. In: Proceedings of the international symposium on low power and electronics design, August 2003, pp 90–95
Demmel J, Eisenstat SC, Gilbert JR, Li XS, Liu JWH (1995) A supernodal approach to sparse partial pivoting. Technical report UCB/CSD-95-883, EECS Department, University of California, Berkeley, 1995
Douglis F, Krishnan P, Marsh B (1994) Thwarting the power-hungry disk. In: Proceedings of the USENIX winter conference, 1994, pp 292–306
Elnozahy M, Kistler M, Rajamony R (2002) Energy-efficient server clusters. In: Proceedings of the second workshop on power aware computing systems, February 2002
Elnozahy M, Kistler M, Rajamony R (2003) Energy conservation policies for web servers. In: Proceedings of the 4th USENIX symposium on internet technologies and systems, March 2003
Freeh VW, Lowenthal DK (2005) Using multiple energy gears in MPI programs on a power-scalable cluster. In: Proceedings of the tenth ACM SIGPLAN symposium on principles and practice of parallel programming, 2005, pp 164–173
George JA, Liu JW-H (1981) Computer solution of large sparse positive definite systems. Prentice-Hall, Englewood Cliffs
MATH Google Scholar
Grigori L, Li XS (2002) A new scheduling algorithm for parallel sparse lu factorization with static pivoting. In: Proceedings of the 2002 ACM/IEEE conference on supercomputing. IEEE Computer Society Press, 2002, pp 1–18
Gropp W, Lusk E, Doss N, Skjellum A (1996) High-performance, portable implementation of the MPI message passing interface standard. Parallel Comput 22(6):789–828
Article MATH Google Scholar
Gupta A, Gustavson F, Joshi M, Karypis G, Kumar V (1999) PSPASES: an efficient and scalable parallel sparse direct solver, http://www-users.cs.umn.edu/~mjoshi/pspases
Gupta A, Kumar V, Sameh A (1993) Performance and scalability of preconditioned conjugate gradient methods on the CM-5. In: Proceedings of the sixth SIAM conference on parallel processing for scientific computing, 1993, pp 664–674
Gurumurthi S, Sivasubramaniam A, Kandemir M, Franke H (2003) DRPM: dynamic speed control for power management in server class disks. In: Proceedings of the international symposium on computer architecture, June 2003, pp 169–179
Heath MT, Ng E, Peyton BW (1991) Parallel algorithms for sparse linear systems. SIAM Rev 33:420–460
Article MATH Google Scholar
Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Nat Bur Stand 49:409–436
MATH Google Scholar
Intel XScale™ Core developer’s manual (2002), http://developer.intel.com/design/intelxscale/
Karypis G, Kumar V (1995) METIS: Unstructured graph partitioning and sparse matrix ordering system, Version 2.0, Manual. Department of Computer Science, University of Minnesota, Minneapolis
Kim EJ, Yum KH, Link G, Das CR, Vijaykrishnan N, Kandemir M, Irwin MJ (2003) Energy optimization techniques in cluster interconnects. In: Proceedings of the international symposium on low power electronics and design. ACM, August 2003, pp 459–464
Kim J, Horowitz MA (2002) Adaptive supply serial links with sub-1v operation and per-pin clock recovery. In: Proceedings of international solid-state circuits conference, February 2002
Luo J, Peh L-S, Jha N (2003) Simultaneous dynamic voltage scaling of processors and communication links in real-time distributed embedded systems. In: Proceedings of the design automation and test in Europe conference, 2003, pp 1150–1151
Malkowski K, Raghavan P (2005) Multi-pass mapping schemes for parallel sparse matrix computation. In: International conference on computational science (1), 2005, pp 245–255
Ng E, Raghavan P (2000) Towards a scalable hybrid sparse solver. Concurr Pract Exp 12:1–16
Article Google Scholar
Pothen A, Sun C (1993) A mapping algorithm for parallel sparse Cholesky factorization. SIAM J Sci Comput 14(5):1253–1257
Article MATH Google Scholar
Raghavan P (1991) Distributed sparse matrix factorization: QR and Cholesky factorizations. PhD thesis, Pennsylvania State University
Raghavan P, Teranishi K, Ng E (2003) A latency tolerant hybrid sparse solver using incomplete Cholesky factorization. Numer Linear Algebra 10:541–560
Article MATH Google Scholar
Saad Y (1996) Iterative methods for sparse linears systems. PWS Publishing, Boston
Google Scholar
Shang L, Peh L-S, Jha NK (2003) Dynamic voltage scaling with links for power optimization of interconnection networks. In: Proceedings of the 9th international symposium on high-performance computer architecture, 2003, pp 91–102
Shin D, Kim J (2004) Power-aware communication optimization for networks-on-chips with voltage scalable links. In: Proceedings of the 2nd IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis, 2004, pp 170–175
Soteriou V, Peh L-S (2004) Design-space exploration of power-aware on/off interconnection networks. In: Proceedings of the IEEE international conference on computer design, 2004, pp 510–517
Transmeta. Crusoe Longrun Power Management White Paper. http://www.transmeta.com/crusoe/longrun.html
Weiser M, Demers A, Welch B, Shenker S (1994) Scheduling for reduced CPU energy. In: Proceedings of symposium on operating system design and implementation, November 1994, pp 13–23
Worm F, Ienne P, Thiran P, Micheli GD (2002) An adaptive low-power transmission scheme for on-chip networks. In: Proceedings of the 15th international symposium on system synthesis, 2002, pp 92–100

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802, USA
Seung Woo Son, Konrad Malkowski, Guilin Chen, Mahmut Kandemir & Padma Raghavan

Authors

Seung Woo Son
View author publications
You can also search for this author in PubMed Google Scholar
Konrad Malkowski
View author publications
You can also search for this author in PubMed Google Scholar
Guilin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mahmut Kandemir
View author publications
You can also search for this author in PubMed Google Scholar
Padma Raghavan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seung Woo Son.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Son, S.W., Malkowski, K., Chen, G. et al. Reducing energy consumption of parallel sparse matrix applications through integrated link/CPU voltage scaling. J Supercomput 41, 179–213 (2007). https://doi.org/10.1007/s11227-007-0113-9

Download citation

Received: 18 February 2006
Accepted: 16 January 2007
Published: 03 April 2007
Issue Date: September 2007
DOI: https://doi.org/10.1007/s11227-007-0113-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reducing energy consumption of parallel sparse matrix applications through integrated link/CPU voltage scaling

Abstract

Access this article

Similar content being viewed by others

Survey on chiplets: interface, interconnect and integration methodology

Performance improvement of the triangular matrix product in commodity clusters

Parallelizing the dual revised simplex method

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reducing energy consumption of parallel sparse matrix applications through integrated link/CPU voltage scaling

Abstract

Access this article

Similar content being viewed by others

Survey on chiplets: interface, interconnect and integration methodology

Performance improvement of the triangular matrix product in commodity clusters

Parallelizing the dual revised simplex method

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation