Energy cost evaluation of parallel algorithms for multiprocessor systems

Wang, Zhuowei; Xu, Xianbin; Xiong, Naixue; Yang, Laurence T.; Zhao, Wuqing

doi:10.1007/s10586-011-0188-1

Energy cost evaluation of parallel algorithms for multiprocessor systems

Published: 29 October 2011

Volume 16, pages 77–90, (2013)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Zhuowei Wang¹,
Xianbin Xu¹,
Naixue Xiong²,
Laurence T. Yang³ &
…
Wuqing Zhao¹

327 Accesses
11 Citations
Explore all metrics

Abstract

With the continuous development of hardware and software, Graphics Processor Units (GPUs) have been used in the general-purpose computation field. They have emerged as a computational accelerator that dramatically reduces the application execution time with CPUs. To achieve high computing performance, a GPU typically includes hundreds of computing units. The high density of computing resource on a chip brings in high power consumption. Therefore power consumption has become one of the most important problems for the development of GPUs. This paper analyzes the energy consumption of parallel algorithms executed in GPUs and provides a method to evaluate the energy scalability for parallel algorithms. Then the parallel prefix sum is analyzed to illustrate the method for the energy conservation, and the energy scalability is experimentally evaluated using Sparse Matrix-Vector Multiply (SpMV). The results show that the optimal number of blocks, memory choice and task scheduling are the important keys to balance the performance and the energy consumption of GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

NVIDIA Corporation: NVIDIA CUDA compute unfied device architecture programming guide. http://developer.nvidia.com/cuda/ (2011)
http://www.cise.urf.edu/research/sparse/matrices/ (2011)
Buck, I., Fatahalian, K., Hanrahan, P.: GPUBench, Evaluating GPU performance for numerical and scientific applications. In: ACM Workshop on General-Purpose Computing on Graphics Processors (GP2), p. C-20 (2004)
Google Scholar
Baghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Hwu, W.W.: An adaptive performance modeling tool for GPU architectures. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010), pp. 105-114. ACM, New York (2010)
Chapter Google Scholar
He, B., et al.: Efficient gather and scatter operations on graphics processors. In: ACM/IEEE SC (2007)
Google Scholar
Goddeke, D., et al.: Exploring weak scalability for FEM calculations on a GPU-enhanced cluster. Parallel Comput. 33(10–11), 685–699 (2007)
Article Google Scholar
Govindaraju, N.K., Larsen, S., Gray, J., Manocha, D.: A memory model for scientific algorithms on graphics processors. In: SC (2006)
Google Scholar
Trancoso, P., Charalambous, M.: Exploring graphics processor performance for general purpose applications. In: The Eighth Euromicro Conference on Digital System Design, Architectures, Methods, and Tools, pp. 306–313 (2005)
Chapter Google Scholar
Harrison, O., Waldron, J.: Optimising data movement rates for parallel processing applications on graphics processors. In: Parallel and Distributed Computing and Networks (PDCN) (2007)
Google Scholar
Sheaffer, J., Skadron, K., Luebke, D.: Studding thermal management for graphic-processor architectures. In: IEEE International Symposium on Performance Analysis of Systems and Software (2005)
Google Scholar
Ramani, K., Ibrahim, A., Shimizu, D.: PowerRed: a flexible power modeling frame work for power efficiency exploration in GPUs. In: Workshop on General Purpose Processing on Graphics Processing Units (GPGPU) (2007)
Google Scholar
Tajuzawa, H., Satol, K., Kobay Ashi, H.: SPRAT: runtime processor selection for energy-aware computing. In: the Third international Workshop on Automatic Performance Tuning (2008)
Google Scholar
Rofouei, M., Stathopoulos, T., Ryffel, S., Kaiser, W., Sarrafzadeh, M.: Energy-aware high performance computing with graphic processing units. In: Workshop on Power Aware Computing and System (2008)
Google Scholar
Huang, S., Xiao, S., Feng, W.: On the energy efficiency of graphics processing units for scientific computing. In: 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2009)
Google Scholar
Korthikanti, V.A., Agha, G.: Towards optimizing energy costs of algorithms for shared memory architectures. In: SPAA (2010)
Google Scholar
Korthikanti, V.A., Agha, G.: Analysis of parallel algorithms for energy conservation in scalable multicore architectures. In: ICPP, pp. 212–219 (2009)
Google Scholar
Wang, X., Ziavras, S.: Performance-energy tradeoff for matrix multiplication on FPGA-based mixed-mode chip multiprocessors. In: International Symposium on Quality Electronic Design, pp. 386–391 (2007)
Chapter Google Scholar
Bender, M.A., Fineman, J.T.: Concurrent cache-oblibious b-trees. In: SPAA, Parallel Computing, pp. 228–237 (2005). 18171616
Google Scholar
Aggarwal, A., Viteer, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31, 1116–1127 (1988)
Article Google Scholar
Chandrakasan, A., Sheng, S., Brodersen, R.: Low-power CMOS digital design. IEEE J. Solid-State Circuits 27(4), 473–484 (1992)
Article Google Scholar
Blelloch, G.E.: Prefix sums and their applications. In: Reif, J.H. (ed.) Synthesis of Parallel Algorithms. Morgan Kaufmann, San Mateo (1990)
Google Scholar
Harri, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. In: Nguyen, H. (ed.) GPU Gems 3. Addison-Wesley, Reading (2007)
Google Scholar
Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing. In: Graphics Hardware 2007, pp. 97–106. ACM Press, New York (2007)
Google Scholar
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation (2008)
Bolz, J., Farmer, I., Grinspun, E., Schroder, P.: SPARSE matrix slovers on the GPU: Conjugate gradients and multigrid. ACM Trans. Graph. 22(3), 917–924 (2003). Proceedings of ACM SIGGRAPH
Article Google Scholar
Blelloch, G.E., Heroux, M.A., Zagham: Segmented operations for sparse matrix computation on vector multiprocessors. Technical Report CMU-CS-93-173, School of Computer Science, Carnegie Mellon University, August 1993
Vazquez, F., Garzon, E.M., Martinez, J.A., Fernandex, J.J.: Scan primitives for vector computers. The sparse matrix vector produce on GPUs. Computer Architecture and Electronics Dep., University of Almeria (2009)
Baskaran, M.M., Bordawekar, R.: Optimizing Sparse Matrix-Vector Multiplication on GPUs. IBM Research Report RC24704 (2009)
Bik, A.J.C., Wijshoff, H.A.G.: Automatic data structure selection and transformation for sparse matrix computations. IEEE Trans. Parallel Distrib. Syst. 7, 109–126 (1996)
Article Google Scholar
Dotesenko, Y., Govindaraju, N.K., Sloan, P.-P., Boyd, C., Manferdelli, J.: Fast scan algorithms on graphics processors. In: ICS: Proceedings of the 22nd Annual International Conference on Supercomputing, New York, NY, USA, pp. 205–213. ACM Press, New York (2008)
Chapter Google Scholar
Chatterhee, S., Blelloch, G.E.: Zagham., Scan primitives for vector computers. In: Supercomputing’90: Proceedings of the 1990 Conference on Supercomputing, pp. 666–675 (1990)
Google Scholar
Fatahaian, K., Sugerman, J., Hanrahan, P.: Understanding the efficiency of GPU algorithms for matrix-matrix multiplications. In: Proceedings of 19th Eurographics/SIGGRAPH Graphics Hardware Workshop, Graphics Hardware, Grenoble, France (2004)
Google Scholar
Wang, Z., Xu, X., Zhao, W., Zhang, Y., He, S.: Optimizing Sparse Matrix-Vector Multiplication on CUDA. In: ICETE: Proceeding of the 2nd International Conference on Education Technology and Computer, Shanghai, China (2010)
Google Scholar
CUDDP: CUDA Data Parallel Primitives Library. http://www.gpgpu.org/developer/cudpp/ (2011)
Burd, T., Brodersen, R.: Design issues for dynamic voltage scaling. In: Proceeding of the 2000 International Symposium on Low Power Electronics and Design, (ISLPED’00) Rapallo, Italy, pp. 9–14 (2000)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, Wuhan University, Wuhan, 430000, China
Zhuowei Wang, Xianbin Xu & Wuqing Zhao
Department of Computer Science, Georgia State University, Atlanta, USA
Naixue Xiong
Department of Computer Science, St. Francis Xavier University, Antigonish, Canada
Laurence T. Yang

Authors

Zhuowei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xianbin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Naixue Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Laurence T. Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wuqing Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naixue Xiong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Z., Xu, X., Xiong, N. et al. Energy cost evaluation of parallel algorithms for multiprocessor systems. Cluster Comput 16, 77–90 (2013). https://doi.org/10.1007/s10586-011-0188-1

Download citation

Received: 11 April 2011
Accepted: 28 September 2011
Published: 29 October 2011
Issue Date: March 2013
DOI: https://doi.org/10.1007/s10586-011-0188-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Energy cost evaluation of parallel algorithms for multiprocessor systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction

Exploring Energy Efficiency for GPU-Accelerated POWER Servers

Investigating the effect of varying block size on power and energy consumption of GPU kernels

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Energy cost evaluation of parallel algorithms for multiprocessor systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction

Exploring Energy Efficiency for GPU-Accelerated POWER Servers

Investigating the effect of varying block size on power and energy consumption of GPU kernels

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now