A 2D algorithm with asymmetric workload for the UPC conjugate gradient method

González-Domínguez, Jorge; Marques, Osni A.; Martín, María J.; Touriño, Juan

doi:10.1007/s11227-014-1300-0

A 2D algorithm with asymmetric workload for the UPC conjugate gradient method

Published: 18 September 2014

Volume 70, pages 816–829, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jorge González-Domínguez¹,
Osni A. Marques²,
María J. Martín³ &
…
Juan Touriño³

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

This paper examines four different strategies, each one with its own data distribution, for implementing the parallel conjugate gradient (CG) method and how they impact communication and overall performance. Firstly, typical 1D and 2D distributions of the matrix involved in CG computations are considered. Then, a new 2D version of the CG method with asymmetric workload, based on leaving some threads idle during part of the computation to reduce communication, is proposed. The four strategies are independent of sparse storage schemes and are implemented using Unified Parallel C (UPC), a Partitioned Global Address Space (PGAS) language. The strategies are evaluated on two different platforms through a set of matrices that exhibit distinct sparse patterns, demonstrating that our asymmetric proposal outperforms the others except for one matrix on one platform.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Design Principles for Sparse Matrix Multiplication on the GPU

Resource-Efficient Parallel CG Algorithms for Linear Systems Solving on Heterogeneous Platforms

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Notes

For practical purposes, the algorithm is often used with preconditioners but this is not in the scope of this paper.

References

Bailey DH, Barszcz E, Barton JT, Browning DS, Carter RL, Dagum L, Fatooh RA, Frederickson PO, Lasinski TA, Schreiber RS, Simon HD, Venkatakrishnan V, Weeratunga SK (1991) The NAS parallel benchmarks. Int J High Perform Comput Appl 5:63–73
Article Google Scholar
Dongarra J, Heroux MA (2013) Toward a new metric for ranking high performance computing systems. Technical Report SAND2013-4744, Sandia National Laboratories, USA
Petitet A, Whaley RC, Dongarra J, Cleary A (2014) HPL-a portable implementation of the high-performance linpack benchmark for distributed-memory computers. http://www.netlib.org/benchmark/hpl (Last visit August 2014)
Top500 Supercomputer Sites. http://top500.org/ (Last visit August 2014)
Berkeley UPC Project. http://upc.lbl.gov (Last visit August 2014)
El-Ghazawi T, Carlson W, Sterling T, Yelick K (2003) UPC: distributed shared-memory programming. Wiley-Interscience, Hoboken
Mallón DA, Gómez A, Mouriño JC, Taboada GL, Teijeiro C, Touriño J, Fraguela BB, Doallo R, Wibecan B (2009) UPC performance evaluation on a multicore system. In: Proceedings of the 3rd conference on partitioned global address space programming models (PGAS’09). Ashburn, Virginia, USA
Shan H, Blagojević F, Min SJ, Hargrove P, Jin H, Fuerlinger K, Koniges A, Wright NJ (2010) A programming model performance study using the NAS parallel benchmarks. Sci Program 18(3–4):153–167
Google Scholar
Zheng Y (2010) Optimizing UPC programs for multi-core systems. Sci Program 18(3–4):183–191
Google Scholar
The DEGAS Project. https://www.xstackwiki.com/index.php/DEGAS (Last visit August 2014)
Chen WY, Bonachea D, Duell J, Husbands P, Iancu C, Yelick K (2003) A performance analysis of the Berkeley UPC compiler. In: Proceedings of the 17th international conference on supercomputing (ICS’03), San Francisco, CA, USA, pp 63–73
El-Ghazawi T, Cantonnet F (2002) UPC performance and potential: a NPB experimental study. In: Proceedings of the 14th ACM/IEEE international conference for high performance computing, networking, storage and analysis (SC’02), Baltimore, MD, USA, pp 1–26
Jin H, Hood R, Mehrotra P (2009) A practical study of UPC using the NAS parallel benchmarks. In: Proceedings of the 3rd conference on partitioned global address space programming models (PGAS’09), Ashburn, Virginia, USA
Vuduc R, Demmel JW, Yelick KA (2005) OSKI: a library of automatically tuned sparse matrix kernels. J Phys Conf Ser 16(1):521–530
Article Google Scholar
Pichel JC, Heras DB, Cabaleiro JC, García-Loureiro AJ, Rivera FF (2010) Increasing the locality of iterative methods and its application to the simulation of semiconductor devices. Int J High Perform Comput Appl 24(2):136–153
Article Google Scholar
Belgin M, Back G, Ribbens CJ (2009) Pattern-based sparse matrix representation for memory-efficient SMVM kernels. In: Proceedings of the 23rd international conference on supercomputing (ICS’09), Yorktown Heights, NY, USA, pp 100–109
Kourtis K, Goumas G, Koziris N (2008) Optimizing sparse matrix–vector multiplication using index and value compression. In: Proceedings of the 5th conference on computing frontiers (CF’08), Ischia, Italy, pp 87–96
Kourtis K, Karakasis V, Goumas G, Kozirisl N (2011) CSX: an extended compression format for SpMV on shared memory systems. In: Proceedings of the 16th ACM SIGPLAN annual symposium on principles and practice of parallel programming (PPoPP’11), San Antonio, TX, USA, pp 12–16
Willcock J, Lumsdaine A (2006) Accelerating sparse matrix computations via data compression. In: Proceedings of the 20th international conference on supercomputing (ICS’06), Cairns, Australia, pp 307–316
González-Domínguez J, García-López O, Taboada GL, Martín MJ, Touriño J (2012) Performance evaluation of sparse matrix products in UPC. J Supercomput 64(1):63–73
Google Scholar
Ismail L (2010) Communication issues in parallel conjugate gradient method using a star-based network. In: Proceedings of the 1st international conference on computer applications and industrial electronics (ICCAIE’10), Kuala Lumpur, Malaysia
Chen F, Theobald KB, Gao GR (2004) Implementing parallel conjugate gradient on the EARTH multithreaded architecture. In: Proceedings of the 6th IEEE international conference on cluster computing (CLUSTER’04), San Diego, CA, USA, pp 459–469
Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, van der Vorst H (1994) Templates for the solution of linear systems: building blocks for iterative methods, 2nd edn. SIAM, Philadelphia
Google Scholar
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia
Book MATH Google Scholar
Williams S, Oliker L, Vuduc RW, Shalf J, Yelick K, Demmel J (2007) Optimization of sparse matrix–vector multiplication on emerging multicore platforms. In: Proceedings of the 19th ACM/IEEE international conference for high performance computing, networking, storage and analysis (SC’07), Reno, NV, USA
The University of Florida Sparse Matrix Collection. http://www.cise.ufl.edu/research/sparse/matrices (Last visit August 2014)

Download references

Acknowledgments

This work was funded by the Ministry of Economy and Competitiveness of Spain and FEDER funds of the EU (Project TIN2013-42148-P), by the Galician Government (Consolidation Program of Competitive Reference Groups GRC2013/055) and by the U.S. Department of Energy (Contract No. DE-AC03-76SF00098).

Author information

Authors and Affiliations

Parallel and Distributed Architectures Group, Johannes Gutenberg University-Mainz, Mainz, Germany
Jorge González-Domínguez
Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Osni A. Marques
Computer Architecture Group, University of A Coruña, A Coruña, Spain
María J. Martín & Juan Touriño

Authors

Jorge González-Domínguez
View author publications
You can also search for this author inPubMed Google Scholar
Osni A. Marques
View author publications
You can also search for this author inPubMed Google Scholar
María J. Martín
View author publications
You can also search for this author inPubMed Google Scholar
Juan Touriño
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jorge González-Domínguez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

González-Domínguez, J., Marques, O.A., Martín, M.J. et al. A 2D algorithm with asymmetric workload for the UPC conjugate gradient method. J Supercomput 70, 816–829 (2014). https://doi.org/10.1007/s11227-014-1300-0

Download citation

Published: 18 September 2014
Issue Date: November 2014
DOI: https://doi.org/10.1007/s11227-014-1300-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A 2D algorithm with asymmetric workload for the UPC conjugate gradient method

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Design Principles for Sparse Matrix Multiplication on the GPU

Resource-Efficient Parallel CG Algorithms for Linear Systems Solving on Heterogeneous Platforms

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now