poster

Revisiting linpack algorithm on large-scale CPU-GPU heterogeneous systems

Authors:

Ke Meng,

Guangming TanAuthors Info & Claims

PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Pages 411 - 412

https://doi.org/10.1145/3332466.3374530

Published: 19 February 2020 Publication History

Get Access

Abstract

As the widening gap between GPU computing capability and other components (CPU, PCIe bus and communication network), it's increasingly challenging to design high performance parallel algorithms for large CPU-GPU heterogeneous systems. There are mainly two reasons. Firstly, simply offloading the kernel library to GPU incurs large volume data transfer through low-speed PCIe bus. Secondly, communication overheads through network severely affects scalability. To solve the above issues, we advocate a paradigm shift to CPU-centric and fine-grained pipelining algorithm design. By taking Linpack benchmark as a case study, the new algorithm design paradigm shows its effectiveness. Our optimized Linpack program achieves 63.79PFlops on 16384 GPUs. Its floating-point efficiency outperforms the NVIDIA proprietary counterparts by 5% on average.

References

[1]

ABCI. 2019. AI Bridging Cloud Infrastructure (ABCI). https://abci.ai/en/about_abci/

Google Scholar

[2]

Jack Dongarra, Robert van de Geijn, and David Walker. 1994. Scalability Issues Affecting The Design Of A Dense Linear Algebra Library. J. Parallel and Distrib. Comput. (06 1994).

Google Scholar

[3]

G. Jo, J. Nah, J. Lee, J. Kim, and J. Lee. 2015. Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes. IEEE Transactions on Parallel and Distributed Systems 26, 7 (July 2015), 1814--1825.

Crossref

Google Scholar

[4]

Antoine Petitet, R C. Whaley, Jack Dongarra, and A Cleary. 2008. HPL - a Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. (01 2008).

Google Scholar

[5]

TOP500. 2019. TOP500 List, https://www.top500.org/lists/2019/11/

Google Scholar

Cited By

View all

Sun QMa WSun JLi H(2022)Evolving the HPL benchmark towards multi-GPGPU clustersCCF Transactions on High Performance Computing10.1007/s42514-022-00128-65:1(84-96)Online publication date: 26-Oct-2022
https://doi.org/10.1007/s42514-022-00128-6
Huang YZhu ZLi X(2020)A novel approach for radar network’s detection power analysis based on GPU2020 Eighth International Conference on Advanced Cloud and Big Data (CBD)10.1109/CBD51900.2020.00015(31-36)Online publication date: Dec-2020
https://doi.org/10.1109/CBD51900.2020.00015

Index Terms

Revisiting linpack algorithm on large-scale CPU-GPU heterogeneous systems
1. Theory of computation
  1. Design and analysis of algorithms
    1. Parallel algorithms

Recommendations

A peta-scalable CPU-GPU algorithm for global atmospheric simulations
PPoPP '13

Developing highly scalable algorithms for global atmospheric modeling is becoming increasingly important as scientists inquire to understand behaviors of the global atmosphere at extreme scales. Nowadays, heterogeneous architecture based on both ...
A peta-scalable CPU-GPU algorithm for global atmospheric simulations
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Developing highly scalable algorithms for global atmospheric modeling is becoming increasingly important as scientists inquire to understand behaviors of the global atmosphere at extreme scales. Nowadays, heterogeneous architecture based on both ...
Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
CLUSTER '10: Proceedings of the 2010 IEEE International Conference on Cluster Computing

In this paper, we describe our experiment developing an implementation of the Linpack benchmark for TianHe-1, a petascale CPU/GPU supercomputer system, the largest GPU-accelerated system ever attempted before. An adaptive optimization framework is ...

Comments

Information & Contributors

Information

Published In

PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 2020

454 pages

ISBN:9781450368186

DOI:10.1145/3332466

General Chair:
Rajiv Gupta
UC Riverside
,
Program Chair:
Xipeng Shen
NCSU

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 February 2020

Check for updates

Author Tags

Qualifiers

Poster

Funding Sources

Chinese Academy of Sciences
National Natural Science Foundation of China
The National Key Research and Development Program of China

Conference

PPoPP '20

Sponsor:

PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 22 - 26, 2020

California, San Diego

Acceptance Rates

PPoPP '20 Paper Acceptance Rate 28 of 121 submissions, 23%;

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
318
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Sun QMa WSun JLi H(2022)Evolving the HPL benchmark towards multi-GPGPU clustersCCF Transactions on High Performance Computing10.1007/s42514-022-00128-65:1(84-96)Online publication date: 26-Oct-2022
https://doi.org/10.1007/s42514-022-00128-6
Huang YZhu ZLi X(2020)A novel approach for radar network’s detection power analysis based on GPU2020 Eighth International Conference on Advanced Cloud and Big Data (CBD)10.1109/CBD51900.2020.00015(31-36)Online publication date: Dec-2020
https://doi.org/10.1109/CBD51900.2020.00015

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cited By

Index Terms

Recommendations

A peta-scalable CPU-GPU algorithm for global atmospheric simulations

A peta-scalable CPU-GPU algorithm for global atmospheric simulations

Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing

Comments

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

A peta-scalable CPU-GPU algorithm for global atmospheric simulations

A peta-scalable CPU-GPU algorithm for global atmospheric simulations

Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations