research-article

Improve the Performance of Parallel Reduction on General-Purpose Graphics Processor Units Using Prediction Models

Authors:
Chin-Fu Kuo

Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan, R.O.C.

Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan, R.O.C.

0000-0001-7739-3154
View Profile

,
Zong-Ru Zhung

Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan, R.O.C.

Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan, R.O.C.

0009-0000-5070-9526
View Profile

,
Yung-Feng Lu

Department of Computer Science and Information Engineering, National Taichung University of Science and Technology, Taichung, Taiwan, R.O.C.

Department of Computer Science and Information Engineering, National Taichung University of Science and Technology, Taichung, Taiwan, R.O.C.

0000-0001-6276-2287
View Profile

RACS '23: Proceedings of the 2023 International Conference on Research in Adaptive and Convergent SystemsAugust 2023Article No.: 1Pages 1–7https://doi.org/10.1145/3599957.3606208

Published:29 August 2023Publication History

RACS '23: Proceedings of the 2023 International Conference on Research in Adaptive and Convergent Systems

Pages 1–7

ABSTRACT

When executing a kernel function on a general-purpose graphics processing unit (GPGPU), it is critical to select an appropriate configuration setting for optimal performance. Configuration settings affect the allocation and utilization of GPGPU resources during the execution of a kernel function1. However, testing all possible configuration settings to find an optimal setting is time-consuming and costly. To address this challenge, we propose a prediction mechanism that can suggest a configuration setting for the kernel function to complete the operation with minimal execution time. We start by filtering the amount of data, mandatory parameters, and optional parameters, and then calculate the resource occupancy of three critical resources on the GPGPU: Warp, Register, and Shared Memory. We eliminate configuration settings with a lower average resource occupancy than the user-defined value. The remaining configuration settings have better execution performance, and we use them to execute the kernel functions and record the required execution time. Finally, we use these configuration settings and their corresponding execution times as training data to build a prediction model using the logistic regression (LR) algorithm. At runtime, the prediction model recommends a configuration setting with better performance when the amount of data to be processed is known. We have conducted experiments that confirm our proposed mechanism's ability to improve kernel function execution performance more effectively than other mechanisms. Note that the proposed mechanism can be applied to other kernel functions.

References

CUDA Toolkit Documentation v11.3.0, https://docs.nvidia.com/cuda/index.html, 2021.Google Scholar
Miroslav Kubat, An Introduction to Machine Learning, Springer, 2017, pp. 43--62.Google ScholarCross Ref
Thanasekhar Balaiah and Ranjani Parthasarathi. 2020. Autotuning of configuration for program execution in GPUs. Concurrency and Computation: Practice and Experience 32, 9 (2020), e5635.Google ScholarCross Ref
Yalin Baştardar and Mustafa Özuysal. 2014. Introduction to machine learning. miRNomics: MicroRNA biology and computational analysis (2014), 105--128.Google Scholar
Ben van Werkhoven. 2019. Kernel Tuner: A search-optimizing GPU code auto- tuner. Future Generation Computer Systems 90 (2019), 347--358.Google ScholarCross Ref

Index Terms

Improve the Performance of Parallel Reduction on General-Purpose Graphics Processor Units Using Prediction Models
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Read More
Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study

Programs developed under the Compute Unified Device Architecture obtain the highest performance rate, when the exploitation of hardware resources on a Graphics Processing Unit (GPU) is maximized. In order to achieve this purpose, load balancing among ...
Read More
General-purpose Graphics Processor Architectures
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

RACS '23: Proceedings of the 2023 International Conference on Research in Adaptive and Convergent Systems
August 2023
251 pages
ISBN:9798400702280
DOI:10.1145/3599957

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 August 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Configuration Setting
Execution Time
Kernel Function
Logistic Regression
Occupancy
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate393of1,581submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 13
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improve the Performance of Parallel Reduction on General-Purpose Graphics Processor Units Using Prediction Models

RACS '23: Proceedings of the 2023 International Conference on Research in Adaptive and Convergent Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study

General-purpose Graphics Processor Architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improve the Performance of Parallel Reduction on General-Purpose Graphics Processor Units Using Prediction Models

RACS '23: Proceedings of the 2023 International Conference on Research in Adaptive and Convergent Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study

General-purpose Graphics Processor Architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media