research-article

Execution profile driven speedup estimation for porting sequential code to GPU

Authors:

Santonu Sarkar,

Sayantan MitraAuthors Info & Claims

COMPUTE '14: Proceedings of the 7th ACM India Computing Conference

Article No.: 21, Pages 1 - 6

https://doi.org/10.1145/2675744.2675767

Published: 09 October 2014 Publication History

Get Access

Abstract

Parallelization of an existing sequential application to achieve a good speed-up on a data-parallel infrastructure is quite difficult and time consuming effort. One of the important steps towards this is to assess whether the existing application in its current form can be parallelized to get the desired speedup. In this paper, we propose a method of analyzing an existing sequential source code that contains data-parallel loops, and give a reasonably accurate prediction of the extent of speedup possible from this algorithm. The proposed method performs static and dynamic analysis of the sequential source code to determine the time required by various portions of the code, including the data-parallel portions. Subsequently, it uses a set of novel invariants to calculate various bottlenecks that exists if the program is to be transferred to a GPGPU platform and predicts the extent of parallelization necessary by the GPU in order to achieve the desired end-to-end speedup. Our approach does not require creation of GPU code skeletons of the data parallel portions in the sequential code, thereby reducing the performance prediction effort. We observed a reasonably accurate speedup prediction when we tested our approach on multiple well-known Rodinia benchmark applications, a popular matrix multiplication program and a fast Walsh transform program.

References

[1]

M. Baskaran, U. Bondhugula, et al. A compiler framework for optimization of affine loop nests for gpgpus. In Proc. of ACM Intl. Conf. on Supercomputing (ICS), 2008.

Digital Library

Google Scholar

[2]

C. Boyd. Data Parallel Computing. Microsoft, 2008.

Digital Library

Google Scholar

[3]

M. Boyer, J. Meng, and K. Kumaran. Improving gpu performance prediction with data transfer modeling. In Intl. Workshop on Accelerators and Hybrid Exascale Systems (ASHES), 2013.

Digital Library

Google Scholar

[4]

S. Che, M. Boyer, et al. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Proc. of IEEE Intl. Symp. on Workload Characterization (IISWC), 2009.

Digital Library

Google Scholar

[5]

A. I. El-Nashar. To parallelize or not to parallelize, speed up issue. International Journal of Distributed and Parallel Systems, 2, 2011.

Google Scholar

[6]

P. N. Glaskowsky. NVidia fermi: The first complete GPU computing architecture. Technical report, 2009.

Google Scholar

[7]

Intel Corp. Intel Advisor XE, 2013.

Google Scholar

[8]

D. Jeon, S. Garcia, et al. Parkour: Parallel speedup estimates for serial programs. In USENIX Workshop on Hot Topics in Parallelism (HotPar), 2011.

Digital Library

Google Scholar

[9]

M. Kim, P. Kumar, et al. Predicting potential speedup of serial code via lightweight profiling and emulations with memory performance model. In Proc. of IEEE Intl. Symp. on Parallel and Distributed Processing (IPDPS), 2012.

Digital Library

Google Scholar

[10]

K. Kothapalli, R. Mukherjee, et al. A performance prediction model for the cuda gpgpu platform. In Proc. of IEEE High Performance Computing (HiPC), 2009.

Crossref

Google Scholar

[11]

C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proc. of IEEE Intl. Symp. on Code Generation and Optimization (CGO), 2004.

Digital Library

Google Scholar

[12]

J. Meng, V. A. Morozov, et al. GROPHECY: GPU Performance Projection from CPU Code Skeletons. In Proc. of ACM Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SC), 2011.

Digital Library

Google Scholar

[13]

J. Nickolls and W. J. Dally. The GPU computing era. IEEE Micro, 30(2), 2010.

Digital Library

Google Scholar

[14]

K. Zaiki. Apparatus for detecting possibility of parallel processing and method thereof and a program translation apparatus utilized therein, 2013.

Google Scholar

Cited By

View all

do Rosario Vda Silva AZanella ANapoli OBorin E(2022)Fast selection of compiler optimizations using performance prediction with graph neural networksConcurrency and Computation: Practice and Experience10.1002/cpe.686935:17Online publication date: 16-Mar-2022
https://doi.org/10.1002/cpe.6869

Index Terms

Execution profile driven speedup estimation for porting sequential code to GPU

Recommendations

Generating data transfers for distributed GPU parallel programs

Nowadays, high performance applications exploit multiple level architectures, due to the presence of hardware accelerators like GPUs inside each computing node. Data transfers occur at two different levels: inside the computing node between the CPU and ...
Efficient warp execution in presence of divergence with collaborative context collection
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

GPU's SIMD architecture is a double-edged sword confronting parallel tasks with control flow divergence. On the one hand, it provides a high performance yet power-efficient platform to accelerate applications via massive parallelism; however, on the ...
NVIDIA Tesla: A Unified Graphics and Computing Architecture

To enable flexible, programmable graphics and high-performance computing, NVIDIA has developed the Tesla scalable unified graphics and parallel computing architecture. Its scalable parallel array of processors is massively multithreaded and programmable ...

Comments

Information & Contributors

Information

Published In

COMPUTE '14: Proceedings of the 7th ACM India Computing Conference

October 2014

175 pages

ISBN:9781605588148

DOI:10.1145/2675744

General Chairs:
Pushpak Bhattacharya
IIT, Mumbai
,
P. J. Narayanan
IIIT Hyderabad
,
Program Chair:
Srinivas Padmanabhuni
ACM India and Infosys Labs

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

Compute '14

Sponsor:

Google India

Compute '14: ACM India Compute Conference

October 9 - 11, 2014

Nagpur, India

Acceptance Rates

COMPUTE '14 Paper Acceptance Rate 21 of 110 submissions, 19%;

Overall Acceptance Rate 114 of 622 submissions, 18%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
65
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

do Rosario Vda Silva AZanella ANapoli OBorin E(2022)Fast selection of compiler optimizations using performance prediction with graph neural networksConcurrency and Computation: Practice and Experience10.1002/cpe.686935:17Online publication date: 16-Mar-2022
https://doi.org/10.1002/cpe.6869

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Generating data transfers for distributed GPU parallel programs

Efficient warp execution in presence of divergence with collaborative context collection

NVIDIA Tesla: A Unified Graphics and Computing Architecture

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations