skip to main content
10.1145/2675744.2675767acmotherconferencesArticle/Chapter ViewAbstractPublication PagescomputeConference Proceedingsconference-collections
research-article

Execution profile driven speedup estimation for porting sequential code to GPU

Published: 09 October 2014 Publication History

Abstract

Parallelization of an existing sequential application to achieve a good speed-up on a data-parallel infrastructure is quite difficult and time consuming effort. One of the important steps towards this is to assess whether the existing application in its current form can be parallelized to get the desired speedup. In this paper, we propose a method of analyzing an existing sequential source code that contains data-parallel loops, and give a reasonably accurate prediction of the extent of speedup possible from this algorithm. The proposed method performs static and dynamic analysis of the sequential source code to determine the time required by various portions of the code, including the data-parallel portions. Subsequently, it uses a set of novel invariants to calculate various bottlenecks that exists if the program is to be transferred to a GPGPU platform and predicts the extent of parallelization necessary by the GPU in order to achieve the desired end-to-end speedup. Our approach does not require creation of GPU code skeletons of the data parallel portions in the sequential code, thereby reducing the performance prediction effort. We observed a reasonably accurate speedup prediction when we tested our approach on multiple well-known Rodinia benchmark applications, a popular matrix multiplication program and a fast Walsh transform program.

References

[1]
M. Baskaran, U. Bondhugula, et al. A compiler framework for optimization of affine loop nests for gpgpus. In Proc. of ACM Intl. Conf. on Supercomputing (ICS), 2008.
[2]
C. Boyd. Data Parallel Computing. Microsoft, 2008.
[3]
M. Boyer, J. Meng, and K. Kumaran. Improving gpu performance prediction with data transfer modeling. In Intl. Workshop on Accelerators and Hybrid Exascale Systems (ASHES), 2013.
[4]
S. Che, M. Boyer, et al. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Proc. of IEEE Intl. Symp. on Workload Characterization (IISWC), 2009.
[5]
A. I. El-Nashar. To parallelize or not to parallelize, speed up issue. International Journal of Distributed and Parallel Systems, 2, 2011.
[6]
P. N. Glaskowsky. NVidia fermi: The first complete GPU computing architecture. Technical report, 2009.
[7]
Intel Corp. Intel Advisor XE, 2013.
[8]
D. Jeon, S. Garcia, et al. Parkour: Parallel speedup estimates for serial programs. In USENIX Workshop on Hot Topics in Parallelism (HotPar), 2011.
[9]
M. Kim, P. Kumar, et al. Predicting potential speedup of serial code via lightweight profiling and emulations with memory performance model. In Proc. of IEEE Intl. Symp. on Parallel and Distributed Processing (IPDPS), 2012.
[10]
K. Kothapalli, R. Mukherjee, et al. A performance prediction model for the cuda gpgpu platform. In Proc. of IEEE High Performance Computing (HiPC), 2009.
[11]
C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proc. of IEEE Intl. Symp. on Code Generation and Optimization (CGO), 2004.
[12]
J. Meng, V. A. Morozov, et al. GROPHECY: GPU Performance Projection from CPU Code Skeletons. In Proc. of ACM Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SC), 2011.
[13]
J. Nickolls and W. J. Dally. The GPU computing era. IEEE Micro, 30(2), 2010.
[14]
K. Zaiki. Apparatus for detecting possibility of parallel processing and method thereof and a program translation apparatus utilized therein, 2013.

Cited By

View all
  • (2022)Fast selection of compiler optimizations using performance prediction with graph neural networksConcurrency and Computation: Practice and Experience10.1002/cpe.686935:17Online publication date: 16-Mar-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
COMPUTE '14: Proceedings of the 7th ACM India Computing Conference
October 2014
175 pages
ISBN:9781605588148
DOI:10.1145/2675744
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Google India: Google India
  • Persistent Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU
  2. SIMT
  3. code analysis
  4. data transfer
  5. instrumentation
  6. speedup estimation
  7. warp

Qualifiers

  • Research-article

Conference

Compute '14
Sponsor:
  • Google India
Compute '14: ACM India Compute Conference
October 9 - 11, 2014
Nagpur, India

Acceptance Rates

COMPUTE '14 Paper Acceptance Rate 21 of 110 submissions, 19%;
Overall Acceptance Rate 114 of 622 submissions, 18%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Fast selection of compiler optimizations using performance prediction with graph neural networksConcurrency and Computation: Practice and Experience10.1002/cpe.686935:17Online publication date: 16-Mar-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media