Proceedings of the 8th Workshop on General Purpose Processing using GPUs

GPGPU-8: Proceedings of the 8th Workshop on General Purpose Processing using GPUs

February 2015

2015 Proceeding

Program Chairs:
David Kaeli
Northeastern University, USA
,
John Cavazos
University of Delaware, USA

Publisher:

Association for Computing Machinery
New York
NY
United States

Conference:

GPGPU-8: General-purpose Processing with Graphics Processing Units 8 San Francisco CA USA 7 February 2015

ISBN:

978-1-4503-3407-5

Published:

07 February 2015

In-Cooperation:

SIGPLAN

Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Get Alerts for this ConferenceAlerts Save to BinderBinder

Save to Binder

Create a New Binder

Name

Export CitationCitation

Share on

Reflects downloads up to 18 Feb 2025Bibliometrics

Citation Count

177

Downloads (6 weeks)

Downloads (12 months)

311

Downloads (cumulative)

4,727

Sections

GPGPU-8: Proceedings of the 8th Workshop on General Purpose Processing using GPUs

2015

Previous Next

Abstract

No abstract available.

Proceeding Downloads

PDFFront matter (Title page, Message from the chairs, Organization, Sponsors and supporters)

Skip Table Of Content Section

Select All

Export Citations Save to Binder

SESSION: HPC

research-article

A comparative investigation of device-specific mechanisms for exploiting HPC accelerators

Ayman Tarakji,
Lukas Börger,
Rainer Leupers

Pages 1–12https://doi.org/10.1145/2716282.2716293

A variety of computational accelerators have been greatly improved in recent years. Intel's MIC (Many Integrated Core) and both GPU architectures, NVIDIA's Kepler and AMD's Graphics Core Next, all represent real innovations in the field of HPC. Based ...

- 0
- 165
Metrics
Total Citations0
Total Downloads165
Last 12 Months1
Last 6 weeks0

Abstract
Get Access

SESSION: Cache and Shared Memory

research-article

GPU-SM: shared memory multi-GPU programming

Javier Cabezas,
Marc Jordà,
Isaac Gelado,
Nacho Navarro,
Wen-mei Hwu

Pages 13–24https://doi.org/10.1145/2716282.2716286

Discrete GPUs in modern multi-GPU systems can transparently access each other's memories through the PCIe interconnect. Future systems will improve this capability by including better GPU interconnects such as NVLink. However, remote memory access ...

- 3
- 603
Metrics
Total Citations3
Total Downloads603
Last 12 Months53
Last 6 weeks0

Abstract
Get Access

research-article

Adaptive GPU cache bypassing

Yingying Tian,
Sooraj Puthoor,
Joseph L. Greathouse,
Bradford M. Beckmann,
Daniel A. Jiménez

Pages 25–35https://doi.org/10.1145/2716282.2716283

Modern graphics processing units (GPUs) include hardware- controlled caches to reduce bandwidth requirements and energy consumption. However, current GPU cache hierarchies are inefficient for general purpose GPU (GPGPU) comput- ing. GPGPU workloads ...

- 58
- 872
Metrics
Total Citations58
Total Downloads872
Last 12 Months45
Last 6 weeks4

Abstract
Get Access

research-article

Efficient utilization of GPGPU cache hierarchy

Mahmoud Khairy,
Mohamed Zahran,
Amr G. Wassal

Pages 36–47https://doi.org/10.1145/2716282.2716291

Recent GPUs are equipped with general-purpose L1 and L2 caches in an attempt to reduce memory bandwidth demand and improve the performance of some irregular GPGPU applications. However, due to the massive multithreading, GPGPU caches suffer from severe ...

- 22
- 724
Metrics
Total Citations22
Total Downloads724
Last 12 Months32
Last 6 weeks3

Abstract
Get Access

SESSION: Optimization

research-article

Effects of source-code optimizations on GPU performance and energy consumption

Jared Coplin,
Martin Burtscher

Pages 48–58https://doi.org/10.1145/2716282.2716292

This paper studies the effects of source-code optimizations on the performance, power draw, and energy consumption of a modern compute GPU. We evaluate 128 versions of two n-body codes: a compute-bound regular implementation and a memory-bound ...

- 12
- 227
Metrics
Total Citations12
Total Downloads227
Last 12 Months15
Last 6 weeks5

Abstract
Get Access

research-article

Public Access

Optimization for performance and energy for batched matrix computations on GPUs

Azzam Haidar,
Tingxing Dong,
Piotr Luszczek,
Stanimire Tomov,
Jack Dongarra

Pages 59–69https://doi.org/10.1145/2716282.2716288

As modern hardware keeps evolving, an increasingly effective approach to develop energy efficient and high-performance solvers is to design them to work on many small size independent problems. Many applications already need this functionality, ...

- 8
- 487
Metrics
Total Citations8
Total Downloads487
Last 12 Months91
Last 6 weeks18

Abstract
View online with eReader
PDF

research-article

Helium: a transparent inter-kernel optimizer for OpenCL

Thibaut Lutz,
Christian Fensch,
Murray Cole

Pages 70–80https://doi.org/10.1145/2716282.2716284

State of the art automatic optimization of OpenCL applications focuses on improving the performance of individual compute kernels. Programmers address opportunities for inter-kernel optimization in specific applications by ad-hoc hand tuning: manually ...

- 7
- 341
Metrics
Total Citations7
Total Downloads341
Last 12 Months8
Last 6 weeks0

Abstract
Get Access

SESSION: Applications

research-article

Stochastic gradient descent on GPUs

Rashid Kaleem,
Sreepathi Pai,
Keshav Pingali

Pages 81–89https://doi.org/10.1145/2716282.2716289

Irregular algorithms such as Stochastic Gradient Descent (SGD) can benefit from the massive parallelism available on GPUs. However, unlike in data-parallel algorithms, synchronization patterns in SGD are quite complex. Furthermore, scheduling for scale-...

- 26
- 427
Metrics
Total Citations26
Total Downloads427
Last 12 Months26
Last 6 weeks2

Abstract
Get Access

research-article

High performance computing of fiber scattering simulation

Leiming Yu,
Yan Zhang,
Xiang Gong,
Nilay Roy,
Lee Makowski,
David Kaeli

Pages 90–98https://doi.org/10.1145/2716282.2716285

Cellulose is one of the most promising energy resources that is waiting to be tapped. Harvesting energy from cellulose requires decoding its atomic structure. Some structural information can be exposed by modeling data produced by X-ray scattering. ...

- 2
- 147
Metrics
Total Citations2
Total Downloads147
Last 12 Months6
Last 6 weeks0

Abstract
Get Access

research-article

Rethinking the parallelization of random-restart hill climbing: a case study in optimizing a 2-opt TSP solver for GPU execution

Molly A. O'Neil,
Martin Burtscher

Pages 99–108https://doi.org/10.1145/2716282.2716287

Random-restart hill climbing is a common approach to combinatorial optimization problems such as the traveling salesman problem (TSP). We present and evaluate an implementation of random-restart hill climbing with 2-opt local search applied to TSP. Our ...

- 16
- 246
Metrics
Total Citations16
Total Downloads246
Last 12 Months19
Last 6 weeks2

Abstract
Get Access

research-article

Forma: a DSL for image processing applications to target GPUs and multi-core CPUs

Mahesh Ravishankar,
Justin Holewinski,
Vinod Grover

Pages 109–120https://doi.org/10.1145/2716282.2716290

As architectures evolve, optimization techniques to obtain good performance evolve as well. Using low-level programming languages like C/C++ typically results in architecture-specific optimization techniques getting entangled with the application ...

- 23
- 422
Metrics
Total Citations23
Total Downloads422
Last 12 Months15
Last 6 weeks0

Abstract
Get Access

Save to Binder

Create a New Binder

Name

Contributors

David R. Kaeli
Northeastern University
- Publication Years1991 - 2024
- Publication counts193
- Citation count2,239
- Available for Download106
- Downloads (cumulative)60,542
- Downloads (12 months)11,436
- Downloads (6 weeks)1,322
- Average Downloads per Article571
- Average Citation per Article12
View Full Profile
John Cavazos
University of Delaware
- Publication Years1996 - 2018
- Publication counts46
- Citation count1,327
- Available for Download37
- Downloads (cumulative)21,709
- Downloads (12 months)2,470
- Downloads (6 weeks)211
- Average Downloads per Article587
- Average Citation per Article29
View Full Profile

Index Terms

Proceedings of the 8th Workshop on General Purpose Processing using GPUs
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages

Comments

0 Comments

Recommendations

LDTA '12: Proceedings of the Twelfth Workshop on Language Descriptions, Tools, and Applications
UbiMob '05: Proceedings of the 2nd French-speaking conference on Mobility and ubiquity computing
TADDS '12: Proceedings of the 4th International Workshop on Theoretical Aspects of Dynamic Distributed Systems

Acceptance Rates

Overall Acceptance Rate 57 of 129 submissions, 44%

Year	Submitted	Accepted	Rate
GPGPU '20	12	7	58%
GPGPU '19	15	6	40%
GPGPU-10	15	8	53%
GPGPU '16	23	9	39%
GPGPU-7	27	12	44%
GPGPU-6	37	15	41%
Overall	129	57	44%

Export Citations

Select Citation format

Please download or close your previous search result export first before starting a new bulk export.
Preview is not available.
By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.
Download
- Download citation
- Copy citation

Save to Binder

Sections

Proceeding Downloads

Save to Binder

Index Terms

Recommendations

LDTA '12: Proceedings of the Twelfth Workshop on Language Descriptions, Tools, and Applications

UbiMob '05: Proceedings of the 2nd French-speaking conference on Mobility and ubiquity computing

TADDS '12: Proceedings of the 4th International Workshop on Theoretical Aspects of Dynamic Distributed Systems

Acceptance Rates