research-article

ThrustHetero: A Framework to Simplify Heterogeneous Computing Platform Programming using Design Abstraction

Authors:

Ajai V. George,

Santonu SarkarAuthors Info & Claims

ISEC '19: Proceedings of the 12th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference)

Article No.: 4, Pages 1 - 11

https://doi.org/10.1145/3299771.3299773

Published: 14 February 2019 Publication History

Abstract

Heterogeneous compute architectures like Multi-Core CPUs, CUDA GPUs, and Intel Xeon Phis have become prevalent over the years. While heterogeneity makes architecture specific features available to the programmer, it also makes application development difficult, as one needs to plan for optimal usage of architectural features, suitable partitioning of the workload, communication and data transfer among the participating devices. A suitable design abstraction that hides such variabilities of the underlying devices and at the same time exploits their computing capabilities, can improve developer productivity. In this work, we present "ThrustHetero", a lightweight framework based on NVIDIA's Thrust, that provides an abstraction over several devices such as GPUs, Xeon Phis and multicore, yet allows developers to easily leverage the full compute capability of these devices. We also demonstrate a novel method for workload distribution in two stages - micro-benchmarking during framework installations to find good proportions and then using this information during application execution. We consider four classes of applications based on how they would perform on various computing architectures on the basis of the amount of branching present in the application. We show that the framework produces a good workload distribution proportions for each class of application and also show that the framework is scalable and portable. Further, we compare the performance and ease of development when using the framework with the native versions of various benchmarks and obtain favorable results.

References

[1]

2017. CUDA C Programming Guide. Technical Report PG-02829-001_v8.0. NVIDIA. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf

[2]

Martin Abadi, Paul Barham, and other. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation. 265--283.

Digital Library

[3]

Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, et al. 2006. The landscape of parallel computing research: A view from berkeley. Technical Report. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley.

[4]

Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23, 2 (2011), 187--198.

Digital Library

[5]

Nathan Bell and Jared Hoberock. 2011. GPU Computing Gems Jade Edition (1st ed.). Morgan Kaufmann Publishers Inc., Chapter Thrust: A Productivity-Oriented Library for CUDA, 359--371.

[6]

Gordon Brown, Ruyman Reyes, and Michael Wong. 2017. Asynchronous Managed Pointer for Heterogeneous and Distributed Computing. techreport P0567r1. Open Standards. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0567r1.html

[7]

J.Lawrence Carter and Mark N. Wegman. 1979. Universal classes of hash functions. J. Comput. System Sci. 18, 2 (apr 1979), 143--154.

[8]

H. Carter Edwards, Christian R. Trott, and Daniel Sunderland. 2014. Kokkos, Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74, 12 (2014), 3202--3216.

Digital Library

[9]

Shuai Che, Michael Boyer, et al. 2009. Rodinia: A Benchmark Suite for Heterogeneous Computing. In IEEE Intl. Symp. on Workload Characterization (IISWC).

Digital Library

[10]

Murray Cole. 2004. Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel computing 30, 3 (2004), 389--406.

Digital Library

[11]

Ronan Collobert, Samy Bengio, and Johnny Mariéthoz. 2002. Torch: a modular machine learning software library. Technical Report. Idiap.

[12]

Alejandro Duran, Eduard Ayguadé, Rosa M Badia, Jesús Labarta, Luis Martinell, Xavier Martorell, and Judit Planas. 2011. Ompss: a proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters 21, 02 (2011), 173--193.

[13]

Michael Garland, Manjunath Kudlur, and Yili Zheng. 2012. Designing a unified programming model for heterogeneous machines. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 67.

Digital Library

[14]

Thierry Gautier, Xavier Besseron, and Laurent Pigeon. 2007. Kaapi: A thread scheduling runtime system for data flow computations on cluster of multi-processors. In Proceedings of the 2007 international workshop on Parallel symbolic computation. ACM, 15--23.

Digital Library

[15]

A. V. George, S. Manoj, S. R. Gupte, S. Mitra, and S. Sarkar. 2017. Thrust++: Extending Thrust Framework for Better Abstraction and Performance. In 2017 IEEE 24th International Conference on High Performance Computing (HiPC). 368--377.

[16]

A. V. George, S. Manoj, S. R. Gupte, and S. Sarkar. 2017. An Empirical Evaluation of Design Abstraction and Performance of Thrust Framework. In 46th International Conference on Parallel Processing Workshops (ICPPW). IEEE Computer Society.

[17]

Horacio González-Vélez and Mario Leyton. 2010. A Survey of Algorithmic Skeleton Frameworks: High-level Structured Parallel Programming Enablers. Softw.Pract. Exper. 40, 12 (2010), 1135--1160.

Digital Library

[18]

Kate Gregory and Ade Miller. 2012. C++ AMP: accelerated massive parallelism with Microsoft Visual C++. Microsoft Press.

[19]

John L Gustafson. 1988. Reevaluating Amdahl's law. Commun. ACM 31, 5 (1988), 532--533.

Digital Library

[20]

Hartmut Kaiser, Thomas Heller, Bryce Adelstein-Lelbach, Adrian Serio, and Dietmar Fey. 2014. Hpx: A task based programming model in a global address space. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models. ACM, 6.

Digital Library

[21]

Bob Kuhn, Paul Petersen, and Eamonn O Toole. 2000. OpenMP versus Threading in C/C+. Concurrency: Pract. Exper 12 (2000), 1165--1176.

[22]

Joao VF Lima and Daniel Di Domenico. 2017. HPSM: A Programming Framework for Multi-CPU and Multi-GPU Systems. In 2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW). IEEE, 31--36.

[23]

Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim. 2009. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 45--55.

Digital Library

[24]

Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang. 2012. Greengpu: A holistic approach to energy efficiency in gpu-cpu heterogeneous architectures. In Parallel Processing (ICPP), 2012 41st International Conference on. IEEE, 48--57.

Digital Library

[25]

Corporation NVIDIA. 2007. cuBLAS Library. http://docs.nvidia.com/cuda/cublas/#axzz49AimKtns

[26]

Eri Rubin, Ely Levy, Amnon Barak, and Tal Ben-Nun. 2014. MAPS: Optimizing Massively Parallel Applications Using Device-Level Memory Abstraction. ACM Trans. Archit. Code Optim. 11, 4 (Dec. 2014), 44:1--44:22.

Digital Library

[27]

Santonu Sarkar, Sayantan Mitra, and Ashok Srinivasan. 2012. Reuse and Refactoring of GPU Kernels to Design Complex Applications. In Intl. Symp. on Parallel and Distributed Processing with Applications. 134--141.

Digital Library

[28]

Dirk Schmidl, Tim Cramer, Sandra Wienke, Christian Terboven, and Matthias S. Müller. 2013. Assessing the Performance of OpenMP Programs on the Intel Xeon Phi. Springer Berlin Heidelberg, 547--558.

Digital Library

[29]

Hiroyuki Takizawa, Katuto Sato, and Hiroaki Kobayashi. 2008. SPRAT: Runtime processor selection for energy-aware computing. In Cluster Computing, 2008 IEEE International Conference on. IEEE, 386--393.

[30]

M. Voss and W. Kim. 2011. Multicore Desktop Programming with Intel Threading Building Blocks. IEEE Software 28, 1 (01 2011), 23--31.

Digital Library

[31]

Michael Wolfe. 2013. The OpenACC application programming interface.

[32]

Yi-Ping You, Hen-Jung Wu, Yeh-Ning Tsai, and Yen-Ting Chao. 2015. VirtCL: A Framework for OpenCL Device Abstraction and Management. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015). ACM, 161--172.

Digital Library

Index Terms

ThrustHetero: A Framework to Simplify Heterogeneous Computing Platform Programming using Design Abstraction
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
    2. Parallel architectures
      1. Single instruction, multiple data
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages
  2. Software organization and properties
    1. Extra-functional properties
      1. Software performance
    2. Software system structures
      1. Software system models
        Massively parallel systems

Recommendations

Vectorizing unstructured mesh computations for many-core architectures

Achieving optimal performance on the latest multi-core and many-core architectures increasingly depends on making efficient use of the hardware's vector units. This paper presents results on achieving high performance through vectorization on CPUs and ...
MIC acceleration of short-range molecular dynamics simulations
COSMIC '13: Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores

Heterogeneous systems containing accelerators such as GPUs or co-processors such as Intel MIC are becoming more prevalent due to their ability of exploiting large-scale parallelism in applications. In this paper, we have developed a hierarchical ...
Heterogeneous acceleration algorithms for shallow cumulus convection scheme over GPU clusters
Abstract
The physical process of atmospheric cumulus convection plays a crucial role in climate modeling, and its complex computational process severely restricts the development of high-resolution climate models. Accelerating the cumulus ...
Highlights
- A GPU-based acceleration algorithm for the UWshcu model is proposed.
- The memory ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ISEC '19: Proceedings of the 12th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference)

February 2019

238 pages

ISBN:9781450362153

DOI:10.1145/3299771

General Chair:
Ravindra Naik
TCS Research, Pune
,
Program Chairs:
Santonu Sarkar
BITS-Pilani, Goa
,
Thomas Hildebrandt
Univ. of Copenhagen, Denmark
,
Atul Kumar
IBM Research India
,
Publications Chair:
Richa Sharma
BMU, Gurugram

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

iSOFT: iSOFT
ACM: Association for Computing Machinery
Microsoft: Microsoft
ACM India: ACM India

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 February 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Science and Engineering Research Board

Conference

ISEC'19

ISEC'19: 12th Innovations in Software Engineering Conference

February 14 - 16, 2019

Pune, India

Acceptance Rates

Overall Acceptance Rate 76 of 315 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
84
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten