skip to main content
10.1145/3085158.3086159acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

How Effective is Design Abstraction in Thrust?: An Empirical Evaluation

Published: 26 June 2017 Publication History

Abstract

High performance computing applications are far more difficult to write, therefore, practitioners expect a well-tuned software to last long and provide optimized performance even when the hardware is upgraded. It may also be necessary to write software using sufficient abstraction over the hardware so that it is capable of running on heterogeneous architecture. A good design abstraction paradigm strikes a balance between the abstraction and visibility over the hardware. This allows the programmer to write applications without having to understand the hardware nuances while exploiting the computing power optimally. In this paper we have analyzed the power of design abstraction of a popular design abstraction framework called Thrust both from ease of programming and performance perspectives. We have shown that while Thrust framework is good in describing an algorithm compared to the native CUDA or OpenMP version but it has quite a few design limitations. With respect to CUDA it does not provide any abstraction over the shared, texture or constant memory usage to the programmer. We have compared the performance of a Thrust application code in CUDA, OpenMP and the CPP backends with respect to the native versions (implementing exactly same algorithm), written for these backends and found that the current Thrust version performs poorly in most of the cases. While we conclude that the framework is not ready for writing applications that can exploit the optimal performance from the hardware, we also highlight the improvements necessary for the framework to make the performance comparable.

References

[1]
M. Abadi, P. Barham, J. Chen, Z. Chen, and other. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation, pages 265--283, 2016.
[2]
S. V. Adve, V. S. Adve, G. Agha, M. I. Frank, M. J. Garzarin, J. C. Hart, W. mei W. Hwu, R. E. Johnson, L. Kale, R. Kumar, D. Marinov, K. Nahrstedt, D. Padua, M. Parthasarathy, S. Patel, G. Rosu, D. Roth, M. Snir, J. Torrellas, and C. Zilles. Parallel@illinois. Technical report, University of Illinois at Urbana-Champaign, 2008.
[3]
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical report, EECS Department, University of California, Berkeley, 2006.
[4]
C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
[5]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), pages 44--54. IEEE Computer Society, 2009.
[6]
R. Collobert, S. Bengio, and J. Mariéthoz. Torch: a modular machine learning software library. Technical report, Idiap, 2002.
[7]
H. González-Vélez and M. Leyton. A survey of algorithmic skeleton frameworks: High-level structured parallel programming enablers. Softw. Pract. Exper., 40(12):1135--1160, 2010.
[8]
W.-m. W. Hwu. GPU Computing Gems Jade Edition. Morgan Kaufmann Publishers Inc., 1st edition, 2011.
[9]
B. Kuhn, P. Petersen, and E. O'Toole. Openmp versus threading in c/c+. Concurrency: Pract. Exper, 12:1165--1176, 2000.
[10]
C. NVIDIA. cublas library, 2007.
[11]
NVIDIA Corporation. CUDA C PROGRAMMING GUIDE. NVIDIA Corporation, 2015.
[12]
E. Rubin, E. Levy, A. Barak, and T. Ben-Nun. Maps: Optimizing massively parallel applications using device-level memory abstraction. ACM Trans. Archit. Code Optim., 11(4):44:1--44:22, Dec. 2014.
[13]
S. Ryoo, C. I. Rodriguesy, et al. Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA. In ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). ACM, 2008.
[14]
D. Schmidl, T. Cramer, S. Wienke, C. Terboven, and M. S. Müller. Assessing the Performance of OpenMP Programs on the Intel Xeon Phi, pages 547--558. Springer Berlin Heidelberg, 2013.
[15]
Y.-P. You, H.-J. Wu, Y.-N. Tsai, and Y.-T. Chao. Virtcl: A framework for opencl device abstraction and management. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015), pages 161--172. ACM, 2015.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SEM4HPC '17: Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications
June 2017
36 pages
ISBN:9781450350006
DOI:10.1145/3085158
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. design abstraction
  2. shared memory
  3. software complexity
  4. thrust

Qualifiers

  • Research-article

Funding Sources

  • Science and Engineering Research Board Govt. of India

Conference

HPDC '17
Sponsor:

Acceptance Rates

SEM4HPC '17 Paper Acceptance Rate 3 of 5 submissions, 60%;
Overall Acceptance Rate 8 of 16 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 105
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media