column

KPN2GPU: an approach for discovery and exploitation of fine-grain data parallelism in process networks

Authors:
Ana Balevic

University of Leiden, Leiden, The Netherlands

University of Leiden, Leiden, The Netherlands
View Profile

,
Bart Kienhuis

University of Leiden, Leiden, The Netherlands

University of Leiden, Leiden, The Netherlands
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 39 Issue 4September 2011pp 66–71https://doi.org/10.1145/2082156.2082173

Published:19 December 2011Publication History

ACM SIGARCH Computer Architecture News

Abstract

With advances in manycore and accelerator architectures, the high performance and embedded spaces are rapidly converging. Emerging architectures feature different forms of parallelism. The Polyhedral Processes Networks (PPNs) are a proven model of choice for automated generation of pipeline and task parallel programs from sequential source code, however data parallelism is not addressed. In this paper, we present asystematic approach for identification and extraction of fine grain data parallelism from the PPN specification. The approach is implemented in a tool, called kpn2gpu, which produces fine-grain data parallel CUDA kernels for graphics processing units (GPUs). First experiments indicate that generated applications have a potential to exploit different forms of parallelism provided by the architecture and that kernels feature a highly regular structure that allows subsequent optimizations.

References

ACE Associated Compiler Experts bv. Parallelization using polyhedral analysis. 2008.Google Scholar
S. Baghdadi, A. Grölinger, and A. Cohen. Putting automatic polyhedral compilation for GPGPU to work. Proc of CPC'10.Google Scholar
A. Balevic and B. Kienhuis. A Data Parallel View on Polyhedral Process Networks. SCOPES'11. Google ScholarDigital Library
M. Baskaran, J. Ramanujam, and P. Sadayappan. Automatic C-to-CUDA code generation for affine programs. In Proc. of Compiler Construction (CC 2010). Springer, 2010. Google ScholarDigital Library
U. Bondhugula et al. PLuTo: a practical and fully automatic polyhedral program optimization system. In Proc. of PLDI'08, Tucson, AZ, 2008.Google ScholarDigital Library
A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Springer, 2000. Google ScholarDigital Library
P. Feautrier. Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20(1):23--53, 1991.Google ScholarDigital Library
P. Feautrier. Some efficient solutions to the affine scheduling problem. Part I. One-dimensional time. IJPP'92, 21(5):313--347, 1992. Google ScholarDigital Library
P. Feautrier. Scalable and structured scheduling. IJPP'06, 34(5):459--487, 2006. Google ScholarDigital Library
G. Kahn and D. MacQueen. Coroutines and Networks of Parallel Processes. In Proceedings of IFIP Congress 77, pages 993--998, 1977.Google Scholar
B. Kienhuis, E. Rijpkema, and E. Deprettere. Compaan: Deriving process networks from matlab for embedded signal processing architectures. In Proc. of CODES'00, pages 13--17. ACM, 2000. Google ScholarDigital Library
E. A. Lee and T. M. Parks. Dataflow process networks. Proc. of the IEEE, 83(5):773--801, 2002.Google ScholarCross Ref
C. Lengauer. Loop parallelization in the polytope model. LECTURE NOTES IN COMPUTER SCIENCE, pages 398--398, 1993. Google ScholarDigital Library
S. Meijer, H. Nikolov, and T. Stefanov. Combining process splitting and merging transformations for polyhedral process networks. Proc. ESTIMedia'10.Google Scholar
NVIDIA Corp. NVIDIA CUDA Technical Documentation: Programming and Best Practices Guide V3.2. Technical report, Sept. 2010.Google Scholar
T. Stefanov et al. System design using Kahn process networks: the Compaan/Laura approach. In Proc. of DATE'04, volume 1, 2004. Google ScholarDigital Library
S. Verdoolaege. Polyhedral process networks. Handbook of Signal Processing Systems, pages 931--965, 2010.Google ScholarCross Ref
Y. Yang, P. Xiang, J. Kong, and H. Zhou. A GPGPU compiler for memory optimization and parallelism management. ACM SIGPLAN Notices, 45(6):86--97, 2010. Google ScholarDigital Library

Index Terms

KPN2GPU: an approach for discovery and exploitation of fine-grain data parallelism in process networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
Read More
Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors
IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

Intel® Xeon Phi™ coprocessor is based on the Intel® Many Integrated Core (Intel® MIC) architecture, which is an innovative new processor architecture that combines abundant thread parallelism with long SIMD vector units. Efficiently exploiting SIMD ...
Read More
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGARCH Computer Architecture News Volume 39, Issue 4
September 2011
116 pages
ISSN:0163-5964
DOI:10.1145/2082156
Issue’s Table of Contents

Copyright © 2011 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 December 2011
Check for updates
Qualifiers
- column
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 120
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

KPN2GPU: an approach for discovery and exploitation of fine-grain data parallelism in process networks

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi

Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

KPN2GPU: an approach for discovery and exploitation of fine-grain data parallelism in process networks

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi

Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors