skip to main content
column

KPN2GPU: an approach for discovery and exploitation of fine-grain data parallelism in process networks

Published:19 December 2011Publication History
Skip Abstract Section

Abstract

With advances in manycore and accelerator architectures, the high performance and embedded spaces are rapidly converging. Emerging architectures feature different forms of parallelism. The Polyhedral Processes Networks (PPNs) are a proven model of choice for automated generation of pipeline and task parallel programs from sequential source code, however data parallelism is not addressed. In this paper, we present asystematic approach for identification and extraction of fine grain data parallelism from the PPN specification. The approach is implemented in a tool, called kpn2gpu, which produces fine-grain data parallel CUDA kernels for graphics processing units (GPUs). First experiments indicate that generated applications have a potential to exploit different forms of parallelism provided by the architecture and that kernels feature a highly regular structure that allows subsequent optimizations.

References

  1. ACE Associated Compiler Experts bv. Parallelization using polyhedral analysis. 2008.Google ScholarGoogle Scholar
  2. S. Baghdadi, A. Grölinger, and A. Cohen. Putting automatic polyhedral compilation for GPGPU to work. Proc of CPC'10.Google ScholarGoogle Scholar
  3. A. Balevic and B. Kienhuis. A Data Parallel View on Polyhedral Process Networks. SCOPES'11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Baskaran, J. Ramanujam, and P. Sadayappan. Automatic C-to-CUDA code generation for affine programs. In Proc. of Compiler Construction (CC 2010). Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. U. Bondhugula et al. PLuTo: a practical and fully automatic polyhedral program optimization system. In Proc. of PLDI'08, Tucson, AZ, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Springer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Feautrier. Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20(1):23--53, 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Feautrier. Some efficient solutions to the affine scheduling problem. Part I. One-dimensional time. IJPP'92, 21(5):313--347, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Feautrier. Scalable and structured scheduling. IJPP'06, 34(5):459--487, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Kahn and D. MacQueen. Coroutines and Networks of Parallel Processes. In Proceedings of IFIP Congress 77, pages 993--998, 1977.Google ScholarGoogle Scholar
  11. B. Kienhuis, E. Rijpkema, and E. Deprettere. Compaan: Deriving process networks from matlab for embedded signal processing architectures. In Proc. of CODES'00, pages 13--17. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. A. Lee and T. M. Parks. Dataflow process networks. Proc. of the IEEE, 83(5):773--801, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  13. C. Lengauer. Loop parallelization in the polytope model. LECTURE NOTES IN COMPUTER SCIENCE, pages 398--398, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Meijer, H. Nikolov, and T. Stefanov. Combining process splitting and merging transformations for polyhedral process networks. Proc. ESTIMedia'10.Google ScholarGoogle Scholar
  15. NVIDIA Corp. NVIDIA CUDA Technical Documentation: Programming and Best Practices Guide V3.2. Technical report, Sept. 2010.Google ScholarGoogle Scholar
  16. T. Stefanov et al. System design using Kahn process networks: the Compaan/Laura approach. In Proc. of DATE'04, volume 1, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Verdoolaege. Polyhedral process networks. Handbook of Signal Processing Systems, pages 931--965, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  18. Y. Yang, P. Xiang, J. Kong, and H. Zhou. A GPGPU compiler for memory optimization and parallelism management. ACM SIGPLAN Notices, 45(6):86--97, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. KPN2GPU: an approach for discovery and exploitation of fine-grain data parallelism in process networks
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGARCH Computer Architecture News
            ACM SIGARCH Computer Architecture News  Volume 39, Issue 4
            September 2011
            116 pages
            ISSN:0163-5964
            DOI:10.1145/2082156
            Issue’s Table of Contents

            Copyright © 2011 Authors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 December 2011

            Check for updates

            Qualifiers

            • column

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader