ABSTRACT
Parallel programming can provide higher computational performance over a sequential implementation by making use of the many cores available in parallel systems. However, the parallel-capable devices introduce complexity into the programming model. Current parallel programming API's such as OpenCL and CUDA provide interfaces to the parallel devices, but are complex and result in code which includes cross-cutting components, such as setting up the parallel programming context, compiling the parallel kernel, and transferring data between the host and device memory spaces when the kernel is executed. A C++ Aspect-Oriented based Parallel Programming (CAPP) framework is developed, using AspectC++ and OpenCL, which defines aspects to remove the cross-cutting components from the C++ code. The aspects set up the OpenCL context, compile the OpenCL kernel, and manage the data transfer between the memory spaces each time a kernel is executed. The aspects are woven into the C++ code before compilation, rather than at run time, which improves performance. An interface is provided for executing OpenCL kernels from the C++ code, essentially providing parallel programming in C++. The framework was applied to the SAXPY and Black-Scholes option pricing problems. Computational performance was, on average, 1.4-7% slower than the OpenCL implementation and up to 9 times faster than the sequential C++ implementation for the Black-Scholes problem. The amount of code was greatly reduced from the OpenCL implementation, and the resulting CAPP framework C++ code was simple and modular, resembling the sequential C++ implementation code. The amount of CAPP code required for the examples was found to be comparable to that of C++ AMP, another parallel programming framework.
- J. Breitbart. CuPP - A framework for easy CUDA integration. In Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1--8, May 2009. Google ScholarDigital Library
- I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. In ACM SIGGRAPH 2004 Papers, SIGGRAPH '04, pages 777--786, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- D. Charousset, T. C. Schmidt, and R. Hiesgen. CAF C++ Actor Framework: An Open Source Implementation of the Actor Model in C++. Available at http://www.actor-framework.org/#about {Accessed 7 May 2015}.Google Scholar
- D. Charousset, T. C. Schmidt, and R. Hiesgen. CAF - The C++ Actor Framework for Scalable and Resource-efficient Applications. In Proc. of the 5th ACM SIGPLAN Conf. on Systems Programming and Applications (SPLASH '14) Workshop AGERE! ACM, October 2014. Google ScholarDigital Library
- A. Gal, W. Schröder-Preikschat, and O. Spinczyk. AspectC++: Language Proposal and Prototype Implementation. In Accepted at the Workshop on Advanced Separation of Concerns in Object-Oriented Systems. OOPSLA, October 2001.Google Scholar
- C. Gregg and K. Hazelwood. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. In Performance Analysis of Systems and Software (ISPASS), 2011 IEEE International Symposium on, pages 134--144, April 2011. Google ScholarDigital Library
- Stanford VLSI Group. Clock frequency. Available at http://cpudb.stanford.edu/visualize/clock_frequency {Accessed 13 June 2015}.Google Scholar
- M. Harris. An Easy Introduction to CUDA C and C++. NVIDIA CUDA ZONE. Available at http://devblogs.nvidia.com/parallelforall/easy-introduction-cuda-c-and-c/ {Accessed 3 June 2015}.Google Scholar
- M. Harris. Six Ways to SAXPY. NVIDIA CUDA ZONE. Available at http://devblogs.nvidia.com/parallelforall/six-ways-saxpy/ {Accessed 8 June 2015}.Google Scholar
- Khronos Group. OpenCL 1.0 Specification, Dec. 2008. Rev. 29. https://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf.Google Scholar
- G. Kiczales, J. Lamping, A. Mendhekar, C. Maeda, C. Lopes, J.-M. Loingtier, and J. Irwin. Aspect-Oriented Programming. In European Conference on Object-Oriented Programming, pages 220--242. Springer-Verlag, June 1997.Google Scholar
- R. Kumar, K. Farkar, N. Jouppi, P. Ranganathan, and D. Tullsen. Single-ISA Hetrogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In Proc. of the 36th International Symposium on Microarchitecture, pages 81--92. IEEE/ACM, Dec. 2003. Google ScholarDigital Library
- Microsoft. C++ AMP (C++ Accelerated Massive Parallelism). Available at https://msdn.microsoft.com/en-us/library/hh265136.aspx {Accessed 7 May 2015}.Google Scholar
- Nvidia. CUDA RUNTIME API, March. 2015. http://docs.nvidia.com/cuda/index.html#axzz3cGMEdjIx.Google Scholar
- M. Pharr and C. Kolb. Options Pricing on the GPU. Addison-Wesley Professional, 2005.Google Scholar
- RapidMind Inc. Writing Applications for the GPU Using the RapidMind Development Platform, 2006. http://www.cs.ucla.edu/~palsberg/course/cs239/papers/rapidmind.pdf.Google Scholar
- J. L. Sobral, M. P. Monteiro, and C. A. Cunha. Aspect-Oriented Support for Modular Parallel Computing. In High Performance Computing for Computational Science - VECPAR 2006, volume LNCS 4395, pages 93--106. Springer-Verlag Berlin Heidelberg, 2007. Google ScholarDigital Library
- O. Spinczyk, A. Gal, and W. Schröder-Preikschat. AspectC++: An Aspect-Oriented Extension to C++. In Proc. of the 40th International Conference on Technology of Object-Oriented Languages and Systems. TOOLS, February 2002. Google ScholarDigital Library
- M. Wang and M. Parashar. Object-oriented stream programming using aspects. In Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1--11, April 2010.Google ScholarCross Ref
- V. Weinberg, M. Brehm, and I. Christadler. OMI4papps: Optimisation, Modelling and Implementation for Highly Parallel Applications. In S. Wagner, M. Steinmetz, A. Bode, and M. M. Müller, editors, High Performance Computing in Science and Engineering, Garching/Munich 2009, pages 51--62. Springer Berlin Heidelberg, 2010.Google Scholar
Recommendations
MapCG: writing parallel program portable between CPU and GPU
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesGraphics Processing Units (GPU) have been playing an important role in the general purpose computing market recently. The common approach to program GPU today is to write GPU specific code with low level GPU APIs such as CUDA. Although this approach can ...
Research on Parallelization of Aspect-Oriented Program
DASC '09: Proceedings of the 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure ComputingAspect-oriented programming, as an ideal candidate to encapsulate crosscutting functionalities, has been adopted for run-time monitoring, failure forecasting, fault tolerance and etc. While, nowadays aspect-oriented techniques are not used to multi-core ...
Preliminary experiences with the uintah framework on Intel Xeon Phi and stampede
XSEDE '13: Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to DiscoveryIn this work, we describe our preliminary experiences on the Stampede system in the context of the Uintah Computational Framework. Uintah was developed to provide an environment for solving a broad class of fluid-structure interaction problems on ...
Comments