skip to main content
10.1145/3303084.3309496acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Task-DAG Support in Single-Source PHAST Library: Enabling Flexible Assignment of Tasks to CPUs and GPUs in Heterogeneous Architectures

Authors Info & Claims
Published:17 February 2019Publication History

ABSTRACT

Nowadays, the majority of desktop, mobile, and embedded devices in the consumer and industrial markets are heterogeneous, as they contain at least multi-core CPU and GPU resources in the same system. However, exploiting the performance and energy-efficiency of these diverse processing elements does not come for free from a software point of view: programmers need to a) code each activity through the specific approaches, libraries, and frameworks suitable for their target architecture (e.g., CPUs and GPUs) along with the orchestration of such heterogeneous execution, and b) decide the distribution of sequential and parallel activities towards the different parallel hardware resources available.

Current frameworks typically provide either low-abstraction-level target-specific and/or generic but not high-performance interfaces, which complicate the exploration of different task assignments, with DAG1 precedence relationship, to the available heterogeneous resources. To enable this, tasks would typically need to be coded one time for each target architecture due to the profound differences in their programming.

In this work, we include the support of tasks and DAGs of data-parallel tasks within the single-source PHAST library, which currently supports both multi-core CPUs and NVIDIA GPUs, so that tasks are coded in a target-agnostic fashion and their targeting to multi-core or GPU architectures is automatic and efficient. The integration of this coding approach with tasks can help to postpone the choice of the execution platform for each task up to the testing, or even to the runtime, phase.

Finally, we demonstrate the effects of this approach in the case of a sample image pipeline benchmark from the computer vision domain. We compare our implementation to a SYCL implementation from a productivity point of view. Also, we show that various task assignments can be seamlessly explored by implementing both the PEFT2 mapping technique along with an exhaustive search in the mapping space.

References

  1. H. Arabnejad and J. G. Barbosa. 2014. List Scheduling Algorithm for Heterogeneous Systems by an Optimistic Cost Table. IEEE Transactions on Parallel and Distributed Systems 25, 3 (March 2014), 682--694. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. 2011. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurr. Comput.: Pract. Exper. 23, 2 (Feb. 2011), 187--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. F. Bittencourt, R. Sakellariou, and E. R. M. Madeira. 2010. DAG Scheduling Using a Lookahead Variant of the Heterogeneous Earliest Finish Time Algorithm. In 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing. 27--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Bouvier and B. Sander. 2014. Applying AMD's Kaveri APU for heterogeneous computing. In 2014 IEEE Hot Chips 26 Symposium (HCS). 1--42.Google ScholarGoogle Scholar
  5. J. Bueno, L. Martinell, A. Duran, M. Farreras, X. Martorell, R. M. Badia, E. Ayguade, and J. Labarta. 2011. Productive Cluster Programming with OmpSs. In Proceedings of the 17th International Conference on Parallel Processing - Volume Part I (EuroPar'11). Springer-Verlag, Berlin, Heidelberg, 555--566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L.-C. Canon, E. Jeannot, R. Sakellariou, and W. Zheng. 2008. Comparative Evaluation Of The Robustness Of DAG Scheduling Heuristics. Springer US, Boston, MA, 73--84.Google ScholarGoogle Scholar
  7. H. C. Edwards and C. R. Trott. 2013. Kokkos: Enabling Performance Portability Across Manycore Architectures. In 2013 Extreme Scaling Workshop (xsw 2013). 18--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. en.cppreference.com. 2018. C++ named requirements- Callable. https://en.cppreference.com/w/cpp/named_req/CallableGoogle ScholarGoogle Scholar
  9. J. Enmyren and C. W. Kessler. 2010. SkePU: A Multi-backend Skeleton Programming Library for multi-GPU Systems. In Proc. of the Int. Workshop on High-level Par. Progr. and Applications (HLPP '10). ACM, New York, NY, USA, 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Franklin. 2017. NVIDIA Jetson TX2 Delivers Twice the Intelligence to the Edge. https://devblogs.nvidia.com/jetson-tx2-delivers-twice-intelligence-edge/Google ScholarGoogle Scholar
  11. A.-I. Funie, P. Grigoras, P. Burovskiy, W. Luk, and M. Salmon. 2018. Run-time Reconfigurable Acceleration for Genetic Programming Fitness Evaluation in Trading Strategies. Journal of Signal Proc. Systems 90, 1 (01 Jan 2018), 39--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. H. Halstead. 1977. Elements of Software Science (Operating and Programming Systems Ser.). Elsevier Science Inc., New York, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. ISO. 2011. ISO/IEC 14882:2011 - Information technology - Programming languages - C++. Standard. International Organization for Standardization, Geneva, CH.Google ScholarGoogle Scholar
  14. ISO. 2017. ISO/IEC14882:2017-Informationtechnology-Programminglanguages - C++. Standard. International Organization for Standardization, Geneva, CH.Google ScholarGoogle Scholar
  15. Khronos OpenCL Working Group. 2016. SYCL Provisional Specification, version 2.2. https://www.khronos.org/registry/sycl/specs/sycl-2.2.pdfGoogle ScholarGoogle Scholar
  16. Khronos OpenCL Working Group. 2016. The OpenCL Specification, version 2.2. https://www.khronos.org/registry/cl/specs/opencl-2.2.pdfGoogle ScholarGoogle Scholar
  17. S. Li, J. Meng, L. Yu, J. Ma, T. Chen, and M. Wu. 2015. Buffer Filter: A Last-Level Cache Management Policy for CPU-GPGPU Heterogeneous System. In 2015 IEEE 17th Int. Conf. on High Performance Computing and Communications, 2015 IEEE 7th Int. Symp. on Cyberspace Safety and Security, and 2015 IEEE 12th Int. Conf. on Embedded Software and Systems. 266--271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Lutz. 2016. Boost.Compute. http://www.boost.org/doc/libs/1_61_0/libs/compute/doc/html/index.htmlGoogle ScholarGoogle Scholar
  19. H. Mair, G. Gammie, A. Wang, S. Gururajarao, I. Lin, H. Chen, W. Kuo, A. Rajagopalan, W. Ge, R. Lagerquist, S. Rahman, C. J. Chung, S. Wang, L. Wong, Y. Zhuang, K. Li, J. Wang, M. Chau, Y. Liu, D. Dia, M. Peng, and U. Ko. 2015. 23.3 A highly integrated smartphone SoC featuring a 2.5GHz octa-core CPU with advanced high-performance and low-power techniques. In 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers. 1--3.Google ScholarGoogle Scholar
  20. S. Markidis, S. Wei Der Chien, E. Laure, I. Peng, and J. S. Vetter. 2018. NVIDIA Tensor Core Programmability, Performance & Precision. 522--531.Google ScholarGoogle Scholar
  21. T. J. McCabe. 1976. A Complexity Measure. IEEE Trans. Softw. Eng. 2, 4 (July 1976), 308--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Membarth, O. Reiche, F. Hannig, J. Teich, M. Körner, and W. Eckert. 2016. HIPAcc: A Domain-Specific Language and Compiler for Image Processing. IEEE Transactions on Parallel and Distributed Systems 27, 1 (Jan 2016), 210--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. NVIDIA. 2015. CUDA C Programming Guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdfGoogle ScholarGoogle Scholar
  24. OpenACC. 2017. The OpenACC Application Programming Interface. https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.6.final.pdfGoogle ScholarGoogle Scholar
  25. OpenMP Architecture Review Board. 2013. OpenMP Application Program Interface. http://www.openmp.org/wp-content/uploads/OpenMP4.0.0.pdfGoogle ScholarGoogle Scholar
  26. B. Peccerillo and S. Bartolini. 2018. PHAST - A portable high-level modern C+ + programming library for GPUs and multi-cores. IEEE Transactions on Parallel and Distributed Systems (2018), 1--1.Google ScholarGoogle Scholar
  27. B. Peccerillo, S. Bartolini, and Ç. K. Koç. 2017. Parallel bitsliced AES through PHAST: a single-source high-performance library for multi-cores and GPUs. Journal of Cryptographic Engineering (2017), 1--13.Google ScholarGoogle Scholar
  28. J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. SIGPLAN Not. 48, 6 (June 2013), 519--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Sakellariou and H. Zhao. 2004. A hybrid heuristic for DAG scheduling on heterogeneous systems. In 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. 111-.Google ScholarGoogle Scholar
  30. M. Steuwer, P. Kegel, and S. Gorlatch. 2011. SkelCL - A Portable Skeleton Library for High-Level GPU Programming. In Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum (IPDPSW '11). IEEE Computer Society, Washington, DC, USA, 1176--1182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Topcuoglu, S. Hariri, and Min-You Wu. 2002. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13, 3 (March 2002), 260--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wikipedia. 2018. Unsharp masking. https://en.wikipedia.org/wiki/Unsharp_maskingGoogle ScholarGoogle Scholar
  33. F. Zhang, J. Zhai, B. He, S. Zhang, and W. Chen. 2017. Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures. IEEE Transactions on Parallel and Distributed Systems 28, 3 (March 2017), 905--918. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Task-DAG Support in Single-Source PHAST Library: Enabling Flexible Assignment of Tasks to CPUs and GPUs in Heterogeneous Architectures

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PMAM'19: Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores
          February 2019
          105 pages
          ISBN:9781450362900
          DOI:10.1145/3303084
          • Editors:
          • Quan Chen,
          • Zhiyi Huang,
          • Min Si

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 February 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          PMAM'19 Paper Acceptance Rate10of17submissions,59%Overall Acceptance Rate53of97submissions,55%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader