skip to main content
10.1145/3585341.3585374acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiwoclConference Proceedingsconference-collections
abstract

Towards a SYCL API for Approximate Computing

Published:18 April 2023Publication History

ABSTRACT

Approximate computing is a well-known method [7] to achieve higher performance or lower energy consumption while accepting a loss of output accuracy. Many applications such as image processing and neural networks, are tolerant of a certain amount of error, and have the potential for significant improvements in terms of execution time and energy consumption. The most advanced software approximation techniques are mixed precision, which uses a lower precision data representation for both integer and floating point variables [1, 4]; perforation, which skips instruction blocks in a program, iterations in a loop, or data in buffers assuming that nearby data have similar values [2, 5, 6, 8]; and relaxed synchronization which removes synchronization points that represent one of the major bottleneck in parallel applications [3, 9]. These approximate approaches differ in performance achieved and also in error produced. Usually, perforation and synchronization elision have higher performance compared with mixed precision but produce more errors. In particular, synchronization elision introduces non-deterministic errors that are complex to handle.

Support for approximate computing is provided by the SYCL heterogeneous programming model often used for developing portable HPC applications. SYCL supports approximate computing by providing a set of built-in functions and data types that can be used to perform approximate operations, such as half-floating-point reductions and bit-level operations.

In this technical talk, we present SYprox, a SYCL-based API supporting a broad set of approximation techniques in modern C++. SYprox introduces a set of semantics that extend SYCL’s buffers and accessors to provide a high-level easy-to-use programming API. It supports data perforation and elision patterns for efficient approximation, as well as signal reconstruction algorithms for error mitigation.

Figure 1 (a) depicts the accurate execution of an application while Figure 1 (b) shows the approximation process: an input buffer is perforated according to the chosen schema, and the perforated data can be approximated before or after computation using respectively input or output reconstruction.

The code snippet below illustrates the accurate version of a SYCL program and our proposed approximate approach using SYprox:

Figure 2 shows a visual representation of the schemes on 1D and 2D buffers. Gray components are perforated, whereas blue-colored elements are computed. Schemes (a) and (b) can be applied to 2D buffers and respectively calculate a row and column of results. Also, scheme (c) is applicable to 2D buffers and perforates data following a checkerboard layout. Finally, schema (d) works on 1D buffers and perforates data according to a user-defined skip factor.

As applying perforation strategies introduce errors in the final output, the developed library also provides two types of reconstruction techniques to mitigate applications error: output and input reconstruction. Output reconstruction approximates perforated data with an interpolation of the output. Differently, input reconstruction approximates perforated data before computation. In this case, the selected perforation schema defines which data will not be loaded in local memory, while the skipped data will be approximated directly in local memory using interpolation. This approach mixes local memory optimization with perforation, decreasing the number of global memory accesses that represent a bottleneck in GPUs application.

Loading data in local memory requires a synchronization point to ensure that all threads in a block have the same view of the local memory. To decrease the time lost during synchronization, SYprox provides a synchronization elision mechanism that defines a way to handle the number of synchronization points.

Both input and output reconstructions are based on data interpolation. Figure 3 shows the data reconstruction using three different types of interpolation. For basic interpolation (b) it is necessary that elements to be reconstructed have adjacent elements on both sides. In stencil interpolation (c) we need adjacent elements on all four direction (top, down, left, right). When this requirement is not respected we employ nearest-neighbor interpolation (a) which approximates data with the nearest element.

Since the effectiveness of the reconstruction techniques depends on the perforation strategy adopted and the input data distribution, SYprox also provides a simple way to implement an ad-hoc perforation strategy that best fits the characteristics of the given input. In this talk, we show a preliminary performance and error evaluation comparing the base implementation of 3 applications with the approximated version. Performance-wise, all applications have a speedup higher than 2x compared to the accurate version. On the other hand, results show that the error introduced by the approximation is highly dependent on how the perforation strategy and reconstruction technique are combined. Despite this, there is an error of less than 10% for all applications.

References

  1. Nhut-Minh Ho, Himeshi De silva, and Weng-Fai Wong. 2021. GRAM: A framework for dynamically mixing precisions in GPU applications. ACM Transactions on Architecture and Code Optimization (TACO) 18, 2 (2021), 1–24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Henry Hoffmann, Sasa Misailovic, Stelios Sidiroglou, Anant Agarwal, and Martin Rinard. 2009. Using code perforation to improve performance, reduce energy consumption, and respond to failures. (2009).Google ScholarGoogle Scholar
  3. Bashima Islam, Faysal Hossain Shezan, and Rifat Shahriyar. 2016. High Performance Approximate Computing by Adaptive Relaxed Synchronization. In 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 1204–1210.Google ScholarGoogle Scholar
  4. Ignacio Laguna, Paul C Wood, Ranvijay Singh, and Saurabh Bagchi. 2019. Gpumixer: Performance-driven floating-point tuning for gpu scientific applications. In High Performance Computing: 34th International Conference, ISC High Performance 2019, Frankfurt/Main, Germany, June 16–20, 2019, Proceedings 34. Springer, 227–246.Google ScholarGoogle ScholarCross RefCross Ref
  5. Shikai Li, Sunghyun Park, and Scott Mahlke. 2018. Sculptor: Flexible approximation with selective dynamic loop perforation. In Proceedings of the 2018 International Conference on Supercomputing. 341–351.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Daniel Maier and Ben Juurlink. 2022. Model-Based Loop Perforation. In Euro-Par 2021: Parallel Processing Workshops: Euro-Par 2021 International Workshops, Lisbon, Portugal, August 30-31, 2021, Revised Selected Papers. Springer, 549–554.Google ScholarGoogle Scholar
  7. Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Computing Surveys (CSUR) 48, 4 (2016), 1–33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Konstantinos Parasyris, Giorgis Georgakoudis, Harshitha Menon, James Diffenderfer, Ignacio Laguna, Daniel Osei-Kuffuor, and Markus Schordan. 2021. HPAC: evaluating approximate computing techniques on HPC OpenMP applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lakshminarayanan Renganarayana, Vijayalakshmi Srinivasan, Ravi Nair, and Daniel Prener. 2012. Programming with relaxed synchronization. In Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability. 41–50.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards a SYCL API for Approximate Computing

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL
            April 2023
            133 pages
            ISBN:9798400707452
            DOI:10.1145/3585341

            Copyright © 2023 Owner/Author

            Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 18 April 2023

            Check for updates

            Qualifiers

            • abstract
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate84of152submissions,55%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format