ABSTRACT
Although approximate computing is widely used, it requires substantial programming effort to find appropriate approximation patterns among multiple pre-defined patterns to achieve a high performance. Therefore, we propose an automatic approximation framework called GATE to uncover hidden opportunities from any data-parallel program regardless of the code pattern or application characteristics using two compiler techniques, namely subgraph-level approximation (SGLA) and approximate thread merge(ATM). GATE also features conservative/aggressive tuning and dynamic calibration to maximize the performance while maintaining the TOQ level during runtime. Our framework achieves an average performance gain of 2.54x over the baseline with minimum accuracy loss.
- A. Anant, M. C. Rinard, S. Sidiroglou, S. Misailovic, and H. Hoffmann. Using code perforation to improve performance, reduce energy consumption, and respond to failures. 2009.Google Scholar
- W. Baek and T. M. Chilimbi. Green: a framework for supporting energy-conscious programming using controlled approximation. In Proc. of the '10 Conference on Programming Language Design and Implementation, pages 198--209, 2010. Google ScholarDigital Library
- S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In Proc. of the IEEE Symposium on Workload Characterization, pages 44--54, 2009. Google ScholarDigital Library
- J. Kessenich, B. Ouriel, and R. Krisch. Spir-v specification provisional (version 1.1, revision 4), 2016.Google Scholar
- KHRONOS Group. OpenCL - the open standard for parallel programming of heterogeneous systems, 2010. http://www.khronos.org.Google Scholar
- C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proc. of the 2004 International Symposium on Code Generation and Optimization, pages 75--86, 2004. Google ScholarDigital Library
- M. A. Laurenzano, P. Hill, M. Samadi, S. Mahlke, J. Mars, and L. Tang. Input responsiveness: Using canary inputs to dynamically steer approximation. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '16, pages 161--176, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
- S. Mittal. A survey of techniques for approximate computing. ACM Comput. Surv., 48(4):62:1--62:33, Mar. 2016. Google ScholarDigital Library
- J. Nickolls et al. NVIDIA CUDA software and GPU parallel computing architecture. In Microprocessor Forum, May 2007.Google Scholar
- G. Nvidia. computing sdk. GPU computing SDK," https://developer.nvidia.com/gpu-computing-sdk, 22(07):2013, 2013.Google Scholar
- Y. Park, S. Seo, H. Park, H. K. Cho, and S. Mahlke. Simd defragmenter: Efficient ilp realization on data-parallel architectures. In ACM SIGARCH Computer Architecture News, volume 40, pages 363--374. ACM, 2012. Google ScholarDigital Library
- Polybench. the polyhedral benchmark suite, 2011. http://www.cse.ohio-state.edu/pouchet/software/polybench.Google Scholar
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211--252, 2015. Google ScholarDigital Library
- M. Samadi, D. A. Jamshidi, J. Lee, and S. Mahlke. Paraprox: Pattern-based approximation for data parallel applications. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 35--50, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- M. Samadi, J. Lee, D. A. Jamshidi, A. Hormati, and S. Mahlke. Sage: Self-tuning approximation for graphics engines. In 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 13--24, Dec 2013. Google ScholarDigital Library
- GATE: A Generalized Dataflow-level Approximation Tuning Engine For Data Parallel Architectures
Recommendations
Coincidental extension of scattered context languages
All the results given in the paper hold true. In the proof of Theorem 1, change steps IV, V, and VI to
IV. for every $a,b,c \in T$ , add $(\langle a\rangle,\langle b \rangle,\langle c \rangle,\$) \rightarrow (\langle 0a \rangle,\langle 0b \rangle,\langle 0c \rangle,\S)$ to P ;
V. for every $a,b,c,d \in T$ , add $(Y, \langle 0a \rangle, Y, \langle 0b \rangle, Y, \langle 0c \rangle, \S) \rightarrow \# , \langle 0a \rangle, X, \langle 0b \rangle, Y, \langle 0c \rangle, \S)$ , $(\langle 0a\rangle, \langle 0b \rangle, \langle 0c \rangle, \S) \rightarrow (\langle 4a \rangle, \langle 1b \rangle, \langle 2c \rangle, \S)$ , $(\langle 4a \rangle, X, \langle 1b \rangle, Y, \langle 2c \rangle, \S) \rightarrow (\langle 4a \rangle, \# , \langle 1b \rangle, X, \langle 2c \rangle, \S)$ , $(\langle 4a \rangle, \langle 1b \rangle, \langle 2c \rangle, \langle d \rangle, \S \rightarrow (a, \langle 4b \rangle, \langle 1c \rangle, \langle 2d \rangle, \S)$ , $(\langle 4a \rangle, \langle 1b \rangle, \langle 2c \rangle, \S) \rightarrow (a, \langle 1b \rangle, \langle 3c \rangle, \S)$ , $(\langle 1a \rangle, X, \langle 3b \rangle, Y, \S) \rightarrow (\langle 1a \rangle, \# , \langle 3b \rangle, \# , \S)$ to P ;
VI. for every $a, b \in T$ , add $(\langle 1a \rangle, X, \langle 3b \rangle, \S) \rightarrow (a, \# , b, \# )$ to P .
Some Multiple Power Series with Zero-One Coefficients
The paper is concerned with sums of the type \[S_{n,j} = \sum {x_1^{a_1 } x_2^{a_2 } \cdots x_n^{a_n } } \quad (n > 1),\] where the summation is over either \[( * )\qquad ja_i \leqq a_1 + a_2 + \cdots + a_n \quad (1 \leqq j \leqq n;1 \leqq i \leqq ...
Comments