An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations

Shen, Xipeng; Liu, Yixun; Zhang, Eddy Z.; Bhamidipati, Poornima

doi:10.1007/s10766-012-0236-3

An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations

Published: 08 December 2012

Volume 41, pages 855–869, (2013)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Xipeng Shen¹,
Yixun Liu²,
Eddy Z. Zhang³ &
…
Poornima Bhamidipati⁴

328 Accesses
3 Citations
Explore all metrics

Abstract

Graphic processing units (GPU) have become increasingly adopted for the enhancement of computing throughput. However, the development of a high-quality GPU application is challenging, due to the large optimization space and complex unpredictable effects of optimizations on GPU program performance. Many recent efforts have been employing empirical search-based auto-tuners to tackle the problem, but few of them have concentrated on the influence of program inputs on the optimizations. In this paper, based on a set of CUDA and OpenCL kernels, we report some evidences on the importance for auto-tuners to adapt to program input changes, and present a framework, G-ADAPT+, to address the influence by constructing cross-input predictive models for automatically predicting the (near-)optimal configurations for an arbitrary input to a GPU program. G-ADAPT+ is based on source-to-source compilers, specifically, Cetus and ROSE. It supports the optimizations of both CUDA and OpenCL programs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hybrid Machine Learning Model for Code Optimization

Article 22 September 2023

A New Optimization Model for MLP Hyperparameter Tuning: Modeling and Resolution by Real-Coded Genetic Algorithm

Article Open access 14 March 2024

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

References

Arnold, M., Hind, M., Ryder, B.G.: Online feedback-directed optimization of Java. In: Proceedings of ACM Conference on Object-Oriented Programming Systems, Languages and Applications, pp. 111–129 (2002)
Bartolini, S., Prete, C.A.: A proposal for input-sensitivity analysis of profile-driven optimizations on embedded applications. In: Proceedings of the 2003 workshop on Memory Performance: DEaling with Applications, systems and, architecture (2003)
Baskaran, M.M., Bondhugula, U., Krishnamoorthy, S., et al.: A compiler framework for optimization of affine loop nests for GPGPUs. In: ICS’08: Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 225–234 (2008)
Bilmes, J., Asanovic, K., Chin, C.W., Demmel, J.: Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. In: Proceedings of the ACM International Conference on Supercomputing, pp. 340–347 (1997)
Cooper, K.D., Hall, M.W., Kennedy, K.: Procedure cloning. In: Computer Languages, pp. 96–105 (1992)
Diniz, P., Rinard, M.: Dynamic feedback: An effective technique for adaptive computing. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 71–84, Las Vegas, May (1997)
Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93(2), 216–231 (2005)
Article Google Scholar
Fujimoto, N.: Faster matrix-vector multiplication on GeForce 8800GTX. In: Proceedings of the Workshop on Large-Scale Parallel Processing (co-located with IPDPS), pp. 1–8 (2008)
Fung, W., Sham, I., Yuan, G., Aamodt, T.: Dynamic warp formation and scheduling for efficient gpu control flow. In: MICRO ’07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 407–420. IEEE Computer Society, Washington, DC, USA (2007)
Rudy, G., Khan, M., Hall, M., Chen, C., Jacqueline, C.: A programming language interface to describe transformations and code generation. In: Proceedings of LCPC, Lecture Notes in Computer Science (2010)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, Berlin (2001)
Book MATH Google Scholar
Im, E.J., Yelick, Katherine, Vuduc, Richard: Sparsity: Optimization framework for sparse matrix kernels. Int. J. High Perform. Comput. Appl. 18(1), 135–158 (2004)
Article Google Scholar
Jiang, Y., Zhang, E., Tian, K., Mao, F., Geathers, M., Shen, X., Gao, Y.: Exploiting statistical correlations for proactive prediction of program behaviors. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), pp. 248–256 (2010)
Kennedy, K., Broom, B., Chauhan, A., et al.: Telescoping languages: A system for automatic generation of domain languages. Proc. IEEE 93(2), 387–408 (2005)
Article Google Scholar
Lee, S., Johnson, T., Eigenmann, R.: Cetus—An extensible compiler infrastructure for source-to-source transformation. In: Proceedings of the 16th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC), pp. 539–553 (2003)
Liu, Y., Zhang, E.Z., Shen, X.: A cross-input adaptive framework for gpu programs optimization. In: Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), pp. 1–10 (2009)
Mao, F., Shen, X.: Cross-input learning and discriminative prediction in evolvable virtual machine. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), pp. 92–101 (2009)
Marlet, R., Consel, C., Boinot, P.: Efficient incremental run-time specialization for free. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 281–292, Atlanta, GA, May (1999)
Meng, J., Tarjan, D., Skadron, K.: Dynamic warp subdivision for integrated branch and memory divergence tolerance. In: ISCA (2010)
Puschel, M., Moura, J.M.F., et al.: SPIRAL: Code generation for DSP transforms. Proc. IEEE 93(2), 232–275 (2005)
Article Google Scholar
Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S. Stratton, J.A., Hwu, W.W.: Program optimization space pruning for a multithreaded GPU. In CGO’08: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and, Optimization, pp. 195–204 (2008)
Samadi, M., Hormati, A., Mehrara, M., Lee, J., Mahlke, S.: Adaptive input-aware compilation for graphics engines. In: Proceedings of ACM SIGPLAN Conference on Programming Languages Design and Implementation (2012)
Schordan, M., Quinlan, D.: A source-to-source architecture for user-defined optimizations. In: Proceedings of the Joint Modular Languages Conference held in conjunction with EuroPar’03, (2003)
Tarjan, D., Meng, J., Skadron, K.: Increasing memory miss tolerance for simd cores. In: SC (2009)
Thomas, N., Tanase, G., Tkachyshyn, O., Perdue, J., Amato, N.M., Rauchwerger, L.: A framework for adaptive algorithm selection in STAPL. In: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 277–288 (2005)
Tian, K., Jiang, Y., Zhang, E., Shen, X.: An input-centric paradigm for program dynamic optimizations. In: The Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) (2010)
Tian, K., Zhang, E., Shen, X.: A step towards transparent integration of input-consciousness into dynamic program optimizations. In: The Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) (2011)
Voss, M., Eigenmann, R.: High-level adaptive program optimization with ADAPT. In: Proceedings of ACM Symposium on Principles and Practice of Parallel Programming, pp. 93–102. Snowbird, Utah, June (2001)
Whaley, R.C., Petitet, A., Dongarra, J.: Automated empirical optimizations of software and the ATLAS project. Parallel Comput. 27(1–2), 3–35 (2001)
Article MATH Google Scholar
Zhang, E., Jiang, Y., Guo, Z., Tian, K., Shen, X.: On-the-fly elimination of dynamic irregularities for gpu computing. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (2011)
Zhang, E.Z., Jiang, Y., Guo, Z., Shen, X.: Streamlining gpu applications on the fly. In: Proceedings of the ACM International Conference on Supercomputing (ICS), pp. 115–125 (2010)

Download references

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant No. 0720499 and 0811791 and CAREER 0954015, along with the Early Career Grant from the Department of Energy. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the Department of Energy.

Author information

Authors and Affiliations

Computer Science Department, College of William and Mary, Williamsburg, VA, USA
Xipeng Shen
Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, 10 Center Drive, Building 10, Rm 1C515, Bethesda, MD, 20892-1182, USA
Yixun Liu
Department of Computer Science, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
Eddy Z. Zhang
Capital One, Williamsburg, VA, 23185, USA
Poornima Bhamidipati

Authors

Xipeng Shen
View author publications
You can also search for this author in PubMed Google Scholar
Yixun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Eddy Z. Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Poornima Bhamidipati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xipeng Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, X., Liu, Y., Zhang, E.Z. et al. An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations. Int J Parallel Prog 41, 855–869 (2013). https://doi.org/10.1007/s10766-012-0236-3

Download citation

Received: 30 December 2011
Accepted: 23 November 2012
Published: 08 December 2012
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10766-012-0236-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations

Abstract

Access this article

Similar content being viewed by others

A Hybrid Machine Learning Model for Code Optimization

A New Optimization Model for MLP Hyperparameter Tuning: Modeling and Resolution by Real-Coded Genetic Algorithm

Can GPU performance increase faster than the code error rate?

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations

Abstract

Access this article

Similar content being viewed by others

A Hybrid Machine Learning Model for Code Optimization

A New Optimization Model for MLP Hyperparameter Tuning: Modeling and Resolution by Real-Coded Genetic Algorithm

Can GPU performance increase faster than the code error rate?

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation