Skip to main content
Log in

An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Graphic processing units (GPU) have become increasingly adopted for the enhancement of computing throughput. However, the development of a high-quality GPU application is challenging, due to the large optimization space and complex unpredictable effects of optimizations on GPU program performance. Many recent efforts have been employing empirical search-based auto-tuners to tackle the problem, but few of them have concentrated on the influence of program inputs on the optimizations. In this paper, based on a set of CUDA and OpenCL kernels, we report some evidences on the importance for auto-tuners to adapt to program input changes, and present a framework, G-ADAPT+, to address the influence by constructing cross-input predictive models for automatically predicting the (near-)optimal configurations for an arbitrary input to a GPU program. G-ADAPT+ is based on source-to-source compilers, specifically, Cetus and ROSE. It supports the optimizations of both CUDA and OpenCL programs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Arnold, M., Hind, M., Ryder, B.G.: Online feedback-directed optimization of Java. In: Proceedings of ACM Conference on Object-Oriented Programming Systems, Languages and Applications, pp. 111–129 (2002)

  2. Bartolini, S., Prete, C.A.: A proposal for input-sensitivity analysis of profile-driven optimizations on embedded applications. In: Proceedings of the 2003 workshop on Memory Performance: DEaling with Applications, systems and, architecture (2003)

  3. Baskaran, M.M., Bondhugula, U., Krishnamoorthy, S., et al.: A compiler framework for optimization of affine loop nests for GPGPUs. In: ICS’08: Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 225–234 (2008)

  4. Bilmes, J., Asanovic, K., Chin, C.W., Demmel, J.: Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. In: Proceedings of the ACM International Conference on Supercomputing, pp. 340–347 (1997)

  5. Cooper, K.D., Hall, M.W., Kennedy, K.: Procedure cloning. In: Computer Languages, pp. 96–105 (1992)

  6. Diniz, P., Rinard, M.: Dynamic feedback: An effective technique for adaptive computing. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 71–84, Las Vegas, May (1997)

  7. Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93(2), 216–231 (2005)

    Article  Google Scholar 

  8. Fujimoto, N.: Faster matrix-vector multiplication on GeForce 8800GTX. In: Proceedings of the Workshop on Large-Scale Parallel Processing (co-located with IPDPS), pp. 1–8 (2008)

  9. Fung, W., Sham, I., Yuan, G., Aamodt, T.: Dynamic warp formation and scheduling for efficient gpu control flow. In: MICRO ’07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 407–420. IEEE Computer Society, Washington, DC, USA (2007)

  10. Rudy, G., Khan, M., Hall, M., Chen, C., Jacqueline, C.: A programming language interface to describe transformations and code generation. In: Proceedings of LCPC, Lecture Notes in Computer Science (2010)

  11. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, Berlin (2001)

    Book  MATH  Google Scholar 

  12. Im, E.J., Yelick, Katherine, Vuduc, Richard: Sparsity: Optimization framework for sparse matrix kernels. Int. J. High Perform. Comput. Appl. 18(1), 135–158 (2004)

    Article  Google Scholar 

  13. Jiang, Y., Zhang, E., Tian, K., Mao, F., Geathers, M., Shen, X., Gao, Y.: Exploiting statistical correlations for proactive prediction of program behaviors. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), pp. 248–256 (2010)

  14. Kennedy, K., Broom, B., Chauhan, A., et al.: Telescoping languages: A system for automatic generation of domain languages. Proc. IEEE 93(2), 387–408 (2005)

    Article  Google Scholar 

  15. Lee, S., Johnson, T., Eigenmann, R.: Cetus—An extensible compiler infrastructure for source-to-source transformation. In: Proceedings of the 16th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC), pp. 539–553 (2003)

  16. Liu, Y., Zhang, E.Z., Shen, X.: A cross-input adaptive framework for gpu programs optimization. In: Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), pp. 1–10 (2009)

  17. Mao, F., Shen, X.: Cross-input learning and discriminative prediction in evolvable virtual machine. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), pp. 92–101 (2009)

  18. Marlet, R., Consel, C., Boinot, P.: Efficient incremental run-time specialization for free. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 281–292, Atlanta, GA, May (1999)

  19. Meng, J., Tarjan, D., Skadron, K.: Dynamic warp subdivision for integrated branch and memory divergence tolerance. In: ISCA (2010)

  20. Puschel, M., Moura, J.M.F., et al.: SPIRAL: Code generation for DSP transforms. Proc. IEEE 93(2), 232–275 (2005)

    Article  Google Scholar 

  21. Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S. Stratton, J.A., Hwu, W.W.: Program optimization space pruning for a multithreaded GPU. In CGO’08: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and, Optimization, pp. 195–204 (2008)

  22. Samadi, M., Hormati, A., Mehrara, M., Lee, J., Mahlke, S.: Adaptive input-aware compilation for graphics engines. In: Proceedings of ACM SIGPLAN Conference on Programming Languages Design and Implementation (2012)

  23. Schordan, M., Quinlan, D.: A source-to-source architecture for user-defined optimizations. In: Proceedings of the Joint Modular Languages Conference held in conjunction with EuroPar’03, (2003)

  24. Tarjan, D., Meng, J., Skadron, K.: Increasing memory miss tolerance for simd cores. In: SC (2009)

  25. Thomas, N., Tanase, G., Tkachyshyn, O., Perdue, J., Amato, N.M., Rauchwerger, L.: A framework for adaptive algorithm selection in STAPL. In: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 277–288 (2005)

  26. Tian, K., Jiang, Y., Zhang, E., Shen, X.: An input-centric paradigm for program dynamic optimizations. In: The Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) (2010)

  27. Tian, K., Zhang, E., Shen, X.: A step towards transparent integration of input-consciousness into dynamic program optimizations. In: The Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) (2011)

  28. Voss, M., Eigenmann, R.: High-level adaptive program optimization with ADAPT. In: Proceedings of ACM Symposium on Principles and Practice of Parallel Programming, pp. 93–102. Snowbird, Utah, June (2001)

  29. Whaley, R.C., Petitet, A., Dongarra, J.: Automated empirical optimizations of software and the ATLAS project. Parallel Comput. 27(1–2), 3–35 (2001)

    Article  MATH  Google Scholar 

  30. Zhang, E., Jiang, Y., Guo, Z., Tian, K., Shen, X.: On-the-fly elimination of dynamic irregularities for gpu computing. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (2011)

  31. Zhang, E.Z., Jiang, Y., Guo, Z., Shen, X.: Streamlining gpu applications on the fly. In: Proceedings of the ACM International Conference on Supercomputing (ICS), pp. 115–125 (2010)

Download references

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant No. 0720499 and 0811791 and CAREER 0954015, along with the Early Career Grant from the Department of Energy. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the Department of Energy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xipeng Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, X., Liu, Y., Zhang, E.Z. et al. An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations. Int J Parallel Prog 41, 855–869 (2013). https://doi.org/10.1007/s10766-012-0236-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-012-0236-3

Keywords

Navigation