Skip to main content

Feedback-Based Global Instruction Scheduling for GPGPU Applications

  • Conference paper
  • 2026 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7333))

Abstract

In the face of the memory wall even in high bandwidth systems such as GPUs, an efficient handling of memory accesses and memory-related instructions is mandatory. Up to now, memory performance considerations were only made for GPGPU applications at source code level. This is not enough when optimizing an application towards high performance: The code has to be optimized at assembly level as well. Due to the spreading of GPGPU-capable hardware in smaller and smaller devices, the energy consumption of a program is – besides the performance – an important optimization goal.

In this paper, a novel compiler optimization technique, called FALIS (Feedback-based and memory-Aware gLobal Instruction Scheduling), is presented based on global instruction scheduling and multi-objective genetic algorithms. The approach uses a profiling-based feedback in order to take the measured performance and energy consumption values inside a compiler into account. Profiling on the real hardware platform is important in order to consider the characteristics of the underlying hardware. FALIS increases runtime performance of a GPGPU application by up to 13.02% and decreases energy consumption by up to 10.23%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banerjia, S., Havanki, W.A., Conte, T.M.: Treegion Scheduling for Highly Parallel Processors. In: Lengauer, C., Griebl, M., Gorlatch, S. (eds.) Euro-Par 1997. LNCS, vol. 1300, pp. 1074–1078. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  2. De Bosschere, K., Luk, W., Martorell, X., Navarro, N., O’Boyle, M., Pnevmatikatos, D., Ramírez, A., Sainrat, P., Seznec, A., Stenström, P., Temam, O.: High-Performance Embedded Architecture and Compilation Roadmap. In: Stenström, P. (ed.) Transactions on High-Performance Embedded Architectures and Compilers I. LNCS, vol. 4050, pp. 5–29. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  3. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: A Benchmark Suite for Heterogeneous Computing. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54 (2009)

    Google Scholar 

  4. Cho, S., Melhem, R.: Corollaries to Amdahl’s Law for Energy. IEEE Computer Architecture Letters, 25–28 (2008)

    Google Scholar 

  5. Dominguez, R., Kaeli, D.R.: Improving the open64 backend for GPUs. Poster at Google Summer School (2009)

    Google Scholar 

  6. Görlich, M.: Untersuchung und Verbesserung der Speicherzugriffsverteilung in GPGPU-Programmen unter Nutzung von lokalen Schedulingmethoden. Master’s thesis, Embedded System Group, Faculty of Computer Science, TU Dortmund (2011)

    Google Scholar 

  7. Han, T.D., Abdelrahman, T.S.: Reducing branch Divergence in GPU Programs. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, pp. 1–8 (2011)

    Google Scholar 

  8. Hong, S., Kim, H.: An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness. In: Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA), pp. 152–163 (2009)

    Google Scholar 

  9. Kerns, D.R., Eggers, S.J.: Balanced Scheduling: Instruction Scheduling When Memory Latency is Uncertain. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 278–289 (1993)

    Google Scholar 

  10. Kerr, A., Campbell, D., Richards, M.: GPU VSIPL: High-Performance VSIPL Implementation for GPUs. In: Proceedings of the 12th High Performance Embedded Computing Workshop (HPEC), Lexington, Massachusetts, USA (2008)

    Google Scholar 

  11. Kung, S.Y., Kailath, T., Whitehouse, H.J.: VLSI and Modern Signal Processing. Prentice Hall Professional Technical Reference (1984)

    Google Scholar 

  12. Leupers, R.: Instruction Scheduling for Clustered VLIW DSPs. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT), pp. 291–300 (2000)

    Google Scholar 

  13. Machanick, P.: Approaches to Addressing the Memory Wall. Technical report, School of IT and Electrical Engineering, University of Queensland (2002)

    Google Scholar 

  14. NVIDIA Corporation: CUDA Architecture (2009)

    Google Scholar 

  15. NVIDIA Corporation: The CUDA Compiler Driver NVCC (2009)

    Google Scholar 

  16. Open64 Project at Rice University: Open64 Compiler: Whirl Intermediate Representation (2007), www.mcs.anl.gov/OpenAD/open64A.pdf

  17. Owens, J., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A., Purcell, T.: A Survey of General-Purpose Computation on Graphics Hardware. Computer Graphics Forum, 80–113 (2007)

    Google Scholar 

  18. Risco-Martin, J.: Java Evolutionary COmputation library (JECO) (2012), https://sourceforge.net/projects/jeco

  19. Rofouei, M., Stathopoulos, T., Ryffel, S., Kaiser, W., Sarrafzadeh, M.: Energy-Aware High Performance Computing with Graphic Processing Units. In: Proceedings of the Workshop on Power Aware Computing and Systems, HotPower (2008)

    Google Scholar 

  20. Timm, C., Gelenberg, A., Marwedel, P., Weichert, F.: Energy Considerations within the Integration of General Purpose GPUs in Embedded Systems. In: Proceedigns of the Annual Internation Conference on Advances in Distributed and Parallel Computing, ADPC (2010)

    Google Scholar 

  21. Timm, C., Weichert, F., Marwedel, P., Müller, H.: Multi-Objective Local Instruction Scheduling for GPGPU Applications. In: Proceedings of the International Conference on Parallel and Distributed Computing Systems, PDCS (2011)

    Google Scholar 

  22. Tseng, C.J., Siewiorek, D.: Automated Synthesis of Data Paths in Digital Systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 379–395 (1986)

    Google Scholar 

  23. Valluri, M., John, L.: Is Compiling for Performance == Compiling for Power? In: Proceedings oh the Workshop on Interaction between Compilers and Computer Architectures, INTERACT (2001)

    Google Scholar 

  24. Voorneveld, M.: Characterization of Pareto Dominance. Operations Research Letters, 7–11 (2003)

    Google Scholar 

  25. Wang, Z., Hu, X.S.: Energy-Aware Variable Partitioning and Instruction Scheduling for Multibank Memory Architectures. ACM Transactions on Design Automation of Electronic Systems (TODAES), 369–388 (2005)

    Google Scholar 

  26. Woo, D.H., Lee, H.H.: Extending Amdahl’s Law for Energy-Efficient Computing in the Many-Core Era. IEEE Computer, 24–31 (2008)

    Google Scholar 

  27. Zitzler, E., Giannakoglou, K., Tsahalis, D., Periaux, J., Papailiou, K., Fogarty, T., Ler, E.Z., Laumanns, M., Thiele, L.: SPEA2: Improving the Strength Pareto Evolutionary Algorithm For Multiobjective Optimization. In: Proceedings of the International Conference on Evolutionary and Deterministic Methods for Design, Optimization and Control with Applications to Industrial and Societal Problems, EUROGEN (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Timm, C., Görlich, M., Weichert, F., Marwedel, P., Müller, H. (2012). Feedback-Based Global Instruction Scheduling for GPGPU Applications. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2012. ICCSA 2012. Lecture Notes in Computer Science, vol 7333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31125-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31125-3_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31124-6

  • Online ISBN: 978-3-642-31125-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics