The Design of NoC-Side Memory Access Scheduling for Energy-Efficient GPGPUs

Liu, Wenjie; Ma, Sheng; Huang, Libo; Wang, Zhiying

doi:10.1007/s10766-017-0521-2

The Design of NoC-Side Memory Access Scheduling for Energy-Efficient GPGPUs

Published: 09 October 2017

Volume 46, pages 722–735, (2018)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Wenjie Liu¹,
Sheng Ma¹,
Libo Huang¹ &
…
Zhiying Wang¹

209 Accesses
Explore all metrics

Abstract

Memory access scheduling schemes, often performed in memory controllers, have a marked impact on alleviating the heavy burden placed on memory systems of GPGPUs. Existing out-of-order scheduling schemes, like FR-FCFS, improve memory access efficiency by reordering memory request sequences at the destination. Their effectiveness, however, is at the expense of complex logics and high power consumption. In this paper, we propose a NoC-side memory access scheduling based on the key insight that the transmission of on-chip networks is the dominating factor in destroying the row access locality and causing poor memory access efficiency. With appropriate NoC-side optimization, the straight-forward in-order scheduling can be used in memory controllers to simplify scheduling logics and alleviate the tight power envelope. Moreover, we introduce several light-weight optimizations to further improve the system performance. Experimental results on memory-intensive applications show that, comparing with FR-FCFS, our proposed scheme increases the overall system performance by 10.5%, reduces the power consumption by 20% and improves the energy efficiency by 36.9%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A memory scheduling strategy for eliminating memory access interference in heterogeneous system

Article Open access 10 January 2020

Designing Coalescing Network-on-Chip for Efficient Memory Accesses of GPGPUs

Aggressive GPU cache bypassing with monolithic 3D-based NoC

Article 21 October 2022

References

Bakhoda, A., Kim, J., Aamodt, T.M.: Throughput-effective on-chip networks for manycore accelerators. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 421–432. IEEE Computer Society (2010)
Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 163–174. IEEE (2009)
Bourduas, S., Zilic, Z.: A hybrid ring/mesh interconnect for network-on-chip using hierarchical rings for global routing. In: Proceedings of the First International Symposium on Networks-on-Chip, pp. 195–204. IEEE Computer Society (2007)
Chen, C.T., Huang, Y.S.C., Chang, Y.Y., Tu, C.Y., King, C.T., Wang, T.Y., Sang, J., Li, M.H.: Designing Coalescing Network-on-Chip for Efficient Memory Accesses of GPGPUs, pp. 169–180. Springer, Berlin (2014)
Google Scholar
Dally, W., Towles, B.: Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., Burlington (2003)
Google Scholar
Dally, W.J., Towles, B.: Route packets, not wires: on-chip interconnection networks. In: Proceedings of the 38th Design Automation Conference, pp. 684–689. ACM (2001)
Jang, H., Kim, J., Gratz, P., Yum, K.H., Kim, E.J.: Bandwidth-efficient on-chip interconnect designs for GPGPUs. In: Proceedings of the 52nd Annual Design Automation Conference, pp. 9:1–9:6. ACM (2015)
Jerger, N.E., Peh, L.S.: On-chip networks. Synthesis Lectures on Computer Architecture, p. 141. Morgan & Claypool Publishers (2009). doi:10.2200/S00209ED1V01Y200907CAC008.
Kim, H., Kim, J., Seo, W., Cho, Y., Ryu, S.: Providing cost-effective on-chip network bandwidth in GPGPUs. In: 2012 IEEE 30th International Conference on Computer Design (ICCD), pp. 407–412. IEEE Computer Society (2012)
Kim, Y., Lee, H., Kim, J.: An alternative memory access scheduling in manycore accelerators. In: 2011 International Conference on Parallel Architectures and Compilation Techniques, pp. 195–196. IEEE Computer Society (2011)
Lee, J., Li, S., Kim, H., Yalamanchili, S.: Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures. ACM Trans. Des. Autom. Electron. Syst. 18(4), 48:1–48:28 (2013)
Google Scholar
Leng, J., Hetherington, T., ElTantawy, A., Gilani, S., Kim, N.S., Aamodt, T.M., Reddi, V.J.: GPUwattch: enabling energy optimizations in GPGPUs. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pp. 487–498. ACM (2013)
Ma, S., Enright Jerger, N., Wang, Z.: DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, pp. 413–424. ACM (2011)
Mutlu, O., Moscibroda, T.: Stall-time fair memory access scheduling for chip multiprocessors. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 146–160. IEEE Computer Society (2007)
Mutlu, O., Moscibroda, T.: Parallelism-aware batch scheduling: enhancing both performance and fairness of shared dram systems. In: Proceedings of the 35th Annual International Symposium on Computer Architecture, pp. 63–74. IEEE Computer Society (2008)
Nesbit, K.J., Aggarwal, N., Laudon, J., Smith, J.E.: Fair queuing memory systems. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 208–222. IEEE Computer Society (2006)
Rafique, N., Lim, W.T., Thottethodi, M.: Effective management of dram bandwidth in multicore processors. In: 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pp. 245–258. IEEE Computer Society (2007)
Rixner, S., Dally, W.J., Kapasi, U.J., Mattson, P., Owens, J.D.: Memory access scheduling. In: Proceedings of the 27th Annual International Symposium on Computer Architecture, pp. 128–138. ACM (2000)
Stratton, J.A., Rodrigues, C., Sung, I.J., Obeid, N., Chang, L.W., Anssari, N., Geng, D., Liu, W.M., Hwu, W.: Parboil: a revised benchmark suite for scientific and commercial throughput computing. IMPACT Technical Report (2012)
Yuan, G.L., Bakhoda, A., Aamodt, T.M.: Complexity effective memory access scheduling for many-core accelerator architectures. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 34–44. ACM (2009)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant No.61572508, 61672526 and 61472435.

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, China
Wenjie Liu, Sheng Ma, Libo Huang & Zhiying Wang

Authors

Wenjie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Ma
View author publications
You can also search for this author in PubMed Google Scholar
Libo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiying Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheng Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, W., Ma, S., Huang, L. et al. The Design of NoC-Side Memory Access Scheduling for Energy-Efficient GPGPUs. Int J Parallel Prog 46, 722–735 (2018). https://doi.org/10.1007/s10766-017-0521-2

Download citation

Received: 27 August 2017
Accepted: 18 September 2017
Published: 09 October 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s10766-017-0521-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Design of NoC-Side Memory Access Scheduling for Energy-Efficient GPGPUs

Abstract

Access this article

Similar content being viewed by others

A memory scheduling strategy for eliminating memory access interference in heterogeneous system

Designing Coalescing Network-on-Chip for Efficient Memory Accesses of GPGPUs

Aggressive GPU cache bypassing with monolithic 3D-based NoC

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Design of NoC-Side Memory Access Scheduling for Energy-Efficient GPGPUs

Abstract

Access this article

Similar content being viewed by others

A memory scheduling strategy for eliminating memory access interference in heterogeneous system

Designing Coalescing Network-on-Chip for Efficient Memory Accesses of GPGPUs

Aggressive GPU cache bypassing with monolithic 3D-based NoC

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation