A perceptual and predictive batch-processing memory scheduling strategy for a CPU-GPU heterogeneous system

Fang, Juan; Lin, Sheng; Yang, Huijing; Xu, Yixiang; Su, Xing

doi:10.1631/FITEE.2200449

A perceptual and predictive batch-processing memory scheduling strategy for a CPU-GPU heterogeneous system

CPU-GPU异构系统感知和预测的批处理内存调度策略

Research Article
Published: 28 July 2023

Volume 24, pages 994–1006, (2023)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Juan Fang (方娟) ORCID: orcid.org/0000-0002-4542-8727¹,
Sheng Lin (林胜)¹,
Huijing Yang (杨会静)¹,
Yixiang Xu (徐艺翔)¹ &
…
Xing Su (苏醒)¹

164 Accesses
Explore all metrics

Abstract

When multiple central processing unit (CPU) cores and integrated graphics processing units (GPUs) share off-chip main memory, CPU and GPU applications compete for the critical memory resource. This causes serious resource competition and has a negative impact on the overall performance of the system. We describe the competition for shared-memory resources in a CPU-GPU heterogeneous multi-core architecture, and a shared-memory request scheduling strategy based on perceptual and predictive batch-processing is proposed. By sensing the CPU and GPU memory request conditions in the request buffer, the proposed scheduling strategy estimates the GPU latency tolerance and reduces mutual interference between CPU and GPU by processing CPU or GPU memory requests in batches. According to the simulation results, the scheduling strategy improves CPU performance by 8.53% and reduces mutual interference by 10.38% with low hardware complexity.

摘要

当多个处理器(CPU)核心和集成图形处理器(GPU)共享片外主存时, CPU和GPU应用程序会竞争关键内存资源, 导致严重的资源竞争, 并对系统整体性能产生负面影响. 本文描述了CPU-GPU异构多核架构下共享内存资源的竞争情况, 提出一种基于感知和预测的批处理共享内存请求调度策略. 该策略通过感知请求缓冲区中CPU和GPU内存请求情况, 估计GPU延迟容忍度, 并通过批量处理CPU或GPU内存请求减少CPU和GPU之间的相互干扰. 实验结果表明, CPU性能提升8.53%, 相互干扰降低10.38%, 该调度策略具有较低硬件复杂度.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing non-coalesced memory access for irregular applications with GPU computing

Article 17 September 2020

GPU Cache System for COMPSs: A Task-Based Distributed Computing Framework

Criticality-aware priority to accelerate GPU memory access

Article 06 July 2022

Data availability

The data that support the findings of this study are openly available in PARSEC3.0 at https://parsec.cs.princeton.edu/parsec3-doc.htm.

References

Ausavarungnirun R, Chang KKW, Subramanian L, et al., 2012. Staged memory scheduling: achieving high performance and scalability in heterogeneous systems. Proc 39^th Annual Int Symp on Computer Architecture, p.416–427. https://doi.org/10.1109/ISCA.2012.6237036
Binkert N, Beckmann B, Black G, et al., 2011. The gem5 simulator. ACM SIGARCH Comput Archit News, 39(2):1–7. https://doi.org/10.1145/2024716.2024718
Article Google Scholar
Bitalebi H, Safaei F, 2023. Criticality-aware priority to accelerate GPU memory access. J Supercomput, 79(1):188–213. https://doi.org/10.1007/s11227-022-04657-3
Article Google Scholar
Bouvier D, Cohen B, Fry W, et al., 2014. Kabini: an AMD accelerated processing unit system on a chip. IEEE Micro, 34(2):22–33. https://doi.org/10.1109/MM.2014.3
Article Google Scholar
Chen W, Ray S, Bhadra J, et al., 2017. Challenges and trends in modern SoC design verification. IEEE Des Test, 34(5):7–22. https://doi.org/10.1109/MDAT.2017.2735383
Article Google Scholar
di Sanzo P, Pellegrini A, Sannicandro M, et al., 2020. Adaptive model-based scheduling in software transactional memory. IEEE Trans Comput, 69(5):621–632. https://doi.org/10.1109/TC.2019.2954139
Article MATH Google Scholar
Fang J, Yu L, Liu ST, et al., 2015. KL_GA: an application mapping algorithm for mesh-of-tree (MoT) architecture in network-on-chip design. J Supercomput, 71(11):4056–4071. https://doi.org/10.1007/s11227-015-1504-y
Article Google Scholar
Fang J, Wang MX, Wei ZL, 2020. A memory scheduling strategy for eliminating memory access interference in heterogeneous system. J Supercomput, 76(4):3129–3154. https://doi.org/10.1007/s11227-019-03135-7
Article Google Scholar
Hazarika A, Poddar S, Rahaman H, 2020. Survey on memory management techniques in heterogeneous computing systems. IET Comput Dig Tech, 14(2):47–60. https://doi.org/10.1049/iet-cdt.2019.0092
Article Google Scholar
Jamieson C, Chandrashekar A, 2022. gem5 GPU accuracy profiler (GAP). Proc 4^th gem5 Users Workshop, p.44.
Jeong MK, Erez M, Sudanthi C, et al., 2012. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC. Proc Design Automation Conf, p.850–855.
Jog A, Kayiran O, Nachiappan NC, et al., 2013. OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance. ACM SIGPLAN Not, 48(4):395–406. https://doi.org/10.1145/2499368.2451158
Article Google Scholar
Jog A, Kayiran O, Pattnaik A, et al., 2016. Exploiting core criticality for enhanced GPU performance. Proc ACM SIGMETRICS Int Conf on Measurement and Modeling of Computer Science, p.351–363. https://doi.org/10.1145/2896377.2901468
Kim Y, Han D, Mutlu O, et al., 2010. ATLAS: a scalable and high-performance scheduling algorithm for multiple memory controllers. Proc 16^th Int Symp on High-Performance Computer Architecture, p.1–12. https://doi.org/10.1109/HPCA.2010.5416658
Lin CH, Liu JC, Yang PK, 2020. Performance enhancement of GPU parallel computing using memory allocation optimization. Proc 14^th Int Conf on Ubiquitous Information Management and Communication, p.1–5. https://doi.org/10.1109/IMCOM48794.2020.9001771
Mittal S, Vetter JS, 2015. A survey of CPU-GPU heterogeneous computing techniques. ACM Comput Surv, 47(4):69. https://doi.org/10.1145/2788396
Article Google Scholar
Mutlu O, Moscibroda T, 2008. Parallelism-aware batch scheduling: enhancing both performance and fairness of shared DRAM systems. Proc Int Symp on Computer Architecture, p.63–74. https://doi.org/10.1109/ISCA.2008.7
Power J, Basu A, Gu JL, et al., 2013. Heterogeneous system coherence for integrated CPU-GPU systems. Proc 46^th Annual IEEE/ACM Int Symp on Microarchitecture, p.457–467. https://doi.org/10.1145/2540708.2540747
Rai S, Chaudhuri M, 2017. Using criticality of GPU accesses in memory management for CPU-GPU heterogeneous multi-core processors. ACM Trans Embed Comput Syst, 16(5s):133. https://doi.org/10.1145/3126540
Article Google Scholar
Subramanian L, Lee D, Seshadri V, et al., 2015. The blacklisting memory scheduler: balancing performance, fairness and complexity. https://arxiv.org/abs/1504.00390v1
Usui H, Subramanian L, Chang KKW, et al., 2016. DASH: deadline-aware high-performance memory scheduler for heterogeneous systems with hardware accelerators. ACM Trans Archit Code Optim, 12(4):65. https://doi.org/10.1145/2847255
Article Google Scholar
Wang HN, Jog A, 2019. Exploiting latency and error tolerance of GPGPU applications for an energy-efficient DRAM. Proc 49^th Annual IEEE/IFIP Int Conf on Dependable Systems and Networks, p.362–374. https://doi.org/10.1109/DSN.2019.00046
Wang QH, Peng Z, Ren B, et al., 2022. MemHC: an optimized GPU memory management framework for accelerating many-body correlation. ACM Trans Archit Code Optim, 19(2):24. https://doi.org/10.1145/3506705
Article Google Scholar
Zhan XS, Bao YG, Bienia C, et al., 2016. PARSEC3.0: a multicore benchmark suite with network stacks and SPLASH-2X. ACM SIGARCH Comput Archit News, 44(5):1–16. https://doi.org/10.1145/3053277.3053279
Article Google Scholar
Zhang F, Zhai JD, He BS, et al., 2017. Understanding co-running behaviors on integrated CPU/GPU architectures. IEEE Trans Parall Distrib Syst, 28(3):905–918. https://doi.org/10.1109/TPDS.2016.2586074
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
Juan Fang (方娟), Sheng Lin (林胜), Huijing Yang (杨会静), Yixiang Xu (徐艺翔) & Xing Su (苏醒)

Authors

Juan Fang (方娟)
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Lin (林胜)
View author publications
You can also search for this author in PubMed Google Scholar
Huijing Yang (杨会静)
View author publications
You can also search for this author in PubMed Google Scholar
Yixiang Xu (徐艺翔)
View author publications
You can also search for this author in PubMed Google Scholar
Xing Su (苏醒)
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Juan FANG and Sheng LIN designed the research. Sheng LIN and Yixiang XU processed the data. Sheng LIN, Huijing YANG, and Xing SU drafted the paper. Juan FANG and Xing SU helped organize the paper. Sheng LIN and Xing SU revised and finalized the paper.

Corresponding author

Correspondence to Juan Fang (方娟).

Ethics declarations

Juan FANG, Sheng LIN, Huijing YANG, Yixiang XU, and Xing SU declare that they have no conflict of interest.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 62276011 and 61202076) and the Natural Science Foundation of Beijing, China (No. 4192007)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, J., Lin, S., Yang, H. et al. A perceptual and predictive batch-processing memory scheduling strategy for a CPU-GPU heterogeneous system. Front Inform Technol Electron Eng 24, 994–1006 (2023). https://doi.org/10.1631/FITEE.2200449

Download citation

Received: 11 October 2022
Accepted: 04 January 2023
Published: 28 July 2023
Issue Date: July 2023
DOI: https://doi.org/10.1631/FITEE.2200449

Key words

关键词

CLC number

TP391.9

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A perceptual and predictive batch-processing memory scheduling strategy for a CPU-GPU heterogeneous system

Abstract

摘要

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing non-coalesced memory access for irregular applications with GPU computing

GPU Cache System for COMPSs: A Task-Based Distributed Computing Framework

Criticality-aware priority to accelerate GPU memory access

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Key words

关键词

CLC number

Subscribe and save

Buy Now

Navigation

A perceptual and predictive batch-processing memory scheduling strategy for a CPU-GPU heterogeneous system

Abstract

摘要

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing non-coalesced memory access for irregular applications with GPU computing

GPU Cache System for COMPSs: A Task-Based Distributed Computing Framework

Criticality-aware priority to accelerate GPU memory access

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

关键词

CLC number

Subscribe and save

Buy Now

Search

Navigation