A memory-driven scheduling scheme and optimization for concurrent execution in GPU

Xu, Bao-yu; Zhang, Wu; Sun, Xian-he; Wang, Yang

doi:10.1007/s10586-016-0656-8

A memory-driven scheduling scheme and optimization for concurrent execution in GPU

Published: 29 September 2016

Volume 19, pages 2241–2250, (2016)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Bao-yu Xu ORCID: orcid.org/0000-0002-6528-146X¹,
Wu Zhang¹,
Xian-he Sun² &
…
Yang Wang¹

322 Accesses
Explore all metrics

Abstract

Concurrent execution of GPU tasks is available in modern GPU device. However, limited device memory is an obvious bottleneck in executing many GPU tasks. And the task priority and system performance are often ignored. To address these, a real-time GPU scheduling scheme is proposed in this paper. A reservation algorithm based on device memory(RBDM) is adopted to provide more opportunity for the High-priority task in the scheme. high priority first wake (HPFW) and small memory HPFW (SM-HPFW) are employed in the scheduling of waiting tasks to improve the priority response time and system performance. A CPU-based monitor is developed to check the GPU task execution. Experiments show the RBDM can work effectively. Compared with FIFO, HPFW can decrease overall priority response time significantly. Overall task completion time can be reduced by 20 % using the SM-HPFW while the distribution of device memory requirement of GPU tasks is even.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

K-Scheduler: dynamic intra-SM multitasking management with execution profiles on GPUs

Article 12 October 2021

Memory Contention Aware Power Management for High Performance GPUs

Criticality-aware priority to accelerate GPU memory access

Article 06 July 2022

References

Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 22, 685–701 (2010). doi:10.1002/cpe
Google Scholar
Chong, E.K.P.: Performance for imprecise evaluation computer of scheduling systems algorithms. J. Syst. Softw. 15, 261–277 (1991)
Article Google Scholar
Eswaran, A., Rajkumar, R.: Energy-aware memory firewalling for QoS-sensitive application. Proc. Euromicro Conf. Real-Time Syst. 2005, 11–20 (2005). doi:10.1109/ECRTS.2005.14
Article Google Scholar
Fang, W., Lau, K.K., Lu, M., Xiao, X., Lam, C.K., Yang, P.Y., He, B., Luo, Q., Sander, P.V., Yang, K.: Parallel data mining on graphics processors. Ph.D. thesis, Hong Kong University (2008). http://gpuminer.googlecode.com/files/gpuminer.pdf
Hardy, D., Puaut, I.: Predictable code and data paging for real time systems. In: Proceedings—Euromicro Conference on Real-Time Systems, pp. 266–275 (2008). doi:10.1109/ECRTS.2008.16
Hung, C.L., Hua, G.J.: Local alignment tool based on Hadoop framework and GPU architecture. BioMed Res. Int. 2014, 1–7 (2014). doi:10.1155/2014/541490
Google Scholar
Jog, A., Bolotin, E., Guz, Z., Parker, M., Keckler, S.W., Kandermir, M.T., Das, C.R.: Application-aware memory system for fair and efficient execution of concurrent GPGPU applications. In: Workshop on General Purpose Processing Using GPUs(GPGPU-7), pp. 1–8 (2014). doi:10.1145/2576779.2576780
Joo, W., Shin, D.: Resource-constrained spatial multi-tasking for embedded GPU. In: 2014 IEEE International Conference on Consumer Electronics (ICCE), pp. 2010–2011 (2014)
Kato, S., Lakshmanan, K., Rajkumar, R.R., Ishikawa, Y.: TimeGraph: GPU scheduling for real-time multi-tasking environments. In: 2011 USENIX Annual Technical Conference (USENIX ATC11), p. 17 (2011)
Kim, H., Rajkumar, R.: Shared-page management for improving the temporal isolation of memory reservations in resource kernels. In: Proceedings—18th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2012—2nd Workshop on Cyber-Physical Systems, Networks, and Applications, CPSNA, pp. 310–319 (2012). doi:10.1109/RTCSA.2012.50
Kim, H., Rajkumar, R.: Memory reservation and shared page management for real-time systems. J. Syst. Archit. 60(2), 165–178 (2014). doi:10.1016/j.sysarc.2013.07.002
Article Google Scholar
Lindholm, E.N.: Nvidia tesla:aunified graphics and computing architecture. Micro IEEE 28(0272–1732), 39–55 (2008)
Article Google Scholar
Mokhtari, R., Stumm, M.: BigKernel—high performance CPU-GPU communication pipelining for big data-style applications. In: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 819–828 (2014). doi:10.1109/IPDPS.2014.89
Nvidia: NVIDIA’s Next Generation CUDA Compute Architecture:Kepler GK110. http://www.nvidia.com/content/PDF/kepler/NVIDIA-kepler-GK110-Architecture-Whitepaper.pdf
Nvidia: Whitepaper NVIDIAs Next Generation CUDA Compute Architecture:Fermi (2009). doi:10.1016/j.immuni.2005.11.006. http://www.nvidia.com
Nvidia: Cuda c programming guide (2013). http://docs.nvidia.com/cuda/cuda-c-programming-guide
O’Neil, M.a., Burtscher, M.: Floating-point data compression at 75 Gb/s on a GPU. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units - GPGPU-4, pp. 1–7 (2011). doi:10.1145/1964179.1964189. http://portal.acm.org/citation.cfm?doid=1964179.1964189
Rixner, S., Dally, W.J., Kapasi, U.J., Mattson, P., Owens, J.D.: Memory access scheduling. In: Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201), vol. 27, pp. 1–11 (2000). :10.1145/342001.339668
Stuart, J.a., Owens, J.D.: Multi-GPU MapReduce on GPU clusters. In: Proceedings—25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011, pp. 1068–1079 (2011). doi:10.1109/IPDPS.2011.102
Sun, X.H., Wang, D.: Concurrent average memory access time. IEEE Comput. 47(5), 74–80 (2014)
Article Google Scholar
Volkov, V., Demmel, J., Berkeley, U.C.: Benchmarking g GPUs to Tune Dense Linear Algebra. In: Proceedings of the 2008 ACM/IEEE Conference on Superconducting (SC ’08), pp. 1–11 (2008)
Yazdanpanah, H.: Evaluation performance of task scheduling algorithms in heterogeneous environments. Int. J. Comput. Appl. 138(8), 1–9 (2016)
Google Scholar

Download references

Acknowledgments

This research is supported by NSFC and Shanghai Municipal Education Commission. I would like to extend my sincere gatitude to my friends at Illinois Institute of Technology (IIT), who have provided selfless help for my work and life abroad during my visiting scholar career. I gratefully acknowledge IIT who has offered me a cosy work environment and my colleagues of HPCC, Shanghai university.

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, China
Bao-yu Xu, Wu Zhang & Yang Wang
Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
Xian-he Sun

Authors

Bao-yu Xu
View author publications
You can also search for this author inPubMed Google Scholar
Wu Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Xian-he Sun
View author publications
You can also search for this author inPubMed Google Scholar
Yang Wang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Bao-yu Xu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, By., Zhang, W., Sun, Xh. et al. A memory-driven scheduling scheme and optimization for concurrent execution in GPU. Cluster Comput 19, 2241–2250 (2016). https://doi.org/10.1007/s10586-016-0656-8

Download citation

Received: 24 May 2016
Revised: 16 August 2016
Accepted: 17 September 2016
Published: 29 September 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10586-016-0656-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A memory-driven scheduling scheme and optimization for concurrent execution in GPU

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

K-Scheduler: dynamic intra-SM multitasking management with execution profiles on GPUs

Memory Contention Aware Power Management for High Performance GPUs

Criticality-aware priority to accelerate GPU memory access

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now