Skip to main content

PDAWL: Profile-Based Iterative Dynamic Adaptive WorkLoad Balance on Heterogeneous Architectures

  • Conference paper
  • First Online:
Job Scheduling Strategies for Parallel Processing (JSSPP 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12326))

Included in the following conference series:

Abstract

While High Performance Computing systems are increasingly based on heterogeneous cores, their effectiveness depends on how well the scheduler can allocate workloads onto appropriate computing devices and how communication and computation can be overlapped. With different types of resources integrated into one system, the complexity of the scheduler correspondingly increases. Moreover, for applications with varying problem sizes on different heterogeneous resources, the optimal scheduling approach may vary accordingly. We thus present PDAWL, an event-driven profile-based Iterative Dynamic Adaptive Work-Load balance scheduling approach to dynamically and adaptively adjust workload to efficiently utilize heterogeneous resources. It combines online scheduling (DAWL), which can adaptively adjust workload based on available real time heterogeneous resources, with an offline machine learning (profile-based estimation model) which can build a device-specific communication computation estimation model. Our scheduling approach is tested on control-regular applications, Stencil kernel (based on a Jacobi Algorithm) and Sparse Matrix-Vector Multiplication (SpMV) in an event-driven runtime system. Experimental results show that PDAWL is either on-par or far outperforms whichever yields the best results (CPU or GPU).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arteaga, J., Zuckerman, S., Gao, G.R.: Generating fine-grain multithreaded applications using a multigrain approach. ACM Trans. Archit. Code Optim. 14(4), 1–47 (2017). https://doi.org/10.1145/3155288

    Article  Google Scholar 

  2. Barnes, B.J., Rountree, B., Lowenthal, D.K., Reeves, J., de Supinski, B., Schulz, M.: A regression-based approach to scalability prediction. In: Proceedings of the 22Nd Annual International Conference on Supercomputing, pp. 368–377. ICS 2008, ACM, New York, USA (2008). https://doi.org/10.1145/1375527.1375580

  3. Chen, Q., Guo, M.: Contention and locality-aware work-stealing for iterative applications in multi-socket computers. IEEE Trans. Comput. 67(6), 784–798 (2018). https://doi.org/10.1109/TC.2017.2783932

    Article  MathSciNet  Google Scholar 

  4. Chow, E., Anzt, H., Scott, J., Dongarra, J.: Using jacobi iterations and blocking for solving sparse triangular systems in incomplete factorization preconditioning. J. Parallel Distrib. Comput. 119, 219–230 (2018)

    Article  Google Scholar 

  5. Danalis, A., et al.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp. 63–74. GPGPU-3, ACM, New York, USA (2010). https://doi.org/10.1145/1735688.1735702, http://doi.acm.org/10.1145/1735688.1735702

  6. Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 December 2011. https://doi.org/10.1145/2049662.2049663, http://doi.acm.org/10.1145/2049662.2049663

  7. García, V., Gomez-Luna, J., Grass, T., Rico, A., Ayguade, E., Pena, A.J.: Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications. In: 2016 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10 September 2016. https://doi.org/10.1109/IISWC.2016.7581277

  8. Geng, T., et al.: The importance of efficient fine-grain synchronization for many-core systems. In: Ding, C., Criswell, J., Wu, P. (eds.) LCPC 2016. LNCS, vol. 10136, pp. 203–217. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52709-3_16

    Chapter  Google Scholar 

  9. Lee, V.W., et al.: Debunking the 100x gpu vs. cpu myth: An evaluation of throughput computing on CPU and GPU. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, pp. 451–460. ISCA 2010, ACM, New York, USA (2010). https://doi.org/10.1145/1815961.1816021, http://doi.acm.org/10.1145/1815961.1816021

  10. Levon, J., Elie, P.: Oprofile: A system profiler for linux (2004)

    Google Scholar 

  11. List, T.S.: November 2017. http://www.top500.org

  12. Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 45–55. MICRO 42, ACM, New York, USA (2009). https://doi.org/10.1145/1669112.1669121, http://doi.acm.org/10.1145/1669112.1669121

  13. Lutz, T., Fensch, C., Cole, M.: Partans: an autotuning framework for stencil computation on multi-GPU systems. ACM Trans. Arch. Code Optim. (TACO) 9(4), 59 (2013)

    Google Scholar 

  14. Margiolas, C., O’Boyle, M.F.P.: Portable and transparent software managed scheduling on accelerators for fair resource sharing. In: 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 82–93, March 2016

    Google Scholar 

  15. O’Boyle, M.F.P., Wang, Z., Grewe, D.: Portable mapping of data parallel programs to opencl for heterogeneous systems. In: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). pp. 1–10. CGO 2013, IEEE Computer Society, Washington, DC, USA (2013). https://doi.org/10.1109/CGO.2013.6494993, http://dx.doi.org/10.1109/CGO.2013.6494993

  16. Sant’Ana, L., Cordeiro, D., Camargo, R.: PLB-HeC: a profile-based load-balancing algorithm for heterogeneous CPU-GPU clusters. In: 2015 IEEE International Conference on Cluster Computing, pp. 96–105, September 2015. https://doi.org/10.1109/CLUSTER.2015.24

  17. San’Ana, L., Cordeiro, D., de Camargo, R.Y.: PLB-HAC: dynamic load-balancing for heterogeneous accelerator clusters. In: Yahyapour, R. (ed.) Euro-Par 2019. LNCS, vol. 11725, pp. 197–209. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29400-7_15

    Chapter  Google Scholar 

  18. Suettlerlein, J., Zuckerman, S., Gao, G.R.: An implementation of the codelet model. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 633–644. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40047-6_63

    Chapter  Google Scholar 

  19. Tribbey, W.: Modern database systems. In: Kim, W. (ed.) Modern Database Systems, chap. Numerical Recipes: The Art of Scientific Computing (3rd Edition) is Written by William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery, and Published by Cambridge University Press, 2007, Hardback, pp. 30–31, ISBN 978-0-521-88068-8, 1235 Pp. ACM Press/Addison-Wesley Publishing Co., New York, USA (1995). https://doi.org/10.1145/1874391.187410, http://dx.doi.org/10.1145/1874391.187410

  20. Van Craeynest, K., Jaleel, A., Eeckhout, L., Narvaez, P., Emer, J.: Scheduling heterogeneous multi-cores through performance impact estimation (pie). SIGARCH Comput. Archit. News 40(3), 213–224 (2012). https://doi.org/10.1145/2366231.2337184, http://doi.acm.org/10.1145/2366231.2337184

  21. Wang, Z., Tournavitis, G., Franke, B., O’boyle, M.F.P.: Integrating profile-driven parallelism detection and machine-learning-based mapping. ACM Trans. Archit. Code Optim. 11(1), 1–26 (2014). https://doi.org/10.1145/2579561, http://doi.acm.org/10.1145/2579561

  22. Wen, Y., O’Boyle, M.F.: Merge or separate?: multi-job scheduling for opencl kernels on CPU/GPU platforms. In: Proceedings of the General Purpose GPUs, pp. 22–31. GPGPU-10, ACM, New York, USA (2017). https://doi.org/10.1145/3038228.3038235, http://doi.acm.org/10.1145/3038228.3038235

  23. Yang, C., et al.: Adaptive optimization for petascale heterogeneous CPU/GPU computing. In: IEEE International Conference on Cluster Computing, pp. 19–28, September 2010). https://doi.org/10.1109/CLUSTER.2010.12

  24. Zhang, F., Wu, B., Zhai, J., He, B., Chen, W.: Finepar: irregularity-aware fine-grained workload partitioning on integrated architectures. In: 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 27–38, Febuary 2017. https://doi.org/10.1109/CGO.2017.7863726

  25. Zhang, F., Zhai, J., He, B., Zhang, S., Chen, W.: Understanding co-running behaviors on integrated CPU/GPU architectures. IEEE TPDS 28(3), 905–918 (2017). https://doi.org/10.1109/TPDS.2016.2586074

    Article  Google Scholar 

  26. Zuckerman, S., Suetterlein, J., Knauerhase, R., Gao, G.R.: Using a “codelet" program execution model for exascale machines: position paper. In: Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era. EXADAPT 2011, ACM, New York, USA (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tongsheng Geng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Geng, T., Amaris, M., Zuckerman, S., Goldman, A., Gao, G.R., Gaudiot, JL. (2020). PDAWL: Profile-Based Iterative Dynamic Adaptive WorkLoad Balance on Heterogeneous Architectures. In: Klusáček, D., Cirne, W., Desai, N. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2020. Lecture Notes in Computer Science(), vol 12326. Springer, Cham. https://doi.org/10.1007/978-3-030-63171-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63171-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63170-3

  • Online ISBN: 978-3-030-63171-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics