Skip to main content
Log in

HDSAP: heterogeneity-aware dynamic scheduling algorithm to improve performance of nanoscale many-core processors for unknown workloads

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The performance growth in processors has been continuing toward increasing the number of processing cores on the chip and scaling the feature size of transistors. However, in the nanoera, side effects of the scaling, such as induced heterogeneities in the performance, power, and soft error rate of identically designed cores, prevent the potential performance from being fully utilized. In this paper, we harness the mentioned side effects in shared-memory multicore processors with unknown workloads by a dynamic heuristic scheduling algorithm called HDSAP. HDSAP aims to maximize performance, i.e., the average response time, under power and reliability constraints in presence of induced heterogeneities. In this regard, we use a mathematical model to quantify task to core assignments based on performance variation. We also consider the variation in power to change selected cores when the power constraint is missed. To meet the reliability constraint, we use N-modular redundancy while being aware of the variation in the soft error rate of cores to prevent under/over reliability estimation. To evaluate HDSAP, we run SPLASH benchmark suite on Sniper and MACPat simulators. As a result, the response time of HDSAP reduces by 6%, 8%, and 25% in comparison with similar algorithms under the same power and reliability constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

Not applicable.

Notes

  1. In this paper, we refer to a job as the incoming workload. A job is a set of dependent tasks and the type of dependency is the barrier–synchronization.

  2. Unknown workload refers to the workload in which the arrival time, departure time, and execution time of jobs are not known to the scheduler in advance.

  3. Since in this paper, we do not consider the effect of the memory controller on the execution time, we eliminate this effect by setting the memory controller latency to zero. We will consider it in our future studies.

References

  1. Olofsson A (2017) Epiphany-V: A 1024 Processor 64-bit RISC system-on-chip. In: Hot Chips Symposium (HCS) ArXiv, abs/1610.01832

  2. Intel® Xeon Phi™ Processor 7295. ark.intel.com/content/www/us/en/ark/products/128690/intel-xeon-phi-processor-7295–16gb-1–5-ghz-72-core.html. Accessed 1 June 2022

  3. Dinechin B D (2015) Kalray MPPA®: Massively parallel processor array: Revisiting DSP acceleration with the Kalray MPPA Manycore processor. In: 2015 IEEE Hot Chips 27 Symposium (HCS), pp 1–27

  4. Raghunathan B, Turakhia Y, Garg S, Marculescu D (2013) Cherry-picking: Exploiting process variations in dark-silicon homogeneous chip multi-processors. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), pp 39–44. https://doi.org/10.7873/DATE.2013.023

  5. Raji M, Nikseresht M (2022) UMOTS: an uncertainty-aware multi-objective genetic algorithm-based static task scheduling for heterogeneous embedded systems. J Supercomput 78:279–314. https://doi.org/10.1007/s11227-021-03887-1

    Article  Google Scholar 

  6. Rangan K, Powell M D, Wei G Y, Brooks D (2011) Achieving uniform performance and maximizing throughput in the presence of heterogeneity. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture, pp 3–14. https://doi.org/10.1109/HPCA.2011.5749712

  7. Huai-Ting L, Chou CY, Yuan-Ting H, Wu AY (2017) Variation-aware reliable many-core system design by exploiting inherent core redundancy. IEEE Trans Very Large Scale Integr VLSI Syst 25(10):2803–2816

    Article  Google Scholar 

  8. Wang Y, Nörtershäuser D, Masson S L, Menaud J M (2019) Experimental characterization of variation in power consumption for processors of different generations. In: 15th IEEE International Conferences on Green Computing and Communications, Atlanta, United States, pp 1–9

  9. Pathania A, Henkel J (2018) Task scheduling for many-cores with s-nuca caches. In: Design, Automation & Test in Europe (DATE), pp 557–562. https://doi.org/10.23919/DATE.2018.8342069

  10. Salehi M, Shafique M, Kriebel F, Rehman S, Khavari Tavana M, Ejlali A, Henkel J (2015). dsReliM: Power-constrained reliability management in dark-silicon many-core chips under process variations. In: International Conferences On Hardware/Software Codesign And System Synthesis (CODES+ ISSS), pp 75–82. https://doi.org/10.1109/CODESISSS.2015.7331370

  11. Kumar S A, Shafique M, Kumar A, Henkel J (2013) Mapping on multi/many-core systems: survey of current and emerging trends. In: 50th ACM/EDAC/IEEE Design Automation Conference (DAC), pp 1–10. https://doi.org/10.1145/2463209.2488734

  12. Shafique M, Gnad D, Garg S, Henkel J (2015) Variability-aware dark silicon management in on-chip many-core systems. In: Design, Automation And Test In Europe Conference And Exhibition (DATE), pp 387–392. https://doi.org/10.7873/DATE.2015.0900

  13. Rapp R, Pathania A, Henkel J (2018) Pareto-optimal power-and cache-aware task mapping for many-cores with distributed shared last-level cache. In: International Symposium on Low Power Electronics And Design (ISLPED), pp 1–6. https://doi.org/10.1145/3218603.3218630

  14. Carlson T E, Heirman W, Eeckhout L (2011) Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In proceeding of 2011 International Conferences For High Performance Computing, Networking, Storage and Analysis, pp 1–12. https://doi.org/10.1145/2063384.2063454

  15. Sheng L, Ahn J H, Strong R D, Brockman J B, Tullsen D M, Jouppi N P (2009). McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: International Symposium Microarchitecture (ISCA).

  16. Woo S C, Ohara M, Torrie E, Singh J P, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: International Symposium on Computer Architecture (ISCA)

  17. Gupta M, Bhargava L, Indu S (2021) Mapping techniques in multicore processors: current and future trends. J Supercomput 77:9308–9363. https://doi.org/10.1007/s11227-021-03650-6

    Article  Google Scholar 

  18. Yesil S, Ozturk O (2022) Scheduling for heterogeneous systems in accelerator-rich environments. J Supercomput 78:200–221. https://doi.org/10.1007/s11227-021-03883-5

    Article  Google Scholar 

  19. Xu J, Shi H, Chen Y (2022) Efficient tasks scheduling in multicore systems integrated with hardware accelerators. J Supercomput. https://doi.org/10.1007/s11227-022-04955-w

    Article  Google Scholar 

  20. Bahrami F, Ranjbar B, Rohbani N, Ejlali A (2021) PVMC: task mapping and scheduling under process variation heterogeneity in mixed-criticality systems. IEEE Trans Emerg Topics Comput 10(2):1166–1177

    Google Scholar 

  21. Kapadia N, Pasricha S (2015) VARSHA: Variation and reliability-aware application scheduling with adaptive parallelism in the dark-silicon era. In: Design, Automation & Test in Europe Conferences & Exhibition (DATE) IEEE, pp 1060–1065. https://doi.org/10.7873/DATE.2015.0454

  22. RAPP, M., et al (2020) Neural network-based performance prediction for task migration on s-nuca many-cores. IEEE Trans Comp 70(10):1691–1704

    MATH  Google Scholar 

  23. Pathania A, Venkatramani V, Shafique M, Mitra T, Henkel J (2016) Optimal greedy algorithm for many-core scheduling. IEEE Trans Comput Aided Des Integr Circuits Syst 36(6):1054–1058. https://doi.org/10.1109/TCAD.2016.2618880

    Article  Google Scholar 

  24. Liu G, Park J, Marculescu D (2015) Procrustes: power constrained performance improvement using extended maximize-then-swap algorithm. IEEE Trans Comput Aided Des Integr Circuits Syst 34(10):1664–1676. https://doi.org/10.1109/TCAD.2015.2421911

    Article  Google Scholar 

  25. Kia, K., & Rajabzadeh, A. (2020) DASH: dynamic scheduling algorithm for single-isa heterogeneous nano-scale many-cores. In: IEEE (Ed.), 10th International Conference on Computer and Knowledge Engineering (ICCKE), pp. 447–452. doi: https://doi.org/10.1109/ICCKE50421.2020.9303673

  26. Yuan B, Li B, Chen H, Zeng Z, Yao X (2020) Multi-objective redundancy hardening with optimal task mapping for independent tasks on multi-cores. Soft Comput 24:981–995

    Article  Google Scholar 

  27. Suraj P, Navonil C, Prasun G (2021) Dynamic task allocation and scheduling with contention-awareness for network-on-chip based multicore systems. J Syst Archit 115:102020. https://doi.org/10.1016/j.sysarc.2021.102020

    Article  Google Scholar 

  28. Boroumand B, Yaghoubi E, Barekatain B (2021) An enhanced cost-aware mapping algorithm based on improved shuffled frog leaping in network on chips. J Supercomput 77:498–522

    Article  Google Scholar 

  29. Feitelson D G, Rudolph L (1998) Metrics and Benchmarking for Parallel Job Scheduling. In: Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP) 1459 Springer Berlin Heidelberg. https://doi.org/10.1007/BFb0053978

  30. Wu YJ, Yu ST, Lai KC, Chhabra A, Chang HY, Huang KC (2020) Two-level utilization-based processor allocation for scheduling moldable jobs. J Supercomput 76:10212–10239. https://doi.org/10.1007/s11227-020-03246-6

    Article  Google Scholar 

  31. Cerrolaza JP, Obermaisser R, Abella J, Cazorla FJ, Grüttner K, Agirre I, Ahmadian H, Allende I (2020) Multi-core devices for safety-critical systems: a survey. ACM Comput Surveys (CSUR) 53(4):1–38

    Article  Google Scholar 

  32. Ansari M, Saber-Latibari J, Pasandideh M, Ejlali A (2019) Simultaneous management of peak-power and reliability in heterogeneous multicore embedded systems. IEEE Trans Parallel Distrib Syst 31(3):623–633. https://doi.org/10.1109/TPDS.2019.2940631

    Article  Google Scholar 

Download references

Funding

No funding was received for conducting this paper.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design.

Corresponding author

Correspondence to Amir Rajabzadeh.

Ethics declarations

Conflict of interest

The authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kia, K., Rajabzadeh, A. HDSAP: heterogeneity-aware dynamic scheduling algorithm to improve performance of nanoscale many-core processors for unknown workloads. J Supercomput 79, 13341–13369 (2023). https://doi.org/10.1007/s11227-023-05159-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05159-6

Keywords

Navigation