Skip to main content
Log in

Hierarchical multicore thread mapping via estimation of remote communication

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Affinity-aware thread mapping is a method to effectively exploit cache resources in multicore processors. We propose an affinity- and architecture-aware thread mapping technique which maximizes data reuse and minimizes remote communications and cache coherency costs of multi-threaded applications. It consists of three main components: Data Sharing Estimator, Affine Mapping Finder and Maximum Speedup Predictor. Data Sharing Estimator creates application-specific data dependency signatures used by Affine Mapping Finder to determine the appropriate thread mapping of application for a given architecture. To prevent excessive thread migration, Maximum Speedup Predictor estimates the speedup of the obtained mapping and ignores it if it causes no significant performance improvement. The proposed framework is evaluated using Phoenix benchmark suite on two different multicore architectures. The proposed thread mapping approach gives 25% improvement in performance compared to default Linux scheduler. We also elucidate that affinity-based thread mapping approaches, which only consider the number of shared blocks, are not appropriate enough to accurately estimate data dependency between threads and determine the proper thread mapping.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Gepner P, Kowalik MF (2006) Multi-core processors: new way to achieve high system performance. In: International Symposium on Parallel Computing in Electrical Engineering, 2006. PAR ELEC 2006. IEEE, pp 9–13

  2. Shukla SK, Murthy C, Chande P (2015) A survey of approaches used in parallel architectures and multi-core processors, for performance improvement. In: Progress in Systems Engineering. Springer, pp 537–545

  3. Khammassi N, Le Lann J-C (2014) Design and implementation of a cache hierarchy-aware task scheduling for parallel loops on multicore architectures. PDCTA, Sydney, Australia

  4. Zhang L, Liu Y, Wang R, Qian D (2014) Lightweight dynamic partitioning for last-level cache of multicore processor on real system. J Supercomput 69(2):547–560

    Article  Google Scholar 

  5. Sun Z, Wang R, Zhang L, Li Q, Chen L, Wu J, Liu Y (2012) Cache-aware scheduling for energy efficiency on multi-processors. In: 2012 International Conference on Computer Distributed Control and Intelligent Environmental Monitoring (CDCIEM). IEEE, pp 182–186

  6. Ding W, Zhang Y, Kandemir M, Srinivas J, Yedlapalli P (2013) Locality-aware mapping and scheduling for multicores. In: 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, pp 1–12

  7. Zhang EZ, Jiang Y, Shen X (2010) Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In: ACM Sigplan Notices. ACM, vol 45, no 5, pp 203–212

  8. Kazempour V, Fedorova A, Alagheband P (2008) Performance implications of cache affinity on multicore processors. Euro-Par 2008–Parallel Processing, pp 151–161

  9. Valiant LG (2011) A bridging model for multi-core computing. J Comput Syst Sci 77(1):154–166

    Article  MathSciNet  MATH  Google Scholar 

  10. Girão G, de Oliveira BC, Soares R, Silva IS (2007) Cache coherency communication cost in a NoC-based MPSoC platform. In: Proceedings of the 20th Annual Conference on Integrated Circuits and Systems Design. ACM, pp 288–293

  11. Ramos S, Hoefler T (2013) Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing. ACM, pp 97–108

  12. Song F, Moore S, Dongarra J (2009) Analytical modeling for affinity-based thread scheduling on multicore plataforms. In: Symposium on Principles and Practice of Parallel Programming

  13. Terboven C, Schmidl D, Jin H, Reichstein T et al (2008) Data and thread affinity in OpenMP programs. In: Proceedings of the 2008 Workshop on Memory Access on Future Processors: A Solved Problem? ACM, pp 377–384

  14. Anbar A, Serres O, Kayraklioglu E, Badawy A-HA, El-Ghazawi T (2016) Exploiting hierarchical locality in deep parallel architectures. ACM Trans Archit Code Optim TACO 13(2):16

    Google Scholar 

  15. Yang T-F, Lin C-H, Yang C-L (2010) Cache-aware task scheduling on multi-core architecture. In: 2010 International Symposium on VLSI Design Automation and Test (VLSI-DAT). IEEE, pp 139–142

  16. Wang E, Ni F, Chen J, Wang H, Li Y (2016) Cache-aware cooperative task mapping in multi-core real-time systems. Int J Inf Electron Eng 6(2):72

    Google Scholar 

  17. Ghosh M, Nathuji R, Lee M, Schwan K, Lee H-HS (2011) Symbiotic scheduling for shared caches in multi-core systems using memory footprint signature. In: 2011 International Conference on Parallel Processing (ICPP). IEEE, pp 11–20

  18. Saez JC, Shelepov D, Fedorova A, Prieto M (2011) Leveraging workload diversity through os scheduling to maximize performance on single-isa heterogeneous multicore systems. J Parallel Distrib Comput 71(1):114–131

    Article  Google Scholar 

  19. Shelepov D, Saez Alcaide JC, Jeffery S, Fedorova A, Perez N, Huang ZF, Blagodurov S, Kumar V (2009) Hass: a scheduler for heterogeneous multicore systems. ACM SIGOPS Oper Syst Rev 43(2):66–75

    Article  Google Scholar 

  20. Luo H, Li P, Ding C (2017) Thread data sharing in cache: theory and measurement. In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, pp 103–115

  21. Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Notices. ACM, vol 40, no 6, pp 190–200

  22. Mattson RL, Gecsei J, Slutz DR, Traiger IL (1970) Evaluation techniques for storage hierarchies. IBM Syst J 9(2):78–117

    Article  MATH  Google Scholar 

  23. Shelepov D, Fedorova A (2008) Scheduling on heterogeneous multicore processors using architectural signatures

  24. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392

    Article  MathSciNet  MATH  Google Scholar 

  25. Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating mapreduce for multi-core and multiprocessor systems. In: IEEE 13th International Symposium on High Performance Computer Architecture, 2007. HPCA 2007. IEEE, pp 13–24

Download references

Acknowledgements

This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Number 14/IA/2474.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamidreza Khaleghzadeh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khaleghzadeh, H., Deldari, H., Reddy, R. et al. Hierarchical multicore thread mapping via estimation of remote communication. J Supercomput 74, 1321–1340 (2018). https://doi.org/10.1007/s11227-017-2176-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2176-6

Keywords

Navigation