Hierarchical multicore thread mapping via estimation of remote communication

Khaleghzadeh, Hamidreza; Deldari, Hossein; Reddy, Ravi; Lastovetsky, Alexey

doi:10.1007/s11227-017-2176-6

Hierarchical multicore thread mapping via estimation of remote communication

Published: 31 October 2017

Volume 74, pages 1321–1340, (2018)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Hamidreza Khaleghzadeh ORCID: orcid.org/0000-0003-4070-7468¹,
Hossein Deldari²,
Ravi Reddy¹ &
…
Alexey Lastovetsky¹

260 Accesses
1 Altmetric
Explore all metrics

Abstract

Affinity-aware thread mapping is a method to effectively exploit cache resources in multicore processors. We propose an affinity- and architecture-aware thread mapping technique which maximizes data reuse and minimizes remote communications and cache coherency costs of multi-threaded applications. It consists of three main components: Data Sharing Estimator, Affine Mapping Finder and Maximum Speedup Predictor. Data Sharing Estimator creates application-specific data dependency signatures used by Affine Mapping Finder to determine the appropriate thread mapping of application for a given architecture. To prevent excessive thread migration, Maximum Speedup Predictor estimates the speedup of the obtained mapping and ignores it if it causes no significant performance improvement. The proposed framework is evaluated using Phoenix benchmark suite on two different multicore architectures. The proposed thread mapping approach gives 25% improvement in performance compared to default Linux scheduler. We also elucidate that affinity-based thread mapping approaches, which only consider the number of shared blocks, are not appropriate enough to accurately estimate data dependency between threads and determine the proper thread mapping.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems

Low-Overhead Reuse Distance Profiling Tool for Multicore

Affinity-Aware Synchronization in Work Stealing Run-Times for NUMA Multi-core Processors

References

Gepner P, Kowalik MF (2006) Multi-core processors: new way to achieve high system performance. In: International Symposium on Parallel Computing in Electrical Engineering, 2006. PAR ELEC 2006. IEEE, pp 9–13
Shukla SK, Murthy C, Chande P (2015) A survey of approaches used in parallel architectures and multi-core processors, for performance improvement. In: Progress in Systems Engineering. Springer, pp 537–545
Khammassi N, Le Lann J-C (2014) Design and implementation of a cache hierarchy-aware task scheduling for parallel loops on multicore architectures. PDCTA, Sydney, Australia
Zhang L, Liu Y, Wang R, Qian D (2014) Lightweight dynamic partitioning for last-level cache of multicore processor on real system. J Supercomput 69(2):547–560
Article Google Scholar
Sun Z, Wang R, Zhang L, Li Q, Chen L, Wu J, Liu Y (2012) Cache-aware scheduling for energy efficiency on multi-processors. In: 2012 International Conference on Computer Distributed Control and Intelligent Environmental Monitoring (CDCIEM). IEEE, pp 182–186
Ding W, Zhang Y, Kandemir M, Srinivas J, Yedlapalli P (2013) Locality-aware mapping and scheduling for multicores. In: 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, pp 1–12
Zhang EZ, Jiang Y, Shen X (2010) Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In: ACM Sigplan Notices. ACM, vol 45, no 5, pp 203–212
Kazempour V, Fedorova A, Alagheband P (2008) Performance implications of cache affinity on multicore processors. Euro-Par 2008–Parallel Processing, pp 151–161
Valiant LG (2011) A bridging model for multi-core computing. J Comput Syst Sci 77(1):154–166
Article MathSciNet MATH Google Scholar
Girão G, de Oliveira BC, Soares R, Silva IS (2007) Cache coherency communication cost in a NoC-based MPSoC platform. In: Proceedings of the 20th Annual Conference on Integrated Circuits and Systems Design. ACM, pp 288–293
Ramos S, Hoefler T (2013) Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing. ACM, pp 97–108
Song F, Moore S, Dongarra J (2009) Analytical modeling for affinity-based thread scheduling on multicore plataforms. In: Symposium on Principles and Practice of Parallel Programming
Terboven C, Schmidl D, Jin H, Reichstein T et al (2008) Data and thread affinity in OpenMP programs. In: Proceedings of the 2008 Workshop on Memory Access on Future Processors: A Solved Problem? ACM, pp 377–384
Anbar A, Serres O, Kayraklioglu E, Badawy A-HA, El-Ghazawi T (2016) Exploiting hierarchical locality in deep parallel architectures. ACM Trans Archit Code Optim TACO 13(2):16
Google Scholar
Yang T-F, Lin C-H, Yang C-L (2010) Cache-aware task scheduling on multi-core architecture. In: 2010 International Symposium on VLSI Design Automation and Test (VLSI-DAT). IEEE, pp 139–142
Wang E, Ni F, Chen J, Wang H, Li Y (2016) Cache-aware cooperative task mapping in multi-core real-time systems. Int J Inf Electron Eng 6(2):72
Google Scholar
Ghosh M, Nathuji R, Lee M, Schwan K, Lee H-HS (2011) Symbiotic scheduling for shared caches in multi-core systems using memory footprint signature. In: 2011 International Conference on Parallel Processing (ICPP). IEEE, pp 11–20
Saez JC, Shelepov D, Fedorova A, Prieto M (2011) Leveraging workload diversity through os scheduling to maximize performance on single-isa heterogeneous multicore systems. J Parallel Distrib Comput 71(1):114–131
Article Google Scholar
Shelepov D, Saez Alcaide JC, Jeffery S, Fedorova A, Perez N, Huang ZF, Blagodurov S, Kumar V (2009) Hass: a scheduler for heterogeneous multicore systems. ACM SIGOPS Oper Syst Rev 43(2):66–75
Article Google Scholar
Luo H, Li P, Ding C (2017) Thread data sharing in cache: theory and measurement. In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, pp 103–115
Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Notices. ACM, vol 40, no 6, pp 190–200
Mattson RL, Gecsei J, Slutz DR, Traiger IL (1970) Evaluation techniques for storage hierarchies. IBM Syst J 9(2):78–117
Article MATH Google Scholar
Shelepov D, Fedorova A (2008) Scheduling on heterogeneous multicore processors using architectural signatures
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Article MathSciNet MATH Google Scholar
Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating mapreduce for multi-core and multiprocessor systems. In: IEEE 13th International Symposium on High Performance Computer Architecture, 2007. HPCA 2007. IEEE, pp 13–24

Download references

Acknowledgements

This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Number 14/IA/2474.

Author information

Authors and Affiliations

School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland
Hamidreza Khaleghzadeh, Ravi Reddy & Alexey Lastovetsky
Salman Institute of Higher Education, Mashhad, Iran
Hossein Deldari

Authors

Hamidreza Khaleghzadeh
View author publications
You can also search for this author inPubMed Google Scholar
Hossein Deldari
View author publications
You can also search for this author inPubMed Google Scholar
Ravi Reddy
View author publications
You can also search for this author inPubMed Google Scholar
Alexey Lastovetsky
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Hamidreza Khaleghzadeh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khaleghzadeh, H., Deldari, H., Reddy, R. et al. Hierarchical multicore thread mapping via estimation of remote communication. J Supercomput 74, 1321–1340 (2018). https://doi.org/10.1007/s11227-017-2176-6

Download citation

Published: 31 October 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s11227-017-2176-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical multicore thread mapping via estimation of remote communication

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems

Low-Overhead Reuse Distance Profiling Tool for Multicore

Affinity-Aware Synchronization in Work Stealing Run-Times for NUMA Multi-core Processors

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now