Dynamic Thread Pinning for Phase-Based OpenMP Programs

Mazouz, Abdelhafid; Touati, Sid-Ahmed-Ali; Barthou, Denis

doi:10.1007/978-3-642-40047-6_8

Abdelhafid Mazouz¹⁹,
Sid-Ahmed-Ali Touati²⁰ &
Denis Barthou²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8097))

Included in the following conference series:

European Conference on Parallel Processing

3756 Accesses
3 Citations

Abstract

Thread affinity has appeared as an important technique to improve the overall program performance and for better performance stability. However, if we consider a program with multiple phases, it is unlikely that a single thread affinity produces the best program performance for all these phases. If we consider the case of OpenMP, applications may have multiple parallel regions, each with a distinct inter-thread data sharing pattern. In this paper, we propose an approach that allows to change thread affinity dynamically (thread migrations) between parallel regions at runtime to account for these distinct inter-thread data sharing patterns. We demonstrate that as far as cache sharing is concerned for SPEC OMP01, not all the tested OpenMP applications exhibit a distinct phase behavior. However, we show that while fixing thread affinity for the whole execution may improve performance by up to 30%, allowing dynamic thread pinning may improve performance by up to 40%. Furthermore, we provide an analysis about the required conditions to improve the effectiveness of the approach.

Download to read the full chapter text

Chapter PDF

Thread Private Variable Access Optimization Technique for Sunway High-Performance Multi-core Processors

A Quantitative Analysis of OpenMP Task Runtime Systems

Low-Overhead Reuse Distance Profiling Tool for Multicore

Keywords

References

Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: Characterization and architectural implications. In: Proc. of the International Conference on Parallel Architectures and Compilation Techniques, PACT 2008 (October 2008)
Google Scholar
Edmonds, J.: Maximum matching and a polyhedron with 0-1 vertices. Journal Res. Nat. 69-B(1-22), 125–130 (1965)
MathSciNet Google Scholar
Jin, H., Frumkin, M., Yan, J.: The OpenMP implementation of NAS parallel benchmarks and its performance. Tech. rep., NASA Ames Research Center (October 1999), http://www.nas.nasa.gov/Resources/Software/npb.html
Kandemir, M., Yemliha, T., Muralidhara, S., Srikantaiah, S., Irwin, M.J., Zhnag, Y.: Cache topology aware computation mapping for multicores. SIGPLAN Not. 45(6), 74–85 (2010)
Article Google Scholar
Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing 48, 96–129 (1998), http://dx.doi.org/10.1006/jpdc.1997.1404
Article Google Scholar
Klug, T., Ott, M., Weidendorfer, J., Trinitis, C.: autopin — automated optimization of thread-to-core pinning on multicore systems. In: Stenström, P. (ed.) Transactions on HiPEAC III. LNCS, vol. 6590, pp. 219–235. Springer, Heidelberg (2011)
Chapter Google Scholar
Lee, J., Wu, H., Ravichandran, M., Clark, N.: Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. In: Proc. of the Annual International Symposium on Computer Architecture, ISCA 2010, pp. 270–279. ACM, New York (2010)
Google Scholar
Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proc. of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2005, pp. 190–200. ACM, New York (2005), http://doi.acm.org/10.1145/1065010.1065034
Chapter Google Scholar
Mazouz, A., Touati, S.A.A., Barthou, D.: Performance evaluation and analysis of thread pinning strategies on multi-core platforms: Case study of spec omp applications on intel architectures. In: Proc. of IEEE International Conference on High Performance Computing & Simulation, HPCS 2011, July 4-8, pp. 273–279. IEEE, Istanbul (2011)
Chapter Google Scholar
Mohr, B., Malony, A.D., Shende, S., Wolf, F.: Design and prototype of a performance tool interface for openmp. The Journal of Supercomputing 23, 105–128 (2002), http://portal.acm.org/citation.cfm?id=603339.603347
Article MATH Google Scholar
Jain, R.: The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modelling. John Wiley and Sons (1991)
Google Scholar
Song, F., Moore, S., Dongarra, J.: Feedback-directed thread scheduling with memory considerations. In: Proc. of the International Symposium on High Performance Distributed Computing, HPDC 2007, pp. 97–106. ACM, New York (2007), http://doi.acm.org/10.1145/1272366.1272380
Google Scholar
Song, F., Moore, S., Dongarra, J.: Analytical modeling and optimization for affinity based thread scheduling on multicore systems. In: Proc. of the IEEE International Conference on Cluster Computing, New Orleans, Louisiana, USA, August 31 - September 4. IEEE (2009)
Google Scholar
Standard Performance Evaluation Corporation: SPEC CPU (2006), http://www.spec.org/
Tam, D., Azimi, R., Stumm, M.: Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In: Proc. of theACM SIGOPS/EuroSys European Conference on Computer Systems 2007, EuroSys 2007, pp. 47–58. ACM, New York (2007)
Chapter Google Scholar
Terboven, C., An Mey, D., Schmidl, D., Jin, H., Reichstein, T.: Data and thread affinity in OpenMP programs. In: Proc. of the Workshop on Memory Access on Future Processors, MAW 2008, pp. 377–384. ACM, New York (2008)
Chapter Google Scholar
Touati, S.A.A., Worms, J., Briais, S.: The Speedup-Test: A Statistical Methodology for Program Speedup Analysis and Computation. To Appear in the Journal of Concurrency and Computation: Practice and Experience (2012), http://hal.inria.fr/hal-00764454
Zhang, E.Z., Jiang, Y., Shen, X.: Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In: Proc. of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pp. 203–212. ACM, New York (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Versailles Saint-Quentin-en-Yvelines, France
Abdelhafid Mazouz
University of Nice Sophia Antipolis, France
Sid-Ahmed-Ali Touati
University of Bordeaux, France
Denis Barthou

Authors

Abdelhafid Mazouz
View author publications
You can also search for this author in PubMed Google Scholar
Sid-Ahmed-Ali Touati
View author publications
You can also search for this author in PubMed Google Scholar
Denis Barthou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

German Research School for Simulation Sciences, RWTH Aachen, Schinkelstr. 2a, 52062, Aachen, Germany
Felix Wolf
Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH, Station 22,, 52425, Jülich, Germany
Bernd Mohr
Center for Computing and Communication, RWTH Aachen, Seffenter Weg 23, 52074, Aachen, Germany
Dieter an Mey

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mazouz, A., Touati, SAA., Barthou, D. (2013). Dynamic Thread Pinning for Phase-Based OpenMP Programs. In: Wolf, F., Mohr, B., an Mey, D. (eds) Euro-Par 2013 Parallel Processing. Euro-Par 2013. Lecture Notes in Computer Science, vol 8097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40047-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-40047-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40046-9
Online ISBN: 978-3-642-40047-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Dynamic Thread Pinning for Phase-Based OpenMP Programs

Abstract

Chapter PDF

Similar content being viewed by others

Thread Private Variable Access Optimization Technique for Sunway High-Performance Multi-core Processors

A Quantitative Analysis of OpenMP Task Runtime Systems

Low-Overhead Reuse Distance Profiling Tool for Multicore

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Dynamic Thread Pinning for Phase-Based OpenMP Programs

Abstract

Chapter PDF

Similar content being viewed by others

Thread Private Variable Access Optimization Technique for Sunway High-Performance Multi-core Processors

A Quantitative Analysis of OpenMP Task Runtime Systems

Low-Overhead Reuse Distance Profiling Tool for Multicore

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation