Thread Private Variable Access Optimization Technique for Sunway High-Performance Multi-core Processors

Kong, Jinying; Nie, Kai; Zhou, Qinglei; Xu, Jinlong; Han, Lin

doi:10.1007/978-981-16-5940-9_14

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1451))

Included in the following conference series:

International Conference of Pioneering Computer Scientists, Engineers and Educators

1174 Accesses
1 Citations

Abstract

The primary way to achieve thread-level parallelism on the Sunway high-performance multicore processor is to use the OpenMP programming technique. To address the problem of low parallelism efficiency caused by slow access to thread private variables in the compilation of Sunway OpenMP programs, this paper proposes a thread private variable access technique based on privileged instructions. The privileged instruction-based thread-private variable access technique centralizes the implementation of thread-private variables at the compiler level, eliminating the model switching overhead of invoking OS core processing and improving the speed of accessing thread-private variables. On the Sunway 1621 server platform, NPB3.3-OMP and SPEC OMP2012 achieved 6.2% and 6.8% running efficiency gains, respectively. The results show that the techniques proposed in this paper can provide technical support for giving full play to the advantages of Sunway’s high-performance multi-core processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

Article Open access 15 May 2021

Offloading C++17 Parallel STL on System Shared Virtual Memory Platforms

Software Cache Coherent Control by Parallelizing Compiler

References

Tiotto, E., Mahjour, B., Tsang, W.: OpenMP 4.5 compiler optimization for GPU offloading. IBM J. Res. Dev. 3(5), 1–11 (2020)
Google Scholar
Neth, B., Scogland, T.R.W., Strout, M.M., de Supinski, B.R.: Unified Sequential optimization directives in OpenMP. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 85–97. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_6
Mosseri, I., Alon, L.O., Harel, R., Oren, G.: ComPar: optimized multi-compiler for automatic OpenMP S2S parallelization. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 247–262. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_16
Schreter, I.: Systems and methods for accessing thread private data (2008)
Google Scholar
Wei, P.F., Brylinski, M.: Accelerated structural bioinformatics for drug discovery. In: High Performance Parallelism Pearls: Multicore and Many-Core Programming Approaches, pp. 55–72 (2015)
Google Scholar
Lin, Y., Chakrabarti, G., Marathe, J., Kwon, O., Sabne, A.: System and method for translating program functions for correct handling of local-scope variables and computing system incorporating the same (2008)
Google Scholar
Marathe, V.J., Byan, S., Seltzer, M.I., Mishra, A., Trivedi, A.: Efficient memory management for persistent memory (2019)
Google Scholar
Bratanov, S.V.: Method of concurrent instruction execution and parallel work balancing in heterogeneous computer systems, US (2019)
Google Scholar
Greenwood, S.R., Peterson, K.R., Schreiber, B.L.: Thread private memory storage for multi-thread digital data processors (1991)
Google Scholar
Chen, F., Ganglin, Y., Shen, S., Ye, X., Yang, F., Wang, K.: Parallelization and optimization of RMC for criticality computing based on the heterogeneous architecture of the Sunway Taihu Light supercomputer. Ann. Nucl. Energy 11(145), 1–12 (2020)
Google Scholar
Shirakihara, T.: Method and apparatus for managing thread private data in a parallel processing computer, US(1996)
Google Scholar
Gerofi, B., Takagi, M., Ishikawa, Y.: Toward operating system support for scalable multithreaded message passing. In: Proceedings of the 22nd European MPI Users’ Group Meeting, pp. 21–23 (2015)
Google Scholar
Hori, A., Takagi, M., Si, M., Dayal, J., Ishikawa, Y., Gerofi, B., Balaji, P.: Process-in-process: techniques for practical address-space sharing. In: HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing, pp. 131–143 (2018)
Google Scholar
Coon, B.W., Lindholm, J.E.: System and method for grouping execution threads, US (2007)
Google Scholar
Kadir, A., Cevdet, A.: Exploiting locality in sparse matrix-matrix multiplication on manycore architectures. IEEE Trans. Parallel Distrib. Syst. 28(8), 2258–2271 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Zhengzhou University, Zhengzhou, 450001, Henan, China
Jinying Kong, Qinglei Zhou & Lin Han
Information Engineering University, Zhengzhou, 450001, Henan, China
Kai Nie & Jinlong Xu

Authors

Jinying Kong
View author publications
You can also search for this author in PubMed Google Scholar
Kai Nie
View author publications
You can also search for this author in PubMed Google Scholar
Qinglei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jinlong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lin Han
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

North University of China, Taiyuan, China
Jianchao Zeng
North University of China, Taiyuan, China
Pinle Qin
Northeast Forestry University, Harbin, China
Weipeng Jing
Harbin University of Science and Technology, Harbin, China
Xianhua Song
National Academy of Guo Ding Institute of Data Science, Beijing, China
Zeguang Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kong, J., Nie, K., Zhou, Q., Xu, J., Han, L. (2021). Thread Private Variable Access Optimization Technique for Sunway High-Performance Multi-core Processors. In: Zeng, J., Qin, P., Jing, W., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2021. Communications in Computer and Information Science, vol 1451. Springer, Singapore. https://doi.org/10.1007/978-981-16-5940-9_14

Download citation

DOI: https://doi.org/10.1007/978-981-16-5940-9_14
Published: 10 September 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-5939-3
Online ISBN: 978-981-16-5940-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics