Skip to main content

Thread Private Variable Access Optimization Technique for Sunway High-Performance Multi-core Processors

  • Conference paper
  • First Online:
Book cover Data Science (ICPCSEE 2021)

Abstract

The primary way to achieve thread-level parallelism on the Sunway high-performance multicore processor is to use the OpenMP programming technique. To address the problem of low parallelism efficiency caused by slow access to thread private variables in the compilation of Sunway OpenMP programs, this paper proposes a thread private variable access technique based on privileged instructions. The privileged instruction-based thread-private variable access technique centralizes the implementation of thread-private variables at the compiler level, eliminating the model switching overhead of invoking OS core processing and improving the speed of accessing thread-private variables. On the Sunway 1621 server platform, NPB3.3-OMP and SPEC OMP2012 achieved 6.2% and 6.8% running efficiency gains, respectively. The results show that the techniques proposed in this paper can provide technical support for giving full play to the advantages of Sunway’s high-performance multi-core processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tiotto, E., Mahjour, B., Tsang, W.: OpenMP 4.5 compiler optimization for GPU offloading. IBM J. Res. Dev. 3(5), 1–11 (2020)

    Google Scholar 

  2. Neth, B., Scogland, T.R.W., Strout, M.M., de Supinski, B.R.: Unified Sequential optimization directives in OpenMP. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 85–97. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_6

  3. Mosseri, I., Alon, L.O., Harel, R., Oren, G.: ComPar: optimized multi-compiler for automatic OpenMP S2S parallelization. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 247–262. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_16

  4. Schreter, I.: Systems and methods for accessing thread private data (2008)

    Google Scholar 

  5. Wei, P.F., Brylinski, M.: Accelerated structural bioinformatics for drug discovery. In: High Performance Parallelism Pearls: Multicore and Many-Core Programming Approaches, pp. 55–72 (2015)

    Google Scholar 

  6. Lin, Y., Chakrabarti, G., Marathe, J., Kwon, O., Sabne, A.: System and method for translating program functions for correct handling of local-scope variables and computing system incorporating the same (2008)

    Google Scholar 

  7. Marathe, V.J., Byan, S., Seltzer, M.I., Mishra, A., Trivedi, A.: Efficient memory management for persistent memory (2019)

    Google Scholar 

  8. Bratanov, S.V.: Method of concurrent instruction execution and parallel work balancing in heterogeneous computer systems, US (2019)

    Google Scholar 

  9. Greenwood, S.R., Peterson, K.R., Schreiber, B.L.: Thread private memory storage for multi-thread digital data processors (1991)

    Google Scholar 

  10. Chen, F., Ganglin, Y., Shen, S., Ye, X., Yang, F., Wang, K.: Parallelization and optimization of RMC for criticality computing based on the heterogeneous architecture of the Sunway Taihu Light supercomputer. Ann. Nucl. Energy 11(145), 1–12 (2020)

    Google Scholar 

  11. Shirakihara, T.: Method and apparatus for managing thread private data in a parallel processing computer, US(1996)

    Google Scholar 

  12. Gerofi, B., Takagi, M., Ishikawa, Y.: Toward operating system support for scalable multithreaded message passing. In: Proceedings of the 22nd European MPI Users’ Group Meeting, pp. 21–23 (2015)

    Google Scholar 

  13. Hori, A., Takagi, M., Si, M., Dayal, J., Ishikawa, Y., Gerofi, B., Balaji, P.: Process-in-process: techniques for practical address-space sharing. In: HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing, pp. 131–143 (2018)

    Google Scholar 

  14. Coon, B.W., Lindholm, J.E.: System and method for grouping execution threads, US (2007)

    Google Scholar 

  15. Kadir, A., Cevdet, A.: Exploiting locality in sparse matrix-matrix multiplication on manycore architectures. IEEE Trans. Parallel Distrib. Syst. 28(8), 2258–2271 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kong, J., Nie, K., Zhou, Q., Xu, J., Han, L. (2021). Thread Private Variable Access Optimization Technique for Sunway High-Performance Multi-core Processors. In: Zeng, J., Qin, P., Jing, W., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2021. Communications in Computer and Information Science, vol 1451. Springer, Singapore. https://doi.org/10.1007/978-981-16-5940-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-5940-9_14

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-5939-3

  • Online ISBN: 978-981-16-5940-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics