Abstract
The leading way to achieve thread-level parallelism on the Sunway high-performance multicore processors is to use OpenMP programming techniques. In order to address the problem of low parallel efficiency caused by high thread group control overhead in the compilation of Sunway OpenMP programs, this paper proposes the parallel region reconstruction technique. The parallel region reconstruction technique expands the parallel scope of parallel regions in OpenMP programs by parallel region merging and parallel region extending. Moreover, it reduces the number of parallel regions in OpenMP programs, decreases the overhead of frequent creation and convergence of thread groups, and converts standard fork-join model OpenMP programs to higher performance SPMD model OpenMP programs. On the Sunway 1621 server computer, NPB3.3-OMP and SPEC OMP2012 achieved 8.9% and 7.9% running efficiency improvement respectively through parallel region reconstruction technique. As a result, the parallel region reconstruction technique is feasible and effective. It provides technical support to fully exploit the multi-core parallelism advantage of Sunway's high-performance processors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tiotto, E., Mahjour, B., Tsang, W.: OpenMP 4.5 compiler optimization for GPU offloading. IBM J. Res. Dev. 3(5), 1–11 (2020)
Neth, B., Scogland, T.R.W., Strout, M.M., de Supinski, B.R.: Unified sequential optimization directives in OpenMP. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 85–97. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_6
Mosseri, I., Alon, L.O., Harel, R., Oren, G.: ComPar: optimized multi-compiler for automatic OpenMP S2S parallelization. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 247–262. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_16
Onodera, N., Idomura, Y., Hasegawa, Y.: GPU acceleration of multigrid preconditioned conjugate gradient solver on block-structured Cartesian grid. In: Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, pp. 120–128 (2021)
Pereira, F.H., Verardi, S.L.L., Nabeta, S.I.: A fast algebraic multigrid preconditioned conjugate gradient solver. Appl. Math. Comput. 179(1), 344–351 (2006)
Pal, S., Pathak, S., Rajasekaran, S.: On speeding-up parallel Jacobi iterations for SVDs. In: Proceedings - 18th IEEE International Conference on High Performance Computing and Communications, 14th IEEE International Conference on Smart City and 2nd IEEE International Conference on Data Science and Systems, pp. 9–16 (2016)
Yang, X., Mittal, R.: Efficient relaxed-Jacobi smoothers for multigrid on parallel computers. J. Comput. Phys. 332, 135–142 (2017)
Kudo, S., Yamamoto, Y., Bečka, M., Vajteršic, M.: Performance of the parallel one-sided block Jacobi SVD algorithm on a modern distributed-memory parallel computer. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9573, pp. 594–604. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32149-3_55
Cervini, S.: System and method for efficiently executing single program multiple data (SPMD) programs, US7904905 B2, US (2011)
Intel Corporation: Architecture and method for data parallel single program multiple data (SPMD) Execution: US,US20200104139[P], 4 February 2020
Sprenger, S., Zeuch, S., Leser, U.: Exploiting automatic vectorization to employ SPMD on SIMD registers. In: Proceedings - IEEE 34th International Conference on Data Engineering Workshops, pp. 90–95 (2018)
Zhu, W., del Cuvillo, J., Gao, G.R.: Performance characteristics of OpenMP language constructs on a many-core-on-a-chip architecture. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP -2005. LNCS, vol. 4315, pp. 230–241. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68555-5_19
Stelle, G., Moses, W.S., Olivier, S.L.: Implementing OpenMP tasks with tapir. In: Proceedings of LLVM-HPC 2017: 4th Workshop on the LLVM Compiler Infrastructure in HPC - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12. OpenMPIR (2017)
Bouraoui, H., Castrillon, J., Jerad, C.: Comparing dataflow and OpenMP programming for speaker recognition applications. In: PARMA-DITAM 2019 - Proceedings: 10th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures - 8th Workshop on Design Tools and Architectures For Multicore Embedded Computing Platforms, pp. 1–6 (2019)
Scogland, T.R.W., Gyllenhaal, J., Keasler, J., Hornung, R., de Supinski, B.R.: Enabling region merging optimizations in OpenMP. In: Terboven, C., de Supinski, B., Reble, P., Chapman, B., Müller, M. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 177–188. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24595-9_13
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nie, K., Zhou, Q., Qian, H., Pang, J., Xu, J., Li, Y. (2021). Parallel Region Reconstruction Technique for Sunway High-Performance Multi-core Processors. In: Zeng, J., Qin, P., Jing, W., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2021. Communications in Computer and Information Science, vol 1451. Springer, Singapore. https://doi.org/10.1007/978-981-16-5940-9_13
Download citation
DOI: https://doi.org/10.1007/978-981-16-5940-9_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-5939-3
Online ISBN: 978-981-16-5940-9
eBook Packages: Computer ScienceComputer Science (R0)