Abstract
The ARMv8 64-bit platform has been considered as an alternative for high performance computing (HPC). Stencil computations are a class of iterative kernels which update array elements according to a stencil. In this paper, we evaluate the performance and scalability of one ARMv8 64-bit Multi-Core Processor with 7-point 3D stencil code, and a series of optimization are devised for the stencil code. In the optimization, we mainly focus on how to parallelize the kernel and how to exploit data locality with loop tiling, also we improve the calculation of the block size in tiling. The achieved performance differs with the grid size of stencil, and the optimal performance is 24.4 % of the peak DP Flops for the grid size of \(64^{3}\). Comparing with Intel Xeon processor, the performance of the ARMv8 64-bit processor is about 40 % of that of Sandy Bridge for the stencil code with the grid size of \(512^{3}\), but this ARMv8 64-bit processor shows better scalability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rajovic, N., Carpenter, P.M., Gelado, I., Puzovic, N., Ramirez, A., Valero, M.: Supercomputing with commodity CPUs: are mobile SoCs ready for HPC? In: SC 2013: International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–12. ACM, New York (2013)
Mont-Blanc. http://www.montblanc-project.eu/project/introduction
Rajovic, N., et al.: Building Supercomputers from Mobile Processors. In: EDA Work-shop13 Presentation, Dresden (2013)
Goodacre, J.: The evolution of the arm architecture towards big data and the data-center. In: VHPC 2013: Proceedings of the 8th Workshop on Virtualization in High-Performance Cloud Computing, pp. 1–10. ACM, New York (2013)
Laurenzano, M.A., Tiwari, A., Jundt, A., Peraza, J., Ward Jr., W.A., Campbell, R., Carrington, L.: Characterizing the performance-energy tradeoff of small ARM cores in HPC computation. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014 Parallel Processing. LNCS, vol. 8632, pp. 124–137. Springer, Heidelberg (2014)
ARMv8-A Architecture. http://www.arm.com/products/processors/instruction-set-architectures/armv8-architecture.php
ARM Infocenter. http://infocenter.arm.com/help/index.jsp
HiStencils. http://www.exastencils.org/histencils
Stencil code. http://en.wikipedia.org/wiki/Stencil_code
Mccool, M., Reinders, J., Robison, A.: Structured parallel programming: patterns for efficient computation. ACM SIGSOFT Softw. Eng. Notes 37(6), 43 (2012)
The Top 500 list. http://www.top500.org
Edson, L.P., Daniel, A.G.O., Pedro, V., et al.: Scalability and energy efficiency of hpc cluster with arm mpsoc. In: Workshop of Parallel and Distributed Processing (2013)
Rajovic, N., Rico, A., Vipond, J., Gelado, I., Puzovic, N., Ramirez, A.: Experiences with mobile processors for energy efficient HPC. In: DATE 2013: Design, Automation and Test in Europe Conference and Exhibition, pp. 464–468. EDA Consortium, San Jose (2013)
Ou, Z., Pang, B., Deng, Y., Nurminen, J.K., Yla-Jaaski, A., Hui, P.: Energy- and cost-efficiency analysis of ARM-based clusters. In: 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 115–123. IEEE, New York (2012)
Blem, E., Menon, J., Sankaralingam, K.: Power struggles: revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures. In: HPCA 2013: 19th IEEE International Symposium on High Performance Computer Architecture, pp. 1–12. IEEE Computer Society (2013)
Abdurachmanov, D., Bockelman, B., Elmer, P., Eulisse, G., Knight, R., Muzaffar, S.: Heterogeneous high throughput scientific computing with apm x-gene and intel xeon phi.CoRR.arXiv preprint arXiv:1410.3441 (2014)
Rivera, G., Tseng, C.W.: Tiling optimizations for 3D scientific computations. In: SC Conference, p. 32. IEEE Computer Society (2000)
Song, Y., Xu, R., Wang, C., Li, Z.: Data locality enhancement by memory reduction. In: Proceedings of the 15th International Conference on Supercomputing, pp. 50–64. ACM (2001)
Kamil, S., Datta, K., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Implicit and explicit optimizations for stencil computations. In: MSPC 2006: Proceedings of the 2006 Workshop on Memory System Performance and Correctness, pp. 51–60. ACM (2006)
Krishnamoorthy, S., Baskaran, M.M., Bondhugula, U., Ramanujam, J., Rountev A., Sadayappan, P.: Effective automatic parallelization of stencil computations. In: Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, pp. 235–244. ACM (2007)
Schäfer, A., Fey, D.: High performance stencil code algorithms for gpgpus. Procedia Comput. Sci. 4, 2027–2036 (2011)
Maruyama, N., Aoki, T.: Optimizing stencil computations for NVIDIA Kepler GPUs. In: Proceedings of the 1st International Workshop on High-Performance Stencil Computations, pp. 89–95 (2014)
Dehnavi, M.M., You, Y., Fu, H., Song, S.L., Gan, L., Huang, X., et al.: Evaluating multi-core and many-core architectures through accelerating the three-dimensional laxCwendroff correction stencil. Int. J. High Perform. Comput. Appl. 28(3), 301–318 (2014)
Rahman, S.M.F., Yi, Q., Qasem, A.: Understanding stencil code performance on multicore architectures. In: Proceedings of the 8th ACM International Conference on Computing Frontiers, p. 30. ACM (2011)
Chapman, B., Jost, G., Van der Pas, R.: Using OpenMP: Portable Shared Memory Parallel Programming, vol. 10. MIT Press, Cambridge (2008)
Dagum, L., Enon, R.: Openmp: an industry-standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998). IEEE
Board, O.A.R.: OpenMP application program interface. version 4.0. The OpenMP Forum, Technical report (2013)
Xue, J.: Loop Tiling for Parallelism. Springer Science & Business Media, US (2000)
Leopold, C.: Tight bounds on capacity misses for 3D stencil codes. In: Sloot, P.M.A., Tan, C.J.K., Dongarra, J., Hoekstra, A.G. (eds.) ICCS-ComputSci 2002, Part I. LNCS, vol. 2329, pp. 843–852. Springer, Heidelberg (2002)
Acknowledgements
The work in this paper is partially supported by the project of National Science Foundation of China under grant No.61170046, and the National High Technology Research and Development Program of China (863 Program) No.2012AA0 10903.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, C., Dong, Y., Li, K. (2015). Stencil Computations on HPC-oriented ARMv8 64-Bit Multi-Core Processor. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9530. Springer, Cham. https://doi.org/10.1007/978-3-319-27137-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-27137-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27136-1
Online ISBN: 978-3-319-27137-8
eBook Packages: Computer ScienceComputer Science (R0)