High Performance Stencil Computations for Intel $$^{\normalsize \circledR }$$ Xeon Phi™ Coprocessor

Feng, Luxia; Dong, Yushan; Li, Chunjiang; Jiang, Hao

doi:10.1007/978-981-10-2209-8_10

Luxia Feng¹²,
Yushan Dong¹²,
Chunjiang Li¹² &
…
Hao Jiang¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 626))

Included in the following conference series:

Conference on Advanced Computer Architecture

Abstract

Stencil computations are a class of computational kernels which update array elements according to some stencil patterns, and they have drawn more attentions recently. The Intel Xeon Phi coprocessor, which is designed for high performance computing, has not been fully evaluated for stencil computations. In this paper, we present a series of optimizations to accelerate the 3-D 7-point stencil code on Intel Xeon Phi coprocessor. We focus on how to exploit the performance potential of many cores and wide-vector unit in each core. In order to exploit data locality, we use loop tiling and we propose a method for calculating the block size while tiling. The achieved performance brings a speedup of 211.6 in comparison with the serial code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Stencil Computations on HPC-oriented ARMv8 64-Bit Multi-Core Processor

Adapting combined tiling to stencil optimizations on sunway processor

Article 17 May 2023

Memory Access Optimization of High-Order CFD Stencil Computations on GPU

References

HiStencils. http://www.exastencils.org/histencils/. Accessed 15 Apr 2015
Stencil code. http://en.wikipedia.org/wiki/Stencil_code/. Accessed 15 Apr 2015
Michael, M., Reinders, J., Robison, A.: Structured Parallel Programming: Patterns for Efficient Computation. Elsevier, Amsterdam (2012)
Google Scholar
Intel Corporation. Intel$^{\textregistered }$ Xeon Phi™ coprocessor system software developers guide, March 2014
Google Scholar
Duran, A., Michael, K.: The Intel$^{\textregistered }$ many integrated core architecture. In: International Conference on High Performance Computing and Simulation (HPCS). IEEE (2012)
Google Scholar
James, J., Reinders, J.: Intel$^{\textregistered }$ Xeon Phi™ Coprocessor High-performance Programming. Newnes, Oxford (2013)
Google Scholar
Top 500 list. http://www.top500.org/. Accessed 1 June 2016
Chapman, B., Jost, G., Van Der Pas, R.: Using OpenMP: Portable Shared Memory Parallel Programming, vol. 10. MIT press, Massachusetts (2008)
Google Scholar
Xue, J.: Loop Tiling for Parallelism. Springer Science & Business Media, Berlin (2000)
Book MATH Google Scholar
Leopold, C.: Tight bounds on capacity misses for 3D stencil codes. In: Sloot, P.M.A., Tan, C.J.K., Dongarra, J., Hoekstra, A.G. (eds.) ICCS-ComputSci 2002, Part I. LNCS, vol. 2329, pp. 843–852. Springer, Heidelberg (2002)
Chapter Google Scholar
Schäfer, A., Fey, D.: High performance stencil code algorithms for GPGPUs. Procedia Comput. Sci. 4, 2027–2036 (2011)
Article Google Scholar
Maruyama, N., Takayuki, A.: Optimizing stencil computations for NVIDIA Kepler GPUs. In: Proceedings of the 1st International Workshop on High Performance Stencil Computations, Vienna (2014)
Google Scholar
You, Y., et al.: Evaluating multi-core, many-core architectures through accelerating the three-dimensional LaxCWendroff correction stencil. Int. J. High Perform. Comput. Appl. 28(3), 301–318 (2014)
Article Google Scholar
Rahman, S.M., Faizur, Q.Y., Apan, Q.: Understanding stencil code performance on multicore architectures. In: Proceedings of the 8th ACM International Conference on Computing Frontiers. ACM (2011)
Google Scholar
Wang, Q., et al.: Accelerating embarrassingly parallel algorithm on Intel MIC. In: International Conference on Progress in Informatics and Computing (PIC). IEEE (2014)
Google Scholar
Tao, G., et al.: Using the intel many integrated core to accelerate graph traversal. Int. J. High Perform. Comput. Appl. 28(3), 255–266 (2014)
Article Google Scholar

Download references

Acknowledgments

The work described in this paper is partially supported by the project of National Science Foundation of China under grant No.61170046 and No.61402495.

Author information

Authors and Affiliations

School of Computer, National University of Defence Technology, Changsha, Hunan, China
Luxia Feng, Yushan Dong, Chunjiang Li & Hao Jiang

Authors

Luxia Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yushan Dong
View author publications
You can also search for this author in PubMed Google Scholar
Chunjiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Hao Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunjiang Li .

Editor information

Editors and Affiliations

Department of Computer Science and Technology, National University of Defense Technology, Changsha, China
Junjie Wu
State Key Laboratory of Computer Architecture, Chinese Academy of Sciences, Beijing, China
Lian Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, L., Dong, Y., Li, C., Jiang, H. (2016). High Performance Stencil Computations for Intel$^{\normalsize \circledR }$ Xeon Phi™ Coprocessor. In: Wu, J., Li, L. (eds) Advanced Computer Architecture. ACA 2016. Communications in Computer and Information Science, vol 626. Springer, Singapore. https://doi.org/10.1007/978-981-10-2209-8_10

Download citation

DOI: https://doi.org/10.1007/978-981-10-2209-8_10
Published: 09 August 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2208-1
Online ISBN: 978-981-10-2209-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

High Performance Stencil Computations for Intel\(^{\normalsize \circledR }\) Xeon Phi™ Coprocessor

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Stencil Computations on HPC-oriented ARMv8 64-Bit Multi-Core Processor

Adapting combined tiling to stencil optimizations on sunway processor

Memory Access Optimization of High-Order CFD Stencil Computations on GPU

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

High Performance Stencil Computations for Intel\(^{\normalsize \circledR }\) Xeon Phi™ Coprocessor

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Stencil Computations on HPC-oriented ARMv8 64-Bit Multi-Core Processor

Adapting combined tiling to stencil optimizations on sunway processor

Memory Access Optimization of High-Order CFD Stencil Computations on GPU

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation