Microarchitectural performance comparison of Intel Knights Corner and Intel Sandy Bridge with CFD applications

Che, Yonggang; Zhang, Lilun; Wang, Yongxian; Xu, Chuanfu; Liu, Wei; Wang, Zhenghua

doi:10.1007/s11227-014-1245-3

Microarchitectural performance comparison of Intel Knights Corner and Intel Sandy Bridge with CFD applications

Published: 28 June 2014

Volume 70, pages 321–348, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Yonggang Che¹,
Lilun Zhang¹,
Yongxian Wang¹,
Chuanfu Xu¹,
Wei Liu¹ &
…
Zhenghua Wang¹

332 Accesses
8 Citations
Explore all metrics

Abstract

This paper comparatively evaluates the microarchitectural performance of two representative Computational Fluid Dynamics (CFD) applications on the Intel Many Integrated Core (MIC) product, the Intel Knights Corner (KNC) coprocessor, and the Intel Sand Bridge (SNB) processor. Performance Monitoring Unit-based measurement method is used, along with a two-phase measurement method and some considerations to minimize the errors and instabilities. The results show that the CFD applications are sensitive to architecture factors. Their single thread performance and efficiency on KNC are much lower than that on SNB. Branch prediction and memory access are two primary factors that make the performance difference. The applications’ low-computational intensity and inefficient vector instruction usage are two additional factors. To be more efficient for the CFD applications, the MIC architecture needs to improve its branch prediction mechanism and memory hierarchy. Fine tuning of application codes is also crucial and is hard work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors

Analysis of Intel’s Haswell Microarchitecture Using the ECM Model and Microbenchmarks

Comprehensive analysis of energy efficiency and performance of ARM and RISC-V SoCs

Article Open access 20 February 2024

References

Intel Corporation. Many Integrated Core (MIC) Architecture. http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html
Intel Corporation (2012) Intel Xeon Phi coprocessor datasheet
Jeffers J, Reinders J (2013) Intel Xeon Phi coprocessor high performance programming. Morgan Kaufmann Press, Menlo Park
Intel Corporation (2012) An overview of programming for Intel Xeon processors and Intel Xeon Phi coprocessors, Rev 20121015
Top500 Supercomputers sites. http://www.top500.org/
Kanter D (2010) Intels sandy bridge microarchitecture. http://www.realworldtech.com/sandy-bridge/
Raman K (2013) Sandias molecular dynamics miniMD performance optimizations
Kamruzzaman M, Swanson S, Tullsen DM (2010) Data software, spreading: leveraging distributed caches to improve single thread performance. PLDI’10, Toronto, Ontario, Canada, June 5–10
Wellein G, Hager G (2012) Performance engineering for multi- and manycores: unveiling the mysteries of application performance. Invited session “Application performance: lessons learned from petascale computing” at ISC12, June 18, 2012. http://blogs.fau.de/hager/files/2010/09/Hager-ISC12
Schulz KW, Ulerich R, Malaya N, Bauman PT, Stogner R, Simmons C (2012) Early experiences porting scientific applications to the many integrated core (MIC) platform. In: TACC-Intel highly parallel computing symposium, Austin, TX, April 10–11
Glenn Brook R, Hadri B, Betro VC, Hulguin RC, Braby R (2012) Early application experiences with the Intel MIC architecture in a cray CX1. Cray User Group Meeting, Stuttgart, Germany, April 29–May 3. 2012, paper no.194
Satish N, Kim C, Chhugani J, Saito H, Krishnaiyer R, Smelyanskiy M, Girkar M, Dubey P (2012) Can traditional programming bridge the ninja performance gap for parallel computing applications? ISCA, pp 440–451
Williams S, Kalamkar DD et al (2012) Optimization of geometric multigrid for emerging multi- and manycore processors. SC’12, Salt Lake City, Utah, USA, paper no. 96
Cramer T, Schmidl D, Klemmy M, an Mey D (2012) OpenMP programming on Intel Xeon Phi coprocessors an early performance comparison. Many-core applications research community symposium, pp 38–44
Vladimirov A, Karpusenko V (2013) Test-driving Intel Xeon Phi coprocessors with a basic N-body simulation. http://goparallel.sourceforge.net/wp-content/uploads/2013/01/Colfax_Nbody_Xeon_Phi
Koesterke L, Milfeld K et al (2013) Optimizing the PCIT algorithm on Stampede’s Xeon and Xeon Phi processors for faster discovery of biological networks. XSEDE’13, San Diego, CA, USA, July 22–25
Meng Q, Humphrey A, Berzins M, Schmidt J (2013) Preliminary experiences with the Uintah framework on Intel Xeon Phi and stampede. XSEDE’13, San Diego, California, USA, July 22–25
Cadambi S, Coviello G, Li C-H, Phull R, Rao K, Sankaradass M, Chakradhar S (2013) COSMIC: middleware for high performance and reliable multiprocessing on Xeon Phi Coprocessors. HPDC’13, New York, NY, USA, June 17–21, pp 215–226
Li Yuqian, Che Yonggang, Wang Zhenghua (2013) Performance evaluation and scalability analysis of NPB-MZ on Intel Xeon Phi coprocessor. Commun Comput Inf Sci 396:153–162
Article Google Scholar
http://www.nas.nasa.gov/Software/NPB/
Van der Wijngaart RF, Jin H (2003) NAS parallel benchmarks, multi-zone versions. NAS Technical Report NAS-03-010
Xiaogang Deng, Hanxin Zhang (2000) Developing high-order accurate nonlinear schemes. J Comput Phys 165:22–44
Article MathSciNet MATH Google Scholar
Deng X, Mao M, Tu G et al (2010) Extending the fifth-order weighted compact nonlinear scheme to complex grids with characteristic-based interface conditions. AIAA J 48(12):2840–2851
Deng Xiaogang, Mao Meiliang, Zhang Hanxin, Zhang Yifeng (2012) High-order and high accurate CFD methods and their applications for complex grid problems. J Comput Phys 11(4):1081–1102
MathSciNet Google Scholar
Che Y-G, Zhang L-L, Wang Y-X, Xu C-F, Liu W, Wang Z-H, Liu H-Y (2012) Uniprocessor performance tuning of a structured grid based parallel CFD application. In: Annual conference on high performance computing of China, Zhangjiajie, China, October 29–31, pp 39–46 (in Chinese with English abstract)
Intel Corporation (2013) Multiplying matrices using dgemm. http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/tutorials/mkl_mmx_f/GUID-36BFBCE9-EB0A-43B0-ADAF-2B65275726EA.htm
Browne S, Dongarra J, Garner N, Ho G, Mucci P (2000) A portable programming interface for performance evaluation on modern processors. Int J High Perform Comput Appl 14:189–204. http://icl.cs.utk.edu/papi/
http://www.intel.com/software/products/vtune/
Intel Corporation (2013) Intel 64 and IA-32 architectures optimization reference manual. Order number: 248966-028
Serdjuk N (2012) Enabling huge paging on MIC with libhugetlbfs library. Intel Corporation
Intel Corporation (2012) Intel Xeon Phi coprocessor (codename: Knights Corner) Performance Monitoring Units. Revision 1.01
Intel Corporation (2013) Intel 64 and IA-32 architectures software developer’s manual combined volumes
Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76
Article Google Scholar
Sun Xian-He, Wang Dawei (2012) APC: a performance metric of memory systems. ACM Sigmetrics Perform Eval Rev 40(2):125–130
Article Google Scholar
McCalpin JD (2012) Some comments on the Xeon Phi coprocessor. Posted on November 17, 2012. http://blogs.utexas.edu/jdm4372/2012/11/17/some-comments-on-the-xeon-phi-coprocessor/

Download references

Acknowledgments

The authors would like to thank the HPC Application Research Center of National University of Defense Technology that provides the platforms for the performance evaluation. The authors would also like to thank Huayong Liu from the State Key Laboratory of Aerodynamics of China for his help. This work was partially supported by the National Natural Science Foundation of China under Grant Nos. 60603055 and 11272352, and the open Research Program of China State Key Laboratory of Aerodynamics under Grant No. SKLA20130105.

Author information

Authors and Affiliations

Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, People’s Republic of China
Yonggang Che, Lilun Zhang, Yongxian Wang, Chuanfu Xu, Wei Liu & Zhenghua Wang

Authors

Yonggang Che
View author publications
You can also search for this author inPubMed Google Scholar
Lilun Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Yongxian Wang
View author publications
You can also search for this author inPubMed Google Scholar
Chuanfu Xu
View author publications
You can also search for this author inPubMed Google Scholar
Wei Liu
View author publications
You can also search for this author inPubMed Google Scholar
Zhenghua Wang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yonggang Che.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Che, Y., Zhang, L., Wang, Y. et al. Microarchitectural performance comparison of Intel Knights Corner and Intel Sandy Bridge with CFD applications. J Supercomput 70, 321–348 (2014). https://doi.org/10.1007/s11227-014-1245-3

Download citation

Published: 28 June 2014
Issue Date: October 2014
DOI: https://doi.org/10.1007/s11227-014-1245-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microarchitectural performance comparison of Intel Knights Corner and Intel Sandy Bridge with CFD applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors

Analysis of Intel’s Haswell Microarchitecture Using the ECM Model and Microbenchmarks

Comprehensive analysis of energy efficiency and performance of ARM and RISC-V SoCs

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now