Skip to main content

Advertisement

Log in

Microarchitectural performance comparison of Intel Knights Corner and Intel Sandy Bridge with CFD applications

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper comparatively evaluates the microarchitectural performance of two representative Computational Fluid Dynamics (CFD) applications on the Intel Many Integrated Core (MIC) product, the Intel Knights Corner (KNC) coprocessor, and the Intel Sand Bridge (SNB) processor. Performance Monitoring Unit-based measurement method is used, along with a two-phase measurement method and some considerations to minimize the errors and instabilities. The results show that the CFD applications are sensitive to architecture factors. Their single thread performance and efficiency on KNC are much lower than that on SNB. Branch prediction and memory access are two primary factors that make the performance difference. The applications’ low-computational intensity and inefficient vector instruction usage are two additional factors. To be more efficient for the CFD applications, the MIC architecture needs to improve its branch prediction mechanism and memory hierarchy. Fine tuning of application codes is also crucial and is hard work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Intel Corporation. Many Integrated Core (MIC) Architecture. http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html

  2. Intel Corporation (2012) Intel Xeon Phi coprocessor datasheet

  3. Jeffers J, Reinders J (2013) Intel Xeon Phi coprocessor high performance programming. Morgan Kaufmann Press, Menlo Park

  4. Intel Corporation (2012) An overview of programming for Intel Xeon processors and Intel Xeon Phi coprocessors, Rev 20121015

  5. Top500 Supercomputers sites. http://www.top500.org/

  6. Kanter D (2010) Intels sandy bridge microarchitecture. http://www.realworldtech.com/sandy-bridge/

  7. Raman K (2013) Sandias molecular dynamics miniMD performance optimizations

  8. Kamruzzaman M, Swanson S, Tullsen DM (2010) Data software, spreading: leveraging distributed caches to improve single thread performance. PLDI’10, Toronto, Ontario, Canada, June 5–10

  9. Wellein G, Hager G (2012) Performance engineering for multi- and manycores: unveiling the mysteries of application performance. Invited session “Application performance: lessons learned from petascale computing” at ISC12, June 18, 2012. http://blogs.fau.de/hager/files/2010/09/Hager-ISC12

  10. Schulz KW, Ulerich R, Malaya N, Bauman PT, Stogner R, Simmons C (2012) Early experiences porting scientific applications to the many integrated core (MIC) platform. In: TACC-Intel highly parallel computing symposium, Austin, TX, April 10–11

  11. Glenn Brook R, Hadri B, Betro VC, Hulguin RC, Braby R (2012) Early application experiences with the Intel MIC architecture in a cray CX1. Cray User Group Meeting, Stuttgart, Germany, April 29–May 3. 2012, paper no.194

  12. Satish N, Kim C, Chhugani J, Saito H, Krishnaiyer R, Smelyanskiy M, Girkar M, Dubey P (2012) Can traditional programming bridge the ninja performance gap for parallel computing applications? ISCA, pp 440–451

  13. Williams S, Kalamkar DD et al (2012) Optimization of geometric multigrid for emerging multi- and manycore processors. SC’12, Salt Lake City, Utah, USA, paper no. 96

  14. Cramer T, Schmidl D, Klemmy M, an Mey D (2012) OpenMP programming on Intel Xeon Phi coprocessors an early performance comparison. Many-core applications research community symposium, pp 38–44

  15. Vladimirov A, Karpusenko V (2013) Test-driving Intel Xeon Phi coprocessors with a basic N-body simulation. http://goparallel.sourceforge.net/wp-content/uploads/2013/01/Colfax_Nbody_Xeon_Phi

  16. Koesterke L, Milfeld K et al (2013) Optimizing the PCIT algorithm on Stampede’s Xeon and Xeon Phi processors for faster discovery of biological networks. XSEDE’13, San Diego, CA, USA, July 22–25

  17. Meng Q, Humphrey A, Berzins M, Schmidt J (2013) Preliminary experiences with the Uintah framework on Intel Xeon Phi and stampede. XSEDE’13, San Diego, California, USA, July 22–25

  18. Cadambi S, Coviello G, Li C-H, Phull R, Rao K, Sankaradass M, Chakradhar S (2013) COSMIC: middleware for high performance and reliable multiprocessing on Xeon Phi Coprocessors. HPDC’13, New York, NY, USA, June 17–21, pp 215–226

  19. Li Yuqian, Che Yonggang, Wang Zhenghua (2013) Performance evaluation and scalability analysis of NPB-MZ on Intel Xeon Phi coprocessor. Commun Comput Inf Sci 396:153–162

    Article  Google Scholar 

  20. http://www.nas.nasa.gov/Software/NPB/

  21. Van der Wijngaart RF, Jin H (2003) NAS parallel benchmarks, multi-zone versions. NAS Technical Report NAS-03-010

  22. Xiaogang Deng, Hanxin Zhang (2000) Developing high-order accurate nonlinear schemes. J Comput Phys 165:22–44

    Article  MathSciNet  MATH  Google Scholar 

  23. Deng X, Mao M, Tu G et al (2010) Extending the fifth-order weighted compact nonlinear scheme to complex grids with characteristic-based interface conditions. AIAA J 48(12):2840–2851

  24. Deng Xiaogang, Mao Meiliang, Zhang Hanxin, Zhang Yifeng (2012) High-order and high accurate CFD methods and their applications for complex grid problems. J Comput Phys 11(4):1081–1102

    MathSciNet  Google Scholar 

  25. Che Y-G, Zhang L-L, Wang Y-X, Xu C-F, Liu W, Wang Z-H, Liu H-Y (2012) Uniprocessor performance tuning of a structured grid based parallel CFD application. In: Annual conference on high performance computing of China, Zhangjiajie, China, October 29–31, pp 39–46 (in Chinese with English abstract)

  26. Intel Corporation (2013) Multiplying matrices using dgemm. http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/tutorials/mkl_mmx_f/GUID-36BFBCE9-EB0A-43B0-ADAF-2B65275726EA.htm

  27. Browne S, Dongarra J, Garner N, Ho G, Mucci P (2000) A portable programming interface for performance evaluation on modern processors. Int J High Perform Comput Appl 14:189–204. http://icl.cs.utk.edu/papi/

  28. http://www.intel.com/software/products/vtune/

  29. Intel Corporation (2013) Intel 64 and IA-32 architectures optimization reference manual. Order number: 248966-028

  30. Serdjuk N (2012) Enabling huge paging on MIC with libhugetlbfs library. Intel Corporation

  31. Intel Corporation (2012) Intel Xeon Phi coprocessor (codename: Knights Corner) Performance Monitoring Units. Revision 1.01

  32. Intel Corporation (2013) Intel 64 and IA-32 architectures software developer’s manual combined volumes

  33. Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76

    Article  Google Scholar 

  34. Sun Xian-He, Wang Dawei (2012) APC: a performance metric of memory systems. ACM Sigmetrics Perform Eval Rev 40(2):125–130

    Article  Google Scholar 

  35. McCalpin JD (2012) Some comments on the Xeon Phi coprocessor. Posted on November 17, 2012. http://blogs.utexas.edu/jdm4372/2012/11/17/some-comments-on-the-xeon-phi-coprocessor/

Download references

Acknowledgments

The authors would like to thank the HPC Application Research Center of National University of Defense Technology that provides the platforms for the performance evaluation. The authors would also like to thank Huayong Liu from the State Key Laboratory of Aerodynamics of China for his help. This work was partially supported by the National Natural Science Foundation of China under Grant Nos. 60603055 and 11272352, and the open Research Program of China State Key Laboratory of Aerodynamics under Grant No. SKLA20130105.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yonggang Che.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Che, Y., Zhang, L., Wang, Y. et al. Microarchitectural performance comparison of Intel Knights Corner and Intel Sandy Bridge with CFD applications. J Supercomput 70, 321–348 (2014). https://doi.org/10.1007/s11227-014-1245-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1245-3

Keywords