Skip to main content
Log in

Towards enhanced I/O performance of a highly integrated many-core processor by empirical analysis

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Optimized for parallel operations, Intel’s second generation Xeon Phi processor, code-named Knights Landing (KNL), is actively utilized in high performance computing systems based on its highly integrated cores and high-bandwidth on-package memory, Multi-Channel DRAM (MCDRAM). Recently, the emergence of data-intensive applications and the utilization of many-core processors have further increased I/O performance requirements of high performance computing systems. Therefore, it is necessary to understand and analyze the I/O characteristics of many integrated core systems. In this paper, we experimentally analyze the I/O characteristics of KNL, focusing on single-thread, buffered-write operations. We determine that KNL has a bottleneck in its buffered write operation that utilizes page cache. To find this bottleneck point and identify its cause, we conduct the experiments in two different ways. First, we measure the execution time of kernel functions through kernel I/O path. Second, we measure the occurrence count of system events such as cache-misses and branch-misses. With results from these experiments, we discuss the characteristics on KNL’s I/O performance involving the performance bottlenecks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

Available upon request

References

  1. Asaadi, H., Khaldi, D., Chapman, B.: A comparative survey of the hpc and big data paradigms: Analysis and experiments. In: Proceedings of the 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 423–432 (2016)

  2. Han, J., Koo, D., Lockwood, G.K., Lee, J., Eom, H., Hwang, S.: Accelerating a burst buffer via user-level i/o isolation. In: Proceedings of the 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 245–255 (2017)

  3. Koo, D., Lee, J., Liu, J., Byun, E.-K., Kwak, J.-H., Lockwood, G.K., Hwang, S., Antypas, K., Wu, K., Eom, H.: An empirical study of i/o separation for burst buffers in hpc systems. J. Parallel Distrib. Comput. 148, 96–108 (2021)

    Article  Google Scholar 

  4. Xuan, P., Ligon, W.B., Srimani, P.K., Ge, R., Luo, F.: Accelerating big data analytics on hpc clusters using two-level storage. Parallel Comput. 61, 18–34 (2017), special Issue on 2015 Workshop on Data Intensive Scalable Computing Systems (DISCS-2015). http://www.sciencedirect.com/science/article/pii/S0167819116300631

  5. Zhao, D., Liu, N., Kimpe, D., Ross, R., Sun, X., Raicu, I.: Towards exploring data-intensive scientific applications at extreme scales through systems and simulations. IEEE Trans. Parallel Distrib. Syst. 27(6), 1824–1837 (2016)

    Article  Google Scholar 

  6. Leak, S.: Introduction to Cori. NERSC User Engagement Group. https://www.nersc.gov/assets/Uploads/Intro-to-Cori.pdf (2017)

  7. “Kisti nurion,” https://www.ksc.re.kr/eng/resource/overview

  8. “Kisti pushes the boundaries of science and technology with nurion,” Intel®, Case Study Report, https://www.intel.co.kr/content/www/kr/ko/products/docs/network-io/high-performance-fabrics/opa-xeon-scalable-kisti-nurion-study.html

  9. Agelastos, A.M. et al.: Performance on trinity phase 2 (a cray xc40 utilizing intel xeon phi processors) with acceptance applications and benchmarks. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), Tech. Rep. (2017)

  10. Sodani, A.: Knights landing (knl): 2nd generation intel®xeon phi processor. In: Proceedings of the 2015 IEEE Hot Chips 27 Symposium (HCS), pp. 1–24 (Aug 2015)

  11. Sodani, A., et al.: Knights landing: second-generation intel xeon phi product. IEEE Micro 36(2), 34–46 (2016)

    Article  Google Scholar 

  12. Woo, J., Choi, H., Lee, J.: Empirical performance analysis of collective communication for distributed deep learning in a many-core cpu environment. Appl. Sci. 10(19), 6717 (2020)

    Article  Google Scholar 

  13. Chen, L., Peng, B., Zhang, B., Liu, T., Zou, Y., Jiang, L., Henschel, R., Stewart, C., Zhang, Z., McCallum, E., Tom, Z., Jon, O., Qiu, J.: Benchmarking harp-daal: High performance hadoop on knl clusters. In: Proceedings of the 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), pp. 82–89 (2017)

  14. Byun, C., Kepner, J., Arcand, W., Bestor, D., Bergeron, B., Gadepally, V., Houle, M., Hubbell, M., Jones, M., Klein, A., Michaleas, P., Milechin, L., Mullen, J., Prout, A., Rosa, A., Samsi, S., Yee, C., Reuther, A.: Benchmarking data analysis and machine learning applications on the intel knl many-core processor. In: Proceedings of the 2017 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6 (2017)

  15. “Cgroups,” https://en.wikipedia.org/wiki/Cgroups

  16. S. A. et al.: Improving i/o resource sharing of linux cgroup for nvme ssds on multi-core systems. In: 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 16). Denver, CO: USENIX Association. https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/ahn (2016)

  17. Oh, K., Park, J., Eom, Y.I.: Weight-based page cache management scheme for enhancing i/o proportionality of cgroups. In: Proceedings of the 2019 IEEE International Conference on Consumer Electronics (ICCE), pp. 1–3 (2019)

  18. “Ior wiki,” https://wiki.lustre.org/IOR

  19. Kljajić, J., Bogdanović, N., Nankovski, M., Tončev, M., Djordjević, B.: Performance analysis of 64-bit ext4, xfs and btrfs filesystems on the solid-state disk technology. INFOTEH-JAHORINA 15, 563–566 (2016)

    Google Scholar 

  20. “How to choose your red hat enterprise linux file system,” https://access.redhat.com/articles/3129891

  21. “Linux perf profiler,” https://en.wikipedia.org/wiki/Perf_(Linux)

  22. Bang, J., Kim, C., Kim, S., Chen, Q., Lee, C., Byun, E.-K., Lee, J., Eom, H.: Finer-lru: A scalable page management scheme for hpc manycore architectures, submitted to IPDPS‘21 (May 2021)

  23. Liu, J. et al.: Understanding the i/o performance gap between cori knl and haswell. Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States), Tech. Rep. (2017)

  24. “Intel silvermont microarchitecture,” https://en.wikipedia.org/wiki/Silvermont

  25. Xie, B., Liu, X., McKee, S.A., Zhan, J., Jia, Z., Wang, L., Zhang, L.: Understanding data analytics workloads on intel(r) xeon phi(r). In: Proceedings of the 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 206–215 (2016)

  26. D’Agostino, D., et al.: Performance and economic evaluations in adopting low power architectures: A real case analysis. In: Pham, C., Altmann, J., Bañares, J.Á. (eds.) Economics of Grids, Clouds, Systems, and Services, pp. 177–189. Springer International Publishing, Cham (2017)

    Chapter  Google Scholar 

  27. Mittal, S.: A survey of techniques for architecting tlbs. Concurr. Comput. 29(10), e4061 (2017)

    Article  Google Scholar 

  28. “Translation lookaside buffer (tlb),” https://en.wikipedia.org/wiki/Translation_lookaside_buffer

  29. Jabbie, I.A. et al.: Performance comparison of intel xeon phi knights landing. SIAM Undergraduate Research Online (SIURO), vol. 10 (2017)

  30. Park, G., Rho, S., Kim, J.-S., Nam, D.: Towards optimal scheduling policy for heterogeneous memory architecture in many-core system. Clust. Comput. 22(1), 121–133 (2019)

    Article  Google Scholar 

  31. Ahn, S., La, K., Kim, J.: Improving i/o resource sharing of linux cgroup for nvme ssds on multi-core systems. In: Proceedings of the 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 16). Denver, CO: USENIX Association. https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/ahn (2016)

  32. Pathak, A.R., Pandey, M., Rautaray, S.S.: Approaches of enhancing interoperations among high performance computing and big data analytics via augmentation. Cluster Computing, pp. 1–36. Springer, New York (2019)

    Google Scholar 

  33. Li, D., Dong, M., Tang, Y., Ota, K.: A novel disk i/o scheduling framework of virtualized storage system. Clust. Comput. 22(1), 2395–2405 (2019)

    Article  Google Scholar 

Download references

Funding

This work was supported by the Korea Institute of Science and Technology Information (K-21-L02-C08-S01), the National Supercomputing Center with supercomputing resources including technical support (KSC-2020-INO-0044), the PF Class Heterogeneous High Performance Computer Development Program (NRF-2016M3C4A7952587), the Next-Generation Information Computing Development Program (NRF-2015M3C4A7065646), the Basic Science Research Program (NRF-2020R1F1A1072696), BK21 FOUR Intelligence Computing (4199990214639, Dept. of Computer Science and Engineering, Seoul National University) through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT, Seoul R&D Program (CY20038) “Commercializing of technology for CAT Pro Web service based on AI” and the Technology development Program (S2878336) funded by the Ministry of SMEs and Startups (MSS, Korea)

Author information

Authors and Affiliations

Authors

Contributions

CL: Software, Writing—original draft, Review, Validation. JL: Conceptualization, Supervision, Writing—original draft, Funding acquisition, Project administration. DK: Software, Validation, Data curation. CK: Writing—original, Validation, Methodology. JB: Methodology, Data curation. EB: Supervision, Resources, Funding acquisition. HE: Project administration, Funding acquisition, Conceptualization.

Corresponding author

Correspondence to Jaehwan Lee.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, C., Lee, J., Koo, D. et al. Towards enhanced I/O performance of a highly integrated many-core processor by empirical analysis. Cluster Comput 26, 2643–2655 (2023). https://doi.org/10.1007/s10586-021-03288-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03288-2

Keywords

Navigation