Skip to main content

Advertisement

Log in

A new software cache structure on Sunway TaihuLight

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

This article has been updated

Abstract

The Sunway TaihuLight is the first supercomputer built entirely with domestic processors in China. On Sunway Taihulight, the local data memory (LDM) of the slave core is limited, so data transmission with the main memory is frequent during calculation, and the memory access efficiency is low. On the other hand, for many scientific computing programs, how to solve the storage problem of irregular access data is the key to program optimization. Software cache (SWC) is one of the effective means to solve these problems. Based on the characteristics of Sunway TaihuLight structure and irregular access, this paper designs and implements a new software cache structure by using part of the space in LDM to simulate the cache function, which uses new cache address mapping and conflicts solution to solve high data access overhead and storage overhead in a traditional cache. At the same time, the SWC uses the register communication between the slave cores to share on the different slave core LDMs, increasing the capacity of the software cache and improving the hit rate. In addition, we adopt a double buffer strategy to access regular data in batches, which hides the communication overhead between the slave core and the main memory. The test results on the Sunway TaihuLight platform show that the software cache structure in this paper can effectively reduce the program running time, improve the software cache hit rate, and achieve a better optimization effect.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Change history

  • 07 December 2022

    The author’s affiliation is updated.

References

  1. Bai H, Hu C, He X et al (2016) Crystal MD: Molecular dynamic simulation software for metal with BCC structure. In: Chen W, Yin G, Zhao G et al (eds) Big data technology and applications. Springer, Singapore, pp 247–258

    Chapter  Google Scholar 

  2. Balart J, González M, Martorell X, et al (2007) A novel asynchronous software cache implementation for the cell-BE processor. In: Languages & Compilers for Parallel Computing balart

  3. Biswas NK, Banerjee S, Biswas U (InPress) Design and development of an energy efficient multimedia cloud data center with minimal SLA violation. Int J Interact Multimed Artif Intell, In Press(In Press), pp 1-11. https://doi.org/10.9781/ijimai.2021.04.004

  4. Chakraborty P, Panda PR (2012) Integrating software caches with scratch pad memory. In: Proceedings of the 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES '12). Association for Computing Machinery, New York, NY, USA, pp 201–210. https://doi.org/10.1145/2380403.2380440

  5. Chang S-H, Tsai M-L, Lee M-H, Ho J-M (InPress) Optimal QoE scheduling in MPEG-DASH video streaming. Int J Interact Multimed Artif Intell, In Press(In Press), pp 1-12. https://doi.org/10.9781/ijimai.2021.06.003

  6. Chen T, Zhang T, Sura ZN, Tallada MG (2008) Prefetching irregular references for software cache on cell. In: Sixth International Symposium on Code Generation and Optimization (CGO 2008), April 5-9, 2008, Boston, MA, USA

  7. Dong W, Kang L, Quan Z, et al (2016) Implementing molecular dynamics simulation on Sunway TaihuLight system. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). pp 443–450

  8. Chen DX, Liu X (2017) Parallel programming and optimization of Sunway Taihulight supercomputer (in Chinese). National Super-computing Wuxi Center, Wuxi

    Google Scholar 

  9. Chen DX, Liu X (2017) Parallel programming design and optimization of Sunway Taihulight, (in Chinese). National Super-computing Wuxi Center, Wuxi, China

    Google Scholar 

  10. Eichenberger AE, OBrien K, OBrien KM, et al (2006) Using advanced compiler technology to exploit the performance of the cell broadband engine;TM; architecture. IBM Systems Journal 45

  11. Fu H, Liao J, Yang J, et al (2016) The sunway TaihuLight supercomputer: system and applications. Science China Information Sciences 59

  12. Gonzàlez M, Vujic N, Martorell X, et al (2008) Hybrid access-specific software cache techniques for the cell BE architecture. Parallel architectures and compilation techniques

  13. http://sparse.tamu.edu/

  14. JackDongarra (2018) Top500[EB/OL]. https://www.top500.org/

  15. Kishor A, Chakraborty C, Jeberson W (InPress) A Novel Fog Computing Approach for Minimization of Latency in Healthcare using Machine Learning. Int J Interact Multimed Artif Intell, In Press(In Press), 1-11. https://doi.org/10.9781/ijimai.2020.12.004

  16. Li P, Chakrabarti DR, Ding C, Yuan L (2017) Adaptive software caching for efficient NVRAM data persistence. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). pp 112–122

  17. Mao C, Wang H, Chen L (2010) SciArray: A multidimensional array algebra for scientific computing(in Chinese). In: Institute of Computing Technology. Beijing, p 7

  18. Pinto C, Benini L (2014) A novel object-oriented software cache for scratchpad-based multi-core clusters. J Sig Process Syst (1–2):77–93. https://doi.org/10.1007/s11265-014-0881-4

  19. Cao Q, Hu CJ et al (2011) Adaptive cache line strategy for irregular references on cell architecture (in Chinese). Chinese J Comput 34:899–911

    Article  Google Scholar 

  20. García R, Verdú E, Regueras LM, de Castro JP, Verdú MJ (2013) A neural network based intelligent system for tile prefetching in web map services. Expert System Appl 40(10):4096–4105. https://doi.org/10.1016/j.eswa.2013.01.037 (ISSN 0957-4174)

    Article  Google Scholar 

  21. Seo S, Lee J, Sura Z (2009) Design and implementation of software-managed caches for multicores with local memory. In: 2009 IEEE 15th International Symposium on High Performance Computer Architecture. pp 55–66

  22. Vujic N, Gonzalez M, Martorell X, Ayguade E (2010) Automatic prefetch and modulo scheduling transformations for the cell BE architecture. IEEE Trans Parallel Distrib Syst 21:494–505. https://doi.org/10.1109/TPDS.2009.97

    Article  Google Scholar 

  23. Xiang Y, Wang X, Huang Z, Wang Z, Luo Y, Wang Z (2018) DCAPS. In: Proceedings of the Thirteenth EuroSys Conference. pp 1-15

  24. Yu Y, An H, Chen J, et al (2017) Pipelining Computation and Optimization Strategies for Scaling GROMACS on the Sunway Many-Core Processor. pp 18–32

  25. Zhang K (2018) The research and application of memory access optimization on heterogeneous multi-core platforms (in Chinese). Dissertation, University of Science and Technology Beijing

  26. Zhao X, Yi X,Deng R (2014) Study of hardware adptive prefetch technoligy based on application pragram memory access pattern (in Chinese). Dissertation, National University of Defense Technology

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (No. 2017YFB0202003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhaochu Deng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Deng, Z., Du, P. et al. A new software cache structure on Sunway TaihuLight. J Supercomput 78, 4779–4798 (2022). https://doi.org/10.1007/s11227-021-04056-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04056-0

Keywords

Navigation