Abstract
The Sunway TaihuLight is the first supercomputer built entirely with domestic processors in China. On Sunway Taihulight, the local data memory (LDM) of the slave core is limited, so data transmission with the main memory is frequent during calculation, and the memory access efficiency is low. On the other hand, for many scientific computing programs, how to solve the storage problem of irregular access data is the key to program optimization. Software cache (SWC) is one of the effective means to solve these problems. Based on the characteristics of Sunway TaihuLight structure and irregular access, this paper designs and implements a new software cache structure by using part of the space in LDM to simulate the cache function, which uses new cache address mapping and conflicts solution to solve high data access overhead and storage overhead in a traditional cache. At the same time, the SWC uses the register communication between the slave cores to share on the different slave core LDMs, increasing the capacity of the software cache and improving the hit rate. In addition, we adopt a double buffer strategy to access regular data in batches, which hides the communication overhead between the slave core and the main memory. The test results on the Sunway TaihuLight platform show that the software cache structure in this paper can effectively reduce the program running time, improve the software cache hit rate, and achieve a better optimization effect.
Similar content being viewed by others
Change history
07 December 2022
The author’s affiliation is updated.
References
Bai H, Hu C, He X et al (2016) Crystal MD: Molecular dynamic simulation software for metal with BCC structure. In: Chen W, Yin G, Zhao G et al (eds) Big data technology and applications. Springer, Singapore, pp 247–258
Balart J, González M, Martorell X, et al (2007) A novel asynchronous software cache implementation for the cell-BE processor. In: Languages & Compilers for Parallel Computing balart
Biswas NK, Banerjee S, Biswas U (InPress) Design and development of an energy efficient multimedia cloud data center with minimal SLA violation. Int J Interact Multimed Artif Intell, In Press(In Press), pp 1-11. https://doi.org/10.9781/ijimai.2021.04.004
Chakraborty P, Panda PR (2012) Integrating software caches with scratch pad memory. In: Proceedings of the 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES '12). Association for Computing Machinery, New York, NY, USA, pp 201–210. https://doi.org/10.1145/2380403.2380440
Chang S-H, Tsai M-L, Lee M-H, Ho J-M (InPress) Optimal QoE scheduling in MPEG-DASH video streaming. Int J Interact Multimed Artif Intell, In Press(In Press), pp 1-12. https://doi.org/10.9781/ijimai.2021.06.003
Chen T, Zhang T, Sura ZN, Tallada MG (2008) Prefetching irregular references for software cache on cell. In: Sixth International Symposium on Code Generation and Optimization (CGO 2008), April 5-9, 2008, Boston, MA, USA
Dong W, Kang L, Quan Z, et al (2016) Implementing molecular dynamics simulation on Sunway TaihuLight system. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). pp 443–450
Chen DX, Liu X (2017) Parallel programming and optimization of Sunway Taihulight supercomputer (in Chinese). National Super-computing Wuxi Center, Wuxi
Chen DX, Liu X (2017) Parallel programming design and optimization of Sunway Taihulight, (in Chinese). National Super-computing Wuxi Center, Wuxi, China
Eichenberger AE, OBrien K, OBrien KM, et al (2006) Using advanced compiler technology to exploit the performance of the cell broadband engine;TM; architecture. IBM Systems Journal 45
Fu H, Liao J, Yang J, et al (2016) The sunway TaihuLight supercomputer: system and applications. Science China Information Sciences 59
Gonzàlez M, Vujic N, Martorell X, et al (2008) Hybrid access-specific software cache techniques for the cell BE architecture. Parallel architectures and compilation techniques
JackDongarra (2018) Top500[EB/OL]. https://www.top500.org/
Kishor A, Chakraborty C, Jeberson W (InPress) A Novel Fog Computing Approach for Minimization of Latency in Healthcare using Machine Learning. Int J Interact Multimed Artif Intell, In Press(In Press), 1-11. https://doi.org/10.9781/ijimai.2020.12.004
Li P, Chakrabarti DR, Ding C, Yuan L (2017) Adaptive software caching for efficient NVRAM data persistence. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). pp 112–122
Mao C, Wang H, Chen L (2010) SciArray: A multidimensional array algebra for scientific computing(in Chinese). In: Institute of Computing Technology. Beijing, p 7
Pinto C, Benini L (2014) A novel object-oriented software cache for scratchpad-based multi-core clusters. J Sig Process Syst (1–2):77–93. https://doi.org/10.1007/s11265-014-0881-4
Cao Q, Hu CJ et al (2011) Adaptive cache line strategy for irregular references on cell architecture (in Chinese). Chinese J Comput 34:899–911
García R, Verdú E, Regueras LM, de Castro JP, Verdú MJ (2013) A neural network based intelligent system for tile prefetching in web map services. Expert System Appl 40(10):4096–4105. https://doi.org/10.1016/j.eswa.2013.01.037 (ISSN 0957-4174)
Seo S, Lee J, Sura Z (2009) Design and implementation of software-managed caches for multicores with local memory. In: 2009 IEEE 15th International Symposium on High Performance Computer Architecture. pp 55–66
Vujic N, Gonzalez M, Martorell X, Ayguade E (2010) Automatic prefetch and modulo scheduling transformations for the cell BE architecture. IEEE Trans Parallel Distrib Syst 21:494–505. https://doi.org/10.1109/TPDS.2009.97
Xiang Y, Wang X, Huang Z, Wang Z, Luo Y, Wang Z (2018) DCAPS. In: Proceedings of the Thirteenth EuroSys Conference. pp 1-15
Yu Y, An H, Chen J, et al (2017) Pipelining Computation and Optimization Strategies for Scaling GROMACS on the Sunway Many-Core Processor. pp 18–32
Zhang K (2018) The research and application of memory access optimization on heterogeneous multi-core platforms (in Chinese). Dissertation, University of Science and Technology Beijing
Zhao X, Yi X,Deng R (2014) Study of hardware adptive prefetch technoligy based on application pragram memory access pattern (in Chinese). Dissertation, National University of Defense Technology
Acknowledgements
This work was supported by the National Key R&D Program of China (No. 2017YFB0202003).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, J., Deng, Z., Du, P. et al. A new software cache structure on Sunway TaihuLight. J Supercomput 78, 4779–4798 (2022). https://doi.org/10.1007/s11227-021-04056-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-04056-0