Abstract
In recent years, China has witnessed considerable achievements in the production of domesticallydesigned CPUs and DSPs. Owing to fifteen years of hard work that began in 2001, significant progress has been made in Chinese domestic CPUs and DSPs, primarily represented by Loongson and ShenWei processors. Furthermore parts of the CPU design techniques are comparable to the world’s most advanced designs. A special issue published in Scientia Sinica Informationis in April 2015, is dedicated to exhibiting the technical advancements in Chinese domestically-designed CPUs and DSPs. The content in this issue describes the design and optimization of high performance processors and the key technologies in processor development; these include high-performance micro-architecture design, many-core and multi-core design, radiation hardening design, highperformance physical design, complex chip verification, and binary translation technology. We hope that the articles we collected will promote understanding of CPU/DSP progress in China. Moreover, we believe that the future of Chinese domestic CPU/DSP processors is quite promising.
创新点
文章总结了设计和优化国产高性能处理器的关键技术,其中包括高性能微构架设计,多核设计,抗辐射加固设计,高性能物理设计,复杂芯片验证方法,二进制翻译技术。展示了中国自主研发处理器和数字信号处理芯片的技术先进性。
Similar content being viewed by others
References
Hu W W, Wang J, Gao X, et al. Godson-3: a scalable multicore risc processor with x86 emulation. IEEE Micro, 2009, 29: 17–29
Huang Y Q, Zhu Y, Ju P J, et al. Functional verification of “Shenwei-1” high performance microprocessor. J Softw, 2009, 20: 1077–1086
Yang X J, Yan X B, Xing Z C, et al. A 64-bit stream processor architecture for scientific applications. In: Proceedings of the 34th Annual International Symposium on Computer Architecture, San Diego, 2007. 210–219
Hu W W, Xiao L M, An H. Editor’s note (in Chinese). Sci Sin Inform, 2015, 45: 457–458
Hu W W, Jin G J, Wang W X, et al. LoongISA for compatibility with mainstream instruction set architecture (in Chinese). Sci Sin Inform, 2015, 45: 459–479
Wu R Y, Wang W X, Wang H D, et al. Design of Loongson GS464E processor architecture (in Chinese). Sci Sin Inform, 2015, 45: 480–500
Yang X, Fan Y C, Fan B X. Loongson X-CPU radiation hardening by design (in Chinese). Sci Sin Inform, 2015, 45: 501–512
Hu X D, Yang J X, Zhu Y. Shenwei-1600: a high-performance multi-core microprocessor (in Chinese). Sci Sin Inform, 2015, 45: 513–522
Zheng F, Xu Y, Li H L, et al. A homegrown many-core processor architecture for high-performance computing (in Chinese). Sci Sin Inform, 2015, 45: 523–534
Hu X D, Ju P J, Zhu Y, et al. Hierarchical and reusable simulation environment for high-performance processor verification (in Chinese). Sci Sin Inform, 2015, 45: 535–547
Wang X, Ke X M. Design of a hierarchical clock distribution network with low clock skew and tolerance for process variations (in Chinese). Sci Sin Inform, 2015, 45: 548–559
Chen S M, Liu S, Wan J H, et al. Coordinate multi-core DSP YHFT-QMBase: architecture and implementation (in Chinese). Sci Sin Inform, 2015, 45: 560–573
Hong Y, Fang T L, Zhao B, et al. BWDSP100 and its applications (in Chinese). Sci Sin Inform, 2015, 45: 574–586
Hu W W, Wang R, Chen Y J, et al. Godson-3B: a 1 GHz 40 W 8-core 128GFLOPS processor in 65 nm CMOS. In: Proceedings of IEEE International Solid-State Circuits Conference Digest of Technical Papers. San Francisco: IEEE, 2011. 76–78
Hu W W, Zhang Y F, Yang L, et al. Godson-3B1500: a 32 nm 1.35 GHz 40W 172.8 GFLOPS 8-core processor. In: Proceedings of IEEE International Solid-State Circuits Conference Digest of Technical Papers. San Francisco: IEEE, 2013. 54–55
Hu W W, Yang L, Fan B X, et al. An 8-core MIPS-compatible processor in 32/28 nm bulk CMOS. IEEE J Solid-State Circ, 2014, 49: 41–49
Hu W W, Zhang F X, Li Z S. Microarchitecture of the Godson-2 processor. J Comput Sci Technol, 2005, 20: 243–249
Lacoe R C. Improving integrated circuit performance through the application of hardness-by-design methodology. IEEE Trans Nucl Sci, 2008, 55: 1903–1925
Mitra S, Seifert N, Zhang M, et al. Robust system design with built-in soft-error resilience. Computer, 2005, 38: 43–52
Jung H, Ju M, Che H A. A theoretical framework for design space exploration of manycore processors. In: Proceedings of the 19th Annual IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, Singapore, 2011. 117–125
Duran A, Klemm M. The intel many integrated core architecture. In: Proceedings of International Conference on High Performance Computing and Simulation (HPCS), Madrid, 2012. 365–366
Seiler L, Carmean D, Sprangle E, et al. Larrabee: a many-core x86 architecture for visual computing. IEEE Micro, 2009, 29: 10–21
Lee Y, Avizienis R, Bishara A, et al. Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, San Jose, 2011. 129–140
Woh M, Seo S, Mahlke S, et al. AnySP: anytime anywhere anyway signal processing. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, Austin, 2009. 128–139
Rowen C, Nicolaescu D, Ravindran R. The world’s fastest DSP core: breaking the 100 GMAC/s barrier. In: Proceedings of the 23rd Hot Chips Conference, Memorial Auditorium. Palo Alto: Standford University Press, 2011. 21–23
Zhao X W. DSP Application and Development Base on TMS320C6200 Series. Beijing: Publishing House of Posts & Telecom Press, 2002. 14–17
Liu S M, Luo Y J. DSP Principle and Application Design of ADSP TS20XS Series. Beijing: Publishing House of Electronics Industry, 2007
Texas Instruments Incorporated. Multicore Fixed and Floating-Point Digital Signal Processor. TMS320C6678, 2010. 14–15
Ottoni G, Hartin T, Weaver C, et al. Harmonia: a transparent, efficient, and harmonious dynamic binary translator targeting the Intel architecture. In: Proceedings of the 8th ACM International Conference on Computing Frontiers, New York, 2011, 26: 1–10
Chang X, Franke H, Ge Y, et al. Improving virtualization in the presence of software managed translation lookaside buffers. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, New York, 2013. 120–129
Hu W W, Liu Q, Wang J, et al. Efficient binary translation system with low hardware cost. In: Proceedings of IEEE International Conference on Computer Design, Lake Tahoe, 2009. 305–312
Bryant R E. A methodology for hardware verification based on logic simulation. J ACM, 1991, 38: 299–328
Zhu Y, Chen C, Li Y Z, et al. Design and implementation of FPGA verification platform for multi-core processor. J Comput Res Dev, 2014, 51: 1295–1303
Schubert K D, Roesner W, Ludden J M, et al. Functional verification of the IBM POWER7 microprocessor and POWER7 multiprocessor systems. IBM J Res Dev, 2011, 55: 1–10
Sagahyroon A, Lakkaraj G, Karunaratne M. Verification components reuse. J Comput, 2012, 7: 2641–2649
Cyclos Semiconductor. Addressing the Power-Performance IC Design Conundrum-A Novel Clock Design Technique to Reduce Power and Increase Performance, 2012
Chattopadhyay A, Zilic Z. Flexible and reconfigurable mismatch-tolerant serial clock distribution networks. IEEE Trans VL SI Syst, 2012, 20: 523–536
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hu, W., Zhang, Y. & Fu, J. An introduction to CPU and DSP design in China. Sci. China Inf. Sci. 59, 1–8 (2016). https://doi.org/10.1007/s11432-015-5431-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-015-5431-6