New Methodologies for Parallel Architecture

Fan, Dong-Rui; Li, Xiao-Wei; Li, Guo-Jie

doi:10.1007/s11390-011-1158-z

New Methodologies for Parallel Architecture

Published: 11 July 2011

Volume 26, pages 578–587, (2011)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Dong-Rui Fan¹,
Xiao-Wei Li¹ &
Guo-Jie Li¹

179 Accesses
1 Citation
Explore all metrics

Abstract

Moore's law continues to grant computer architects ever more transistors in the foreseeable future, and parallelism is the key to continued performance scaling in modern microprocessors. In this paper, the achievements in our research project, which is supported by the National Basic Research 973 Program of China, on parallel architecture, are systematically presented. The innovative approaches and techniques to solve the significant problems in parallel architecture design are summarized, including architecture level optimization, compiler and language-supported technologies, reliability, power-performance efficient design, test and verification challenges, and platform building. Two prototype chips, a multi-heavy-core Godson-3 and a many-light-core Godson-T, are described to demonstrate the highly scalable and reconfigurable parallel architecture designs. We also present some of our achievements appearing in ISCA, MICRO, ISSCC, HPCA, PLDI, PACT, IJCAI, Hot Chips, DATE, IEEE Trans. VLSI, IEEE Micro, IEEE Trans. Computers, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Programming Support for Future Parallel Architectures

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Article 31 July 2020

Parallel Programming Models

References

Hu W, Wang J, Gao X, Chen Y, Liu Q, Li G. Godson-3: A scalable multi-core RISC processor with x86 emulation support. IEEE Micro, 2009, 29(2): 17–29.
Article Google Scholar
Fan D R, Yuan N, Zhang J C et al. Godson-T: An efficient many-core architecture for parallel program executions. Journal of Computer Science and Technology, 2009, 24(6): 1061–1073.
Article Google Scholar
Lv H, Cheng Y, Bai L, Chen M, Fan D, Sun N. P-GAS: Parallelizing a cycle-accurate event-driven many-core processor simulator using parallel discrete event simulation. In Proc. Workshop on Principle of Advanced and Distributed Simulation, Atlanta, USA, May 17–19, 2010, pp.1-8.
Tang D, Bao Y, Hu W, Chen M. DMA cache: Using on-chip storage to architecturally separate I/O data from CPU data for improving I/O performance. In Proc. Int. Conf. High-Performance Computer Architecture, Bangalore, India, Jan. 9–14, 2010, pp.1-12.
Long G, Franklin D, Biswas S, Ortiz P, Oberg J, Fan D, Chong F T. Minimal multi-threading: Finding and removing redundant instructions in multi-threaded processors. In Proc. IEEE/ACM Int. Symp. Microarchitecture, Atlanta, USA, Dec. 4–8, 2010, pp.337-348.
Chen Y, Hu W, Chen T, Wu R. LReplay: A pending period based deterministic replay scheme. In Proc. Int. Symp. Computer Architecture, Saint-Malo, France, Jun. 19–23, 2010, pp.187-197.
Su M, Chen Y, Gao X. A general method to make multi-clock system deterministic. In Proc. Conf. Design, Automation and Test in Europe, Dresden, Germany, Mar. 8–12, 2010, pp.1480-1485.
Guo Q, Chen T, Chen Y, Zhou Z H, Hu W, Xu Z. Effiective and efficient microprocessor design space exploration using unlabeled design configurations. In Proc. Int. Joint Conf. Artificial Intelligence, Spain, 2011. (To appear)
Xu D, Wu C, Yew P C. On mitigating memory bandwidth contention through bandwidth-aware scheduling. In Proc. Int. Conf. Parallel Architectures and Compilation Techniques, Vienna, Austria, Sept. 11–15, 2010, pp.237-247.
Chen L, Liu L, Tang S, Huang L, Jing Z, Xu S, Zhang D, Shou B. Unified parallel C for GPU clusters: Language extensions and compiler implementation. In Proc. the 23 rd International Workshop on Languages and Compilers for Parallel Computing, Huston, USA, Oct. 7–9, 2010, pp.151-165.
Wang L, Cui H, Duan Y, Lu F, Feng X, Yew P C. An adaptive task creation strategy for work-stealing scheduling. In Proc. Int. Conf. Code Generation and Optimization, Toronto, Canada, Apr. 24–28, 2010, pp.266-277.
Liu L, Chen L, Wu C Y, Feng X B. Global tiling for communication minimal parallelization on distributed memory systems. In Proc. Int. Euro-Par Conf. Parallel Processing, Klagenfurt, Austria, Aug. 26–29, 2008, pp.382-391.
Chen Y, Huang Y, Eeckhout L, Fursin G, Peng L, Temam O, Wu C. Evaluating iterative optimization across 1000 data sets. In Proc. Conf. Programming Language Design and Implementation, Toronto, Canada, Jun. 5–10, 2010, pp.448-459.
Yu T, Xue J, Huo W, Feng X, Zhang Z. Level by level: Making flow- and context-sensitive pointer analysis scalable for millions of lines of code. In Proc. Int. Conf. Code Generation and Optimization, Toronto, Canada, Apr. 24–28, 2010, pp.218-229.
Wang Z, Wu C. Yew P C. On improving heap memory layout by dynamic pool allocation. In Proc. Int. Conf. Code Generation and Optimization, Toronto, Canada, Apr. 24–28, 2010, pp.92-100.
Li J,Wu C, HsuWC. An evaluation of misaligned data access handling mechanisms in dynamic binary translation systems. In Proc. Int. Conf. Code Generation and Optimization, Seattle, USA, Mar. 22–25, 2009, pp.180-189.
Lv F, Wang L, Feng X, Li Z, Zhang Z. Exploiting idle register classes for fast spill destination. In Proc. Int. Conf. Super-computing, Island of Kos, Greece, Jun. 7–12, 2008, pp.319-326.
Zhang L, Han Y, Xu Q, Li X, Li H. On topology reconfiguration for defect-tolerant NoC-based homogeneous manycore systems. IEEE Trans. VLSI Systems, 2009, 17(9): 1173–1186.
Article Google Scholar
Yan G, Liang X, Han Y, Li X. Leveraging the core-level complementary effects of PVT variations to reduce timing emergencies in multi-core processors. In Proc. Int. Symp. Computer Architecture, Saint-Malo, France, Jun. 19–23, 2010, pp.485-496.
Pan S, Hu Y, Li X. IVF: Characterizing the vulnerability of microprocessor structures to intermittent faults. In Proc. Conf. Design, Automation and Test in Europe, Dresden, Germany, Mar. 8–12, 2010, pp.238-243.
Hu W, Wang R, Chen Y, Fan B, Zhong S, Gao X, Qi Z, Yang X. Godson-3B: A 1 GHz 40 W 8-Core 128 GFlops processor in 65 nm CMOS. In Proc. Int. Solid-State Circuits Conference, 2011. (To appear)
Zhang M, Li H, Li X. Path delay test generation toward activation of worst case coupling effects. IEEE Transactions on Very Large Scale Integration Systems, 2010, 18(12): 1–14.
Article Google Scholar
Han Y, Hu Y, Li X, Li H, Chandra A. Embedded test decompressor to reduce the required channels and vector memory of tester for complex processor circuit. IEEE Transactions on Very Large Scale Integration Systems, 2007, 5(15): 531–540.
Article Google Scholar
Wang D, Hu Y, Li H, Li X. The design-for-testability features and test implementation of a giga hertz general purpose microprocessor. Journal of Computer Science and Technology, 2008, 23(6): 1037–1046.
Article Google Scholar
Chen Y, Lv Y, Hu W, Chen T, Shen H, Wang P, Pan H. Fast complete memory consistency verification. In Proc. Int. Symp. High-Performance Computer Architecture, Raleigh, USA, Feb. 14–18, 2009, pp.381-392.
Hu W, Chen Y, Chen T, Qian C, Li L. Linear time memory consistency verification. IEEE Transactions on Computers, 2011. (Accepted)
Li L, Chen T, Chen Y, Li L, Qian C, Hu W. Brief announcement: Program regularization in verifying memory consistency. In Proc. Symp. Parallelism in Algorithms and Architectures, San Jose, USA, Jun. 4–6, 2011. (To appear)
Guo Q, Chen T, Shen H, Chen Y, Wu Y, Hu W. Empirical design bugs prediction for verification. In Proc. Conf. Design, Automation and Test in Europe, Grenoble, France, Mar. 14–18, 2011, pp.1-6.
Zhang T, Lv T, Li X. An abstraction-guided simulation approach using Markov models for microprocessor verification. In Proc. Conf. Design, Automation and Test in Europe, Dresden, Germany, Mar. 8–12, 2010, pp.484-489.
Hu W, Wang J, Gao X, Chen Y. Micro-architecture of Godson-3 multi-core processor. In Proc. Symp. High Performance Chips, Stanford University, USA, Aug. 24–26, 2008.
Gao X, Chen Y J, Wang H D et al. System architecture of Godson-3 multi-core processors. Journal of Computer Science and Technology, 2010, 25(2): 181–191.
Article Google Scholar
Hu W, Chen Y. GS464V: A high-performance low-power XPU with 512-bit vector extension. In Proc. Symp. High Performance Chips, Aug. 22–24, Stanford University, USA, 2010.
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Dong-Rui Fan (Member, CCF, IEEE), Xiao-Wei Li & Guo-Jie Li (Fellow, CCF)

Authors

Dong-Rui Fan
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Wei Li
View author publications
You can also search for this author in PubMed Google Scholar
Guo-Jie Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong-Rui Fan.

Additional information

This work is in part supported by the National Basic Research 973 Program of China under Grant Nos. 2011CB302500, 2005CB321600, and the National Natural Science Foundation of China under Grant No.60921002.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 93.3 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, DR., Li, XW. & Li, GJ. New Methodologies for Parallel Architecture. J. Comput. Sci. Technol. 26, 578–587 (2011). https://doi.org/10.1007/s11390-011-1158-z

Download citation

Received: 03 May 2011
Published: 11 July 2011
Issue Date: July 2011
DOI: https://doi.org/10.1007/s11390-011-1158-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New Methodologies for Parallel Architecture

Abstract

Access this article

Similar content being viewed by others

Programming Support for Future Parallel Architectures

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Parallel Programming Models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(PDF 93.3 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

New Methodologies for Parallel Architecture

Abstract

Access this article

Similar content being viewed by others

Programming Support for Future Parallel Architectures

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Parallel Programming Models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(PDF 93.3 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation