Skip to main content
Log in

Progress and Challenges in High Performance Computer Technology

  • Architechture
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

High performance computers provide strategic computing power in the construction of national economy and defense, and become one of symbols of the country’s overall strength. Over 30 years, with the supports of governments, the technology of high performance computers is in the process of rapid development, during which the computing performance increases nearly 3 million times and the processors number expands over 10 hundred thousands times. To solve the critical issues related with parallel efficiency and scalability, scientific researchers pursued extensive theoretical studies and technical innovations. The paper briefly looks back the course of building high performance computer systems both at home and abroad, and summarizes the significant breakthroughs of international high performance computer technology. We also overview the technology progress of China in the area of parallel computer architecture, parallel operating system and resource management, parallel compiler and performance optimization, environment for parallel programming and network computing. Finally, we examine the challenging issues, “memory wall”, system scalability and “power wall”, and discuss the issues of high productivity computers, which is the trend in building next generation high performance computers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Susan L Graham, Marc Snir, Cynthia A Patterson. Getting up to speed: The future of supercomputing. Committee on the Future of Supercomputing, National Research Council.

  2. PITAC report. Computational science: Ensuring America’s competitiveness. http://www.nitrd.gov/pitac/reports/20050609_computational/computational.pdf

  3. Cray History. http://www.cray.com/about_cray/history.html.

  4. CM-5 at UC Berkeley. http://www.eecs.berkeley.edu/Resea-rch/Projects/CS/parallel/cm5.html.

  5. The development road of Chinese supercomputer. http://www.dawning.com.cn/4000A/test_gx_1.htm. (in Chinese)

  6. http://www.nti.org/e_research/profiles/China/Chemical/in-dex.html.

  7. ASCI Red SiteMap. http://www.sandia.gov/ASCI/Red/Site-Map.htm.

  8. http://www.top500.org/.

  9. The earth simulator center.http://www.es.jamstec.go.jp/.

  10. BlueGene. http://www.research.ibm.com/bluegene/.

  11. John L. Gustafson. Reevaluating Amdahl’s law. Communications of the ACM, May 1988, 31(5): 532–533.

    Article  Google Scholar 

  12. David Culler, Richard Karp, David Patterson et al. LogP: Towards a realistic model of parallel computation. In Proc. the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New York, ACM Pres, 1993, pp. 1–12.

  13. Michael J Quinn. Parallel programming in C with MPI and OpenMP. USA: McGraw-Hill, May 2003.

    Google Scholar 

  14. Neil H E Weste, David Harris. CMOS VLSI design: A Circuits and Systems Perspective. 3rd Edition, USA: Addison-Wesley, May 2004.

    Google Scholar 

  15. Jose Duato, Sudhakar Yalamanchili Lionel Ni. Interconnection Networks: An Engineering Approach. 2nd Edition, Morgan Kaufmann Publishers, 2002.

  16. Rajkumar Buyya. High Performance Cluster Computing Architectures and Systems, Volume 1. Prentice Hall, May 1999.

  17. Scientific computing and visualization. http://scv.bu.edu/.

  18. Francine Berman, Geoffrey C Fox, Tony Hey. Grid Computing: Making the Global Infrastructure a Reality. John Wiley and Sons, May 2003.

  19. Xubang Shen, Xixin Cao. The selection of models for LS MPP. Chinese Journal of Computers, 1997, 20(5): 385–390. (in Chinese)

  20. Li li, Xubang Shen. The design of LS SIMD array microprocessor control logic. Chinese Journal of Computers, 2000, 23(5): 557–560. (in Chinese)

  21. Caoyang Chen, Zhong Wang, Xubang Shen et al. The LS MPP parallel image processor. Chinese Journal of Computers, 2002, 25(3): 292–296. (in Chinese)

  22. Ying Zhang, Wei Huang, Qunsheng Ma, Sanli Li. The design and implementation of hierarchical parallel system: MP860 supercomputer. Chinese Journal of Computers, 1998, 21(z1): 230–236. (in Chinese)

  23. Ling Qiao, Zhizhong Tang, Hongbo Rong, Chihong Zhang. The model of instruction level parallel program execution. Chinese Journal of Computers, 1999, 22(5): 476–480. (in Chinese)

  24. Gang Xiao, Xingming Zhou, Ming Xu, Kun Deng. SMA: A speculative multithreaded architecture. Chinese Journal of Computers, 1999, 22(6): 582–590. (in Chinese)

    Google Scholar 

  25. Yunquan Zhang. DRAM(h): A parallel computation model for high performance numerical computing. Chinese Journal of Computers, 2003, 26(12): 1660–1670. (in Chinese)

    Google Scholar 

  26. Weiwu Hu, Peisu Xia. Out-of-order execution in sequentially consistent shared memory systems: Principles. Chinese Journal of Computers, 1997, 20(6): 481–490. (in Chinese)

  27. Weiwu Hu, Peisu Xia. Out-of-order execution in sequentially consistent shared memory systems: Simulation results. Chinese Journal of Computers, 1997, 20(6): 491–500. (in Chinese)

  28. Xianghui Xie, Chengde Han, Zhimin Tang. Data pre sending technique in distributed shared memory systems. Chinese Journal of Computers, 1999, 22(3): 241–248. (in Chinese)

  29. Weiwu Hu, Weisong Shi, Zhimin Tang. A software DSM system based on a new cache coherence protocol. Chinese Journal of Computers, 1999, 22(5): 467–475. (in Chinese)

  30. Huadong Dai, Xuejun Yang. An operating system-centric memory consistency model — Thread consistency model. Journal of Computer Research and Development, 2003, 40(2): 351–359.

    Google Scholar 

  31. Yong Dou, Xingming Zhou. A software controlled data prefetching scheme based on weak order consistency model. Journal of Software, 1997, 8(2): 81–86.

    Google Scholar 

  32. Rong Zeng, Xiangjun Dong, Mingfa Zhu. Wormhole routing and its chip design. Chinese Journal of Computers, 1997, 20(5): 404–411. (in Chinese)

  33. Feng Gao, Zhongcheng Li, Yinghua Min, Jie Wu. A fault-tolerant routing strategy based on extended safety vectors in hypercube multicomputers. Chinese Journal of Computers, 2000, 23(3): 248–254. (in Chinese)

  34. Jianfeng Wu, Shanli Li, Yi Ge. Message memory network interface design in network parallel computing. Chinese Journal of Computers, 2000, 23(2): 195–201. (in Chinese)

  35. Jun Shen, Weimin Zheng, Dapeng Ju. FMP: A fast message passing for workstation clusters. Chinese Journal of Computers, 1998, 21(7): 595–602. (in Chinese)

  36. Zuo-ning Chen, Yi-lian Jin. A parallel operating system based on multi-virtual-space and multi-mapping technology. Journal of Software, 2001, 12(10): 1562–1568.

    Google Scholar 

  37. Ning-Hui Sun, Zhi-wei Xu. Design of system software of Dawning/2000 supercomputer. Chinese Journal of Computers, 2000, 23(1): 9–20. (in Chinese)

    Google Scholar 

  38. Dan Meng, Jian-feng Zhan, Lei Wang et al. Fully integrated cluster operating system: Phoenix. Journal of Computer Research and Development, 2005, 42(6): 979–986. (in Chinese)

  39. Hua-ping Chen, Liu-sheng Huang. Processor selection policy in heuristic task scheduling. Journal of Software, 1999, 10(11): 1194–1198. (in Chinese)

  40. Jin-gui Huang, Jian-er Chen, Song-qiao Chen. Parallel-job scheduling on cluster computing systems. Chinese Journal of Computers, 2004, 27(6): 765–771. (in Chinese)

  41. Qing-hua Li, Jian-jun Han, Abbas A Essa. A fast and effective static task scheduling algorithm in homogeneous computing environments. Journal of Computer Research and Development, 2005, 42(1): 118–125. (in Chinese)

  42. Qiang Fu, Wei-min Zheng. A dynamic task scheduling method in cluster of workstations. Journal of Software, 1999, 10(1): 19–23. (in Chinese)

  43. Hao Huang, Jian-cheng Du, Dao-xu Chen, Li Xie. Optimum degree of parallelism-based task dependence graph scheduling scheme. Journal of Software, 1999, 10(10): 1038–1046. (in Chinese)

  44. Zhou Lei, Zhi-wei Xu, Ming-fa Zhu. A new adaptive processor allocation algorithm for cluster: Limited load balancing allocation (LLBA). Chinese Journal of Computers, 1999, 22(8): 877–881. (in Chinese)

  45. Nong Xiao, Yu-tong Lu, Xi-cheng Lu. A dynamic load distributing algorithm based on a parallel computing network environment. Journal of Computer Research and Development, 1999, 36(2): 238–241. (in Chinese)

  46. Zhi-yan Jin, Ding-xing Wang. Diffusion algorithm of dynamic load balancing for heterogeneous system. Chinese Journal of Computers, 2003, 26(11): 1487–1493. (in Chinese)

  47. Yan-zhi Wen, Rui-qi Lian, Cheng-yong Wu et al. A micro-scheduling method on directed cyclic graph. Journal of Computer Research and Development, 2005, 42(3): 387–393. (in Chinese)

  48. Jin-Wei Hong, Guo-liang Chen, Zhao-qing Zhang, Feng Zhang. Compiling-support communication optimizations for SVMs. Chinese Journal of Computers, 2000, 23(7): 738–743. (in Chinese)

    Google Scholar 

  49. Rui-qi Lian, Zhao-qing Zhang, Ru-liang Qiao. A data prefetching method used in ILP compilers and its optimization. Chinese Journal of Computers, 2000, 23(6): 576–584. (in Chinese)

  50. Rui-qi Lian, Cheng-yong Wu, Zhao-qing Zhang. Integrating code optimization and instruction scheduling. Chinese Journal of Computers, 2001, 24(7): 694–701. (in Chinese)

  51. Yun-zhao Lu, Zhao-qing Zhang, Qui-qi Lian. Predicate analysis techniques in ILP. Chinese Journal of Computers, 2003, 26(10): 1337–1342. (in Chinese)

  52. Wenlong Li, Haibo Lin, Zhizhong Tang. Cost model and decision framework for software pipelining. Journal of Software, 2004, 15(7): 1005–1011. (in Chinese)

  53. Haibo Lin, Wenlong Li, Zhizhong Tang. Research on register requirements of software pipelined loops in the IA-64 architecture. Journal of Computer Research and Development, 2004, 41(1): 22–27. (in Chinese)

  54. Li Liu, Wenlong Li, Zhenyu Gu, Shengmei Li, Zhizhong Tang. Optimization to prevent cache penalty in modulo scheduling. Journal of Software, 2005, 16(10): 1842–1852. (in Chinese)

  55. Jun Xia, Xuejun Yang, Lifang Zeng, Haifang Zhou. A projection-delamination based approach to optimizing spatial locality in loop nests. Chinese Journal of Computers, 2003, 26(5): 539–551. (in Chinese)

  56. Jun Xia, Huadong Dai, Xuejun Yang. A linear expressing based approach for optimizing locality using non-singular loop transformations. Chinese Journal of Computers, 2003, 26(12): 1609–1620. (in Chinese)

  57. Jun Xia, Xuejun Yang. A data space fusion based approach for global computation and data decompositions. Journal of Software, 2004, 15(9): 1311–1327. (in Chinese)

  58. Guokai Ma, Xin Wang, Peng Wwang et al. Increase parallel granularity and data locality by unimodular metrics. Chinese Journal of Computers, 2004, 27(4): 516–523. (in Chinese)

  59. Lifang Zeng, Xuejun Yang, Jun Xia, Juan Chen. Improving data locality and reducing false-sharing based on data fusion. Chinese Journal of Computers, 2005, 27(1): 32–41. (in Chinese)

  60. Yijun Yu, Binyu Zang, Wu Shi, Chuanqi Zhu. Automatically computing unimodular transforming matrix to parallelize nested sequential loops. Journal of Software, 1999, 10(4): 366–371. (in Chinese)

  61. Jianping Wang, Xu Cheng, Wenkui Ding et al. The implementation strategy of communication in HPF compiler and related algorithms. Chinese Journal of Computers, 1999, 22(5): 486–496. (in Chinese)

  62. Li Chen, Zhaoqing Zhang, Xiaobing Feng. Redundant computation partitioning in distributed-memory systems. Chinese Journal of Computers, 2003, 26(2): 180–187. (in Chinese)

  63. Bo Yang, Dingxing Wang, Weimin Zheng. An algorithm on task scheduling in structural parallel control mechanism. Journal of Software, 2001, 12(5): 698–705. (in Chinese)

  64. Qiang Liu, Zhaoqing Zhang, Ruliang Qiao. An integrated tool for debugging, monitoring and performance analysis. Journal of Software, 1999, 10(2): 220–224. (in Chinese)

  65. Jian Liu, Hao Wang, Meiming Sheng, Weimin Zheng. A parallel debugger with fast conditional breakpoint. Journal of Software, 2003, 14(11): 1827–1833. (in Chinese)

  66. Chao Yan, Taoying Liu, Guoliang Chen. A parallel debugger based on cluster operating system. Journal of Computer Research and Development, 2004, 41(4): 630–636. (in Chinese)

  67. Zhiwei Xu, Wei Li. Research on Vega grid architecture. Journal of Computer Research and Development, 2002, 39(8): 923–929. (in Chinese)

  68. Xicheng Lu, Huaimin Wang, Ji Wang. Internet-based virtual computing environment (iVCE): Concepts and architecture. Science in China, Series E, 2006, 36(10). (To appear)

  69. Dongsheng Li, Xicheng Lu. A novel constant degree and constant congestion DHT scheme for peer-to-peer networks. Science in China, 2005, 48(4): 421–436.

    Article  MathSciNet  MATH  Google Scholar 

  70. Hai Zhuge. Semantic grid: Scientific issues, infrastructure, and methodology. Communications of the ACM, 2005, 48(4): 117–119.

    Article  MathSciNet  Google Scholar 

  71. HPCS program. http://www.highproductivity.org/.

  72. National energy research scientific computing center 2004 annual report. National Energy Research Scientific Computing Center, 2005, http://www.nersc.gov/news/annual_reports/an-nre-p04/annrep04.pdf.

  73. Wulf W, McKee S. Hitting the memory wall: Implications of the obvious. Computer Architecture News, 1995, 23(1): 20–24.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xue-Jun Yang.

Additional information

Survey: The paper is partly supported by the National Natural Science Foundation of China under Grant No. 69933030.

Xue-Jun Yang is a professor and doctoral supervisor at the School of Computer, National University of Defense Technology. His researches areas are high performance computing, parallel theory, architecture and applications.

Yong Dou was born in 1966, he is now a Ph.D. advisor, professor in School of Computer, National University of Defense Technology. His main research area focuses on high performance computer architecture, reconfigurable computing. As one of the main researchers, he took part in the project, “Research on Architecture of High-end High Performance Computers”, supported by NSFC in 2000.

Qing-Feng Hu was born in 1958. He is now a professor in National Laboratory for Parallel and Distributed Processing. His main research area focuses on high performance computer application, parallel algorithms, science and engineering computing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, XJ., Dou, Y. & Hu, QF. Progress and Challenges in High Performance Computer Technology. J Comput Sci Technol 21, 674–681 (2006). https://doi.org/10.1007/s11390-006-0674-8

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-006-0674-8

Keywords

Navigation