Abstract
Multicore chips are emerging as the mainstream solution for high performance computing. Generally, communication overheads cause large performance degradation in multi-core collaboration. Interconnects in large scale are needed to deal with these overheads. Amdahl’s and Gustafson’s law have been applied to multi-core chips but inter-core communication has not been taken into account. In this paper, we introduce interconnection into Amdahl’s and Gustafson’s law so that these laws work more precisely in the multi-core era. We further propose an area cost model and analyse our speedup models under area constraints. We find optimized parameters according to our speedup model. These parameters provide useful feedbacks to architects at an initial phase of their designs. We also present a case study to show the necessity of incorporating interconnection into Amdahl’s and Gustafson’s law.



Similar content being viewed by others
References
Amdahl GM (1967) Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the April 18–20, 1967, spring joint computer conference. ACM, New York, pp 483–485
Gustafson JL (1988) Reevaluating Amdahl’s law. Commun ACM 31(5):532–533
Borkar S (2007) Thousand core chips—a technology perspective. San Diego, CA
Furber S (2008) The future of computer technology and its implications for the computer industry. Comput J 51(6):735–740
Wentzlaff D, Griffin P, Hoffmann H, Bao L, Edwards B, Ramey C, Mattina M, Miao CC, Brown Iii JF, Agarwal A (2007) On-chip interconnection architecture of the tile processor. IEEE MICRO 27(5):15–31
George L (2009) More Cores Keep Power Down. “Computing now”. from. http://www.computer.org/portal/web/computingnow/archive/news041
Semiconductor Industry Association (2007). International Technology Roadmap for Semiconductors. From http://www.itrs.net/Links/2007ITRS/Home2007.htm
R CJ (2007) Intel’s teraflops chip uses mesh architecture to emulate mainframe. EETimes Product Brief
Hill MD, Marty MR (2008) Amdahl’s law in the multicore era. Computer 41(7):33–38
Woo DH, Lee HHS (2008) Extending Amdahl’s law for energy-efficient computing in the multi-core era. Computer 41(12):24–31
Sun X-H, Chen Y (2010) Reevaluating Amdahl’s law in the multicore era. J Parallel Distrib Comput 70(2):183–188
Sinnen O, Sousa LA (2005) Communication contention in task scheduling. IEEE Trans Parallel Distrib Syst 16(6):503–515
Benoit A, Hakem M, Robert Y (2009) Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems. Parallel Comput 35(2):83–108
Sinnen O, To A, Kaur M (2011) Contention-aware scheduling with task duplication. J Parallel Distrib Comput 71(1):77–86
Guo M, Nakata I, Yamashita Y (2000) Contention-free communication scheduling for array redistribution. Parallel Comput 26(10):1325–1343
Xiaoyong T, Kenli L, Degui X, Jing Y, Min L, Yunchuan Q (2006) A dynamic communication contention awareness list scheduling algorithm for arbitrary heterogeneous system. Montpellier
Zhiqiang Q (2010) Implementing medical CT algorithms on stand-alone FPGA based systems using an efficient workflow with SysGen and simulink. In: Yongxin Z, Xuan W, Jibo Y, Tian H, Zhe Z, Li Y, Feng Z, Yuzhuo F (eds) Computer and information technology (CIT), 2010 IEEE 10th international conference, pp 2391–2396
Larrabee, http://en.wikipedia.org/wiki/Larrabee_(microarchitecture)
Teraflops Research Chip, http://en.wikipedia.org/wiki/Teraflops_Research_Chip
Single-chip Cloud Computer, http://en.wikipedia.org/wiki/Single-chip_Cloud_Computer
Intel Many Integrated Core Architecture. http://en.wikipedia.org/wiki/Intel_MIC
Pollack’s Rule, http://en.wikipedia.org/wiki/Pollack’s_Rule
Morad TY, Weiser UC, Kolodny A, Valero M, Ayguad E (2006) Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors. IEEE Comput Archit Lett 5(1):14–17
Vee V-Y, Hsu W-J (1999) Applying cilk in provably efficient task scheduling. Comput J 42(8):699–712
Kwok Y-K, Ahmad I (1999) Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput Surv 31(4):406–471
Kwok Y-K, Ahmad I (1996) Dynamic critical-path scheduling: an effective technique for allocating task graphs to multiprocessors. IEEE Trans Parallel Distrib Syst 7(5):506–521
Cosnard M, Loi M (1995) Automatic task graph generation techniques. In: Proceedings of the twenty-eighth Hawaii international conference on system sciences, vol. II
Wu MY, Gajski DD (1990) Hypertool: a programming aid for message-passing systems. IEEE Trans Parallel Distrib Syst 1(3):330–343
Hwang K, Xu Z, Arakawa M (1996) Benchmark evaluation of the IBM SP2 for parallel signal processing. IEEE Trans Parallel Distrib Syst 7(5):522–536
Jereb B, Pipan L (1992) Measuring parallelism in algorithms. Microprocess Microprogram 34(1–5):49–52
Jain KK, Rajaraman V (1994) Parallelism measures of task graphs for multiprocessors. Microprocess Microprogram 40(4):249–259
Transistor count, From Wikipedia http://en.wikipedia.org/wiki/Transistor_count
Keck B, Hofmann HG, Scherl H, Kowarschik M, Hornegger J (2009) High resolution iterative CT reconstruction using graphics hardware. In: Nuclear science symposium conference record (NSS/MIC). IEEE, New York, pp 4035–4040
Acknowledgement
This paper is partially sponsored by the National High-Technology Research and Development Program of China (863 Program) (No.2009AA012201).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, T., Zhu, Y., Qiu, M. et al. Extending Amdahl’s law and Gustafson’s law by evaluating interconnections on multi-core processors. J Supercomput 66, 305–319 (2013). https://doi.org/10.1007/s11227-013-0908-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-0908-9