ABSTRACT
This paper presents a fundamental law for parallel performance: it shows that parallel performance is not only limited by sequential code (as suggested by Amdahl's law) but is also fundamentally limited by synchronization through critical sections. Extending Amdahl's software model to include critical sections, we derive the surprising result that the impact of critical sections on parallel performance can be modeled as a completely sequential part and a completely parallel part. The sequential part is determined by the probability for entering a critical section and the contention probability (i.e., multiple threads wanting to enter the same critical section). This fundamental result reveals at least three important insights for multicore design. (i) Asymmetric multicore processors deliver less performance benefits relative to symmetric processors than suggested by Amdahl's law, and in some cases even worse performance. (ii) Amdahl's law suggests many tiny cores for optimum performance in asymmetric processors, however, we find that fewer but larger small cores can yield substantially better performance. (iii) Executing critical sections on the big core can yield substantial speedups, however, performance is sensitive to the accuracy of the critical section contention predictor.
- G. M. Amdahl. Validity of the single-processor approach to achieving large-scale computing capabilities. In Proceedings of the American Federation of Information Processing Societies Conference (AFIPS), pages 483--485, 1967. Google ScholarDigital Library
- M. Annavaram, E. Grochowski, and J. Shen. Mitigating Amdahl's law through EPI throttling. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 298--309, June 2005. Google ScholarDigital Library
- K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec. 2006.Google Scholar
- S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The impact of performance asymmetry in emerging multicore architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 506--517, June 2005. Google ScholarDigital Library
- S. Borkar. Thousand core chips -- a technology perspective. In Proceedings of the Design Automation Conference (DAC), pages 746--749, June 2007. Google ScholarDigital Library
- M. Gschwind, H. P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. Synergistic processing in Cell's multicore architecture. IEEE Micro, 26(2):10-24, March/April 2006. Google ScholarDigital Library
- J. L. Gustafson. Reevaluating Amdahl's law. Communications of the ACM, 31(5):532--533, May 1988. Google ScholarDigital Library
- L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. D. an B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 102--113, June 2004. Google ScholarDigital Library
- M. Herlihy and J. Moss. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 289--300, June 1993. Google ScholarDigital Library
- M. D. Hill and M. R. Marty. Amdahl's law in the multicore era. IEEE Computer, 41(7):33--38, July 2008. Google ScholarDigital Library
- E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. Core fusion: Accommodating software diversity in chip multiprocessors. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 186--197, June 2007. Google ScholarDigital Library
- W. Kim, M. S. Gupta, G.-Y. Wei, and D. Brooks. System level analysis of fast, per-core DVFS using on-chip switching regulators. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), pages 123--134, Feb. 2008.Google Scholar
- R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of the ACM/IEEE Annual International Symposium on Microarchitecture (MICRO), pages 81--92, Dec. 2003. Google ScholarDigital Library
- J. F. Martinez and J. Torrellas. Speculative synchronization: Applying thread-level speculation to explicitly parallel applications. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 18--29, Oct. 2002. Google ScholarDigital Library
- D. Menasce and V. Almeida. Cost-performance analysis of heterogeneity in supercomputer architectures. In Proceedings of the International Conference on Supercomputing (ICS), pages 169--177, Nov. 1990. Google ScholarDigital Library
- C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), pages 35--46, Sept. 2008.Google Scholar
- K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood. Log™: Log-based transactional memory. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pages 254--265, Feb. 2006.Google ScholarCross Ref
- T. Y. Morad, U. C. Weiser, A. Kolodny, M. Valero, and A. Ayguade. Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors. IEEE Computer Architecture Letters, 5(1):14--17, Jan. 2006. Google ScholarDigital Library
- J. M. Paul and B. H. Meyer. Amdahl's law revisited for single chip systems. International Jounal of Parallel Programming, 35(2):101--123, Apr. 2007. Google ScholarDigital Library
- R. Rajwar and J. R. Goodman. Speculative lock elision: Enabling highly concurrent multithreaded execution. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 294--305, Dec. 2001. Google ScholarDigital Library
- R. Rajwar and J. R. Goodman. Transactional lock-free execution of lock-based programs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 5--17, Oct. 2002. Google ScholarDigital Library
- M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 253--264, Mar. 2009. Google ScholarDigital Library
- M. A. Suleman, M. K. Qureshi, and Y. N. Patt. Feedback-driven threading: Power-efficient and high-performance execution of multi-threaded workloads on CMPs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 277--286, Mar. 2008. Google ScholarDigital Library
- D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA), pages 191--202, May 1996. Google ScholarDigital Library
Index Terms
- Modeling critical sections in Amdahl's law and its implications for multicore design
Recommendations
Modeling critical sections in Amdahl's law and its implications for multicore design
ISCA '10This paper presents a fundamental law for parallel performance: it shows that parallel performance is not only limited by sequential code (as suggested by Amdahl's law) but is also fundamentally limited by synchronization through critical sections. ...
Extending Amdahl's law and Gustafson's law by evaluating interconnections on multi-core processors
Multicore chips are emerging as the mainstream solution for high performance computing. Generally, communication overheads cause large performance degradation in multi-core collaboration. Interconnects in large scale are needed to deal with these ...
The effect of communication and synchronization on Amdahl's law in multicore systems
This work analyses the effects of sequential-to-parallel synchronization and inter-core communication on multicore performance, speedup and scaling from Amdahl's law perspective. Analytical modeling supported by simulation leads to a modification of ...
Comments