Skip to main content
Log in

A comparative simulation study on the power–performance of multi-core architecture

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Nowadays, multi-core processor is the main technology used in desktop PCs, laptop computers and mobile hardware platforms. As the number of cores on a chip keeps increasing, it adds up the complexity and impacts more on both power and performance of a processor. In multi-processors, the number of cores and various parameters, such as issue-width, number of instructions and execution time, are key design factors to balance the amount of thread-level parallelism and instruction-level parallelism. In this paper, we perform a comprehensive simulation study that aims to find the optimum number of processor cores in desktop/laptop computing processor models with shallow pipeline depth. This paper also explores the trade-off between the number of cores and different parameters used in multi-processors in terms of power–performance gains and analyzes the impact of 3D stacking on the design of simultaneous multi-threading and chip multiprocessing. Our analysis shows that the optimum number of cores varies with different classes of workloads, namely: SPEC2000, SPEC2006 and MiBench. Simulation study is presented using architectures with shorter pipeline depth, showing that (1) the optimum number of cores for power–performance is 8, (2) the optimum number of threads in the range [2, 4], and (3) for beyond 32 cores, multi-core processors are no longer efficient in terms of performance benefits and overall power consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Gonzalez R, Horowitz M (1996) Energy dissipation in general purpose microprocessors. IEEE J Solid-State Circ 31:1277–1284

  2. Nordstrm T (2010) Embedded parallel computing. http://www.hh.se/download/18.2515361d1351369447180003923/1328015855390/EPC_Lecture+5_v1.pdf

  3. Sharma V (2010) Ethernet-based massively multi-core architecture. http://www.psimast.com/ATCA_Summit_Presentation.pdf

  4. Yalamanchi S (2010) Multicore computing-evolution. http://users.ece.gatech.edu/sudha/academic/class/ece4100-6100/Lectures/Module13-MultiCore/multicore-introduction.pdf

  5. Kozyrakis C, Kansal A, Sankar S, Vaid K (2010) Server engineering insights for large-scale online services. IEEE Micro 30(4):1–12

  6. Eggers SJ, Emer JS, Levy HM, Lo JL, Stamm RL, Tullsen DM (1997) Simultaneous multithreading: a platform for next-generation processors. IEEE Micro 17:12–19

    Article  Google Scholar 

  7. Li Y, Brooks D, Hu Z, Skadron K (2005) Performance, energy, and thermal considerations for smt and cmp architectures. In: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, HPCA ’05, IEEE Computer Society, Washington, DC, USA, 2005, pp 71–82. doi:10.1109/HPCA.2005.25

  8. Barroso LA, Gharachorloo K, Mcnamara R, Nowatzyk A, Qadeer S, Sano B, Smith S, Stets R, Verghese B (2000) Piranha: a scalable architecture based on single-chip multiprocessing. SIGARCH Comput Archit News 1:282–293

  9. Bahar RI (2001) Power and energy reduction via pipeline balancing. In: International Symposium on computer architecture, pp 218–229

  10. Folegnani D, González A (2001) Energy-effective issue logic. SIGARCH Comput Archit News 29(2):230–239. doi:10.1145/384285.379266

  11. Grant RE, Afsahi A (2006) Power-performance efficiency of asymmetric multiprocessors for multi-threaded scientific applications. In: Proceedings of the 20th international conference on parallel and distributed processing, IPDPS’06, IEEE Computer Society, Washington, DC, USA, 2006, pp 300–300. http://dl.acm.org/citation.cfm?id=1898699.1898829

  12. Hsu C-H, Kremer U (2003) The design, implementation, and evaluation of a compiler algorithm for cpu energy reduction. SIGPLAN Not 38(5):38–48. doi:10.1145/780822.781137

  13. Bianchini R, Rajamony R (2004) Power and energy management for server systems. Computer 37(11):68–74. doi:10.1109/MC.2004.217

  14. Kunkel SR, Smith JE (1986) Optimal pipelining in supercomputers. SIGARCH Comput Archit News 14:404–411. doi:10.1145/17356.17403

  15. Agarwal V, Hrishikesh MS, Keckler SW, Burger D (2000) Clock rate versus IPC: the end of the road for conventional microarchitectures. In: Proceedings of the 27th annual international symposium on Computer architecture, ISCA ’00, ACM, New York, NY, USA, 2000, pp 248–259. doi:10.1145/339647.339691

  16. Hrishikesh MS, Burger D, Jouppi NP, Keckler SW, Farkas KI, Shivakumar P (2002) The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In: Proceedings of the 29th annual international symposium on computer architecture, ISCA ’02, IEEE Computer Society, 2002, pp 14–24. http://dl.acm.org/citation.cfm?id=545215.545218

  17. Hartstein A, Puzak TR (2003) Optimum power/performance pipeline depth. In: Proceedings of the 36th Annual International Symposium on Microarchitecture, pp 1–9

  18. Srinivasan V, Brooks D, Gschwind M, Bose P, Zyuban V, Strenski PN, Emma PG (2002) Optimizing pipelines for power and performance. In: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, MICRO 35, 2002, pp 333–344. http://dl.acm.org/citation.cfm?id=774861.774897

  19. Brooks D, Tiwari V, Martonosi M (2000) Wattch: a framework for architectural-level power analysis and optimizations. In: Proceedings of the 27th Annual International Symposium on computer architecture

  20. Sprangle E, Carmean D (2002) Increasing processor performance by implementing deeper pipelines. In: Proceedings of the 29th International Symposium on computer architecture (ISCA-29)

  21. Zyuban V, Strenski P (2002) Unified methodology for resolving power–performance tradeoffs at the microarchitectural and circuit levels. In: Proceedings of the 2002 international symposium on low power electronics and design, ISLPED ’02, ACM, 2002, pp 166–171. doi:10.1145/566408.566451

  22. Brooks D, Bose P, Srinivasan V, Gschwind MK, Emma PG, Rosenfield MG (2003) New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors. IBM J Res Dev 47:653–670. doi:10.1147/rd.475.0653

  23. Hsieh M-Y, Rodrigues A, Riesen R, Thompson K, Song W (2011) A framework for architecture-level power, area, and thermal simulation and its application to network-on-chip design exploration. SIGMETRICS Perform Eval Rev 38(4):63–68. doi:10.1145/1964218.1964229

  24. Chishti Z, Vijaykumar TN (2008) Optimal power/performance pipeline depth for SMT in scaled technologies. IEEE Trans Comput 57:69–81. doi:10.1109/TC.2007.70771. http://portal.acm.org/citation.cfm?id=1340077.1340119

  25. Meng K, Joseph R, Dick RP, Shang L (2008) Multi-optimization power management for chip multiprocessors. In: Proceedings of the 17th International Conference on parallel architectures and compilation techniques, PACT ’08, 2008, pp 177–186. doi:10.1145/1454115.1454141

  26. Pouwelse J, Langendoen K, Sips H (2001) Dynamic voltage scaling on a low-power microprocessor. In: Proceedings of the 7th annual international conference on mobile computing and networking, MobiCom ’01, ACM, 2001, pp 251–259. doi:10.1145/381677.381701

  27. Venkatachalam V, Franz M (2005) Power reduction techniques for microprocessor systems. ACM Comput Surv 37(3):195–237. doi:10.1145/1108956.1108957

  28. Moudgill M, Wellman J-D, Moreno JH (1999) Environment for PowerPC microarchitecture exploration. IEEE Micro 19:15–25. doi:10.1109/40.768496. http://dl.acm.org/citation.cfm?id=623287.624269

  29. Li Y, Lee B, Brooks D, Hu Z, Skadron K (2006) CMP design space exploration subject to physical constraints. In: Proceedings of the 12th International Symposium on high performance computer architecture

  30. Hyari A (2009) A comparative study on heterogeneous and homogeneous multiprocessors. Ph.D. thesis. http://www.abandah.com/gheith/Courses/CPE731_F09/Research_Projects/5_Report.pdf

  31. Ghiasi S (2000) A comparison of two architectural power models. In: Workshop on power-aware computer systems, pp 137–152

  32. Pisharath J, Jiang N, Choudhary A (2003) Evaluation of application-aware heterogeneous embedded systems for performance and energy consumption. In: Proceedings of the The 9th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS ’03, IEEE Computer Society, Washington, DC, USA, 2003, pp 124. http://dl.acm.org/citation.cfm?id=827266.828521

  33. Lorch JR, Smith AJ (2001) Pace: a new approach to dynamic voltage scaling. Technical report, Berkeley

  34. Donald J, Martonosi M (2006) An efficient practical parallelization methodology for multicore architecture simulation. IEEE Comput Archit Lett 5:14. doi:10.1109/L-CA.2006.14

  35. Joseph KG, Sharkey J, Ponomarev D (2005) Abstract M-SIM: a flexible, multithreaded architectural simulation environment. Technical Report

  36. Sharkey JJ, Ponomarev D, Ghose K (2013) Abstract M-SIM: a flexible, multithreaded architectural simulation environment

  37. Sankaralingam K, Nagarajan R, Keckler SW, Burger D (2001) SimpleScalar simulation of the PowerPC instruction set architecture. Technical Report TR2000-04, Department of Computer Sciences, The University of Texas at Austin, Austin

  38. Austin T, Larson E, Ernst D (2002) SimpleScalar: an infrastructure for computer system modeling. Computer 35:59–67. doi:10.1109/2.982917

  39. Austin T (1997) A users and hackers guide to the SimpleScalar architectural research tool set. http://www.cs.virginia.edu/skadron/cs654/slides/hack_guide.pdf

  40. Whitham J (2013) Simplescalar/ARM VirtualBox appliance. http://www.jwhitham.org/simplescalar

  41. Manjikian N (2001) Multiprocessor enhancements of the simplescalar tool set. SIGARCH Comput Archit News 29(1):8–15. doi:10.1145/373574.373578

  42. Conte TM, Menezes KNP, Sathaye SW (1995) A technique to determine power-efficient, high-performance superscalar processors. In: Proceedings of the Twenty-Eighth Hawaii International Conference on system sciences. IEEE Computer Society Press, pp 324–333

  43. Annavaram M, Grochowski E, Shen J (2005) Mitigating amdahl’s law through epi throttling. In: Proceedings of the 32nd annual international symposium on computer architecture, ISCA ’05, IEEE Computer Society, Washington, DC, USA, 2005, pp 298–309. doi:10.1109/ISCA.2005.36

  44. Guthaus MR, Ringenberg JS, Ernst D, Austin TM, Mudge T, Brown RB (2001) Mibench: a free, commercially representative embedded benchmark suite. In: Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, WWC ’01, IEEE Computer Society, Washington, DC, USA, 2001, pp 3–14. doi:10.1109/WWC.2001.15

  45. Koppanalil J, Ramrakhyani P, Desai S, Vaidyanathan A, Rotenberg E (2002) A case for dynamic pipeline scaling. In: Proceedins of the 5th International Conference on compilers, architecture, and synthesis for embedded aystems (CASES’02), pp 1–8

  46. Brooks DM, Bose P, Schuster SE, Jacobson H, Kudva PN, Buyuktosunoglu A, Wellman J-D, Zyuban V, Gupta M, Cook PW (2000) Power-aware microarchitecture: design and modeling challenges for next-generation microprocessors. IEEE Micro 20:26–44. doi:10.1109/40.888701

  47. Vijayalakshmi S, Anpalagan A, Kothari D, Woungang I, Obaidat M (2014) An analytical study of resource division and its impact on power and performance of multi-core processors. J Supercomput 68(3):1–15. doi:10.1007/s11227-014-1086-0

  48. The itrs technology working groups, international technology roadmap for semiconductors (itrs) (2013). http://www.public.itrs.net

Download references

Author information

Authors and Affiliations

Authors

Consortia

Corresponding author

Correspondence to Mohammad S. Obaidat.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saravanan, V., Anpalagan, A., Kothari, D.P. et al. A comparative simulation study on the power–performance of multi-core architecture. J Supercomput 70, 465–487 (2014). https://doi.org/10.1007/s11227-014-1263-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1263-1

Keywords

Navigation