Abstract
Nowadays, multi-core processor is the main technology used in desktop PCs, laptop computers and mobile hardware platforms. As the number of cores on a chip keeps increasing, it adds up the complexity and impacts more on both power and performance of a processor. In multi-processors, the number of cores and various parameters, such as issue-width, number of instructions and execution time, are key design factors to balance the amount of thread-level parallelism and instruction-level parallelism. In this paper, we perform a comprehensive simulation study that aims to find the optimum number of processor cores in desktop/laptop computing processor models with shallow pipeline depth. This paper also explores the trade-off between the number of cores and different parameters used in multi-processors in terms of power–performance gains and analyzes the impact of 3D stacking on the design of simultaneous multi-threading and chip multiprocessing. Our analysis shows that the optimum number of cores varies with different classes of workloads, namely: SPEC2000, SPEC2006 and MiBench. Simulation study is presented using architectures with shorter pipeline depth, showing that (1) the optimum number of cores for power–performance is 8, (2) the optimum number of threads in the range [2, 4], and (3) for beyond 32 cores, multi-core processors are no longer efficient in terms of performance benefits and overall power consumption.
Similar content being viewed by others
References
Gonzalez R, Horowitz M (1996) Energy dissipation in general purpose microprocessors. IEEE J Solid-State Circ 31:1277–1284
Nordstrm T (2010) Embedded parallel computing. http://www.hh.se/download/18.2515361d1351369447180003923/1328015855390/EPC_Lecture+5_v1.pdf
Sharma V (2010) Ethernet-based massively multi-core architecture. http://www.psimast.com/ATCA_Summit_Presentation.pdf
Yalamanchi S (2010) Multicore computing-evolution. http://users.ece.gatech.edu/sudha/academic/class/ece4100-6100/Lectures/Module13-MultiCore/multicore-introduction.pdf
Kozyrakis C, Kansal A, Sankar S, Vaid K (2010) Server engineering insights for large-scale online services. IEEE Micro 30(4):1–12
Eggers SJ, Emer JS, Levy HM, Lo JL, Stamm RL, Tullsen DM (1997) Simultaneous multithreading: a platform for next-generation processors. IEEE Micro 17:12–19
Li Y, Brooks D, Hu Z, Skadron K (2005) Performance, energy, and thermal considerations for smt and cmp architectures. In: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, HPCA ’05, IEEE Computer Society, Washington, DC, USA, 2005, pp 71–82. doi:10.1109/HPCA.2005.25
Barroso LA, Gharachorloo K, Mcnamara R, Nowatzyk A, Qadeer S, Sano B, Smith S, Stets R, Verghese B (2000) Piranha: a scalable architecture based on single-chip multiprocessing. SIGARCH Comput Archit News 1:282–293
Bahar RI (2001) Power and energy reduction via pipeline balancing. In: International Symposium on computer architecture, pp 218–229
Folegnani D, González A (2001) Energy-effective issue logic. SIGARCH Comput Archit News 29(2):230–239. doi:10.1145/384285.379266
Grant RE, Afsahi A (2006) Power-performance efficiency of asymmetric multiprocessors for multi-threaded scientific applications. In: Proceedings of the 20th international conference on parallel and distributed processing, IPDPS’06, IEEE Computer Society, Washington, DC, USA, 2006, pp 300–300. http://dl.acm.org/citation.cfm?id=1898699.1898829
Hsu C-H, Kremer U (2003) The design, implementation, and evaluation of a compiler algorithm for cpu energy reduction. SIGPLAN Not 38(5):38–48. doi:10.1145/780822.781137
Bianchini R, Rajamony R (2004) Power and energy management for server systems. Computer 37(11):68–74. doi:10.1109/MC.2004.217
Kunkel SR, Smith JE (1986) Optimal pipelining in supercomputers. SIGARCH Comput Archit News 14:404–411. doi:10.1145/17356.17403
Agarwal V, Hrishikesh MS, Keckler SW, Burger D (2000) Clock rate versus IPC: the end of the road for conventional microarchitectures. In: Proceedings of the 27th annual international symposium on Computer architecture, ISCA ’00, ACM, New York, NY, USA, 2000, pp 248–259. doi:10.1145/339647.339691
Hrishikesh MS, Burger D, Jouppi NP, Keckler SW, Farkas KI, Shivakumar P (2002) The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In: Proceedings of the 29th annual international symposium on computer architecture, ISCA ’02, IEEE Computer Society, 2002, pp 14–24. http://dl.acm.org/citation.cfm?id=545215.545218
Hartstein A, Puzak TR (2003) Optimum power/performance pipeline depth. In: Proceedings of the 36th Annual International Symposium on Microarchitecture, pp 1–9
Srinivasan V, Brooks D, Gschwind M, Bose P, Zyuban V, Strenski PN, Emma PG (2002) Optimizing pipelines for power and performance. In: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, MICRO 35, 2002, pp 333–344. http://dl.acm.org/citation.cfm?id=774861.774897
Brooks D, Tiwari V, Martonosi M (2000) Wattch: a framework for architectural-level power analysis and optimizations. In: Proceedings of the 27th Annual International Symposium on computer architecture
Sprangle E, Carmean D (2002) Increasing processor performance by implementing deeper pipelines. In: Proceedings of the 29th International Symposium on computer architecture (ISCA-29)
Zyuban V, Strenski P (2002) Unified methodology for resolving power–performance tradeoffs at the microarchitectural and circuit levels. In: Proceedings of the 2002 international symposium on low power electronics and design, ISLPED ’02, ACM, 2002, pp 166–171. doi:10.1145/566408.566451
Brooks D, Bose P, Srinivasan V, Gschwind MK, Emma PG, Rosenfield MG (2003) New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors. IBM J Res Dev 47:653–670. doi:10.1147/rd.475.0653
Hsieh M-Y, Rodrigues A, Riesen R, Thompson K, Song W (2011) A framework for architecture-level power, area, and thermal simulation and its application to network-on-chip design exploration. SIGMETRICS Perform Eval Rev 38(4):63–68. doi:10.1145/1964218.1964229
Chishti Z, Vijaykumar TN (2008) Optimal power/performance pipeline depth for SMT in scaled technologies. IEEE Trans Comput 57:69–81. doi:10.1109/TC.2007.70771. http://portal.acm.org/citation.cfm?id=1340077.1340119
Meng K, Joseph R, Dick RP, Shang L (2008) Multi-optimization power management for chip multiprocessors. In: Proceedings of the 17th International Conference on parallel architectures and compilation techniques, PACT ’08, 2008, pp 177–186. doi:10.1145/1454115.1454141
Pouwelse J, Langendoen K, Sips H (2001) Dynamic voltage scaling on a low-power microprocessor. In: Proceedings of the 7th annual international conference on mobile computing and networking, MobiCom ’01, ACM, 2001, pp 251–259. doi:10.1145/381677.381701
Venkatachalam V, Franz M (2005) Power reduction techniques for microprocessor systems. ACM Comput Surv 37(3):195–237. doi:10.1145/1108956.1108957
Moudgill M, Wellman J-D, Moreno JH (1999) Environment for PowerPC microarchitecture exploration. IEEE Micro 19:15–25. doi:10.1109/40.768496. http://dl.acm.org/citation.cfm?id=623287.624269
Li Y, Lee B, Brooks D, Hu Z, Skadron K (2006) CMP design space exploration subject to physical constraints. In: Proceedings of the 12th International Symposium on high performance computer architecture
Hyari A (2009) A comparative study on heterogeneous and homogeneous multiprocessors. Ph.D. thesis. http://www.abandah.com/gheith/Courses/CPE731_F09/Research_Projects/5_Report.pdf
Ghiasi S (2000) A comparison of two architectural power models. In: Workshop on power-aware computer systems, pp 137–152
Pisharath J, Jiang N, Choudhary A (2003) Evaluation of application-aware heterogeneous embedded systems for performance and energy consumption. In: Proceedings of the The 9th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS ’03, IEEE Computer Society, Washington, DC, USA, 2003, pp 124. http://dl.acm.org/citation.cfm?id=827266.828521
Lorch JR, Smith AJ (2001) Pace: a new approach to dynamic voltage scaling. Technical report, Berkeley
Donald J, Martonosi M (2006) An efficient practical parallelization methodology for multicore architecture simulation. IEEE Comput Archit Lett 5:14. doi:10.1109/L-CA.2006.14
Joseph KG, Sharkey J, Ponomarev D (2005) Abstract M-SIM: a flexible, multithreaded architectural simulation environment. Technical Report
Sharkey JJ, Ponomarev D, Ghose K (2013) Abstract M-SIM: a flexible, multithreaded architectural simulation environment
Sankaralingam K, Nagarajan R, Keckler SW, Burger D (2001) SimpleScalar simulation of the PowerPC instruction set architecture. Technical Report TR2000-04, Department of Computer Sciences, The University of Texas at Austin, Austin
Austin T, Larson E, Ernst D (2002) SimpleScalar: an infrastructure for computer system modeling. Computer 35:59–67. doi:10.1109/2.982917
Austin T (1997) A users and hackers guide to the SimpleScalar architectural research tool set. http://www.cs.virginia.edu/skadron/cs654/slides/hack_guide.pdf
Whitham J (2013) Simplescalar/ARM VirtualBox appliance. http://www.jwhitham.org/simplescalar
Manjikian N (2001) Multiprocessor enhancements of the simplescalar tool set. SIGARCH Comput Archit News 29(1):8–15. doi:10.1145/373574.373578
Conte TM, Menezes KNP, Sathaye SW (1995) A technique to determine power-efficient, high-performance superscalar processors. In: Proceedings of the Twenty-Eighth Hawaii International Conference on system sciences. IEEE Computer Society Press, pp 324–333
Annavaram M, Grochowski E, Shen J (2005) Mitigating amdahl’s law through epi throttling. In: Proceedings of the 32nd annual international symposium on computer architecture, ISCA ’05, IEEE Computer Society, Washington, DC, USA, 2005, pp 298–309. doi:10.1109/ISCA.2005.36
Guthaus MR, Ringenberg JS, Ernst D, Austin TM, Mudge T, Brown RB (2001) Mibench: a free, commercially representative embedded benchmark suite. In: Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, WWC ’01, IEEE Computer Society, Washington, DC, USA, 2001, pp 3–14. doi:10.1109/WWC.2001.15
Koppanalil J, Ramrakhyani P, Desai S, Vaidyanathan A, Rotenberg E (2002) A case for dynamic pipeline scaling. In: Proceedins of the 5th International Conference on compilers, architecture, and synthesis for embedded aystems (CASES’02), pp 1–8
Brooks DM, Bose P, Schuster SE, Jacobson H, Kudva PN, Buyuktosunoglu A, Wellman J-D, Zyuban V, Gupta M, Cook PW (2000) Power-aware microarchitecture: design and modeling challenges for next-generation microprocessors. IEEE Micro 20:26–44. doi:10.1109/40.888701
Vijayalakshmi S, Anpalagan A, Kothari D, Woungang I, Obaidat M (2014) An analytical study of resource division and its impact on power and performance of multi-core processors. J Supercomput 68(3):1–15. doi:10.1007/s11227-014-1086-0
The itrs technology working groups, international technology roadmap for semiconductors (itrs) (2013). http://www.public.itrs.net
Author information
Authors and Affiliations
Consortia
Corresponding author
Rights and permissions
About this article
Cite this article
Saravanan, V., Anpalagan, A., Kothari, D.P. et al. A comparative simulation study on the power–performance of multi-core architecture. J Supercomput 70, 465–487 (2014). https://doi.org/10.1007/s11227-014-1263-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1263-1