Scheduling for Better Energy Efficiency on Many-Core Chips

Kang, Chanseok; Lee, Seungyul; Lee, Yong-Jun; Lee, Jaejin; Egger, Bernhard

doi:10.1007/978-3-319-61756-5_3

Chanseok Kang¹⁵,
Seungyul Lee¹⁵,
Yong-Jun Lee¹⁵,
Jaejin Lee¹⁵ &
…
Bernhard Egger¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10353))

Included in the following conference series:

606 Accesses

Abstract

Many-core chips are especially attractive for data center operators providing cloud computing service models. With the advance of many-core chips in such environments energy-conscious scheduling of independent processes or operating systems (OSes) is gaining importance. An important research question is how the scheduler of such a system should assign the cores to the OSes in order to achieve a better energy utilization. In this paper, we demonstrate that many-core chips offer new opportunities for extremely light-weight migration of independent processes (or OSes) running bare-metal on the many-core chip. We then show how this intra-chip migration can be utilized to achieve a better performance per watt ratio by implementing a hierarchical power-management scheme on top of dynamic voltage and frequency scaling (DVFS). We have implemented and tested the proposed techniques on the Intel Single Chip Cloud Computer (SCC). Combining migration with DVFS we achieve, on average, a 25–35% better performance per watt over a DVFS-only solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agerwala, T., Chatterjee, S.: Computer architecture: Challenges and opportunities for the next decade. IEEE Micro 25(3), 58–69 (2005)
Article Google Scholar
Barroso, L.A., Gharachorloo, K., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., Verghese, B.: Piranha: a scalable architecture based on single-chip multiprocessing. In: Proceedings of the 27th Annual International Symposium on Computer Architecture, ISCA 2000, pp. 282–293. ACM, New York (2000)
Google Scholar
Borkar, S.: Thousand core chips-a technology perspective. In: Proceedings of the 44th Annual Design Automation Conference, DAC 2007, pp. 746–749. ACM, New York (2007)
Google Scholar
Burd, T.D., Brodersen, R.W.: Energy efficient CMOS microprocessor design. In: Proceedings of the Twenty-Eighth Hawaii International Conference on System Sciences, vol. 1, pp. 288–297 (1995)
Google Scholar
Cai, Q., González, J., Magklis, G., Chaparro, P., González, A.: Thread shuffling: combining DVFS and thread migration to reduce energy consumptions for multi-core systems. In: Proceedings of the 17th IEEE/ACM International Symposium on Low-power Electronics and Design, ISLPED 2011, pp. 379–384. IEEE Press, Piscataway (2011)
Google Scholar
Ebi, T., Faruque, M., Henkel, J.: Tape: Thermal-aware agent-based power econom multi/many-core architectures. In: IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers, ICCAD 2009, pp. 302–309 (2009)
Google Scholar
Ghiasi, S.: Aide De Camp: asymmetric multi-core design for dynamic thermal management. PhD thesis, Boulder, CO, USA (2004). AAI3136618
Google Scholar
Herbert, S., Marculescu, D.: Analysis of dynamic voltage/frequency scaling in chip-multiprocessors. In: 2007 ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), pp. 38–43, August 2007
Google Scholar
Howard, J., Dighe, S., Hoskote, Y., Vangal, S., Finan, D., Ruhl, G., Jenkins, D., Wilson, H., Borkar, N., Schrom, G., Pailet, F., Jain, S., Jacob, T., Yada, S., Marella, S., Salihundam, P., Erraguntla, V., Konow, M., Riepen, M., Droege, G., Lindemann, J., Gries, M., Apel, T., Henriss, K., Lund-Larsen, T., Steibl, S., Borkar, S., De, V., Van der Wijngaart, R., Mattson, T.: A 48-core IA-32 message-passing processor with DVFS in 45 nm CMOS. In: IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010, pp. 108–109, February 2010
Google Scholar
Ioannou, N., Kauschke, M., Gries, M., Cintra, M.: Phase-based application-driven hierarchical power management on the single-chip cloud computer. In: Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT 2011, pp. 131–142. IEEE Computer Society, Washington, DC (2011)
Google Scholar
Isci, C., Buyuktosunoglu, A., Cher, C.Y., Bose, P., Martonosi, M.: An analysis of efficient multi-core global power management policies: maximizing performance for a given power budget. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 39, pages 347–358. IEEE Computer Society, Washington, DC, USA (2006)
Google Scholar
Kim, W., Gupta, M.S., Wei, G.Y., Brooks, D.: System level analysis of fast, per-core DVFS using on-chip switching regulators. In: IEEE 14th International Symposium on High Performance Computer Architecture (HPCA 2008), pp. 123–134, February 2008
Google Scholar
Kumar, R., Tullsen, D.M., Ranganathan, P., Jouppi, N.P., Farkas, K.I.: Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In: Proceedings of the 31st Annual International Symposium on Computer Architecture, ISCA 2004, p. 64. IEEE Computer Society, Washington, DC, USA (2004)
Google Scholar
Li, J., Martinez, J.F.: Power-performance implications of thread-level parallelism on chip multiprocessors. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2005), pp. 124–134, March 2005
Google Scholar
Ma, K., Li, X., Chen, M., Wang, X.: Scalable power control for many-core architectures running multi-threaded applications. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA 2011, pp. 449–460. ACM, New York (2011)
Google Scholar
Meisner, D., Gold, B.T., Wenisch, T.F.: Powernap: Eliminating server idle power. In: Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, pp. 205–216. ACM, New York (2009)
Google Scholar
Meisner, D., Wenisch, T.F.: Dreamweaver: architectural support for deep sleep. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pp. 313–324. ACM, New York (2012)
Google Scholar
Meng, K., Joseph, R., Dick, R.P., Shang, L.: Multi-optimization power management for chip multiprocessors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT 2008, pp. 177–186. ACM, New York (2008)
Google Scholar
Olukotun, K., Nayfeh, B.A., Hammond, L., Wilson, K., Chang, K.: The case for a single-chip multiprocessor. In: Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VII, pp. 2–11. ACM, New York (1996)
Google Scholar
Rangan, K.K., Wei, G.Y., Brooks, D.: Thread motion: fine-grained power management for multi-core systems. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA 2009, pp. 302–313. ACM, New York (2009)
Google Scholar
Rotem, E., Mendelson, A., Ginosar, R., Weiser, U.: Multiple clock and voltage domains for chip multi processors. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pp. 459–468. ACM, New York (2009)
Google Scholar
Vajda, A.: Multi-core and many-core processor architectures. Programming Many-Core Chips, pp. 9–43. Springer, New York (2011)
Chapter Google Scholar

Download references

Acknowledgments

This work was supported, in part, by BK21 Plus for Pioneers in Innovative Computing (Dept. of Computer Science and Engineering, SNU) funded by the National Research Foundation (NRF) of Korea (Grant 21A20151113068), the Basic Science Research Program through NRF funded by the Ministry of Science, ICT & Future Planning (Grants NRF-2015K1A3A1A14021288 and NRF-2008-0062609), and by the Promising-Pioneering Researcher Program through Seoul National University in 2015. ICT at Seoul National University provided research facilities for this study.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Seoul National University, Seoul, Korea
Chanseok Kang, Seungyul Lee, Yong-Jun Lee, Jaejin Lee & Bernhard Egger

Authors

Chanseok Kang
View author publications
You can also search for this author in PubMed Google Scholar
Seungyul Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Jun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jaejin Lee
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Egger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bernhard Egger .

Editor information

Editors and Affiliations

Google, Seattle, USA
Narayan Desai
Google, Mountain View, USA
Walfredo Cirne

Appendices

Appendix

A Profiled Workload Benchmark Scenarios

This appendix describes the details of the benchmarks evaluated in this work. Each benchmark scenario consists of two parts:

Two or more workload pattern that describe how the workload changes over time.
An initial assignment of the workloads to the 48 cores of the exercised Intel SCC.

Each workload pattern (WL), denoted S{1–7} in the tables below, lists the CPU workload for every epoch (10 or 15 s, depending on the benchmark) for the duration of one period (300 s). A workload never stops, it keeps repeating the workload pattern period after period. Note that all workloads are pure CPU-based workloads; memory-based workloads are part of future work.

The core assignment tables below show what workload pattern are assigned to which cores when the experiment starts. In our setup, voltage domain 3 runs various logging and monitoring services and is thus not available for user benchmarks. The power measurements include the power consumed by vdom3 because power is only reported for the entire chip and not for individual voltage domains.

A benchmark ends after a predefined number of seconds (in our example after 300 s). The total progress of each workload is measured externally and thus includes all overheads caused by migration, voltage changes or slowdowns cause by too low frequency settings.

1.1 A.1 Synthetic Benchmark Scenario based on Periodic Workloads

The synthetic benchmark consists of two identical workload patterns shifted in time. Each voltage domain contains workloads of both patterns. The purpose of this benchmark is to demonstrate the potential of combining DVFS with OS migration. The results of this benchmark are shown in Fig. 5.

Workload patterns:

WL	Epoch (1 epoch = 15 s)
WL	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
S1	95	95	10	10	95	95	10	10	95	95	10	10	95	95	10	10	95	95	10	10	10
S2	10	95	95	10	10	95	95	10	10	95	95	10	10	95	95	10	10	95	95	10	10

Core assignment:

vdom0		vdom1		vdom3		vdom4		vdom5		vdom7
-	-	-	-	n/a	n/a	-	-	-	-	-	-
S2	-	S2	-	n/a	n/a	S2	S2	S2	-	S2	-
-	-	-	-	n/a	n/a	-	-	-	-	-
S1	S2	S1	S1	n/a	n/a	S1	S1	S1	S2	S1	S1

1.2 A.2 Benchmark Scenarios based on Profiled Workloads

The following four benchmarks are based on the usage patterns of Linux and Windows desktop computers. Initially, each voltage domain is loaded with different workload patterns. These benchmarks demonstrate the effect of the proposed technique when applied to a multi-user setup (i.e., virtual desktops of employees on a server machine).

The detailed result of the first benchmark are shown in Fig. 6, and Table 1 lists the combined results for all four benchmark scenarios shown here.

Benchmark 1 (BM1)

Workload patterns:

WL	Epoch (1 epoch = 10 s)
WL	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30
S1	27	49	31	32	62	77	80	44	0	6	1	1	8	73	87	81	80	91	100	99	89	67	13	52	0	0	10	46	27	86	63
S2	69	57	68	60	55	66	61	63	69	58	56	57	63	59	62	58	57	67	68	64	61	71	78	63	71	82	69	14	0	2	4
S3	28	84	41	12	83	48	55	0	35	69	42	59	17	46	59	49	51	2	46	47	80	40	4	73	41	53	47	18	100	42	45
S4	27	49	31	32	62	77	80	44	0	6	1	1	8	73	87	81	80	91	100	99	89	67	13	52	0	0	10	80	66	56	32
S5	71	53	26	9	34	25	23	38	37	26	96	92	34	41	89	100	100	12	17	30	27	21	31	35	41	84	89	63	100	96	84
S6	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	27	96	63	100	27	0	0	0	0	0	0	0	0	0	0	0
S7	5	4	5	7	2	4	5	6	6	4	100	6	2	4	1	1	0	1	2	2	4	2	2	4	6	6	6	5	2	10	5

Core assignment:

vdom0		vdom1		vdom3		vdom4		vdom5		vdom7
S4	S6	S4	S4	n/a	n/a	S5	S5	S5	S6	S5	S5
S3	S3	S3	S7	n/a	n/a	S3	S3	S4	S4	S2	S2
S2	S5	S2	S2	n/a	n/a	S2	S6	S2	S7	S3	S4
S1	S1	S1	S5	n/a	n/a	S1	S4	S1	S3	S1	S1

Benchmark 2 (BM2)

Workload patterns:

WL	Epoch (1 epoch = 10 s)
WL	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30
S1	27	49	31	32	62	77	80	44	0	6	1	1	8	73	87	81	80	91	100	99	89	67	13	52	0	0	10	46	27	86	63
S2	82	39	55	42	96	42	100	33	53	20	20	10	11	14	13	11	13	13	1	5	1	0	23	45	61	42	83	83	20	15	3
S3	28	84	41	12	83	48	55	0	35	69	42	59	17	46	59	49	51	2	46	47	80	40	4	73	41	53	47	18	100	42	45
S4	27	49	31	32	62	77	80	44	0	6	1	1	8	73	87	81	80	91	100	99	89	67	13	52	0	0	10	10	15	30	27
S5	71	53	26	9	34	25	23	38	37	26	96	92	34	41	89	100	100	12	17	30	27	21	31	35	41	84	89	63	100	96	84
S6	53	21	52	48	33	92	89	100	39	38	29	41	48	4	64	45	36	31	42	41	42	35	15	80	93	62	10	23	48	32	0

Core assignment:

vdom0		vdom1		vdom3		vdom4		vdom5		vdom7
-	-	-	-	n/a	n/a	-	-	-	-	-	-
S5	S6	S5	S6	n/a	n/a	S3	S6	S4	S5	S4	S5
-	-	-	-	n/a	n/a	-	-	-	-	-	-
S1	S4	S1	S2	n/a	n/a	S1	S2	S2	S3	S1	S3

Benchmark 3 (BM3).

Workload patterns:

WL	Epoch (1 epoch = 10 s)
WL	00	01	02	03	04	05	06	07	08	09	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30
S1	42	77	25	11	34	36	30	14	33	26	22	58	100	52	30	13	15	0	21	39	48	43	40	41	40	42	41	40	39	36	35
S2	45	15	6	27	25	9	64	55	27	28	18	51	46	100	56	20	25	25	12	0	0	0	0	0	0	0	0	0	0	0	0
S3	71	53	26	9	34	25	23	38	37	26	30	23	34	41	39	29	29	12	17	30	27	21	31	35	41	84	89	63	100	96	2
S4	11	22	20	10	27	12	45	100	22	9	4	14	9	43	19	6	17	18	14	21	5	5	5	6	25	16	7	0	0	0	0
S5	42	66	40	67	57	67	66	71	75	72	31	38	59	54	86	80	68	55	95	100	89	85	86	77	64	0	0	0	0	0	0

Core assignment:

vdom0		vdom1		vdom2		vdom4		vdom5		vdom7
S5	-	-	-	n/a	n/a	S5	-	S5	-	S5	-
-	-	S5	-	n/a	n/a	S4	-	S4	-	S4	-
S2	S4	S2	S4	n/a	n/a	-	S3	S2	S3	S2	-
S1	S3	S1	S3	n/a	n/a	S1	S2	S1	-	S1	S3

Benchmark 4 (BM4).

Workload patterns:

WL	Epoch (1 epoch = 10 s)
WL	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30
S1	27	49	31	32	62	77	80	44	0	6	1	1	8	73	87	81	80	91	100	99	89	67	13	52	0	0	10	46	27	86	63
S2	82	39	55	42	96	42	100	33	53	20	20	10	11	14	13	11	13	13	1	5	1	0	23	45	61	42	83	83	20	15	3
S3	8	20	21	30	80	100	24	50	36	54	83	92	91	73	27	1	0	1	1	1	1	0	1	1	10	1	21	17	33	5	7
S4	27	49	31	32	62	77	80	44	0	6	1	1	8	73	87	81	80	91	100	99	89	67	13	52	0	0	10	10	15	30	27
S5	53	21	52	48	33	92	89	100	39	38	29	41	48	4	64	45	36	31	42	41	42	35	15	80	93	62	10	23	48	32	0

Core assignment:

vdom0		vdom1		vdom2		vdom4		vdom5		vdom7
-	-	-	-	n/a	n/a	-	-	-	-	-	-
S3	S4	S3	S4	n/a	n/a	S3	S4	S3	S4	S3	S4
-	S5	-	S5	n/a	n/a	-	S5	-	S5	-	S5
S1	S2	S1	S2	n/a	n/a	S1	S2	S1	S2	S1	S2

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kang, C., Lee, S., Lee, YJ., Lee, J., Egger, B. (2017). Scheduling for Better Energy Efficiency on Many-Core Chips. In: Desai, N., Cirne, W. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP JSSPP 2015 2016. Lecture Notes in Computer Science(), vol 10353. Springer, Cham. https://doi.org/10.1007/978-3-319-61756-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-61756-5_3
Published: 12 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61755-8
Online ISBN: 978-3-319-61756-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scheduling for Better Energy Efficiency on Many-Core Chips

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix

A Profiled Workload Benchmark Scenarios

1.1 A.1 Synthetic Benchmark Scenario based on Periodic Workloads

1.2 A.2 Benchmark Scenarios based on Profiled Workloads

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation