research-article

M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores

Authors:

Benedikt Nöthen,

Hermann Härtig,

Gerhard FettweisAuthors Info & Claims

ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 189 - 203

https://doi.org/10.1145/2872362.2872371

Published: 25 March 2016 Publication History

Abstract

In the last decade, the number of available cores increased and heterogeneity grew. In this work, we ask the question whether the design of the current operating systems (OSes) is still appropriate if these trends continue and lead to abundantly available but heterogeneous cores, or whether it forces a fundamental rethinking of how systems are designed. We argue that: 1. hiding heterogeneity behind a common hardware interface unifies, to a large extent, the control and coordination of cores and accelerators in the OS, 2. isolating at the network-on-chip rather than with processor features (like privileged mode, memory management unit, ...), allows running untrusted code on arbitrary cores, and 3. providing OS services via protocols over the network-on-chip, instead of via system calls, makes them accessible to arbitrary types of cores as well.

In summary, this turns accelerators into first-class citizens and enables a single and convenient programming environment for all cores without the need to trust any application.

In this paper, we introduce network-on-chip-level isolation, present the design of our microkernel-based OS, M3, and the common hardware interface, and evaluate the performance of our prototype in comparison to Linux. A bit surprising, without using accelerators, M3 outperforms Linux in some application-level benchmarks by more than a factor of five.

References

[1]

BusyBox. http://www.busybox.net/. last checked: 01/19/2015.

[2]

An introduction to the Intel® QuickPath interconnect. http://www.intel.de/content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf. last checked: 01/19/2015.

[3]

J. Ahn, M. Fiorentino, R. G. Beausoleil, N. Binkert, A. Davis, D. Fattal, N. P. Jouppi, M. McLaren, C. M. Santori, R. S. Schreiber, S. M. Spillane, D. Vantrease, and Q. Xu. Devices and architectures for photonic chip-scale integration. Applied Physics A, 95(4):989--997, 2009.

[4]

R. Alpert, C. Dubnicki, E.W. Felten, and K. Li. Design and implementation of NX message passing using Shrimp virtual memory mapped communication. In Proceedings of the 1996 International Conference on Parallel Processing, volume 1, pages 111--119, Aug 1996.

[5]

Oliver Arnold, Emil Matus, Benedikt Noethen, Markus Winter, Torsten Limberg, and Gerhard Fettweis. Tomahawk: Parallelism and heterogeneity in communications signal processing MPSoCs. ACM Transactions on Embedded Computing Systems, 13(3s):107:1--107:24, March 2014.

Digital Library

[6]

F.J. Ballesteros, N. Evans, C. Forsyth, G. Guardiola, J. McKie, R. Minnich, and E. Soriano-Salvador. NIX: A case for a manycore system for cloud computing. Bell Labs Technical Journal, 17(2):41--54, 2012.

Digital Library

[7]

Antonio Barbalace, Marina Sadini, Saif Ansary, Christopher Jelesnianski, Akshay Ravichandran, Cagil Kendir, Alastair Murray, and Binoy Ravindran. Popcorn: Bridging the programmability gap in heterogeneous-ISA platforms. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys '15), pages 29:1--29:16, New York, NY, USA, 2015. ACM.

Digital Library

[8]

Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09), pages 29--44, New York, NY, USA, 2009. ACM.

Digital Library

[9]

Cadence. Xtensa customizable processor. http://ip.cadence.com. last checked: 01/19/2015.

[10]

Koushik Chakraborty, Philip M. Wells, and Gurindar S. Sohi. Computation spreading: Employing hardware migration to specialize CMP cores on-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII), pages 283--292, New York, NY, USA, 2006. ACM.

Digital Library

[11]

Emilio G. Cota, Paolo Mantovani, Giuseppe Di Guglielmo, and Luca P. Carloni. An analysis of accelerator coupling in heterogeneous architectures. In Proceedings of the 52nd Annual Design Automation Conference (DAC '15), pages 202:1--202:6, New York, NY, USA, 2015. ACM.

Digital Library

[12]

R.H. Dennard, V.L. Rideout, E. Bassous, and A.R. LeBlanc. Design of ion-implanted MOSFET's with very small physical dimensions. Solid-State Circuits, IEEE Journal of, 9(5):256--268, Oct 1974.

[13]

Jack B. Dennis and Earl C. Van Horn. Programming semantics for multiprogrammed computations. Communications of the ACM, 9(3):143--155, March 1966.

Digital Library

[14]

Manuel Fahndrich, Mark Aiken, Chris Hawblitzel, Orion Hodson, Galen Hunt, James R. Larus, and Steven Levi. Language support for fast and reliable message-based communication in Singularity OS. In Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems, pages 177--190, New York, NY, USA, 2006. ACM.

Digital Library

[15]

Norman Feske. A case study on the cost and benefit of dynamic RPC marshalling for low-level system components. ACM SIGOPS Operating Systems Review, 41(4):40--48, July 2007.

Digital Library

[16]

L. Fiorin, G. Palermo, S. Lukovic, V. Catalano, and C. Silvano. Secure memory accesses on networks-on-chip. IEEE Transactions on Computers, 57(9):1216--1229, Sept 2008.

Digital Library

[17]

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP '03), pages 29--43, New York, NY, USA, 2003. ACM.

Digital Library

[18]

N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward dark silicon in servers. IEEE Micro, 31(4):6--15, July 2011.

Digital Library

[19]

Norman Hardy. KeyKOS architecture. ACM SIGOPS Operating Systems Review, 19(4):8--25, October 1985.

Digital Library

[20]

John Heinlein, Kourosh Gharachorloo, Scott Dresser, and Anoop Gupta. Integration of message passing and shared memory in the Stanford FLASH multiprocessor. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 38--50, New York, NY, USA, 1994. ACM.

Digital Library

[21]

K.U. Jarvinen and J.O. Skytta. High-speed elliptic curve cryptography accelerator for koblitz curves. In 16th International Symposium on Field-Programmable Custom Computing Machines (FCCM '08), pages 109--118, April 2008.

Digital Library

[22]

Sangman Kim, Seonggu Huh, Yige Hu, Xinya Zhang, Amir Wated, Emmett Witchel, and Mark Silberstein. GPUnet: Networking abstractions for GPU programs. In Proceedings of the International Conference on Operating Systems Design and Implementation, pages 6--8, 2014.

[23]

Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. seL4: Formal verification of an OS kernel. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pages 207--220, New York, NY, USA, 2009. ACM.

Digital Library

[24]

George Kurian, Jason E. Miller, James Psota, Jonathan Eastep, Jifeng Liu, Jurgen Michel, Lionel C. Kimerling, and Anant Agarwal. ATAC: A 1000-core cache-coherent processor with on-chip optical network. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10), pages 477--488, New York, NY, USA, 2010. ACM.

Digital Library

[25]

J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy. The Stanford FLASH multiprocessor. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 302--313, Apr 1994.

Digital Library

[26]

Adam Lackorzynski and Alexander Warg. Taming subsystems: Capabilities as universal resource access control in L4. In Proceedings of the Second Workshop on Isolation and Integration in Embedded Systems (IIES '09), pages 25--30, New York, NY, USA, 2009. ACM.

Digital Library

[27]

J. Liedtke. On micro-kernel construction. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP '95), pages 237--250, New York, NY, USA, 1995. ACM.

Digital Library

[28]

Kevin Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, and Thomas F. Wenisch. Thin servers with smart pipes: Designing SoC accelerators for memcached. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13), pages 36--47, New York, NY, USA, 2013. ACM.

Digital Library

[29]

Felix Xiaozhu Lin, Zhen Wang, and Lin Zhong. K2: A mobile operating system for heterogeneous coherence domains. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14), pages 285--300, New York, NY, USA, 2014. ACM.

Digital Library

[30]

Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. PuDianNao: A polyvalent machine learning accelerator. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 369--381. ACM, 2015.

Digital Library

[31]

K. Mackenzie, J. Kubiatowicz, M. Frank, W. Lee, W. Lee, A. Agarwal, and M.F. Kaashoek. Exploiting two-case delivery for fast protected messaging. In Fourth International Symposium on High-Performance Computer Architecture, pages 231--242, Feb 1998.

[32]

Avantika Mathur, Mingming Cao, Suparna Bhattacharya, Andreas Dilger, Alex Tomas, and Laurent Vivier. The new ext4 filesystem: current status and future plans. In Proceedings of the Linux Symposium, volume 2, pages 21--33, 2007.

[33]

Edmund B. Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, and Galen Hunt. Helios: Heterogeneous multiprocessing with satellite kernels. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09), pages 221--234, New York, NY, USA, 2009. ACM.

Digital Library

[34]

Mike Parker, Al Davis, and Wilson Hsieh. Message-passing for the 21st century: Integrating user-level networks with SMT. In Proceedings of the 5th Workshop on Multithreaded Execution, Architecture and Compilation, 2001.

[35]

Rob Pike, Dave Presotto, Ken Thompson, Howard Trickey, et al. Plan 9 from Bell Labs. In Proceedings of the Summer 1990 UKUUG Conference, pages 1--9. London, UK, 1990.

[36]

J. Porquet, A. Greiner, and C. Schwarz. NoC-MPU: A secure architecture for flexible co-hosting on shared memory MPSoCs. In Design, Automation Test in Europe Conference Exhibition (DATE), 2011, pages 1--4, March 2011.

[37]

Wajahat Qadeer, Rehan Hameed, Ofer Shacham, Preethi Venkatesan, Christos Kozyrakis, and Mark Horowitz. Convolution engine: Balancing efficiency and flexibility in specialized computing. Communications of the ACM, 58(4):85--93, March 2015.

Digital Library

[38]

Ohad Rodeh, Josef Bacik, and Chris Mason. BTRFS: The Linux B-tree filesystem. ACM Transactions on Storage (TOS), 9(3):9:1--9:32, August 2013.

[39]

Christopher J. Rossbach, Jon Currey, Mark Silberstein, Baishakhi Ray, and Emmett Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11), pages 233--248, New York, NY, USA, 2011. ACM.

Digital Library

[40]

Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. GPUfs: Integrating a file system with GPUs. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13), pages 485--498, New York, NY, USA, 2013. ACM.

Digital Library

[41]

Livio Soares and Michael Stumm. FlexSC: Flexible system call scheduling with exception-less system calls. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI '10), pages 1--8, Berkeley, CA, USA, 2010. USENIX Association.

[42]

Udo Steinberg and Bernhard Kauer. NOVA: A microhypervisor-based secure virtualization architecture. In Proceedings of the 5th European Conference on Computer Systems (EuroSys '10), pages 209--222, New York, NY, USA, 2010. ACM.

Digital Library

[43]

M.B. Taylor. A landscape of the new dark silicon design regime. IEEE Micro, 33(5):8--19, Sept 2013.

Digital Library

[44]

David Wentzlaff and Anant Agarwal. Factored operating systems (fos): The case for a scalable operating system for multicores. ACM SIGOPS Operating Systems Review, 43(2):76--85, April 2009.

Digital Library

[45]

Jonathan Woodruff, Robert N.M. Watson, David Chisnall, Simon W. Moore, Jonathan Anderson, Brooks Davis, Ben Laurie, Peter G. Neumann, Robert Norton, and Michael Roe. The CHERI capability model: Revisiting RISC in an age of risk. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14), pages 457--468, Piscataway, NJ, USA, 2014. IEEE Press.

[46]

Lisa Wu, Raymond J. Barker, Martha A. Kim, and Kenneth A. Ross. Navigating big data with high-throughput, energy-efficient data partitioning. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13), pages 249--260, New York, NY, USA, 2013. ACM.

Digital Library

[47]

Wei Yu and Yun He. A high performance CABAC decoding architecture. IEEE Transactions on Consumer Electronics, 51(4):1352--1359, Nov 2005.

Digital Library

Cited By

GU JLI HXIA YGUAN HDING ZZHAO YCHEN H(2024)BrickOS: specialized kernels for heterogeneous hardware resourcesSCIENTIA SINICA Informationis10.1360/SSI-2022-041354:3(491)Online publication date: 11-Mar-2024
https://doi.org/10.1360/SSI-2022-0413
Reusch VAsmussen NRoitzsch M(2024)Robust and Immediate Resource Reclamation with M3Proceedings of the 2nd Workshop on Kernel Isolation, Safety and Verification10.1145/3698576.3698763(1-7)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3698576.3698763
Misono MOkelmann PMainas CBhatotia P(2024)uIO: Lightweight and Extensible UnikernelsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698518(580-599)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698518
Show More Cited By

Index Terms

M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores

Recommendations

M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores
ASPLOS'16

In the last decade, the number of available cores increased and heterogeneity grew. In this work, we ask the question whether the design of the current operating systems (OSes) is still appropriate if these trends continue and lead to abundantly ...
M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores
ASPLOS '16

In the last decade, the number of available cores increased and heterogeneity grew. In this work, we ask the question whether the design of the current operating systems (OSes) is still appropriate if these trends continue and lead to abundantly ...
Rinnegan: Efficient Resource Use in Heterogeneous Architectures
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

Current processors provide a variety of different processing units to improve performance and power efficiency. For example, ARM's big.LITTLE, AMD's APUs, and Oracle's M7 provide heterogeneous processors, on-die GPUs, and on-die accelerators. However, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems

March 2016

824 pages

ISBN:9781450340915

DOI:10.1145/2872362

General Chair:
Tom Conte
Georgia Tech, USA
,
Program Chair:
Yuanyuan Zhou
University of California, San Diego, USA

ACM SIGPLAN Notices Volume 51, Issue 4
ASPLOS '16
April 2016
774 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2954679
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents
ACM SIGARCH Computer Architecture News Volume 44, Issue 2
ASPLOS'16
May 2016
774 pages
ISSN:0163-5964
DOI:10.1145/2980024
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Deutsche Forschungsgemeinschaft

Conference

ASPLOS '16

Sponsor:

ASPLOS '16: Architectural Support for Programming Languages and Operating Systems

April 2 - 6, 2016

Georgia, Atlanta, USA

Acceptance Rates

ASPLOS '16 Paper Acceptance Rate 53 of 232 submissions, 23%;

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

45
Total Citations
View Citations
1,429
Total Downloads

Downloads (Last 12 months)196
Downloads (Last 6 weeks)21

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

GU JLI HXIA YGUAN HDING ZZHAO YCHEN H(2024)BrickOS: specialized kernels for heterogeneous hardware resourcesSCIENTIA SINICA Informationis10.1360/SSI-2022-041354:3(491)Online publication date: 11-Mar-2024
https://doi.org/10.1360/SSI-2022-0413
Reusch VAsmussen NRoitzsch M(2024)Robust and Immediate Resource Reclamation with M3Proceedings of the 2nd Workshop on Kernel Isolation, Safety and Verification10.1145/3698576.3698763(1-7)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3698576.3698763
Misono MOkelmann PMainas CBhatotia P(2024)uIO: Lightweight and Extensible UnikernelsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698518(580-599)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698518
Suresh VMishra BJing YZhu ZJin NBlock CMantovani PGiri DZuckerman JCarloni LAdve S(2024)Mozart: Taming Taxes and Composing Accelerators with Shared-MemoryProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676896(183-200)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676896
Asmussen NHaas SLackorzyński ARoitzsch M(2024) Core-Local Reasoning and Predictable Cross-Core Communication with M 3 2024 IEEE 30th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS61025.2024.00024(199-211)Online publication date: 13-May-2024
https://doi.org/10.1109/RTAS61025.2024.00024
Haas SDunkel CPauls FHasler MVerma Y(2024)Trustworthy Silicon: An MPSoC for a Secure Operating System2024 IEEE Nordic Circuits and Systems Conference (NorCAS)10.1109/NorCAS64408.2024.10752473(1-7)Online publication date: 29-Oct-2024
https://doi.org/10.1109/NorCAS64408.2024.10752473
Eisoldt JMiemietz T(2024)Trustworthy Execution of O-RAN Applications by strong Separation and minimal Trusted Computing Base2024 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN)10.1109/NFV-SDN61811.2024.10807504(214-216)Online publication date: 5-Nov-2024
https://doi.org/10.1109/NFV-SDN61811.2024.10807504
Gouveia IGraczyk RVölp MEsteves-Verissimo P(2024)Efficient On-Chip ReplicationIEEE Access10.1109/ACCESS.2024.348401312(172581-172595)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3484013
Wang YYu JYu Z(2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术：综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
https://doi.org/10.1631/FITEE.2100298
Wei TTurtayeva NOrenes-Vera MLonkar OBalkind JAamodt TJerger NSwift M(2023)Cohort: Software-Oriented Acceleration for Heterogeneous SoCsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582059(105-117)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582059
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten