skip to main content
10.1145/2872362.2872371acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores

Published: 25 March 2016 Publication History

Abstract

In the last decade, the number of available cores increased and heterogeneity grew. In this work, we ask the question whether the design of the current operating systems (OSes) is still appropriate if these trends continue and lead to abundantly available but heterogeneous cores, or whether it forces a fundamental rethinking of how systems are designed. We argue that: 1. hiding heterogeneity behind a common hardware interface unifies, to a large extent, the control and coordination of cores and accelerators in the OS, 2. isolating at the network-on-chip rather than with processor features (like privileged mode, memory management unit, ...), allows running untrusted code on arbitrary cores, and 3. providing OS services via protocols over the network-on-chip, instead of via system calls, makes them accessible to arbitrary types of cores as well.
In summary, this turns accelerators into first-class citizens and enables a single and convenient programming environment for all cores without the need to trust any application.
In this paper, we introduce network-on-chip-level isolation, present the design of our microkernel-based OS, M3, and the common hardware interface, and evaluate the performance of our prototype in comparison to Linux. A bit surprising, without using accelerators, M3 outperforms Linux in some application-level benchmarks by more than a factor of five.

References

[1]
BusyBox. http://www.busybox.net/. last checked: 01/19/2015.
[2]
An introduction to the Intel® QuickPath interconnect. http://www.intel.de/content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf. last checked: 01/19/2015.
[3]
J. Ahn, M. Fiorentino, R. G. Beausoleil, N. Binkert, A. Davis, D. Fattal, N. P. Jouppi, M. McLaren, C. M. Santori, R. S. Schreiber, S. M. Spillane, D. Vantrease, and Q. Xu. Devices and architectures for photonic chip-scale integration. Applied Physics A, 95(4):989--997, 2009.
[4]
R. Alpert, C. Dubnicki, E.W. Felten, and K. Li. Design and implementation of NX message passing using Shrimp virtual memory mapped communication. In Proceedings of the 1996 International Conference on Parallel Processing, volume 1, pages 111--119, Aug 1996.
[5]
Oliver Arnold, Emil Matus, Benedikt Noethen, Markus Winter, Torsten Limberg, and Gerhard Fettweis. Tomahawk: Parallelism and heterogeneity in communications signal processing MPSoCs. ACM Transactions on Embedded Computing Systems, 13(3s):107:1--107:24, March 2014.
[6]
F.J. Ballesteros, N. Evans, C. Forsyth, G. Guardiola, J. McKie, R. Minnich, and E. Soriano-Salvador. NIX: A case for a manycore system for cloud computing. Bell Labs Technical Journal, 17(2):41--54, 2012.
[7]
Antonio Barbalace, Marina Sadini, Saif Ansary, Christopher Jelesnianski, Akshay Ravichandran, Cagil Kendir, Alastair Murray, and Binoy Ravindran. Popcorn: Bridging the programmability gap in heterogeneous-ISA platforms. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys '15), pages 29:1--29:16, New York, NY, USA, 2015. ACM.
[8]
Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09), pages 29--44, New York, NY, USA, 2009. ACM.
[9]
Cadence. Xtensa customizable processor. http://ip.cadence.com. last checked: 01/19/2015.
[10]
Koushik Chakraborty, Philip M. Wells, and Gurindar S. Sohi. Computation spreading: Employing hardware migration to specialize CMP cores on-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII), pages 283--292, New York, NY, USA, 2006. ACM.
[11]
Emilio G. Cota, Paolo Mantovani, Giuseppe Di Guglielmo, and Luca P. Carloni. An analysis of accelerator coupling in heterogeneous architectures. In Proceedings of the 52nd Annual Design Automation Conference (DAC '15), pages 202:1--202:6, New York, NY, USA, 2015. ACM.
[12]
R.H. Dennard, V.L. Rideout, E. Bassous, and A.R. LeBlanc. Design of ion-implanted MOSFET's with very small physical dimensions. Solid-State Circuits, IEEE Journal of, 9(5):256--268, Oct 1974.
[13]
Jack B. Dennis and Earl C. Van Horn. Programming semantics for multiprogrammed computations. Communications of the ACM, 9(3):143--155, March 1966.
[14]
Manuel Fahndrich, Mark Aiken, Chris Hawblitzel, Orion Hodson, Galen Hunt, James R. Larus, and Steven Levi. Language support for fast and reliable message-based communication in Singularity OS. In Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems, pages 177--190, New York, NY, USA, 2006. ACM.
[15]
Norman Feske. A case study on the cost and benefit of dynamic RPC marshalling for low-level system components. ACM SIGOPS Operating Systems Review, 41(4):40--48, July 2007.
[16]
L. Fiorin, G. Palermo, S. Lukovic, V. Catalano, and C. Silvano. Secure memory accesses on networks-on-chip. IEEE Transactions on Computers, 57(9):1216--1229, Sept 2008.
[17]
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP '03), pages 29--43, New York, NY, USA, 2003. ACM.
[18]
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward dark silicon in servers. IEEE Micro, 31(4):6--15, July 2011.
[19]
Norman Hardy. KeyKOS architecture. ACM SIGOPS Operating Systems Review, 19(4):8--25, October 1985.
[20]
John Heinlein, Kourosh Gharachorloo, Scott Dresser, and Anoop Gupta. Integration of message passing and shared memory in the Stanford FLASH multiprocessor. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 38--50, New York, NY, USA, 1994. ACM.
[21]
K.U. Jarvinen and J.O. Skytta. High-speed elliptic curve cryptography accelerator for koblitz curves. In 16th International Symposium on Field-Programmable Custom Computing Machines (FCCM '08), pages 109--118, April 2008.
[22]
Sangman Kim, Seonggu Huh, Yige Hu, Xinya Zhang, Amir Wated, Emmett Witchel, and Mark Silberstein. GPUnet: Networking abstractions for GPU programs. In Proceedings of the International Conference on Operating Systems Design and Implementation, pages 6--8, 2014.
[23]
Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. seL4: Formal verification of an OS kernel. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pages 207--220, New York, NY, USA, 2009. ACM.
[24]
George Kurian, Jason E. Miller, James Psota, Jonathan Eastep, Jifeng Liu, Jurgen Michel, Lionel C. Kimerling, and Anant Agarwal. ATAC: A 1000-core cache-coherent processor with on-chip optical network. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10), pages 477--488, New York, NY, USA, 2010. ACM.
[25]
J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy. The Stanford FLASH multiprocessor. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 302--313, Apr 1994.
[26]
Adam Lackorzynski and Alexander Warg. Taming subsystems: Capabilities as universal resource access control in L4. In Proceedings of the Second Workshop on Isolation and Integration in Embedded Systems (IIES '09), pages 25--30, New York, NY, USA, 2009. ACM.
[27]
J. Liedtke. On micro-kernel construction. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP '95), pages 237--250, New York, NY, USA, 1995. ACM.
[28]
Kevin Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, and Thomas F. Wenisch. Thin servers with smart pipes: Designing SoC accelerators for memcached. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13), pages 36--47, New York, NY, USA, 2013. ACM.
[29]
Felix Xiaozhu Lin, Zhen Wang, and Lin Zhong. K2: A mobile operating system for heterogeneous coherence domains. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14), pages 285--300, New York, NY, USA, 2014. ACM.
[30]
Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. PuDianNao: A polyvalent machine learning accelerator. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 369--381. ACM, 2015.
[31]
K. Mackenzie, J. Kubiatowicz, M. Frank, W. Lee, W. Lee, A. Agarwal, and M.F. Kaashoek. Exploiting two-case delivery for fast protected messaging. In Fourth International Symposium on High-Performance Computer Architecture, pages 231--242, Feb 1998.
[32]
Avantika Mathur, Mingming Cao, Suparna Bhattacharya, Andreas Dilger, Alex Tomas, and Laurent Vivier. The new ext4 filesystem: current status and future plans. In Proceedings of the Linux Symposium, volume 2, pages 21--33, 2007.
[33]
Edmund B. Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, and Galen Hunt. Helios: Heterogeneous multiprocessing with satellite kernels. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09), pages 221--234, New York, NY, USA, 2009. ACM.
[34]
Mike Parker, Al Davis, and Wilson Hsieh. Message-passing for the 21st century: Integrating user-level networks with SMT. In Proceedings of the 5th Workshop on Multithreaded Execution, Architecture and Compilation, 2001.
[35]
Rob Pike, Dave Presotto, Ken Thompson, Howard Trickey, et al. Plan 9 from Bell Labs. In Proceedings of the Summer 1990 UKUUG Conference, pages 1--9. London, UK, 1990.
[36]
J. Porquet, A. Greiner, and C. Schwarz. NoC-MPU: A secure architecture for flexible co-hosting on shared memory MPSoCs. In Design, Automation Test in Europe Conference Exhibition (DATE), 2011, pages 1--4, March 2011.
[37]
Wajahat Qadeer, Rehan Hameed, Ofer Shacham, Preethi Venkatesan, Christos Kozyrakis, and Mark Horowitz. Convolution engine: Balancing efficiency and flexibility in specialized computing. Communications of the ACM, 58(4):85--93, March 2015.
[38]
Ohad Rodeh, Josef Bacik, and Chris Mason. BTRFS: The Linux B-tree filesystem. ACM Transactions on Storage (TOS), 9(3):9:1--9:32, August 2013.
[39]
Christopher J. Rossbach, Jon Currey, Mark Silberstein, Baishakhi Ray, and Emmett Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11), pages 233--248, New York, NY, USA, 2011. ACM.
[40]
Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. GPUfs: Integrating a file system with GPUs. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13), pages 485--498, New York, NY, USA, 2013. ACM.
[41]
Livio Soares and Michael Stumm. FlexSC: Flexible system call scheduling with exception-less system calls. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI '10), pages 1--8, Berkeley, CA, USA, 2010. USENIX Association.
[42]
Udo Steinberg and Bernhard Kauer. NOVA: A microhypervisor-based secure virtualization architecture. In Proceedings of the 5th European Conference on Computer Systems (EuroSys '10), pages 209--222, New York, NY, USA, 2010. ACM.
[43]
M.B. Taylor. A landscape of the new dark silicon design regime. IEEE Micro, 33(5):8--19, Sept 2013.
[44]
David Wentzlaff and Anant Agarwal. Factored operating systems (fos): The case for a scalable operating system for multicores. ACM SIGOPS Operating Systems Review, 43(2):76--85, April 2009.
[45]
Jonathan Woodruff, Robert N.M. Watson, David Chisnall, Simon W. Moore, Jonathan Anderson, Brooks Davis, Ben Laurie, Peter G. Neumann, Robert Norton, and Michael Roe. The CHERI capability model: Revisiting RISC in an age of risk. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14), pages 457--468, Piscataway, NJ, USA, 2014. IEEE Press.
[46]
Lisa Wu, Raymond J. Barker, Martha A. Kim, and Kenneth A. Ross. Navigating big data with high-throughput, energy-efficient data partitioning. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13), pages 249--260, New York, NY, USA, 2013. ACM.
[47]
Wei Yu and Yun He. A high performance CABAC decoding architecture. IEEE Transactions on Consumer Electronics, 51(4):1352--1359, Nov 2005.

Cited By

View all
  • (2024)BrickOS: specialized kernels for heterogeneous hardware resourcesSCIENTIA SINICA Informationis10.1360/SSI-2022-041354:3(491)Online publication date: 11-Mar-2024
  • (2024)Robust and Immediate Resource Reclamation with M3Proceedings of the 2nd Workshop on Kernel Isolation, Safety and Verification10.1145/3698576.3698763(1-7)Online publication date: 4-Nov-2024
  • (2024)uIO: Lightweight and Extensible UnikernelsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698518(580-599)Online publication date: 20-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
March 2016
824 pages
ISBN:9781450340915
DOI:10.1145/2872362
  • General Chair:
  • Tom Conte,
  • Program Chair:
  • Yuanyuan Zhou
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. accelerators
  2. capabilities
  3. heterogeneous architectures
  4. on-chip networks
  5. operating systems

Qualifiers

  • Research-article

Funding Sources

  • Deutsche Forschungsgemeinschaft

Conference

ASPLOS '16

Acceptance Rates

ASPLOS '16 Paper Acceptance Rate 53 of 232 submissions, 23%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)196
  • Downloads (Last 6 weeks)21
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)BrickOS: specialized kernels for heterogeneous hardware resourcesSCIENTIA SINICA Informationis10.1360/SSI-2022-041354:3(491)Online publication date: 11-Mar-2024
  • (2024)Robust and Immediate Resource Reclamation with M3Proceedings of the 2nd Workshop on Kernel Isolation, Safety and Verification10.1145/3698576.3698763(1-7)Online publication date: 4-Nov-2024
  • (2024)uIO: Lightweight and Extensible UnikernelsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698518(580-599)Online publication date: 20-Nov-2024
  • (2024)Mozart: Taming Taxes and Composing Accelerators with Shared-MemoryProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676896(183-200)Online publication date: 14-Oct-2024
  • (2024) Core-Local Reasoning and Predictable Cross-Core Communication with M 3 2024 IEEE 30th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS61025.2024.00024(199-211)Online publication date: 13-May-2024
  • (2024)Trustworthy Silicon: An MPSoC for a Secure Operating System2024 IEEE Nordic Circuits and Systems Conference (NorCAS)10.1109/NorCAS64408.2024.10752473(1-7)Online publication date: 29-Oct-2024
  • (2024)Trustworthy Execution of O-RAN Applications by strong Separation and minimal Trusted Computing Base2024 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN)10.1109/NFV-SDN61811.2024.10807504(214-216)Online publication date: 5-Nov-2024
  • (2024)Efficient On-Chip ReplicationIEEE Access10.1109/ACCESS.2024.348401312(172581-172595)Online publication date: 2024
  • (2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术:综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
  • (2023)Cohort: Software-Oriented Acceleration for Heterogeneous SoCsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582059(105-117)Online publication date: 25-Mar-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media