skip to main content
10.1145/3409963.3410487acmconferencesArticle/Chapter ViewAbstractPublication PagesapsysConference Proceedingsconference-collections
research-article

A heterogeneous microkernel OS for Rack-Scale systems

Published: 24 August 2020 Publication History

Abstract

Datacenters are adopting heterogeneous hardware in the form of different CPU ISAs and accelerators. Advances in low-latency and high-bandwidth interconnects enable hardware vendors to tighten the coupling of multiple CPU servers and accelerators. The closer connection of components facilitates bigger machines, which pose a new challenge to operating systems. We advocate to build a heterogeneous OS for large heterogeneous systems by combining multiple OS design principles to leverage the benefits of each design. Because a security-oriented design, enabled by simplicity and clear encapsulation, is vital in datacenters, we choose to survey various design principles found in microkernel-based systems. We explain that heterogeneous hardware employs different mechanisms to enforce access rights, for example for memory accesses or communication channels. We outline a way to combine enforcement mechanisms of CPUs and accelerators in one system. A consequence of this is a heterogeneous access rights management which is implemented as a heterogeneous capability system in a microkernel-based OS.

References

[1]
Compute Express Link (CXL) Promoters 2019. Compute Express Link Specification 1.0, 2019.
[2]
Reto Achermann, Robert N. M. Watson, Chris Dalton, Paolo Faraboschi, Moritz Hoffmann, Dejan Milojicic, Geoffrey Ndu, Alexander Richardson, Timothy Roscoe, and Adrian L. Shaw. Separating translation from protection in address spaces with dynamic remapping. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (HotOS), 2017.
[3]
Sandeep R Agrawal, Sam Idicula, Arun Raghavan, Evangelos Vlachos, Venkatraman Govindaraju, Venkatanathan Varadarajan, Cagri Balkesen, Georgios Giannikis, Charlie Roth, Nipun Agarwal, and Eric Sedlar. A many-core architecture for in-memory data processing. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2017.
[4]
Alexey Andreyev. The New FacebookDC Topology, 2019. OCP Summit 2019.
[5]
Nils Asmussen, Marcus Völp, Benedikt Nöthen, Hermann Härtig, and Gerhard Fettweis. M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2016.
[6]
Antonio Barbalace, Binoy Ravindran, and David Katz. Popcorn: a replicated-kernel OS based on Linux. Ottawa Linux Symposium (OLS), 2014.
[7]
Andrew Baumann, Paul Barham, Pierre-evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. The Multikernel: A New OS Architecture for Scalable Multicore Systems. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles (SOSP), 2009.
[8]
Andrew Baumann, Chris Hawblitzel, Kornilios Kourtis, Tim Harris, and Timothy Roscoe. Cosh: clear OS data sharing in an incoherent world. In 2014 Conference on Timely Results in Operating Systems (TRIOS), 2014.
[9]
Simon Biggs, Damon Lee, and Gernot Heiser. The Jury Is In: Monolithic OS Design Is Flawed: Microkernel-based Designs Improve Security. In 9th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys), 2018.
[10]
Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. Forwarding metamorphosis: Fast programmable match-action processing in hardware for sdn. In Proceedings of the ACM SIGCOMM 2013 Conference (SIGCOMM), 2013.
[11]
Silas Boyd-wickizer, Austin T Clements, Yandong Mao, Aleksey Pesterev, M Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. An Analysis of Linux Scalability to Many Cores. Proceedings of the 9th USENIX conference on Operating systems design and implementation (OSDI), 2010.
[12]
Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for GPUs: Stream computing on graphics hardware. In ACM SIGGRAPH, 2004.
[13]
Adrian M. Caulfield, Eric S. Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger. A cloud-scale acceleration architecture. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016.
[14]
Xiaoxin Chen, Tal Garfinkel, E. Christopher Lewis, Pratap Subrahmanyam, Carl A. Waldspurger, Dan Boneh, Jeffrey Dwoskin, and Dan R.K. Ports. Overshadow: A virtualization-based approach to retrofitting protection in commodity operating systems. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2008.
[15]
Gen-Z Consortium. Gen-z core specification version 1.0, 2018.
[16]
Microsoft Corporation. How microsoft designs its cloud-scale servers. https://www.microsoft.com, 2014.
[17]
Jack B. Dennis and Earl C. Van Horn. Programming semantics for multiprogrammed computations. Communications of the ACM, 1966.
[18]
Kevin Elphinstone and Gernot Heiser. From L3 to seL4 what have we learnt in 20 years of L4 microkernels? In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP), 2013.
[19]
D. Foley and J. Danskin. Ultra-performance pascal gpu and nvlink interconnect. IEEE Micro, 2017.
[20]
Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, Stephen Heil, Prerak Patel, Adam Sapek, Gabriel Weisz, Lisa Woods, Sitaram Lanka, Steve Reinhardt, Adrian Caulfield, Eric Chung, and Doug Burger. A configurable cloud-scale dnn processor for real-time ai. In Proceedings of the 45th International Symposium on Computer Architecture (ISCA), 2018.
[21]
N. A. Gawande, J. B. Landwehr, J. A. Daily, N. R. Tallent, A. Vishnu, and D. J. Kerbyson. Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017.
[22]
H. Härtig, M. Hohmuth, N. Feske, C. Helmuth, A. Lackorzynski, F. Mehnert, and M. Peter. The nizza secure-system architecture. In 2005 International Conference on Collaborative Computing: Networking, Applications and Worksharing, 2005.
[23]
Robert Hormuth. Dell EMC's 2019 Server Trends & Observations. https://blog.dellemc.com/en-us/dell-emc-s-2019-server-trends-observations, 2019.
[24]
Muhuan Huang, Di Wu, Cody Hao Yu, Zhenman Fang, Matteo Interlandi, Tyson Condie, and Jason Cong. Programming and runtime support to blaze fpga accelerator deployment at datacenter scale. In Proceedings of the Seventh ACM Symposium on Cloud Computing (SoCC), 2016.
[25]
Amazon Web Services Inc. Amazon elastic graphics. https://aws.amazon.com/ec2/elastic-graphics/.
[26]
Google Inc. Cloud TPUs - ML accelerators for TensorFlow. https://cloud.google.com/tpu/.
[27]
Google Inc. System architecture cloud tpu. https://cloud.google.com/tpu/docs/system-architecture.
[28]
Jouppi et al. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), 2017.
[29]
Katrinis et al. Rack-scale disaggregated cloud data centers: The dredbox project vision. In 2016 Design, Automation Test in Europe Conference Exhibition (DATE), 2016.
[30]
Stefanos Kaxiras and Alberto Ros. A new perspective for efficient virtual-cache coherence. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA), 2013.
[31]
Patrick Kennedy. Facebook zion accelerator platform for oam. https://www.servethehome.com/facebook-zion-accelerator-platform-for-oam.
[32]
Patrick Kennedy. Gen-z in dell emc poweredge mx and cxl implications. https://www.servethehome.com/gen-z-in-dell-emc-poweredge-mx-and-cxl-implications.
[33]
Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. sel4: Formal verification of an os kernel. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP), 2009.
[34]
Adam Lackorzynski and Alexander Warg. Taming subsystems: Capabilities as universal resource access control in L4. In Proceedings of the Second Workshop on Isolation and Integration in Embedded Systems (IIES), 2009.
[35]
Jochen Liedtke. On μ-kernel construction. In Proceedings of the fifteenth ACM symposium on Operating systems principles (OSDI), 1995.
[36]
Anthony Liguori. C5 instances and the evolution of amazon ec2 virtualization, 2018.
[37]
Milo M. K. Martin, Mark D. Hill, and Daniel J. Sorin. Why on-chip cache coherence is here to stay. Communications of the ACM (CACM), 2012.
[38]
Open Compute Project Microsoft Corporation. Project olympus 1u server mechanical specification. https://www.opencompute.org/wiki/Server/ProjectOlympus, 2017.
[39]
Vlad Nitu, Boris Teabe, Alain Tchana, Canturk Isci, and Daniel Hagimont. Welcome to zombieland: Practical and energy-efficient memory disaggregation in a datacenter. In Proceedings of the Thirteenth EuroSys Conference (EuroSys), 2018.
[40]
Pierre Olivier, Sang-Hoon Kim, and Binoy Ravindran. Os support for thread migration and distribution in the fully heterogeneous datacenter. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems - HotOS '17, 2017.
[41]
Ardavan Pedram, Stephen Richardson, Mark Horowitz, Sameh Galal, and Shahar Kvatinsky. Dark memory and accelerator-rich system optimization in the dark silicon era. IEEE Design and Test (IEEE D&T), 2017.
[42]
B. Poudel, N. Kumar Giri, and A. Munir. Design and comparative evaluation of gpgpu- and fpga-based mpsoc ecu architectures for secure, dependable, and real-time automotive cps. In 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2017.
[43]
Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. LegoOS: a disseminated, distributed OS for hardware resource disaggregation. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018.
[44]
Jonathan S. Shapiro, Jonathan M. Smith, and David J. Farber. Eros: A fast capability system. In Proceedings of the Seventeenth ACM Symposium on Operating Systems Principles (SOSP), 1999.
[45]
Mark Silberstein. OmniX: an accelerator-centric OS for omni-programmable systems. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (HotOS), 2017.
[46]
Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. GPUfs: Integrating a File System with GPUs. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2013.
[47]
Hayden Kwok-Hay So and Robert Brodersen. A unified hardware/software runtime environment for fpga-based reconfigurable computers using borph. ACM Transaction of Embedded Computing Systems (TECS), 2008.
[48]
Siamak Tavallaei, Whitney Zhao, Tiffany Jin, Cheng Chen, and Richard Ding. OCP Accelerator Module (OAM), 2019. OCP Summit 2019.
[49]
Mellanox Technologies. Bluefield multicore system on chip. http://www.mellanox.com/related-docs/npu-multicore-processors/PB_Bluefield_SoC.pdf.
[50]
Qi Wang, Yuxin Ren, Matt Scaperoth, and Gabriel Parmer. SPeCK: a kernel for scalable predictability. In 21st IEEE Real-Time and Embedded Technology and Applications Symposium, 2015.
[51]
David Wentzlaff and Anant Agarwal. Factored operating systems (fos): The Case for a Scalable Operating System for Multicores. ACM SIGOPS Operating Systems Review, 2009.
[52]
Jonathan Woodruff, Robert N M Watson, David Chisnall, Simon W. Moore, Jonathan Anderson, Brooks Davis, Ben Laurie, Peter G. Neumann, Robert Norton, and Michael Roe. The cheri capability model: Revisiting risc in an age of risk. In Proceedings of the 41st International Symposium on Computer Architecture (ISCA), 2014.
[53]
Hansen Zhang, Soumyadeep Ghosh, Jordan Fix, Sotiris Apostolakis, Stephen R Beard, Nayana P. Nagendra, Taewook Oh, and David I August. Architectural Support for Containment-based Security. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2019.
[54]
Peng Zhang, Jianbin Fang, Canqun Yang, Tao Tang, Chun Huang, and Zheng Wang. MOCL: An efficient openCL implementation for the Matrix-2000 architecture. In Proceedings of the 15th ACM International Conference on Computing Frontiers (CF), 2018.
[55]
Whitney Zhao, Siamak Tavallaei, Richard Ding, and Tiffany Jin. OCP Accelerator Module (OAM) System: An Open Accelerator Infrastructure Project, 2019. OCP Summit 2019.

Cited By

View all
  • (2023)Towards OS Heterogeneity Aware Cluster Management for HPCProceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3609510.3609819(16-23)Online publication date: 24-Aug-2023
  • (2023)Trusted Heterogeneous Disaggregated ArchitecturesProceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3609510.3609812(72-79)Online publication date: 24-Aug-2023
  • (2023)Cohort: Software-Oriented Acceleration for Heterogeneous SoCsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582059(105-117)Online publication date: 25-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
APSys '20: Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems
August 2020
135 pages
ISBN:9781450380690
DOI:10.1145/3409963
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2020

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

APSys '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 169 of 430 submissions, 39%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Towards OS Heterogeneity Aware Cluster Management for HPCProceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3609510.3609819(16-23)Online publication date: 24-Aug-2023
  • (2023)Trusted Heterogeneous Disaggregated ArchitecturesProceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3609510.3609812(72-79)Online publication date: 24-Aug-2023
  • (2023)Cohort: Software-Oriented Acceleration for Heterogeneous SoCsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582059(105-117)Online publication date: 25-Mar-2023
  • (2022)Towards practical multikernel OSes with MySySProceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3546591.3547525(29-37)Online publication date: 23-Aug-2022
  • (2022)VMSHProceedings of the Seventeenth European Conference on Computer Systems10.1145/3492321.3519589(678-696)Online publication date: 28-Mar-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media