ABSTRACT
Microservices are emerging as a popular cloud-computing paradigm. Microservice environments execute typically-short service requests that interact with one another via remote procedure calls (often across machines), and are subject to stringent tail-latency constraints. In contrast, current processors are designed for traditional monolithic applications. They support global hardware cache coherence, provide large caches, incorporate microarchitecture for long-running, predictable applications (such as advanced prefetching), and are optimized to minimize average latency rather than tail latency.
To address this imbalance, this paper proposes μManycore, an architecture optimized for cloud-native microservice environments. Based on a characterization of microservice applications, μManycore is designed to minimize unnecessary microarchitecture and mitigate overheads to reduce tail latency. Indeed, rather than supporting manycore-wide hardware cache coherence, μManycore has multiple small hardware cache-coherent domains, called Villages. Clusters of villages are interconnected with an on-package leaf-spine network, which has many redundant, low-hop-count paths between clusters. To minimize latency overheads, μManycore schedules and queues service requests in hardware, and includes hardware support to save and restore process state when doing a context-switch. Our simulation-based results show that μManycore delivers high performance. A cluster of 10 servers with a 1024-core μManycore in each server delivers 3.7× lower average latency, 15.5× higher throughput, and, importantly, 10.4× lower tail latency than a cluster with iso-power conventional server-class multicores. Similar good results are attained compared to a cluster with power-hungry iso-area conventional server-class multicores.
- Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. 2020. Firecracker: Lightweight Virtualization for Serverless Applications. In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI '20).Google Scholar
- Amazon AWS. 2023. AWS Lambda. https://aws.amazon.com/lambda/.Google Scholar
- ARM. 2023. ARM Cortex A15. https://developer.arm.com/Processors/Cortex-A15.Google Scholar
- Akhil Arunkumar, Evgeny Bolotin, Benjamin Cho, Ugljesa Milic, Eiman Ebrahimi, Oreste Villa, Aamer Jaleel, Carole-Jean Wu, and David Nellans. 2017. MCM-GPU: Multi-chip-module GPUs for continued performance scalability. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA '17).Google ScholarDigital Library
- Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories. ACM Transactions on Architecture and Code Optimization (TACO '17) (2017).Google ScholarDigital Library
- Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The Multikernel: A New OS Architecture for Scalable Multicore Systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09).Google ScholarDigital Library
- Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A Protected Dataplane Operating System for High Throughput and Low Latency. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI '14).Google ScholarDigital Library
- Rahul Bera, Konstantinos Kanellopoulos, Anant Nori, Taha Shahroodi, Sreenivas Subramoney, and Onur Mutlu. 2021. Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning.Google Scholar
- Srikant Bharadwaj, Jieming Yin, Bradford Beckmann, and Tushar Krishna. 2020. Kite: A Family of Heterogeneous Interposer Topologies Enabled via Accurate Interconnect Modeling. In 2020 57th ACM/IEEE Design Automation Conference (DAC '20).Google ScholarCross Ref
- Milind Chabbi and Murali Krishna Ramanathan. 2022. A Study of Real-World Data Races in Golang. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI '22).Google ScholarDigital Library
- Shenghsun Cho, Amoghavarsha Suresh, Tapti Palit, Michael Ferdman, and Nima Honarmand. 2018. Taming the Killer Microsecond. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '18).Google Scholar
- Cisco. 2023. Cisco Spine and Leaf Architecture. https://ciscolicense.com/blog/cisco-spine-and-leaf-architecture/.Google Scholar
- Clang. 2023. A C language family frontend for LLVM. https://clang.llvm.org.Google Scholar
- Google Cloud. 2023. What is Microservices Architecture? https://cloud.google.com/learn/what-is-microservices-architecture.Google Scholar
- Alexandros Daglis, Mark Sutherland, and Babak Falsafi. 2019. RPCValet: NI-Driven Tail-Aware Balancing of μs-Scale RPCs. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19).Google ScholarDigital Library
- Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale. Commun. ACM 56 (2013), 74--80.Google ScholarDigital Library
- Docker. 2023. Docker Compose. https://docs.docker.com/compose/.Google Scholar
- Dong Du, Tianyi Yu, Yubin Xia, Binyu Zang, Guanglu Yan, Chenggang Qin, Qixuan Wu, and Haibo Chen. 2020. Catalyzer: Sub-Millisecond Startup for Serverless Computing with Initialization-Less Booting. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20).Google ScholarDigital Library
- Pete Ehrett, Todd Austin, and Valeria Bertacco. 2021. Chopin: Composing Cost-Effective Custom Chips with Algorithmic Chiplets. In 2021 IEEE 39th International Conference on Computer Design (ICCD '21).Google Scholar
- Engineering at Meta. 2023. Introducing data center fabric, the next-generation Facebook data center network. https://engineering.fb.com/2014/11/14/production-engineering/introducing-data-center-fabric-the-next-generation-facebook-data-center-network/.Google Scholar
- B. Flachs, S. Asano, S.H. Dhong, H.P. Hofstee, G. Gervais, R. Kim, T. Le, P. Liu, J. Leenstra, J. Liberty, B. Michael, H.-J. Oh, S.M. Mueller, O. Takahashi, A. Hatakeyama, Y. Watanabe, N. Yano, D.A. Brokenshire, M. Peyravian, V. To, and E. Iwata. 2006. The microarchitecture of the Synergistic Processor for a Cell Processor. IEEE Journal of Solid-State Circuits (2006).Google Scholar
- Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. 2020. Caladan: Mitigating Interference at Microsecond Timescales. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI '20).Google Scholar
- Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, Kelvin Hu, Meghna Pancholi, Yuan He, Brett Clancy, Chris Colen, Fukang Wen, Catherine Leung, Siyuan Wang, Leon Zaruvinsky, Mateo Espinosa, Rick Lin, Zhongling Liu, Jake Padilla, and Christina Delimitrou. 2019. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19).Google ScholarDigital Library
- Golang. 2023. Http Package. https://pkg.go.dev/net/http.Google Scholar
- Google. 2023. Google Cloud Functions. https://cloud.google.com/functions.Google Scholar
- Google. 2023. gVisor: Container Runtime Sandbox. https://gvisor.dev/docs/.Google Scholar
- gRPC. 2023. An RPC library and framework. https://github.com/grpc/grpc.Google Scholar
- Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, James Law, Kevin Lee, Jason Lu, Pieter Noordhuis, Misha Smelyanskiy, Liang Xiong, and Xiaodong Wang. 2018. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. In IEEE International Symposium on High Performance Computer Architecture (HPCA '18).Google ScholarCross Ref
- Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, and Joaquin Quiñonero Candela. 2014. Practical Lessons from Predicting Clicks on Ads at Facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising.Google ScholarDigital Library
- Jason Howard, Saurabh Dighe, Yatin Hoskote, Sriram Vangal, David Finan, Gregory Ruhl, David Jenkins, Howard Wilson, Nitin Borkar, Gerhard Schrom, Fabrice Pailet, Shailendra Jain, Tiju Jacob, Satish Yada, Sraven Marella, Praveen Salihundam, Vasantha Erraguntla, Michael Konow, Michael Riepen, Guido Droege, Joerg Lindemann, Matthias Gries, Thomas Apel, Kersten Henriss, Tor Lund-Larsen, Sebastian Steibl, Shekhar Borkar, Vivek De, Rob Van Der Wijngaart, and Timothy Mattson. 2010. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In 2010 IEEE International Solid-State Circuits Conference - (ISSCC '10).Google ScholarCross Ref
- Jack Tigar Humphries, Neel Natu, Ashwin Chaugule, Ofir Weisse, Barret Rhoden, Josh Don, Luigi Rizzo, Oleg Rombakh, Paul Turner, and Christos Kozyrakis. 2021. GhOSt: Fast & Flexible User-Space Delegation of Linux Scheduling. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP '21).Google ScholarDigital Library
- Stephen Ibanez, Alex Mallery, Serhat Arslan, Theo Jepsen, Muhammad Shahbaz, Changhoon Kim, and Nick McKeown. 2021. The nanoPU: A Nanosecond Network Stack for Datacenters. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI '21).Google Scholar
- IBM. 2023. IBM Cloud Functions. https://cloud.ibm.com/functions/.Google Scholar
- Intel. 2023. Intel Xeon Platinum 8380 Processor. https://ark.intel.com/content/www/us/en/ark/products/212287/intel-xeon-platinum-8380-processor-60m-cache-2-30-ghz.html.Google Scholar
- Daniel Jimenez and Calvin. Lin. 2001. Dynamic branch prediction with perceptrons. In Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture (HPCA '01).Google ScholarCross Ref
- Kostis Kaffes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Maziéres, and Christos Kozyrakis. 2019. Shinjuku: Preemptive Scheduling for μsecond-scale Tail Latency. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI '19).Google Scholar
- Kostis Kaffes, Neeraja J. Yadwadkar, and Christos Kozyrakis. 2019. Centralized Core-Granular Scheduling for Serverless Functions. In Proceedings of the ACM Symposium on Cloud Computing (SoCC '19).Google ScholarDigital Library
- Anuj Kalia, Michael Kaminsky, and David Andersen. 2019. Datacenter RPCs can be General and Fast. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI '19).Google Scholar
- Mahmoud Khairy, Ahmad Alawneh, Aaron Barnes, and Timothy G. Rogers. 2022. SIMR: Single Instruction Multiple Request Processing for Energy-Efficient Data Center Microservices. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO '22).Google Scholar
- Tanvir Ahmed Khan, Akshitha Sriraman, Joseph Devietti, Gilles Pokam, Heiner Litz, and Baris Kasikci. 2020. I-SPY: Context-Driven Conditional Instruction Prefetching with Coalescing. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '20).Google Scholar
- Tanvir Ahmed Khan, Dexin Zhang, Akshitha Sriraman, Joseph Devietti, Gilles Pokam, Heiner Litz, and Baris Kasikci. 2021. Ripple: Profile-Guided Instruction Cache Replacement for Data Center Applications. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA '21).Google Scholar
- Kubernetes. 2023. Production-Grade Container Orchestration. https://kubernetes.io/.Google Scholar
- Sanjeev Kumar, Christopher J. Hughes, and Anthony Nguyen. 2007. Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA '07).Google ScholarDigital Library
- Nikita Lazarev, Neil Adit, Shaojie Xiang, Zhiru Zhang, and Christina Delimitrou. 2020. Dagger: Towards Efficient RPCs in Cloud Microservices With Near-Memory Reconfigurable NICs. IEEE Computer Architecture Letters (2020).Google Scholar
- Sanghoon Lee, Devesh Tiwari, Yan Solihin, and James Tuck. 2011. HAQu: Hardware-accelerated queueing for fine-grained threading on a chip multiprocessor. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA '11).Google ScholarCross Ref
- Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '09).Google Scholar
- Linux. 2023. Pt Regs. https://elixir.bootlin.com/linux/v5.17/source/arch/86/include/asm/ptrace.h#L59.Google Scholar
- Linux. 2023. Thread Struct. https://elixir.bootlin.com/linux/v5.17/source/arch/86/include/asm/processor.h#L467.Google Scholar
- Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '05).Google ScholarDigital Library
- Shutian Luo, Huanle Xu, Chengzhi Lu, Kejiang Ye, Guoyao Xu, Liping Zhang, Yu Ding, Jian He, and Chengzhong Xu. 2021. Characterizing Microservice Dependency and Performance: Alibaba Trace Analysis. In Proceedings of the ACM Symposium on Cloud Computing (SoCC '21).Google ScholarDigital Library
- Shutian Luo, Huanle Xu, Kejiang Ye, Guoyao Xu, Liping Zhang, Guodong Yang, and Chengzhong Xu. 2022. The Power of Prediction: Microservice Auto Scaling via Workload Learning. In Proceedings of the ACM Symposium on Cloud Computing (SoCC '22).Google ScholarDigital Library
- Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C. Evans, Steve Gribble, Nicholas Kidd, Roman Kononov, Gautam Kumar, Carl Mauer, Emily Musick, Lena Olson, Erik Rubow, Michael Ryan, Kevin Springborn, Paul Turner, Valas Valancius, Xi Wang, and Amin Vahdat. 2019. Snap: A Microkernel Approach to Host Networking. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19).Google ScholarDigital Library
- Sarah McClure, Amy Ousterhout, Scott Shenker, and Sylvia Ratnasamy. 2022. Efficient Scheduling Policies for Microsecond-Scale Tasks. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI' 22).Google Scholar
- Microsoft. 2023. Microsoft Azure Functions. https://azure.microsoft.com/en-gb/services/functions/.Google Scholar
- Amirhossein Mirhosseini, Brendan L. West, Geoffrey W. Blake, and Thomas F. Wenisch. 2019. Express-Lane Scheduling and Multithreading to Minimize the Tail Latency of Microservices. In 2019 IEEE International Conference on Autonomic Computing (ICAC '19).Google Scholar
- Amirhossein Mirhosseini, Akshitha Sriraman, and Thomas F. Wenisch. 2019. Enhancing Server Efficiency in the Face of Killer Microseconds. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA '19).Google Scholar
- Samuel Naffziger, Noah Beck, Thomas Burd, Kevin Lepak, Gabriel H. Loh, Mahesh Subramony, and Sean White. 2021. Pioneering Chiplet Technology and Design for the AMD EPYC™ and Ryzen™ Processor Families : Industrial Product. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA '21).Google ScholarDigital Library
- Nokia Networks. 2023. Event Machine on ODP. https://openeventmachine.github.io/em-odp/.Google Scholar
- Old GigaOm. 2011. The biggest thing Amazon got right: The platform. https://old.gigaom.com/2011/10/12/419-the-biggest-thing-amazon-got-right-the-platform/.Google Scholar
- Oracle. 2023. MySQL. https://www.mysql.com.Google Scholar
- Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. 2019. Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI '19).Google Scholar
- Arash Pourhabibi, Mark Sutherland, Alexandros Daglis, and Babak Falsafi. 2021. Cerebros: Evading the RPC Tax in Datacenters. In 2021 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '21).Google Scholar
- George Prekas, Marios Kogias, and Edouard Bugnion. 2017. ZygOS: Achieving Low Tail Latency for Microsecond-Scale Networked Tasks. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17).Google ScholarDigital Library
- Chris Richardson. 2023. What are microservices? https://microservices.io/.Google Scholar
- Arun F. Rodrigues, Jeanine Cook, Elliott Cooper-Balis, K. Scott Hemmert, Chad Kersey, Rolf Riesen, Paul Rosenfeld, Ron Oldfield, and Marlow Weston. 2006. The Structural Simulation Toolkit. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC '10).Google ScholarDigital Library
- Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Computer Architecture Letters (2011).Google Scholar
- Alexander Rucker, Muhammad Shahbaz, Tushar Swamy, and Kunle Olukotun. 2019. Elastic RSS: Co-Scheduling Packets and Cores Using Programmable NICs. In Proceedings of the 3rd Asia-Pacific Workshop on Networking 2019 (APNet '19).Google ScholarDigital Library
- Daniel Sanchez, Richard M. Yoo, and Christos Kozyrakis. 2010. Flexible Architectural Support for Fine-Grain Scheduling. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '10).Google ScholarDigital Library
- Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel Emer, C. Thomas Gray, Brucek Khailany, and Stephen W. Keckler. 2019. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '19).Google Scholar
- Shixin Song, Tanvir Ahmed Khan, Sara Mahdizadeh Shahri, Akshitha Sriraman, Niranjan K Soundararajan, Sreenivas Subramoney, Daniel A. Jiménez, Heiner Litz, and Baris Kasikci. 2022. Thermometer: Profile-Guided BTB Replacement for Data Center Applications. In Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA '22).Google ScholarDigital Library
- Spring Framework. 2023. RestController. https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/bind/annotation/RestController.html.Google Scholar
- Akshitha Sriraman and Abhishek Dhanotia. 2020. Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20).Google ScholarDigital Library
- Akshitha Sriraman, Abhishek Dhanotia, and Thomas F. Wenisch. 2019. SoftSKU: Optimizing Server Architectures for Microservice Diversity @Scale. In Proceedings of the 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA '19).Google Scholar
- Akshitha Sriraman and Thomas F. Wenisch. 2018. μSuite: A Benchmark Suite for Microservices. In IEEE International Symposium on Workload Characterization (IISWC '18).Google Scholar
- Akshitha Sriraman and Thomas F. Wenisch. 2018. μTune: Auto-Tuned Threading for OLDI Microservices. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI '18).Google Scholar
- Aaron Stillmaker and Bevan Baas. 2017. Scaling equations for the accurate prediction of CMOS device performance from 180nm to 7nm. Integration the VLSI journal (2017).Google Scholar
- Mark Sutherland, Siddharth Gupta, Babak Falsafi, Virendra Marathe, Dionisios Pnevmatikatos, and Alexandros Daglis. 2020. The NEBULA RPC-Optimized Architecture. In Proceedings of the 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA '20).Google ScholarDigital Library
- The Apache Software Foundation. 2023. Apache Cassandra. https://cassandra.apache.org/.Google Scholar
- The Apache Software Foundation. 2023. Apache Kafka. https://kafka.apache.org/.Google Scholar
- The Apache Software Foundation. 2023. Apache Thrift. https://thrift.apache.org/.Google Scholar
- Think Software. 2021. Microservices Architecture of Twitter Service. https://thinksoftware.medium.com/design-twitter-microservices-architecture-of-twitter-service-996ddd68e1ca.Google Scholar
- Uber. 2020. Introducing Domain-Oriented Microservice Architecture. https://www.uber.com/blog/microservice-architecture/.Google Scholar
- Rob F. van der Wijngaart, Timothy G. Mattson, and Werner Haas. 2011. LightWeight Communications on Intel's Single-Chip Cloud Computer Processor. SIGOPS Operating Systems Review (2011).Google Scholar
- Ketan Varshneya. 2021. Understanding design of microservices architecture at Netflix. https://www.techaheadcorp.com/blog/design-of-microservices-architecture-at-netflix/.Google Scholar
- Kangjin Wang, Cheng Wang, Tong Jia, Kingsum Chow, Yang Wen, Yaoyong Dou, Guoyao Xu, Chuanjia Hou Hou, Jie Yao, Liping Zhang Zhang, and Ying Li Li. 2022. Characterizing Job Microarchitectural Profiles at Scale: Dataset and Analysis. In 51st International Conference on Parallel Processing (ICPP '22).Google ScholarDigital Library
- Tianqi Wang, Fan Feng, Shaolin Xiang, Qi Li, and Jing Xia. 2022. Application Defined On-chip Networks for Heterogeneous Chiplets: An Implementation Perspective. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA '22).Google Scholar
- David Wentzlaff and Anant Agarwal. 2009. Factored Operating Systems (fos): The Case for a Scalable Operating System for Multicores. SIGOPS Oper. Syst. Rev. (2009).Google Scholar
- Wordpress. 2023. Blog Tool, Publishing Platform, and CMS. https://wordpress.org/.Google Scholar
- Yibo Wu, Liang Wang, Xiaohang Wang, Jie Han, Jianfeng Zhu, Honglan Jiang, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2022. Upward Packet Popup for Deadlock Freedom in Modular Chiplet-Based Systems. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA '22).Google Scholar
- Jieming Yin, Zhifeng Lin, Onur Kayiran, Matthew Poremba, Muhammad Shoaib Bin Altaf, Natalie Enright Jerger, and Gabriel H. Loh. 2018. Modular Routing Design for Chiplet-Based Systems. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA '18).Google Scholar
- Irene Zhang, Amanda Raybuck, Pratyush Patel, Kirk Olynyk, Jacob Nelson, Omar S. Navarro Leija, Ashlie Martinez, Jing Liu, Anna Kornfeld Simpson, Sujay Jayakar, Pedro Henrique Penna, Max Demoulin, Piali Choudhury, and Anirudh Badam. 2021. The Demikernel Datapath OS Architecture for Microsecond-Scale Data-center Systems. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP '21).Google ScholarDigital Library
- Xiantao Zhang, Xiao Zheng, Zhi Wang, Hang Yang, Yibin Shen, and Xin Long. 2020. High-Density Multi-Tenant Bare-Metal Cloud. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20).Google ScholarDigital Library
- Yunqi Zhang, David Meisner, Jason Mars, and Lingjia Tang. 2016. Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference. In ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA '16).Google ScholarDigital Library
- Zhizhou Zhang, Murali Krishna Ramanathan, Prithvi Raj, Abhishek Parwal, Timothy Sherwood, and Milind Chabbi. 2022. CRISP: Critical Path Analysis of Large-Scale Microservice Architectures. In USENIX Annual Technical Conference (USENIX ATC '22).Google ScholarCross Ref
- Jiechen Zhao, Iris Uwizeyimana, Karthik Ganesan, Mark C. Jeffrey, and Natalie Enright Jerger. 2022. ALTOCUMULUS: Scalable Scheduling for Nanosecond-Scale Remote Procedure Calls. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO '22).Google Scholar
- Xiang Zhou, Xin Peng, Tao Xie, Jun Sun, Chenjie Xu, Chao Ji, and Wenyun Zhao. 2018. Benchmarking Microservice Systems for Software Engineering Research. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (ICSE '18) (Gothenburg, Sweden).Google ScholarDigital Library
- Hang Zhu, Kostis Kaffes, Zixu Chen, Zhenming Liu, Christos Kozyrakis, Ion Stoica, and Xin Jin. 2020. RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI '20).Google Scholar
Index Terms
- μManycore: A Cloud-Native CPU for Tail at Scale
Recommendations
Exploring a Hybrid Voting-based Eviction Policy for Caches and Sparse Directories on Manycore Architectures
AbstractIn manycore systems, eviction decisions related to caches and memory coherence greatly impact system performance, thereby emphasizing their importance. Extensive research has produced numerous standalone eviction policies such as LRU, ...
CPU Cache Prefetching: Timing Evaluation of Hardware Implementations
Prefetching into CPU caches has long been known to be effective in reducing the cache miss ratio, but known implementations of prefetching have been unsuccessful in improving CPU performance. The reasons for this are that prefetches interfere with ...
Virtualized environments in cloud can have superlinear speedup
BCI '12: Proceedings of the Fifth Balkan Conference in InformaticsCPU cache is used to speedup the execution of memory intensive algorithms. Usage of greater cache memory sizes reduces the cache misses and overall execution time. This paper addresses architectures in modern processors realized as multi chip and multi ...
Comments