skip to main content
10.1145/3545008.3545026acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Characterizing Job Microarchitectural Profiles at Scale: Dataset and Analysis

Published: 13 January 2023 Publication History

Abstract

Understanding the microarchitectural resource characteristics of datacenter jobs has become increasingly critical to guarantee the performance of jobs while improving resource utilization. Prior work studied the resource characteristics of datacenter jobs at the OS level, little reveals the deep and detailed characteristics at the microarchitecture level due to the lack of related open traces. In this paper, we provide a new open trace, AMTrace (Alibaba  Microarchitecture Trace) 1, which is profiled from 8,577 high-end physical hosts from Alibaba’s datacenter by a hardware/software co-design monitoring method. AMTrace provides the microarchitectural metrics of 9.8 × 105 Linux containers with ”Per-Container-Per-Logic CPU” granularity. Different from existing open traces, AMTrace  provides a new perspective to analyze the microarchitectural resource characteristics of datacenter jobs. Based on AMTrace, we first reveal the uneven resource usage of jobs among multiple logic CPUs. Then, we analyze the impact of resource contention of CPU and memory bandwidth on job performance. Finally, we analyze the job performance under different CPU provisioning modes from microarchitecture perspective. These analyses lead to constructive insights for datacenter resource management and optimization. Furthermore, we discuss possible research opportunities on AMTrace and we believe that AMTrace will inspire more exciting research on microarchitecture and resource management.

References

[1]
Alibaba. 2018. Alibaba Open Trace. https://github.com/alibaba/clusterdata.
[2]
Subho S Banerjee, Saurabh Jha, Zbigniew Kalbarczyk, and Ravishankar K Iyer. 2021. BayesPerf: minimizing performance monitoring errors using Bayesian statistics. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 832–844.
[3]
Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. 2016. Borg, omega, and kubernetes. Commun. ACM 59, 5 (2016), 50–57.
[4]
Shuang Chen, Christina Delimitrou, and José F Martínez. 2019. PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 107–120.
[5]
Russell Clapp, Martin Dimitrov, Karthik Kumar, Vish Viswanathan, and Thomas Willhalm. 2015. Quantifying the performance impact of memory latency and bandwidth for big data workloads. In 2015 IEEE International Symposium on Workload Characterization. IEEE, 213–224.
[6]
Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th Symposium on Operating Systems Principles. 153–167.
[7]
Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. In ACM SIGPLAN Notices, Vol. 48. ACM, 77–88.
[8]
Kenneth J Duda and David R Cheriton. 1999. Borrowed-virtual-time (BVT) scheduling: supporting latency-sensitive threads in a general-purpose scheduler. ACM SIGOPS Operating Systems Review 33, 5 (1999), 261–276.
[9]
Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. 2020. Caladan: Mitigating interference at microsecond timescales. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 281–297.
[10]
Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch, Arun Suresh, and Sriram Rao. 2018. Medea: scheduling of long running applications in shared production clusters. In Proceedings of the Thirteenth EuroSys Conference. 1–13.
[11]
Google. 2015. Google Trace Data. https://github.com/google/cluster-data.
[12]
Brendan Gregg. 2019. BPF Performance Tools. Addison-Wesley Professional.
[13]
John L Hennessy and David A Patterson. 2011. Computer architecture: a quantitative approach Fifth Edition. Elsevier.
[14]
Intel. 2018. Cache Allocation Technology. https://github.com/intel/intel-cmt-cat.
[15]
Intel. 2019. https://www.intel.com/content/www/us/en/developer/articles/technical/xeon-processor-scalable-family-technical-overview.html.
[16]
Peggy Irelan and Shihjong Kuo. 2019. Performance Monitoring Unit Sharing Guide. Intel White Paper, http://software. intel. com/file/30388,(30388-PMU-Sharing-Guidelines. pdf) (2019).
[17]
Bruce Jacob. 2009. The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It. Morgan and Claypool Publishers(2009).
[18]
Akshay Jajoo, Y. Charlie Hu, Xiaojun Lin, and Nan Deng. 2022. A Case for Task Sampling based Learning for Cluster Job Scheduling. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, Renton, WA, USA. https://www.usenix.org/conference/nsdi22/presentation/jajoo
[19]
Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a warehouse-scale computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 158–169.
[20]
Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. 2007. kvm: the Linux virtual machine monitor. In Proceedings of the Linux symposium, Vol. 1. Dttawa, Dntorio, Canada, 225–230.
[21]
Andi Leen. 2021. PMU-tools. https://github.com/andikleen/pmu-tools/.
[22]
Linux. 2021. Linux CGroup. https://www.man7.org/linux/man-pages/man7/cgroups.7.html.
[23]
Linux. 2021. Linux manual page of perf_event_open. https://www.man7.org/linux/man-pages/man2/perf_event_open.2.html.
[24]
John DC Little and Stephen C Graves. 2008. Little’s law. In Building intuition. Springer, 81–100.
[25]
Qixiao Liu and Zhibin Yu. 2018. The Elasticity and Plasticity in Semi-Containerized Co-locating Cloud Workload: a View from Alibaba Trace. In the ACM Symposium.
[26]
David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 450–462.
[27]
Chengzhi Lu, Kejiang Ye, Guoyao Xu, Cheng-Zhong Xu, and Tongxin Bai. 2017. Imbalance in the cloud: An analysis on alibaba cluster trace. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2884–2892.
[28]
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture. ACM, 248–259.
[29]
Microsoft. 2019. AzurePublicDataset. https://github.com/Azure/AzurePublicDataset.
[30]
Ripal Nathuji, Aman Kansal, and Alireza Ghaffarkhah. 2010. Q-clouds: managing performance interference effects for qos-aware clouds. In Proceedings of the 5th European conference on Computer systems. 237–250.
[31]
Dejan Novaković, Nedeljko Vasić, Stanko Novaković, Dejan Kostić, and Ricardo Bianchini. 2013. Deepdive: Transparently identifying and managing performance interference in virtualized environments. In 2013 {USENIX} Annual Technical Conference ({USENIX}{ATC} 13). 219–230.
[32]
Jinsu Park, Seongbeom Park, Myeonggyun Han, Jihoon Hyun, and Woongki Baek. 2018. Hypart: a hybrid technique for practical memory bandwidth partitioning on commodity servers. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques. 1–14.
[33]
Tirthak Patel and Devesh Tiwari. 2020. Clite: Efficient and qos-aware co-location of multiple latency-critical jobs for warehouse scale computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 193–206.
[34]
Haoran Qiu, Subho S Banerjee, Saurabh Jha, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. 2020. {FIRM}: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices. In 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20). 805–825.
[35]
Weng Qizhen, Xiao Wencong, Yu Yinghao, Wang Wei, Wang Chen, He Jian, Li Yong, Zhang Liping, Lin Wei, and Yu Ding. 2022. MLaaS inthe Wild: Workload Analysis and Scheduling in Large HeterogeneousGPU Clusters. In USENIX NSDI.
[36]
Alessandro Randazzo and Ilenia Tinnirello. 2019. Kata containers: An emerging architecture for enabling mec services in fast and secure way. In 2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS). IEEE, 209–214.
[37]
Charles Reiss, Alexey Tumanov, Gregory R Ganger, Randy H Katz, and Michael A Kozuch. 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the third ACM symposium on cloud computing. 1–13.
[38]
Sharanyan Srikanthan, Sandhya Dwarkadas, and Kai Shen. 2015. Data sharing or resource contention: Toward performance transparency on multicore systems. In 2015 {USENIX} Annual Technical Conference ({USENIX}{ATC} 15). 529–540.
[39]
Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems. 1–17.
[40]
Yaocheng Xiang, Chencheng Ye, Xiaolin Wang, Yingwei Luo, and Zhenlin Wang. 2019. EMBA: Efficient memory bandwidth allocation to improve performance on intel commodity processor. In Proceedings of the 48th International Conference on Parallel Processing. 1–12.
[41]
Ran Xu, Subrata Mitra, Jason Rahman, Peter Bai, Bowen Zhou, Greg Bronevetsky, and Saurabh Bagchi. 2018. Pythia: Improving datacenter utilization via precise contention prediction for multiple co-located workloads. In Proceedings of the 19th International Middleware Conference. 146–160.
[42]
Ahmad Yasin. 2014. A top-down method for performance analysis and counters architecture. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 35–44.
[43]
Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. 2013. CPI 2: CPU performance isolation for shared compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 379–391.
[44]
Yunqi Zhang, Michael A Laurenzano, Jason Mars, and Lingjia Tang. 2014. Smite: Precise qos prediction on real-system smt processors to improve utilization in warehouse scale computers. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 406–418.
[45]
Zhuo Zhang, Chao Li, Yangyu Tao, Renyu Yang, Hong Tang, and Jie Xu. 2014. Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. Proceedings of the VLDB Endowment 7, 13 (2014), 1393–1404.
[46]
Laiping Zhao, Yanan Yang, Yiming Li, Xian Zhou, and Keqiu Li. 2021. Understanding, predicting and scheduling serverless workloads under partial interference. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–15.
[47]
Laiping Zhao, Yanan Yang, Kaixuan Zhang, Xiaobo Zhou, Tie Qiu, Keqiu Li, and Yungang Bao. 2020. Rhythm: component-distinguishable workload deployment in datacenters. In Proceedings of the Fifteenth European Conference on Computer Systems. 1–17.

Cited By

View all
  • (2025)Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.349487936:1(67-83)Online publication date: Jan-2025
  • (2024)DeployFix: Dynamic Repair of Software Deployment Failures via Constraint SolvingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695268(2053-2064)Online publication date: 27-Oct-2024
  • (2024)SparseRCA: Unsupervised Root Cause Analysis in Sparse Microservice Testing Traces2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00045(391-402)Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '22: Proceedings of the 51st International Conference on Parallel Processing
August 2022
976 pages
ISBN:9781450397339
DOI:10.1145/3545008
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. datacenter
  2. microarchitecture
  3. performance
  4. resource contention

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Key RD Project of Guangdong Province

Conference

ICPP '22
ICPP '22: 51st International Conference on Parallel Processing
August 29 - September 1, 2022
Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)307
  • Downloads (Last 6 weeks)21
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.349487936:1(67-83)Online publication date: Jan-2025
  • (2024)DeployFix: Dynamic Repair of Software Deployment Failures via Constraint SolvingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695268(2053-2064)Online publication date: 27-Oct-2024
  • (2024)SparseRCA: Unsupervised Root Cause Analysis in Sparse Microservice Testing Traces2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00045(391-402)Online publication date: 28-Oct-2024
  • (2024)Modeling Memory Bandwidth Interference in Cloud Data Centers via Deep Learning2024 9th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS61882.2024.10603358(441-447)Online publication date: 19-Apr-2024
  • (2024)Navigator: A Decentralized Scheduler for Latency-Sensitive AI Workflows2024 IEEE International Conference on Edge Computing and Communications (EDGE)10.1109/EDGE62653.2024.00015(35-47)Online publication date: 7-Jul-2024
  • (2024)Tracemesh: Scalable and Streaming Sampling for Distributed Traces2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00016(54-65)Online publication date: 7-Jul-2024
  • (2024)RCFS: rate and cost fair CPU scheduling strategy in edge nodesThe Journal of Supercomputing10.1007/s11227-024-05997-y80:10(14000-14028)Online publication date: 14-Mar-2024
  • (2024)GPU cluster dynamics: insights from Alibaba’s 2023 trace releaseComputing10.1007/s00607-024-01369-9107:1Online publication date: 20-Nov-2024
  • (2023)NeiLatS: Neighbor-Aware Latency-Sensitive Application Scheduling in Heterogeneous Cloud-Edge EnvironmentProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605630(615-624)Online publication date: 7-Aug-2023
  • (2023)MALT: Fine-Grained Microservice Profiling for Request Latency Anomaly Localization2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00025(114-121)Online publication date: 17-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media