research-article

Characterizing Job Microarchitectural Profiles at Scale: Dataset and Analysis

Authors:

Liping ZhangAuthors Info & Claims

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

Article No.: 47, Pages 1 - 11

https://doi.org/10.1145/3545008.3545026

Published: 13 January 2023 Publication History

Abstract

Understanding the microarchitectural resource characteristics of datacenter jobs has become increasingly critical to guarantee the performance of jobs while improving resource utilization. Prior work studied the resource characteristics of datacenter jobs at the OS level, little reveals the deep and detailed characteristics at the microarchitecture level due to the lack of related open traces. In this paper, we provide a new open trace, AMTrace (Alibaba Microarchitecture Trace) 1, which is profiled from 8,577 high-end physical hosts from Alibaba’s datacenter by a hardware/software co-design monitoring method. AMTrace provides the microarchitectural metrics of 9.8 × 105 Linux containers with ”Per-Container-Per-Logic CPU” granularity. Different from existing open traces, AMTrace provides a new perspective to analyze the microarchitectural resource characteristics of datacenter jobs. Based on AMTrace, we first reveal the uneven resource usage of jobs among multiple logic CPUs. Then, we analyze the impact of resource contention of CPU and memory bandwidth on job performance. Finally, we analyze the job performance under different CPU provisioning modes from microarchitecture perspective. These analyses lead to constructive insights for datacenter resource management and optimization. Furthermore, we discuss possible research opportunities on AMTrace and we believe that AMTrace will inspire more exciting research on microarchitecture and resource management.

References

[1]

Alibaba. 2018. Alibaba Open Trace. https://github.com/alibaba/clusterdata.

[2]

Subho S Banerjee, Saurabh Jha, Zbigniew Kalbarczyk, and Ravishankar K Iyer. 2021. BayesPerf: minimizing performance monitoring errors using Bayesian statistics. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 832–844.

Digital Library

[3]

Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. 2016. Borg, omega, and kubernetes. Commun. ACM 59, 5 (2016), 50–57.

Digital Library

[4]

Shuang Chen, Christina Delimitrou, and José F Martínez. 2019. PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 107–120.

Digital Library

[5]

Russell Clapp, Martin Dimitrov, Karthik Kumar, Vish Viswanathan, and Thomas Willhalm. 2015. Quantifying the performance impact of memory latency and bandwidth for big data workloads. In 2015 IEEE International Symposium on Workload Characterization. IEEE, 213–224.

Digital Library

[6]

Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th Symposium on Operating Systems Principles. 153–167.

Digital Library

[7]

Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. In ACM SIGPLAN Notices, Vol. 48. ACM, 77–88.

[8]

Kenneth J Duda and David R Cheriton. 1999. Borrowed-virtual-time (BVT) scheduling: supporting latency-sensitive threads in a general-purpose scheduler. ACM SIGOPS Operating Systems Review 33, 5 (1999), 261–276.

Digital Library

[9]

Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. 2020. Caladan: Mitigating interference at microsecond timescales. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 281–297.

Digital Library

[10]

Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch, Arun Suresh, and Sriram Rao. 2018. Medea: scheduling of long running applications in shared production clusters. In Proceedings of the Thirteenth EuroSys Conference. 1–13.

Digital Library

[11]

Google. 2015. Google Trace Data. https://github.com/google/cluster-data.

[12]

Brendan Gregg. 2019. BPF Performance Tools. Addison-Wesley Professional.

[13]

John L Hennessy and David A Patterson. 2011. Computer architecture: a quantitative approach Fifth Edition. Elsevier.

[14]

Intel. 2018. Cache Allocation Technology. https://github.com/intel/intel-cmt-cat.

[15]

Intel. 2019. https://www.intel.com/content/www/us/en/developer/articles/technical/xeon-processor-scalable-family-technical-overview.html.

[16]

Peggy Irelan and Shihjong Kuo. 2019. Performance Monitoring Unit Sharing Guide. Intel White Paper, http://software. intel. com/file/30388,(30388-PMU-Sharing-Guidelines. pdf) (2019).

[17]

Bruce Jacob. 2009. The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It. Morgan and Claypool Publishers(2009).

[18]

Akshay Jajoo, Y. Charlie Hu, Xiaojun Lin, and Nan Deng. 2022. A Case for Task Sampling based Learning for Cluster Job Scheduling. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, Renton, WA, USA. https://www.usenix.org/conference/nsdi22/presentation/jajoo

[19]

Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a warehouse-scale computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 158–169.

Digital Library

[20]

Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. 2007. kvm: the Linux virtual machine monitor. In Proceedings of the Linux symposium, Vol. 1. Dttawa, Dntorio, Canada, 225–230.

[21]

Andi Leen. 2021. PMU-tools. https://github.com/andikleen/pmu-tools/.

[22]

Linux. 2021. Linux CGroup. https://www.man7.org/linux/man-pages/man7/cgroups.7.html.

[23]

Linux. 2021. Linux manual page of perf_event_open. https://www.man7.org/linux/man-pages/man2/perf_event_open.2.html.

[24]

John DC Little and Stephen C Graves. 2008. Little’s law. In Building intuition. Springer, 81–100.

[25]

Qixiao Liu and Zhibin Yu. 2018. The Elasticity and Plasticity in Semi-Containerized Co-locating Cloud Workload: a View from Alibaba Trace. In the ACM Symposium.

Digital Library

[26]

David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 450–462.

[27]

Chengzhi Lu, Kejiang Ye, Guoyao Xu, Cheng-Zhong Xu, and Tongxin Bai. 2017. Imbalance in the cloud: An analysis on alibaba cluster trace. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2884–2892.

[28]

Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture. ACM, 248–259.

Digital Library

[29]

Microsoft. 2019. AzurePublicDataset. https://github.com/Azure/AzurePublicDataset.

[30]

Ripal Nathuji, Aman Kansal, and Alireza Ghaffarkhah. 2010. Q-clouds: managing performance interference effects for qos-aware clouds. In Proceedings of the 5th European conference on Computer systems. 237–250.

Digital Library

[31]

Dejan Novaković, Nedeljko Vasić, Stanko Novaković, Dejan Kostić, and Ricardo Bianchini. 2013. Deepdive: Transparently identifying and managing performance interference in virtualized environments. In 2013 {USENIX} Annual Technical Conference ({USENIX}{ATC} 13). 219–230.

[32]

Jinsu Park, Seongbeom Park, Myeonggyun Han, Jihoon Hyun, and Woongki Baek. 2018. Hypart: a hybrid technique for practical memory bandwidth partitioning on commodity servers. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques. 1–14.

Digital Library

[33]

Tirthak Patel and Devesh Tiwari. 2020. Clite: Efficient and qos-aware co-location of multiple latency-critical jobs for warehouse scale computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 193–206.

[34]

Haoran Qiu, Subho S Banerjee, Saurabh Jha, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. 2020. {FIRM}: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices. In 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20). 805–825.

[35]

Weng Qizhen, Xiao Wencong, Yu Yinghao, Wang Wei, Wang Chen, He Jian, Li Yong, Zhang Liping, Lin Wei, and Yu Ding. 2022. MLaaS inthe Wild: Workload Analysis and Scheduling in Large HeterogeneousGPU Clusters. In USENIX NSDI.

[36]

Alessandro Randazzo and Ilenia Tinnirello. 2019. Kata containers: An emerging architecture for enabling mec services in fast and secure way. In 2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS). IEEE, 209–214.

[37]

Charles Reiss, Alexey Tumanov, Gregory R Ganger, Randy H Katz, and Michael A Kozuch. 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the third ACM symposium on cloud computing. 1–13.

Digital Library

[38]

Sharanyan Srikanthan, Sandhya Dwarkadas, and Kai Shen. 2015. Data sharing or resource contention: Toward performance transparency on multicore systems. In 2015 {USENIX} Annual Technical Conference ({USENIX}{ATC} 15). 529–540.

[39]

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems. 1–17.

Digital Library

[40]

Yaocheng Xiang, Chencheng Ye, Xiaolin Wang, Yingwei Luo, and Zhenlin Wang. 2019. EMBA: Efficient memory bandwidth allocation to improve performance on intel commodity processor. In Proceedings of the 48th International Conference on Parallel Processing. 1–12.

Digital Library

[41]

Ran Xu, Subrata Mitra, Jason Rahman, Peter Bai, Bowen Zhou, Greg Bronevetsky, and Saurabh Bagchi. 2018. Pythia: Improving datacenter utilization via precise contention prediction for multiple co-located workloads. In Proceedings of the 19th International Middleware Conference. 146–160.

Digital Library

[42]

Ahmad Yasin. 2014. A top-down method for performance analysis and counters architecture. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 35–44.

[43]

Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. 2013. CPI 2: CPU performance isolation for shared compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 379–391.

Digital Library

[44]

Yunqi Zhang, Michael A Laurenzano, Jason Mars, and Lingjia Tang. 2014. Smite: Precise qos prediction on real-system smt processors to improve utilization in warehouse scale computers. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 406–418.

Digital Library

[45]

Zhuo Zhang, Chao Li, Yangyu Tao, Renyu Yang, Hong Tang, and Jie Xu. 2014. Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. Proceedings of the VLDB Endowment 7, 13 (2014), 1393–1404.

Digital Library

[46]

Laiping Zhao, Yanan Yang, Yiming Li, Xian Zhou, and Keqiu Li. 2021. Understanding, predicting and scheduling serverless workloads under partial interference. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–15.

Digital Library

[47]

Laiping Zhao, Yanan Yang, Kaixuan Zhang, Xiaobo Zhou, Tie Qiu, Keqiu Li, and Yungang Bao. 2020. Rhythm: component-distinguishable workload deployment in datacenters. In Proceedings of the Fifteenth European Conference on Computer Systems. 1–17.

Digital Library

Cited By

Liao HLiu TGuo JHuang BYang DDing J(2025)Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.349487936:1(67-83)Online publication date: Jan-2025
https://doi.org/10.1109/TPDS.2024.3494879
Liao HGuo JHuang BHan YYang DShi KDing JXu GYang GZhang LFilkov VRay BZhou M(2024)DeployFix: Dynamic Repair of Software Deployment Failures via Constraint SolvingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695268(2053-2064)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695268
Yao ZYe HPei CCheng GWang GLiu ZChen HCui HLi ZLi JXie GPei D(2024)SparseRCA: Unsupervised Root Cause Analysis in Sparse Microservice Testing Traces2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00045(391-402)Online publication date: 28-Oct-2024
https://doi.org/10.1109/ISSRE62328.2024.00045
Show More Cited By

Recommendations

Live Migration Impact on Virtual Datacenter Performance: Vmware vMotion Based Study
FICLOUD '14: Proceedings of the 2014 International Conference on Future Internet of Things and Cloud

Cloud computing is the future wave of information technology that provides infrastructure, platform and application as on demand services with low cost and rapid scalability. Infrastructure resources virtualization is the backbone of cloud computing to ...
Improving Resource Efficiency at Scale with Heracles

User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared ...
Enabling Instantaneous Relocation of Virtual Machines with a Lightweight VMM Extension
CCGRID '10: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing

We are developing an efficient resource management system with aggressive virtual machine (VM) relocation among physical nodes in a data center. Existing live migration technology, however, requires a long time to change the execution host of a VM, it ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

August 2022

976 pages

ISBN:9781450397339

DOI:10.1145/3545008

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Key RD Project of Guangdong Province

Conference

ICPP '22

ICPP '22: 51st International Conference on Parallel Processing

August 29 - September 1, 2022

Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
834
Total Downloads

Downloads (Last 12 months)307
Downloads (Last 6 weeks)21

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liao HLiu TGuo JHuang BYang DDing J(2025)Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.349487936:1(67-83)Online publication date: Jan-2025
https://doi.org/10.1109/TPDS.2024.3494879
Liao HGuo JHuang BHan YYang DShi KDing JXu GYang GZhang LFilkov VRay BZhou M(2024)DeployFix: Dynamic Repair of Software Deployment Failures via Constraint SolvingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695268(2053-2064)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695268
Yao ZYe HPei CCheng GWang GLiu ZChen HCui HLi ZLi JXie GPei D(2024)SparseRCA: Unsupervised Root Cause Analysis in Sparse Microservice Testing Traces2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00045(391-402)Online publication date: 28-Oct-2024
https://doi.org/10.1109/ISSRE62328.2024.00045
Yao JLi JMo XWu W(2024)Modeling Memory Bandwidth Interference in Cloud Data Centers via Deep Learning2024 9th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS61882.2024.10603358(441-447)Online publication date: 19-Apr-2024
https://doi.org/10.1109/ICCCS61882.2024.10603358
Yang YMerlina ASong WYuan TBirman KVitenberg R(2024)Navigator: A Decentralized Scheduler for Latency-Sensitive AI Workflows2024 IEEE International Conference on Edge Computing and Communications (EDGE)10.1109/EDGE62653.2024.00015(35-47)Online publication date: 7-Jul-2024
https://doi.org/10.1109/EDGE62653.2024.00015
Chen ZJiang ZSu YLyu MZheng Z(2024)Tracemesh: Scalable and Streaming Sampling for Distributed Traces2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00016(54-65)Online publication date: 7-Jul-2024
https://doi.org/10.1109/CLOUD62652.2024.00016
Zhao YRao HLe KWang WXu YJia G(2024)RCFS: rate and cost fair CPU scheduling strategy in edge nodesThe Journal of Supercomputing10.1007/s11227-024-05997-y80:10(14000-14028)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1007/s11227-024-05997-y
Siavashi AMomtazpour M(2024)GPU cluster dynamics: insights from Alibaba’s 2023 trace releaseComputing10.1007/s00607-024-01369-9107:1Online publication date: 20-Nov-2024
https://doi.org/10.1007/s00607-024-01369-9
Li HLiu HLiu CChen ANiu ZDu J(2023)NeiLatS: Neighbor-Aware Latency-Sensitive Application Scheduling in Heterogeneous Cloud-Edge EnvironmentProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605630(615-624)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605630
Zhang DWu QZhao YLi ZHe PZhang YDuan XZhang YMiao GTong JXie G(2023)MALT: Fine-Grained Microservice Profiling for Request Latency Anomaly Localization2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00025(114-121)Online publication date: 17-Dec-2023
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00025

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten