Lowering the volatility: a practical cache allocation prediction and stability-oriented co-runner scheduling algorithms

Wang, Fei; Gao, Xiaofeng; Chen, Guihai

doi:10.1007/s11227-016-1645-7

Lowering the volatility: a practical cache allocation prediction and stability-oriented co-runner scheduling algorithms

Published: 05 February 2016

Volume 72, pages 1126–1151, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Fei Wang¹,
Xiaofeng Gao¹ &
Guihai Chen¹

158 Accesses
Explore all metrics

Abstract

The accurate and quantitative analysis of the cache behavior in a Chip Multi-Core (CMP) machine has long been a challenging work. So far there has been no practical way to predict the cache allocation, i.e., allocated cache size, of a running program. Lots of applications, especially those that have many interactions with the users, cache allocation should be estimated with high accuracy since its variation is closely related to the stability of system performance which is important to the efficient operation of servers and has a great influence on user experience. For these interests, this paper proposes an accurate prediction model for the allocation of the last level cache (LLC) of the co-runners. With a precise cache allocation predicted, we further implemented a performance-stability-oriented co-runner scheduling algorithm which aims to maximize the number of co-runners running in performance-stable state and minimize the performance variation of the unstable ones. We demonstrate that the proposed prediction algorithm exhibits a high accuracy with an average error of 5.7 %; and the co-runner scheduling algorithm can find the optimal solution under the specified target with a time complexity of O(n).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ACAM: Application Aware Adaptive Cache Management for Shared LLC

Priority Based Yield of Shared Cache to Provide Cache QoS in Multicore Systems

Article 09 July 2016

Hybrid Approach on Cache Aware Real-Time Scheduling for Multi-Core Systems

References

Eyerman S, Eeckhout L (2008) System-level performance metrics for multiprogram workloads. IEEE Micro 28:42–53. doi:10.1109/MM.2008.44
Article Google Scholar
Cazorla FJ, Knijnenburg Peter MW, Sakellariou R, Fernandez E, Ramirez A, Valero M (2004) Predictable performance in SMT processors. In: Proceedings of the 1st conference on computing frontiers. ACM, New York, pp 433–443. doi:10.1145/977091.977152
Jiang Y, Shen X (2008) Exploration of the influence of program inputs on cmp co-scheduling. Euro Conf Parall Comp. 263–273. doi:10.1007/978-3-540-85451-7_29
Sandberg A, Sembrant A, Hagersten E, Black-Schaffer D (2013) Modeling Performance Variation Due to Cache Sharing. In: Proceedings International Symposium High Performance Computer Architecture (HPCA), pp 155–166. doi:10.1109/HPCA.2013.6522315
Chen Xi E, Aamodt Tor M (2012) Modeling cache contention and throughput of multiprogrammed manycore processors. IEEE Trans Comp 61:913–927. doi:10.1109/TC.2011.141
Article MathSciNet Google Scholar
Chandra D, Guo F, Kim S, Solihin Y (2005) Predicting inter-thread cache contention on a chip multi-processor Architecture. In: Proceedings of International Symposium High-Performance Computer Architecture (HPCA), pp 76–86. doi:10.1109/HPCA.2005.27
Xu C, Chen X, Dick Rober P, Mao Zhuoqing M (2010) Cache contention and application performance prediction for multi-core systems. In: International Symposium Performance Analysis of Systems and Software (ISPASS). pp 76–86. doi:10.1109/ISPASS.2010.5452065
Xiang X, Ding C, Luo H, Bao B (2013) HOTL: a higher order theory of locality. In: Proceedings of the 18th Intl’ Conf on Architectural support for programming languages and operating systems (ASPLOS ’13). pp 343–356. doi:10.1145/2451116.2451153
Kim S, Chandra D, Solihin Y (2004) Fair cache sharing and partitioning on a chip multi-processor architecture. In: Proceedings of the 13th International Conference on Parallel Architecture and Compilation Techniques. pp 111–122. doi:10.1109/PACT.2004.15
Qureshi MK, Patt YN (2006) Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of Annual International Symposium on Microarchitecture (MICRO), pp 111–122. doi:10.1109/MICRO.2006.49
Suh GE, Devadas S, Rudolph L (2002) A new memory monitoring scheme for memory-aware scheduling and partitioning. In: Proceedings of International Symposium on High Performance Computer Architecture. doi:10.1109/HPCA.2002.995703
DeVuyst M, Kumar R, Tullsen Dean M (2006) Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In: Proceedings International Parallel and Distributed Processing Symposium(IPDPS), pp 117–126. doi:10.1109/IPDPS.2006.1639374
Jiang Yunlian, Tian Kai, Shen Xipeng, Zhang Jinghe, Jie Chen, Tripath Rahul (2010) The complexity of optimal job co-scheduling on chip multiprocessors and heuristics-based solutions. IEEE Trans Paral Distrib Syst 22:1192–1205. doi:10.1109/TPDS.2010.193
Article Google Scholar
Yunlian J, Xipeng S, Chen J, Rahul T (2008) Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: Proceedings of 17th Interantional Conference on Parallel Architectures and Compilation Techniques(PACT), pp 220–229. doi:10.1145/1454115.1454146
Zhuravlev S, Blagodurov S, Fedorova A (2010) Addressing shared resource contention in multicore processors via scheduling. In: Proceedings of Architectural support for programming languages and operating systems(ASPLOS), pp 129–142. doi:10.1145/1736020.1736036
Snavely A, Tullsen D (2000) Symbiotic job scheduling for a simultaneous multi threading processor. ASPLOS IX. doi:10.1145/356989.357011
Aamer J, Najaf-abadi Hashem H, Samantika S, Steely Simon C, Joel E (2012) CRUISE: cache replacement and utility-aware scheduling. ASPLOS XII 249–260 doi:10.1145/2150976.2151003
Gupta Saurabh, Xiang Ping, Zhang Yi, Zhou Huiyang (2013) Locality principle revisited: a probability-based quantitative approach. J Parallel Distrib Comput 73:1011–1027. doi:10.1016/j.jpdc.2013.01.010
Article Google Scholar
Xiaoya X, Bao B, Ding C, Kai S (2012) Cache conscious task regrouping on multicore processors. In: Proceedings of 12th IEEE/ACM Interantional Symposium on Cluster, Cloud and Grid Computing. doi:10.1109/CCGrid.139
Knauerhase R, Brett P, Hohlt B, Li T, Hahn S (2008) Using OS observations to improve performance in multicore systems. IEEE Micro 28:54–66. doi:10.1109/MM.2008.48
Article Google Scholar
Fedorova A, Seltzer M, Smith Michael D (2007) Improving performance isolation on chip multiprocessors via an operating system scheduler. In: Proceedings of 16th International Conference Parallel Architecture and Compilaton Techniques (PACT), pp 25-38. doi:10.1109/PACT.2007.40
Eiman E, Joo Lee C, Onur M, Patt Yale N (2010) Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. ASPLOS XV. doi:10.1145/1736020.1736058
Eklov D, Nikoleris N, Black-Schaffer D, Hagersten E (2011) Cache pirating: measuring the curse of the shared cache. In: International Conference on Parallel Processing (ICPP), pp 165–175. doi:10.1109/ICPP.2011.15
Perelman E, Polito M, Bouguet JY, Sampson J, Calder B, Dulong C (2006) Detecting phases in parallel applications on shared memory architectures. In: 20th IEEE Interantional Parallel and Distributed Processing Symposium (IPDPS), pp 88–98 doi:10.1109/IPDPS.2006.1639325
Han W, Xiaopeng G, Zhiqiang W, Yi L (2009) Using GPU to accelerate cache simulation. In: IEEE Interantional Symposium on Parallel and Distributed Processing with Applications, pp 565–570. doi:10.1109/ISP.2009.51
Curtin Ryan R, Cline James R, Slagle Neil P, March William B, Ram P, Mehta Nishant A, Gray Alexander G (2013) MLPACK: a scalable C++ machine learning library. J Mach Learn Res 801–805

Download references

Acknowledgments

This work was supported by Huawei Innovation Research Program(HIPRO, Grant Number. YB2015080028).

Author information

Authors and Affiliations

Shanghai Key Laboratory of Scalable Computing and Systems, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Fei Wang, Xiaofeng Gao & Guihai Chen

Authors

Fei Wang
View author publications
You can also search for this author inPubMed Google Scholar
Xiaofeng Gao
View author publications
You can also search for this author inPubMed Google Scholar
Guihai Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xiaofeng Gao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, F., Gao, X. & Chen, G. Lowering the volatility: a practical cache allocation prediction and stability-oriented co-runner scheduling algorithms. J Supercomput 72, 1126–1151 (2016). https://doi.org/10.1007/s11227-016-1645-7

Download citation

Published: 05 February 2016
Issue Date: March 2016
DOI: https://doi.org/10.1007/s11227-016-1645-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lowering the volatility: a practical cache allocation prediction and stability-oriented co-runner scheduling algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

ACAM: Application Aware Adaptive Cache Management for Shared LLC

Priority Based Yield of Shared Cache to Provide Cache QoS in Multicore Systems

Hybrid Approach on Cache Aware Real-Time Scheduling for Multi-Core Systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now