research-article

Survey of scheduling techniques for addressing shared resources in multicore processors

Authors:

Sergey Zhuravlev,

Juan Carlos Saez,

Sergey Blagodurov,

Alexandra Fedorova,

Manuel PrietoAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 45, Issue 1

Article No.: 4, Pages 1 - 28

https://doi.org/10.1145/2379776.2379780

Published: 07 December 2012 Publication History

Abstract

Chip multicore processors (CMPs) have emerged as the dominant architecture choice for modern computing platforms and will most likely continue to be dominant well into the foreseeable future. As with any system, CMPs offer a unique set of challenges. Chief among them is the shared resource contention that results because CMP cores are not independent processors but rather share common resources among cores such as the last level cache (LLC). Shared resource contention can lead to severe and unpredictable performance impact on the threads running on the CMP. Conversely, CMPs offer tremendous opportunities for mulithreaded applications, which can take advantage of simultaneous thread execution as well as fast inter thread data sharing. Many solutions have been proposed to deal with the negative aspects of CMPs and take advantage of the positive. This survey focuses on the subset of these solutions that exclusively make use of OS thread-level scheduling to achieve their goals. These solutions are particularly attractive as they require no changes to hardware and minimal or no changes to the OS. The OS scheduler has expanded well beyond its original role of time-multiplexing threads on a single core into a complex and effective resource manager. This article surveys a multitude of new and exciting work that explores the diverse new roles the OS scheduler can successfully take on.

References

[1]

Albonesi, D. H. 1999. Selective cache ways: On-demand cache resource allocation. In Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 32). 248--259.

Digital Library

[2]

Allen, M. D., Sridharan, S., and Sohi, G. S. 2009. Serialization Sets: A dynamic dependence- based parallel execution model. In Proceedings of the 14^th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '09). 85--96.

Digital Library

[3]

Awasthi, M., Sudan, K., Balasubramonian, R., and Carter, J. 2009. Dynamic hardware- assisted software-controlled page placement to manage capacity allocation and sharing within large caches. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture (HPCA'09). 250--261.

[4]

Azimi, R., Soares, L., Stumm, M., Walsh, T., and Brown, A. D. 2007. Path: Page access tracking to improve memory management. In Proceedings of the 6th International Symposium on Memory Management (ISMM'07). 31--42.

Digital Library

[5]

Azimi, R., Tam, D. K., Soares, L., and Stumm, M. 2009. Enhancing operating system support for multicore processors by using hardware performance monitoring. SIGOPS Oper. Syst. Rev. 43, 2, 56--65.

Digital Library

[6]

Balasubramonian, R., Albonesi, D., Buyuktosunoglu, A., and Dwarkadas, S. 2000. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 33). 245--257.

Digital Library

[7]

Banikazemi, M., Poff, D., and Abali, B. 2008. Pam: A novel performance/power aware meta- scheduler for multi-core systems. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC'08). 1--12.

Digital Library

[8]

Baumann, A., Barham, P., Dagand, P.-E., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., and Singhania, A. 2009. The multikernel: a new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP'09). 29--44.

Digital Library

[9]

Berg, E. and Hagersten, E. 2004. Statcache: A probabilistic approach to efficient and accurate data locality analysis. In Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'04). 20--27.

Digital Library

[10]

Best, M. J., Mottishaw, S., Mustard, C., Roth, M., Fedorova, A., and Brownsword, A. 2011. Synchronization via scheduling: Techniques for efficiently managing shared state in video games. In Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation (PLDI'11).

Digital Library

[11]

Bitirgen, R., Ipek, E., and Martinez, J. F. 2008. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO 41). 318--329.

Digital Library

[12]

Blagodurov, S. and Fedorova, A. 2011. User-level scheduling on NUMA multicore systems under Linux. In Proceedings of the 13th Annual Linux Symposium.

[13]

Blagodurov, S., Zhuravlev, S., Dashti, M., and Fedorova, A. 2011. A case for NUMA-aware contention management on multicore processors. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC).

Digital Library

[14]

Blumofe, R. D., Joerg, C. F., Kuszmaul, B. C., Leiserson, C. E., Randall, K. H., and Zhou, Y. 1995. CILK: An efficient multithreaded runtime system. J. Paral. Distrib. Comput. 207--216.

Digital Library

[15]

Boyd-Wickizer, S., Chen, H., Chen, R., Mao, Y., Kaashoek, F., Morris, R., Pesterev, A., Stein, L., Wu, M., Dai, Y., Zhang, Y., and Zhang, Z. 2008. Corey: An operating system for many cores. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI '08). 43--57.

Digital Library

[16]

Burger, D., Goodman, J. R., and Kägi, A. 1996. Memory bandwidth limitations of future microprocessors. In Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA'96). 78--89.

Digital Library

[17]

Cascaval, C., Rose, L. D., Padua, D. A., and Reed, D. A. 2000. Compile-time based performance prediction. In Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing (LPCP99). 365--379.

Digital Library

[18]

Chandra, D., Guo, F., Kim, S., and Solihin, Y. 2005. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA'05). 340--351.

Digital Library

[19]

Chandra, R., Dagum, L., Kohr, D., Maydan, D., McDonald, J., and Menon, R. 2001. Parallel programming in OpenMP. In Proceedings of EuroPar'09.

Digital Library

[20]

Chang, J. and Sohi, G. S. 2007. Cooperative cache partitioning for chip multiprocessors. In Proceedings of the 21st Annual International Conference on Supercomputing (ICS'07). 242--252.

Digital Library

[21]

Chaudhuri, M. 2009. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture, 2009 (HPCA 2009). 227--238.

[22]

Chen, R., Chen, H., and Zang, B. 2010. Tiled MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT 2010). Vienna, Austria.

Digital Library

[23]

Chew, J. 2006. Memory placement optimization (MPO) (http://opensolaris.org/os/community/&percnt; linebreak{0}performance/numa/mpo update.pdf).

[24]

Chishti, Z., Powell, M. D., and Vijaykumar, T. N. 2005. Optimizing replication, communication, and capacity allocation in CMPs. In Proceedings of the 32nd annual international symposium on Computer Architecture (ICSA'05). 357--368.

Digital Library

[25]

Cho, S. and Jin, L. 2006. Managing distributed, shared L2 caches through OS-level page allo-cation. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39). 455--468.

Digital Library

[26]

Colmenares, J. A., Bird, S., Cook, H., Pearce, P., Zhu, D., Shalf, J., Hofmeyr, S., Asanovic', K., and Kubiatowicz, J. 2010. Resource management in the tessellation manycore OS. In Poster session at 2nd USENIX Workshop on Hot Topics in Parallelism.

[27]

Delaluz, V., Sivasubramaniam, A., Kandemir, M., Vijaykrishnan, N., and Irwin, M. J. 2002. Scheduler-based DRAM energy management. In Proceedings of the 39th Annual Design Automation Conference (DAC'02). 697--702.

Digital Library

[28]

Denning, P. J. 1968. The working set model for program behavior. Commun. ACM 11, 323--333.

Digital Library

[29]

Dybdahl, H. and Stenstrom, P. 2007. An adaptive shared/private NUCA cache partitioning scheme for chip multiprocessors. In Proceedings of the 2007 IEEE 13th International Sympo-sium on High Performance Computer Architecture. 2--12.

Digital Library

[30]

Fedorova, A., Seltzer, M., and Smith, M. D. 2007. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proceedings of the 16^th International Conference on Parallel Architecture and Compilation Techniques (PACT'07). 25--38.

Digital Library

[31]

Gordon-Ross, A., Viana, P., Vahid, F., Najjar, W., and Barros, E. 2007. A one-shot configurable-cache tuner for improved energy and performance. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'07). 755--760.

Digital Library

[32]

Guo, F., Kannan, H., Zhao, L., Illikkal, R., Iyer, R., Newell, D., Solihin, Y., and Kozyrakis, C. 2007. From chaos to QoS: case studies in CMP resource management. SIGARCH Comput. Archit. News 35, 1, 21--30.

Digital Library

[33]

Guo, F. and Solihin, Y. 2006. An analytical model for cache replacement policy performance. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '06/Performance '06). 228--239.

Digital Library

[34]

Hardavellas, N., Ferdman, M., Falsafi, B., and Ailamaki, A. 2009. Reactive NUCA: near-optimal block placement and replication in distributed caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). 184--195.

Digital Library

[35]

Hofmeyr, S., Iancu, C., and Blagojević, F. 2010. Load balancing on speed. In Proceedings of the 15^th ACM SIGPLAN Symposium on Prinicples and Practice of Parallel Programming (PPoPP'10). ACM.

Digital Library

[36]

Hoste, K. and Eeckhout, L. 2007. Microarchitecture-independent workload characterization. IEEE Micro 27, 3, 63--72.

Digital Library

[37]

Hsu, L. R., Reinhardt, S. K., Iyer, R., and Makineni, S. 2006. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT'06). 13--22.

Digital Library

[38]

Hur, I. and Lin, C. 2004. Adaptive history-based memory schedulers. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 37). 343--354.

Digital Library

[39]

Ipek, E., Mutlu, O., Martínez, J. F., and Caruana, R. 2008. Self-optimizing memory con- trollers: A reinforcement learning approach. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). 39--50.

Digital Library

[40]

Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. 2007. Dryad: distributed data- parallel programs from sequential building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys'07). 59--72.

Digital Library

[41]

Iyer, R. 2004. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proceedings of the 18th Annual International Conference on Supercomputing (ICS'04). 257--266.

Digital Library

[42]

Iyer, R., Zhao, L., Guo, F., Illikkal, R., Makineni, S., Newell, D., Solihin, Y., Hsu, L., and Reinhardt, S. 2007. QoS policies and architecture for cache/memory in CMP platforms. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 25--36.

Digital Library

[43]

Jaleel, A., Hasenplaugh, W., Qureshi, M., Sebot, J., Steely, Jr., S., and Emer, J. 2008. Adaptive insertion policies for managing shared caches. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). 208--219.

Digital Library

[44]

Jiang, X., Mishra, A. K., Zhao, L., Iyer, R., Fang, Z., Srinivasan, S., Makineni, S., Brett, P., and Das, C. R. 2011. Access: Smart scheduling for asymmetric cache CMPs. In Proceedings of the IEEE 17th International Symposium on High-Performance Computer Architecture (HPCA'11). 527--538.

Digital Library

[45]

Jiang, Y., Shen, X., Chen, J., and Tripathi, R. 2008. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). 220--229.

Digital Library

[46]

Kamali, A. 2010. Sharing aware scheduling on multicore systems. M.S. dissertation, Simon Fraser University, Burnaby, BC, Canada.

[47]

Kim, C., Burger, D., and Keckler, S. W. 2002. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X). 211--222.

Digital Library

[48]

Kim, S., Chandra, D., and Solihin, Y. 2004. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT'04). 111--122.

Digital Library

[49]

Kim, Y., Han, D., Mutlu, O., and Harchol-Balter, M. 2010. ATLAS: A scalable and high- performance scheduling algorithm for multiple memory controllers. In Proceedings of the IEEE 16^th International Symposium on High Performance Computer Architecture (HPCA'10). 1--12.

[50]

Klues, K., Rhoden, B., Waterman, A., Zhu, D., and Brewer, E. 2010. Processes and resource management in a scalable many-core OS. In Poster session at 2nd USENIX Workshop on Hot Topics in Parallelism.

[51]

Knauerhase, R., Brett, P., Hohlt, B., Li, T., and Hahn, S. 2008. Using OS observations to improve performance in multicore systems. IEEE Micro 28, 3, 54--66.

Digital Library

[52]

Knobe, K. 2009. Ease of use with concurrent collections (CnC). In Proceedings of the 1^st USENIX Workshop on Hot Topics in Parallelism (HotPar'09).

Digital Library

[53]

Kondo, M., Sasaki, H., and Nakamura, H. 2007. Improving fairness, throughput and energy- efficiency on a chip multiprocessor through DVFs. SIGARCH Comput. Archit. News 35, 1, 31--38.

Digital Library

[54]

Kotera, I., Egawa, R., Takizawa, H., and Kobayashi, H. 2007. A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs. In Proceedings of the 2007 Workshop on MEmory Performance (MEDEA'07). 113--120.

Digital Library

[55]

Kotera, I., Egawa, R., Takizawa, H., and Kobayashi, H. 2008. Modeling of cache access behavior based on Zipf's law. In Proceedings of the 9th Workshop on MEmory Performance (MEDEA'08). 9--15.

Digital Library

[56]

Koukis, E. and Koziris, N. 2005. Memory bandwidth aware scheduling for SMP cluster nodes. In Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP'05). 187--196.

Digital Library

[57]

Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., and Chew, L. P. 2007. Optimistic parallelism requires abstractions. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation (PLDI'07). 211--222.

Digital Library

[58]

Kumar, K., Vengerov, D., Fedorova, A., and Kalogeraki, V. 2011. FACT: A framework for adaptive contention-aware thread migrations. In Proceedings of the ACM International Conference on Computing Frontiers (CF'11).

Digital Library

[59]

Lee, R., Ding, X., Chen, F., Lu, Q., and Zhang, X. 2009. Mcc-db: minimizing cache conflicts in multi-core processors for databases. Proc. VLDB Endow. 2, 1, 373--384.

Digital Library

[60]

Leonard, T. 2007. Dragged kicking and screaming: Source multicore. In Proceedings of the Game Developers Conference.

[61]

Li, T., Baumberger, D., Koufaty, D. A., and Hahn, S. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

Digital Library

[62]

Liang, Y. and Mitra, T. 2008a. Cache modeling in probabilistic execution time analysis. In Proceedings of the 45th Annual Design Automation Conference (DAC'08). 319--324.

Digital Library

[63]

Liang, Y. and Mitra, T. 2008b. Static analysis for fast and accurate design space exploration of caches. In: Proceedings of the 6th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'08). 103--108.

Digital Library

[64]

Liedtke, J., Haertig, H., and Hohmuth, M. 1997. OS-controlled cache predictability for real- time systems. In Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS'97). 213.

Digital Library

[65]

Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., and Sadayappan, P. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA'08). 367--378.

[66]

Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., and Sadayappan, P. 2009. Enabling software management for multicore caches with a lightweight hardware support. In Proceedings of the Conference on High-Performance Computing Networking, Storage and Analysis (SC'09). Article No. 14.

Digital Library

[67]

Liu, C., Sivasubramaniam, A., and Kandemir, M. 2004. Organizing the last line of defense before hitting the memory wall for CMPs. In Proceedings of the 10th International Symposium on High Performance Computer Architecture (HPCA'04). 176.

Digital Library

[68]

Loh, G. H. 2008. 3d-stacked memory architectures for multi-core processors. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). IEEE Computer Society, Washington, DC, 453--464.

Digital Library

[69]

Mattson, R. L., Gecsei, J., Slutz, D. R., and Traiger, I. L. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 78--117.

Digital Library

[70]

McGregor, R. L., Antonopoulos, C. D., and Nikolopoulos, D. S. 2005. Scheduling algorithms for effective thread pairing on hybrid multiprocessors. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) -- Papers. 28.1.

Digital Library

[71]

Merkel, A., Stoess, J., and Bellosa, F. 2010. Resource-conscious scheduling for energy efficiency on multicore processors. In Proceedings of the 5th European Conference on Computer Systems (EuroSys'10). 153--166.

Digital Library

[72]

Moreto, M., Cazorla, F. J., Ramirez, A., Sakellariou, R., and Valero, M. 2009. FLEXDCP: A QoS framework for CMP architectures. SIGOPS Oper. Syst. Rev. 43, 2, 86--96.

Digital Library

[73]

Moscibroda, T. and Mutlu, O. 2007. Memory performance attacks: Denial of memory service in multi-core systems. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium. 18:1--18:18.

Digital Library

[74]

Mutlu, O. and Moscibroda, T. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40). 146--160.

Digital Library

[75]

Mutlu, O. and Moscibroda, T. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). 63--74.

Digital Library

[76]

Nesbit, K. J., Aggarwal, N., Laudon, J., and Smith, J. E. 2006. Fair queuing memory systems. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39). 208--222.

Digital Library

[77]

Nesbit, K. J., Laudon, J., and Smith, J. E. 2007. Virtual private caches. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA'07). 57--68.

Digital Library

[78]

Pesterev, A., Zeldovich, N., and Morris, R. T. 2010. Locating cache performance bottlenecks using data profiling. In Proceedings of the 5th European Conference on Computer Systems (EuroSys'10). 335--348.

Digital Library

[79]

Peter, S., Schupbach, A., Barham, P., Baumann, A., Isaacs, R., Harris, T., and Roscoe, T. 2010. Design principles for end-to-end multicore schedulers. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Parallelism.

Digital Library

[80]

Qureshi, M. K., Lynch, D. N., Mutlu, O., and Patt, Y. N. 2006. A case for MLP-aware cache replacement. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA'06). IEEE Computer Society, Washington, DC, 167--178.

Digital Library

[81]

Qureshi, M. K. and Patt, Y. N. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39). 423--432.

Digital Library

[82]

Rafique, N., Lim, W.-T., and Thottethodi, M. 2006. Architectural support for operating system-driven CMP cache management. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT'06). 2--12.

Digital Library

[83]

Reddy, R. and Petrov, P. 2007. Eliminating inter-process cache interference through cache reconfigurability for real-time and low-power embedded multi-tasking systems. In Proceedings of the 2007 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'07). 198--207.

Digital Library

[84]

Reinders, J. 2007. Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O'Reilly. (ASPLOS'10, EUROPAR'09.)

[85]

Rixner, S., Dally, W. J., Kapasi, U. J., Mattson, P., and Owens, J. D. 2000. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA'00). 128--138.

Digital Library

[86]

Shi, X., Su, F., Peir, J.-k., Xia, Y., and Yang, Z. 2007. CMP cache performance projection: Accessibility vs. capacity. SIGARCH Comput. Archit. News 35, 1, 13--20.

Digital Library

[87]

Snavely, A., Tullsen, D. M., and Voelker, G. 2002. Symbiotic job scheduling with priorities for a simultaneous multithreading processor. In Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 66--76.

Digital Library

[88]

Soares, L., Tam, D., and Stumm, M. 2008. Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO 41). 258--269.

Digital Library

[89]

Srikantaiah, S., Das, R., Mishra, A. K., Das, C. R., and Kandemir, M. 2009. A case for integrated processor-cache partitioning in chip multiprocessors. In Proceedings of the Conference on High-Performance Computing Networking, Storage and Analysis (SC'09). Article No. 6.

Digital Library

[90]

Srikantaiah, S., Kandemir, M., and Irwin, M. J. 2008. Adaptive set pinning: managing shared caches in chip multiprocessors. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). 135--144.

Digital Library

[91]

Stone, H. S., Turek, J., and Wolf, J. L. 1992. Optimal partitioning of cache memory. IEEE Trans. Comput. 41, 9, 1054--1068.

Digital Library

[92]

Suh, G. E., Devadas, S., and Rudolph, L. 2002. A new memory monitoring scheme for memory- aware scheduling and partitioning. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA'02). 117.

Digital Library

[93]

Suh, G. E., Rudolph, L., and Devadas, S. 2004. Dynamic partitioning of shared cache memory. J. Supercomput. 28, 1, 7--26.

Digital Library

[94]

Tam, D., Azimi, R., and Stumm, M. 2007. Thread clustering: Sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In Proceedings of the 2nd ACM European Conference on Computer Systems (EuroSys '07).

Digital Library

[95]

Tam, D. K., Azimi, R., Soares, L. B., and Stumm, M. 2009. RapidMRC: Approximating L2 miss rate curves on commodity systems for online optimizations. In Proceeding of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09). 121--132.

Digital Library

[96]

Thekkath, R. and Eggers, S. J. 1994. Impact of sharing-based thread placement on multi- threaded architectures. In Proceedings of the 21st Annual International Symposium on Computer Architecture (ISCA'94). 176--186.

Digital Library

[97]

Tian, K., Jiang, Y., and Shen, X. 2009. A study on optimally co-scheduling jobs of different lengths on chip multiprocessors. In Proceedings of the 6th ACM Conference on Computing Frontiers (CF'09). 41--50.

Digital Library

[98]

Viana, P., Gordon-Ross, A., Barros, E., and Vahid, F. 2008. A table-based method for single-pass cache optimization. In Proceedings of the 18th ACM Great Lakes Symposium on VLSI (GLSVLSI'08). 71--76.

Digital Library

[99]

Wang, S. and Wang, L. 2006. Thread-associative memory for multicore and multithreaded computing. In Proceedings of the 2006 International Symposium on Low Power Electronics and Design (ISLPED '06). 139--142.

Digital Library

[100]

Weinberg, J. and Snavely, A. E. 2008. Accurate memory signatures and synthetic address traces for HPC applications. In Proceedings of the 22nd Annual International Conference on Supercomputing (ICS'08). 36--45.

Digital Library

[101]

Xie, Y. and Loh, G. 2008. Dynamic classification of program memory behaviors in CMPs. In Proceedings of CMP-MSI, held in conjunction with ISCA-35.

[102]

Xie, Y. and Loh, G. H. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). 174--183.

Digital Library

[103]

Zhang, E. Z., Jiang, Y., and Shen, X. 2010. Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs&quest; In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'10). 203--212.

Digital Library

[104]

Zhang, X., Dwarkadas, S., and Shen, K. 2009. Towards practical page coloring-based multicore cache management. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys'09). 89--102.

Digital Library

[105]

Zhao, L., Iyer, R., Illikkal, R., Moses, J., Makineni, S., and Newell, D. 2007. Cachescouts: Fine-grain monitoring of shared caches in CMP platforms. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT'07). 339--352.

Digital Library

[106]

Zhong, Y., Shen, X., and Ding, C. 2009. Program locality analysis using reuse distance. ACM Trans. Program. Lang. Syst. 31, 6, 1--39.

Digital Library

[107]

Zhou, P., Pandey, V., Sundaresan, J., Raghuraman, A., Zhou, Y., and Kumar, S. 2004. Dynamic tracking of page miss ratio curve for memory management. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XI). 177--188.

Digital Library

[108]

Zhuravlev, S., Blagodurov, S., and Fedorova, A. 2010. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). 129--142.

Digital Library

Cited By

Kumar AKatkam RChaudhary PNaik PVutukuru M(2024)AppSteer: Framework for Improving Multicore Scalability of Network Functions via Application-aware Packet Steering2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00012(18-27)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00012
Schryen G(2024)Speedup and efficiency of computational parallelization: A unifying approach and asymptotic analysisJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104835187(104835)Online publication date: May-2024
https://doi.org/10.1016/j.jpdc.2023.104835
Mururu GKhan SChatterjee BChen CPorter CGavrilovska APande S(2023)Beacons: An End-to-End Compiler Framework for Predicting and Utilizing Dynamic Loop CharacteristicsProceedings of the ACM on Programming Languages10.1145/36228037:OOPSLA2(173-203)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3622803
Show More Cited By

Index Terms

Survey of scheduling techniques for addressing shared resources in multicore processors
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Addressing shared resource contention in multicore processors via scheduling
ASPLOS '10

Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software ...
Survey of Energy-Cognizant Scheduling Techniques

Execution time is no longer the only metric by which computational systems are judged. In fact, explicitly sacrificing raw performance in exchange for energy savings is becoming a common trend in environments ranging from large server farms attempting ...
Addressing shared resource contention in multicore processors via scheduling
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems

Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software ...

Reviews

Reviewer: Edel M Sherratt

The new scheduling challenges posed by chip multicore architectures have opened new research directions in operating system scheduling. This survey paper provides an excellent introduction to this fascinating topic for the reader who has some knowledge of scheduling, but who may lack specialist knowledge of chip multicore architectures. Chip multiprocessors share resources like caches and memory controllers. Conventional schedulers, which treat each core as an independent processor, give rise to poor performance on multicore processors due to contention between threads for shared resources. Thread-level schedulers mitigate this problem. Two approaches to thread-level scheduling form the main body of the survey. Contention-aware scheduling is used to map threads to cores, so as to avoid unnecessary competition for shared resources by balancing memory-intensive with compute-intensive threads. Cooperative resource scheduling focuses on threads that share resources within a single multithreaded application. While this approach is hampered by the difficulty of monitoring how threads share data, it is likely to become more important with the emergence of parallel algorithms. The survey concludes with a review of scheduling in Linux and Solaris, and an informative discussion leading to a summary of the ideal scheduler. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 45, Issue 1

November 2012

455 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/2379776

Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2012

Accepted: 01 July 2011

Revised: 01 May 2011

Received: 01 February 2011

Published in CSUR Volume 45, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Spanish government's research
Ingenio 2010 Consolider

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

99
Total Citations
View Citations
2,799
Total Downloads

Downloads (Last 12 months)145
Downloads (Last 6 weeks)12

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kumar AKatkam RChaudhary PNaik PVutukuru M(2024)AppSteer: Framework for Improving Multicore Scalability of Network Functions via Application-aware Packet Steering2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00012(18-27)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00012
Schryen G(2024)Speedup and efficiency of computational parallelization: A unifying approach and asymptotic analysisJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104835187(104835)Online publication date: May-2024
https://doi.org/10.1016/j.jpdc.2023.104835
Mururu GKhan SChatterjee BChen CPorter CGavrilovska APande S(2023)Beacons: An End-to-End Compiler Framework for Predicting and Utilizing Dynamic Loop CharacteristicsProceedings of the ACM on Programming Languages10.1145/36228037:OOPSLA2(173-203)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3622803
Ma TChen SWu YDeng ESong ZChen QGuo MAamodt TJerger NSwift M(2023)Efficient Scheduler Live Update for Linux Kernel with ModularizationProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582054(194-207)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582054
Bilbao CSaez JPrieto-Matias M(2023)Divide&Content: A Fair OS-Level Resource Manager for Contention Balancing on NUMA MulticoresIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.330999934:11(2928-2945)Online publication date: Nov-2023
https://doi.org/10.1109/TPDS.2023.3309999
Staffolani ADarvariu VBellavista PMusolesi M(2023)RLQ: Workload Allocation With Reinforcement Learning in Distributed QueuesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.323198134:3(856-868)Online publication date: 1-Mar-2023
https://doi.org/10.1109/TPDS.2022.3231981
Kundan SMarinakis TAnagnostopoulos IKagaris D(2022)A Pressure-Aware Policy for Contention Minimization on Multicore SystemsACM Transactions on Architecture and Code Optimization10.1145/352461619:3(1-26)Online publication date: 25-May-2022
https://dl.acm.org/doi/10.1145/3524616
Azhar MPericàs MStenström P(2022)Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service ConstraintsACM Transactions on Architecture and Code Optimization10.1145/349453719:1(1-26)Online publication date: 23-Jan-2022
https://dl.acm.org/doi/10.1145/3494537
Spantidi OMarinakis TAnagnostopoulos I(2022)Fair Scheduling Through Collaborative Filtering on Multicore Systems2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937409(1551-1555)Online publication date: 28-May-2022
https://doi.org/10.1109/ISCAS48785.2022.9937409
Kirkpatrick RBrown CJanjic V(2022)COMPROF and COMPLACE: Shared-Memory Communication Profiling and Automated Thread Placement via Dynamic Binary Instrumentation2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC56025.2022.00040(236-245)Online publication date: Dec-2022
https://doi.org/10.1109/HiPC56025.2022.00040
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents