skip to main content
research-article

Survey of scheduling techniques for addressing shared resources in multicore processors

Published: 07 December 2012 Publication History

Abstract

Chip multicore processors (CMPs) have emerged as the dominant architecture choice for modern computing platforms and will most likely continue to be dominant well into the foreseeable future. As with any system, CMPs offer a unique set of challenges. Chief among them is the shared resource contention that results because CMP cores are not independent processors but rather share common resources among cores such as the last level cache (LLC). Shared resource contention can lead to severe and unpredictable performance impact on the threads running on the CMP. Conversely, CMPs offer tremendous opportunities for mulithreaded applications, which can take advantage of simultaneous thread execution as well as fast inter thread data sharing. Many solutions have been proposed to deal with the negative aspects of CMPs and take advantage of the positive. This survey focuses on the subset of these solutions that exclusively make use of OS thread-level scheduling to achieve their goals. These solutions are particularly attractive as they require no changes to hardware and minimal or no changes to the OS. The OS scheduler has expanded well beyond its original role of time-multiplexing threads on a single core into a complex and effective resource manager. This article surveys a multitude of new and exciting work that explores the diverse new roles the OS scheduler can successfully take on.

References

[1]
Albonesi, D. H. 1999. Selective cache ways: On-demand cache resource allocation. In Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 32). 248--259.
[2]
Allen, M. D., Sridharan, S., and Sohi, G. S. 2009. Serialization Sets: A dynamic dependence- based parallel execution model. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '09). 85--96.
[3]
Awasthi, M., Sudan, K., Balasubramonian, R., and Carter, J. 2009. Dynamic hardware- assisted software-controlled page placement to manage capacity allocation and sharing within large caches. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture (HPCA'09). 250--261.
[4]
Azimi, R., Soares, L., Stumm, M., Walsh, T., and Brown, A. D. 2007. Path: Page access tracking to improve memory management. In Proceedings of the 6th International Symposium on Memory Management (ISMM'07). 31--42.
[5]
Azimi, R., Tam, D. K., Soares, L., and Stumm, M. 2009. Enhancing operating system support for multicore processors by using hardware performance monitoring. SIGOPS Oper. Syst. Rev. 43, 2, 56--65.
[6]
Balasubramonian, R., Albonesi, D., Buyuktosunoglu, A., and Dwarkadas, S. 2000. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 33). 245--257.
[7]
Banikazemi, M., Poff, D., and Abali, B. 2008. Pam: A novel performance/power aware meta- scheduler for multi-core systems. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC'08). 1--12.
[8]
Baumann, A., Barham, P., Dagand, P.-E., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., and Singhania, A. 2009. The multikernel: a new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP'09). 29--44.
[9]
Berg, E. and Hagersten, E. 2004. Statcache: A probabilistic approach to efficient and accurate data locality analysis. In Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'04). 20--27.
[10]
Best, M. J., Mottishaw, S., Mustard, C., Roth, M., Fedorova, A., and Brownsword, A. 2011. Synchronization via scheduling: Techniques for efficiently managing shared state in video games. In Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation (PLDI'11).
[11]
Bitirgen, R., Ipek, E., and Martinez, J. F. 2008. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO 41). 318--329.
[12]
Blagodurov, S. and Fedorova, A. 2011. User-level scheduling on NUMA multicore systems under Linux. In Proceedings of the 13th Annual Linux Symposium.
[13]
Blagodurov, S., Zhuravlev, S., Dashti, M., and Fedorova, A. 2011. A case for NUMA-aware contention management on multicore processors. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC).
[14]
Blumofe, R. D., Joerg, C. F., Kuszmaul, B. C., Leiserson, C. E., Randall, K. H., and Zhou, Y. 1995. CILK: An efficient multithreaded runtime system. J. Paral. Distrib. Comput. 207--216.
[15]
Boyd-Wickizer, S., Chen, H., Chen, R., Mao, Y., Kaashoek, F., Morris, R., Pesterev, A., Stein, L., Wu, M., Dai, Y., Zhang, Y., and Zhang, Z. 2008. Corey: An operating system for many cores. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI '08). 43--57.
[16]
Burger, D., Goodman, J. R., and Kägi, A. 1996. Memory bandwidth limitations of future microprocessors. In Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA'96). 78--89.
[17]
Cascaval, C., Rose, L. D., Padua, D. A., and Reed, D. A. 2000. Compile-time based performance prediction. In Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing (LPCP99). 365--379.
[18]
Chandra, D., Guo, F., Kim, S., and Solihin, Y. 2005. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA'05). 340--351.
[19]
Chandra, R., Dagum, L., Kohr, D., Maydan, D., McDonald, J., and Menon, R. 2001. Parallel programming in OpenMP. In Proceedings of EuroPar'09.
[20]
Chang, J. and Sohi, G. S. 2007. Cooperative cache partitioning for chip multiprocessors. In Proceedings of the 21st Annual International Conference on Supercomputing (ICS'07). 242--252.
[21]
Chaudhuri, M. 2009. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture, 2009 (HPCA 2009). 227--238.
[22]
Chen, R., Chen, H., and Zang, B. 2010. Tiled MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT 2010). Vienna, Austria.
[23]
Chew, J. 2006. Memory placement optimization (MPO) (http://opensolaris.org/os/community/% linebreak{0}performance/numa/mpo update.pdf).
[24]
Chishti, Z., Powell, M. D., and Vijaykumar, T. N. 2005. Optimizing replication, communication, and capacity allocation in CMPs. In Proceedings of the 32nd annual international symposium on Computer Architecture (ICSA'05). 357--368.
[25]
Cho, S. and Jin, L. 2006. Managing distributed, shared L2 caches through OS-level page allo-cation. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39). 455--468.
[26]
Colmenares, J. A., Bird, S., Cook, H., Pearce, P., Zhu, D., Shalf, J., Hofmeyr, S., Asanovic', K., and Kubiatowicz, J. 2010. Resource management in the tessellation manycore OS. In Poster session at 2nd USENIX Workshop on Hot Topics in Parallelism.
[27]
Delaluz, V., Sivasubramaniam, A., Kandemir, M., Vijaykrishnan, N., and Irwin, M. J. 2002. Scheduler-based DRAM energy management. In Proceedings of the 39th Annual Design Automation Conference (DAC'02). 697--702.
[28]
Denning, P. J. 1968. The working set model for program behavior. Commun. ACM 11, 323--333.
[29]
Dybdahl, H. and Stenstrom, P. 2007. An adaptive shared/private NUCA cache partitioning scheme for chip multiprocessors. In Proceedings of the 2007 IEEE 13th International Sympo-sium on High Performance Computer Architecture. 2--12.
[30]
Fedorova, A., Seltzer, M., and Smith, M. D. 2007. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT'07). 25--38.
[31]
Gordon-Ross, A., Viana, P., Vahid, F., Najjar, W., and Barros, E. 2007. A one-shot configurable-cache tuner for improved energy and performance. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'07). 755--760.
[32]
Guo, F., Kannan, H., Zhao, L., Illikkal, R., Iyer, R., Newell, D., Solihin, Y., and Kozyrakis, C. 2007. From chaos to QoS: case studies in CMP resource management. SIGARCH Comput. Archit. News 35, 1, 21--30.
[33]
Guo, F. and Solihin, Y. 2006. An analytical model for cache replacement policy performance. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '06/Performance '06). 228--239.
[34]
Hardavellas, N., Ferdman, M., Falsafi, B., and Ailamaki, A. 2009. Reactive NUCA: near-optimal block placement and replication in distributed caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). 184--195.
[35]
Hofmeyr, S., Iancu, C., and Blagojević, F. 2010. Load balancing on speed. In Proceedings of the 15th ACM SIGPLAN Symposium on Prinicples and Practice of Parallel Programming (PPoPP'10). ACM.
[36]
Hoste, K. and Eeckhout, L. 2007. Microarchitecture-independent workload characterization. IEEE Micro 27, 3, 63--72.
[37]
Hsu, L. R., Reinhardt, S. K., Iyer, R., and Makineni, S. 2006. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT'06). 13--22.
[38]
Hur, I. and Lin, C. 2004. Adaptive history-based memory schedulers. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 37). 343--354.
[39]
Ipek, E., Mutlu, O., Martínez, J. F., and Caruana, R. 2008. Self-optimizing memory con- trollers: A reinforcement learning approach. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). 39--50.
[40]
Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. 2007. Dryad: distributed data- parallel programs from sequential building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys'07). 59--72.
[41]
Iyer, R. 2004. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proceedings of the 18th Annual International Conference on Supercomputing (ICS'04). 257--266.
[42]
Iyer, R., Zhao, L., Guo, F., Illikkal, R., Makineni, S., Newell, D., Solihin, Y., Hsu, L., and Reinhardt, S. 2007. QoS policies and architecture for cache/memory in CMP platforms. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 25--36.
[43]
Jaleel, A., Hasenplaugh, W., Qureshi, M., Sebot, J., Steely, Jr., S., and Emer, J. 2008. Adaptive insertion policies for managing shared caches. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). 208--219.
[44]
Jiang, X., Mishra, A. K., Zhao, L., Iyer, R., Fang, Z., Srinivasan, S., Makineni, S., Brett, P., and Das, C. R. 2011. Access: Smart scheduling for asymmetric cache CMPs. In Proceedings of the IEEE 17th International Symposium on High-Performance Computer Architecture (HPCA'11). 527--538.
[45]
Jiang, Y., Shen, X., Chen, J., and Tripathi, R. 2008. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). 220--229.
[46]
Kamali, A. 2010. Sharing aware scheduling on multicore systems. M.S. dissertation, Simon Fraser University, Burnaby, BC, Canada.
[47]
Kim, C., Burger, D., and Keckler, S. W. 2002. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X). 211--222.
[48]
Kim, S., Chandra, D., and Solihin, Y. 2004. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT'04). 111--122.
[49]
Kim, Y., Han, D., Mutlu, O., and Harchol-Balter, M. 2010. ATLAS: A scalable and high- performance scheduling algorithm for multiple memory controllers. In Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture (HPCA'10). 1--12.
[50]
Klues, K., Rhoden, B., Waterman, A., Zhu, D., and Brewer, E. 2010. Processes and resource management in a scalable many-core OS. In Poster session at 2nd USENIX Workshop on Hot Topics in Parallelism.
[51]
Knauerhase, R., Brett, P., Hohlt, B., Li, T., and Hahn, S. 2008. Using OS observations to improve performance in multicore systems. IEEE Micro 28, 3, 54--66.
[52]
Knobe, K. 2009. Ease of use with concurrent collections (CnC). In Proceedings of the 1st USENIX Workshop on Hot Topics in Parallelism (HotPar'09).
[53]
Kondo, M., Sasaki, H., and Nakamura, H. 2007. Improving fairness, throughput and energy- efficiency on a chip multiprocessor through DVFs. SIGARCH Comput. Archit. News 35, 1, 31--38.
[54]
Kotera, I., Egawa, R., Takizawa, H., and Kobayashi, H. 2007. A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs. In Proceedings of the 2007 Workshop on MEmory Performance (MEDEA'07). 113--120.
[55]
Kotera, I., Egawa, R., Takizawa, H., and Kobayashi, H. 2008. Modeling of cache access behavior based on Zipf's law. In Proceedings of the 9th Workshop on MEmory Performance (MEDEA'08). 9--15.
[56]
Koukis, E. and Koziris, N. 2005. Memory bandwidth aware scheduling for SMP cluster nodes. In Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP'05). 187--196.
[57]
Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., and Chew, L. P. 2007. Optimistic parallelism requires abstractions. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation (PLDI'07). 211--222.
[58]
Kumar, K., Vengerov, D., Fedorova, A., and Kalogeraki, V. 2011. FACT: A framework for adaptive contention-aware thread migrations. In Proceedings of the ACM International Conference on Computing Frontiers (CF'11).
[59]
Lee, R., Ding, X., Chen, F., Lu, Q., and Zhang, X. 2009. Mcc-db: minimizing cache conflicts in multi-core processors for databases. Proc. VLDB Endow. 2, 1, 373--384.
[60]
Leonard, T. 2007. Dragged kicking and screaming: Source multicore. In Proceedings of the Game Developers Conference.
[61]
Li, T., Baumberger, D., Koufaty, D. A., and Hahn, S. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[62]
Liang, Y. and Mitra, T. 2008a. Cache modeling in probabilistic execution time analysis. In Proceedings of the 45th Annual Design Automation Conference (DAC'08). 319--324.
[63]
Liang, Y. and Mitra, T. 2008b. Static analysis for fast and accurate design space exploration of caches. In: Proceedings of the 6th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'08). 103--108.
[64]
Liedtke, J., Haertig, H., and Hohmuth, M. 1997. OS-controlled cache predictability for real- time systems. In Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS'97). 213.
[65]
Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., and Sadayappan, P. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA'08). 367--378.
[66]
Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., and Sadayappan, P. 2009. Enabling software management for multicore caches with a lightweight hardware support. In Proceedings of the Conference on High-Performance Computing Networking, Storage and Analysis (SC'09). Article No. 14.
[67]
Liu, C., Sivasubramaniam, A., and Kandemir, M. 2004. Organizing the last line of defense before hitting the memory wall for CMPs. In Proceedings of the 10th International Symposium on High Performance Computer Architecture (HPCA'04). 176.
[68]
Loh, G. H. 2008. 3d-stacked memory architectures for multi-core processors. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). IEEE Computer Society, Washington, DC, 453--464.
[69]
Mattson, R. L., Gecsei, J., Slutz, D. R., and Traiger, I. L. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 78--117.
[70]
McGregor, R. L., Antonopoulos, C. D., and Nikolopoulos, D. S. 2005. Scheduling algorithms for effective thread pairing on hybrid multiprocessors. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) -- Papers. 28.1.
[71]
Merkel, A., Stoess, J., and Bellosa, F. 2010. Resource-conscious scheduling for energy efficiency on multicore processors. In Proceedings of the 5th European Conference on Computer Systems (EuroSys'10). 153--166.
[72]
Moreto, M., Cazorla, F. J., Ramirez, A., Sakellariou, R., and Valero, M. 2009. FLEXDCP: A QoS framework for CMP architectures. SIGOPS Oper. Syst. Rev. 43, 2, 86--96.
[73]
Moscibroda, T. and Mutlu, O. 2007. Memory performance attacks: Denial of memory service in multi-core systems. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium. 18:1--18:18.
[74]
Mutlu, O. and Moscibroda, T. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40). 146--160.
[75]
Mutlu, O. and Moscibroda, T. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). 63--74.
[76]
Nesbit, K. J., Aggarwal, N., Laudon, J., and Smith, J. E. 2006. Fair queuing memory systems. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39). 208--222.
[77]
Nesbit, K. J., Laudon, J., and Smith, J. E. 2007. Virtual private caches. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA'07). 57--68.
[78]
Pesterev, A., Zeldovich, N., and Morris, R. T. 2010. Locating cache performance bottlenecks using data profiling. In Proceedings of the 5th European Conference on Computer Systems (EuroSys'10). 335--348.
[79]
Peter, S., Schupbach, A., Barham, P., Baumann, A., Isaacs, R., Harris, T., and Roscoe, T. 2010. Design principles for end-to-end multicore schedulers. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Parallelism.
[80]
Qureshi, M. K., Lynch, D. N., Mutlu, O., and Patt, Y. N. 2006. A case for MLP-aware cache replacement. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA'06). IEEE Computer Society, Washington, DC, 167--178.
[81]
Qureshi, M. K. and Patt, Y. N. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39). 423--432.
[82]
Rafique, N., Lim, W.-T., and Thottethodi, M. 2006. Architectural support for operating system-driven CMP cache management. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT'06). 2--12.
[83]
Reddy, R. and Petrov, P. 2007. Eliminating inter-process cache interference through cache reconfigurability for real-time and low-power embedded multi-tasking systems. In Proceedings of the 2007 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'07). 198--207.
[84]
Reinders, J. 2007. Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O'Reilly. (ASPLOS'10, EUROPAR'09.)
[85]
Rixner, S., Dally, W. J., Kapasi, U. J., Mattson, P., and Owens, J. D. 2000. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA'00). 128--138.
[86]
Shi, X., Su, F., Peir, J.-k., Xia, Y., and Yang, Z. 2007. CMP cache performance projection: Accessibility vs. capacity. SIGARCH Comput. Archit. News 35, 1, 13--20.
[87]
Snavely, A., Tullsen, D. M., and Voelker, G. 2002. Symbiotic job scheduling with priorities for a simultaneous multithreading processor. In Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 66--76.
[88]
Soares, L., Tam, D., and Stumm, M. 2008. Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture (MICRO 41). 258--269.
[89]
Srikantaiah, S., Das, R., Mishra, A. K., Das, C. R., and Kandemir, M. 2009. A case for integrated processor-cache partitioning in chip multiprocessors. In Proceedings of the Conference on High-Performance Computing Networking, Storage and Analysis (SC'09). Article No. 6.
[90]
Srikantaiah, S., Kandemir, M., and Irwin, M. J. 2008. Adaptive set pinning: managing shared caches in chip multiprocessors. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). 135--144.
[91]
Stone, H. S., Turek, J., and Wolf, J. L. 1992. Optimal partitioning of cache memory. IEEE Trans. Comput. 41, 9, 1054--1068.
[92]
Suh, G. E., Devadas, S., and Rudolph, L. 2002. A new memory monitoring scheme for memory- aware scheduling and partitioning. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA'02). 117.
[93]
Suh, G. E., Rudolph, L., and Devadas, S. 2004. Dynamic partitioning of shared cache memory. J. Supercomput. 28, 1, 7--26.
[94]
Tam, D., Azimi, R., and Stumm, M. 2007. Thread clustering: Sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In Proceedings of the 2nd ACM European Conference on Computer Systems (EuroSys '07).
[95]
Tam, D. K., Azimi, R., Soares, L. B., and Stumm, M. 2009. RapidMRC: Approximating L2 miss rate curves on commodity systems for online optimizations. In Proceeding of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09). 121--132.
[96]
Thekkath, R. and Eggers, S. J. 1994. Impact of sharing-based thread placement on multi- threaded architectures. In Proceedings of the 21st Annual International Symposium on Computer Architecture (ISCA'94). 176--186.
[97]
Tian, K., Jiang, Y., and Shen, X. 2009. A study on optimally co-scheduling jobs of different lengths on chip multiprocessors. In Proceedings of the 6th ACM Conference on Computing Frontiers (CF'09). 41--50.
[98]
Viana, P., Gordon-Ross, A., Barros, E., and Vahid, F. 2008. A table-based method for single-pass cache optimization. In Proceedings of the 18th ACM Great Lakes Symposium on VLSI (GLSVLSI'08). 71--76.
[99]
Wang, S. and Wang, L. 2006. Thread-associative memory for multicore and multithreaded computing. In Proceedings of the 2006 International Symposium on Low Power Electronics and Design (ISLPED '06). 139--142.
[100]
Weinberg, J. and Snavely, A. E. 2008. Accurate memory signatures and synthetic address traces for HPC applications. In Proceedings of the 22nd Annual International Conference on Supercomputing (ICS'08). 36--45.
[101]
Xie, Y. and Loh, G. 2008. Dynamic classification of program memory behaviors in CMPs. In Proceedings of CMP-MSI, held in conjunction with ISCA-35.
[102]
Xie, Y. and Loh, G. H. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). 174--183.
[103]
Zhang, E. Z., Jiang, Y., and Shen, X. 2010. Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'10). 203--212.
[104]
Zhang, X., Dwarkadas, S., and Shen, K. 2009. Towards practical page coloring-based multicore cache management. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys'09). 89--102.
[105]
Zhao, L., Iyer, R., Illikkal, R., Moses, J., Makineni, S., and Newell, D. 2007. Cachescouts: Fine-grain monitoring of shared caches in CMP platforms. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT'07). 339--352.
[106]
Zhong, Y., Shen, X., and Ding, C. 2009. Program locality analysis using reuse distance. ACM Trans. Program. Lang. Syst. 31, 6, 1--39.
[107]
Zhou, P., Pandey, V., Sundaresan, J., Raghuraman, A., Zhou, Y., and Kumar, S. 2004. Dynamic tracking of page miss ratio curve for memory management. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XI). 177--188.
[108]
Zhuravlev, S., Blagodurov, S., and Fedorova, A. 2010. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). 129--142.

Cited By

View all
  • (2024)AppSteer: Framework for Improving Multicore Scalability of Network Functions via Application-aware Packet Steering2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00012(18-27)Online publication date: 6-May-2024
  • (2024)Speedup and efficiency of computational parallelization: A unifying approach and asymptotic analysisJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104835187(104835)Online publication date: May-2024
  • (2023)Beacons: An End-to-End Compiler Framework for Predicting and Utilizing Dynamic Loop CharacteristicsProceedings of the ACM on Programming Languages10.1145/36228037:OOPSLA2(173-203)Online publication date: 16-Oct-2023
  • Show More Cited By

Index Terms

  1. Survey of scheduling techniques for addressing shared resources in multicore processors

Recommendations

Reviews

Edel M Sherratt

The new scheduling challenges posed by chip multicore architectures have opened new research directions in operating system scheduling. This survey paper provides an excellent introduction to this fascinating topic for the reader who has some knowledge of scheduling, but who may lack specialist knowledge of chip multicore architectures. Chip multiprocessors share resources like caches and memory controllers. Conventional schedulers, which treat each core as an independent processor, give rise to poor performance on multicore processors due to contention between threads for shared resources. Thread-level schedulers mitigate this problem. Two approaches to thread-level scheduling form the main body of the survey. Contention-aware scheduling is used to map threads to cores, so as to avoid unnecessary competition for shared resources by balancing memory-intensive with compute-intensive threads. Cooperative resource scheduling focuses on threads that share resources within a single multithreaded application. While this approach is hampered by the difficulty of monitoring how threads share data, it is likely to become more important with the emergence of parallel algorithms. The survey concludes with a review of scheduling in Linux and Solaris, and an informative discussion leading to a summary of the ideal scheduler. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 45, Issue 1
November 2012
455 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/2379776
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2012
Accepted: 01 July 2011
Revised: 01 May 2011
Received: 01 February 2011
Published in CSUR Volume 45, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Survey
  2. cooperative resource sharing
  3. power-aware scheduling
  4. shared resource contention
  5. thermal effects
  6. thread level scheduling

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Spanish government's research
  • Ingenio 2010 Consolider

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)145
  • Downloads (Last 6 weeks)12
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)AppSteer: Framework for Improving Multicore Scalability of Network Functions via Application-aware Packet Steering2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00012(18-27)Online publication date: 6-May-2024
  • (2024)Speedup and efficiency of computational parallelization: A unifying approach and asymptotic analysisJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104835187(104835)Online publication date: May-2024
  • (2023)Beacons: An End-to-End Compiler Framework for Predicting and Utilizing Dynamic Loop CharacteristicsProceedings of the ACM on Programming Languages10.1145/36228037:OOPSLA2(173-203)Online publication date: 16-Oct-2023
  • (2023)Efficient Scheduler Live Update for Linux Kernel with ModularizationProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582054(194-207)Online publication date: 25-Mar-2023
  • (2023)Divide&Content: A Fair OS-Level Resource Manager for Contention Balancing on NUMA MulticoresIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.330999934:11(2928-2945)Online publication date: Nov-2023
  • (2023)RLQ: Workload Allocation With Reinforcement Learning in Distributed QueuesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.323198134:3(856-868)Online publication date: 1-Mar-2023
  • (2022)A Pressure-Aware Policy for Contention Minimization on Multicore SystemsACM Transactions on Architecture and Code Optimization10.1145/352461619:3(1-26)Online publication date: 25-May-2022
  • (2022)Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service ConstraintsACM Transactions on Architecture and Code Optimization10.1145/349453719:1(1-26)Online publication date: 23-Jan-2022
  • (2022)Fair Scheduling Through Collaborative Filtering on Multicore Systems2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937409(1551-1555)Online publication date: 28-May-2022
  • (2022)COMPROF and COMPLACE: Shared-Memory Communication Profiling and Automated Thread Placement via Dynamic Binary Instrumentation2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC56025.2022.00040(236-245)Online publication date: Dec-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media