ABSTRACT
In this paper, we study and analyze process scheduling problems for future multicore processors. It is expected that hundreds or even thousands of cores will be integrated on a single chip, known as a Chip Multiprocessor (CMP). However, operating system process scheduling, one of the most important design issues for CMP systems, has not been well addressed. We define a model for future CMPs, based on which a scheduling algorithm is proposed to reduce on-chip communication latencies and improve performance. The impact of memory access and inter process communication (IPC) in scheduling are analyzed. We explore six typical core allocation strategies. Results show that, a strategy with a balanced consideration of both IPC and memory access out-performs other strategies, the two metrics (misses per thousand instructions and cache hit latencies) are reduced up to 25.97% and 13.11%, respectively.
- D. Abts, N. D. E. Jerger, J. Kim, D. Gibson, and M. H. Lipasti. Achieving predictable performance through better memory controller placement in many-core cmps. In Proc. of the 36th ISCA, 2009. Google ScholarDigital Library
- D. H. Bailey. Ffts in external or hierarchical memory. The Journal of Supercomputing, 4:23--35, 1990. 10.1007/BF00162341. Google ScholarDigital Library
- L. Benini and G. D. Micheli. Networks on chips: A new soc paradigm. IEEE Computer, 35(1):70--78, January 2002. Google ScholarDigital Library
- Y.-J. Chen, C.-L. Yang, and Y.-S. Chang. An architectural co-synthesis algorithm for energy-aware network-on-chip design. J. Syst. Archit., 55(5--6):299--309, 2009. Google ScholarDigital Library
- T. Corporation, August 2010. http://www.tilera.com.Google Scholar
- B. R. Gaeke, P. Husbands, X. S. Li, L. Oliker, K. A. Yelick, and R. Biswas. Memory-intensive benchmarks: Iram vs. cache-based machines. In Proceedings of the 16th IPDPS, page 203, April 2002. Google ScholarDigital Library
- J. Hu and R. Marculescu. Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints. In DATE '04, page 10234, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarDigital Library
- Intel. Single-chip cloud computer, May 2010. http://techresearch.intel.com/articles/Tera-Scale/1826.htm.Google Scholar
- J. Laudon and D. Lenoski. The sgi origin: a ccnuma highly scalable server. In Proc. of the 24th ISCA, pages 241--251, June 1997. Google ScholarDigital Library
- T. Lei and S. Kumar. A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In DSD, 2003, pages 180--187, sep. 2003. Google ScholarDigital Library
- S. T. Leutenegger and M. K. Vernon. The performance of multiprogrammed multiprocessor scheduling algorithms. In Proc. of the 1990 ACM SIGMETRICS Conf., pages 226--236, April 1990. Google ScholarDigital Library
- P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, 2002. Google ScholarDigital Library
- P. Schmid and A. Roos. Core i7 memory scaling: From ddr3-800 to ddr3-1600, 2009. Tom's Hardware.Google Scholar
- D. D. Sharma and D. K. Pradhan. Processor allocation in hypercube multicomputers: Fast and efficient strategies for cubic and noncubic allocation. IEEE TPDS, 6(10):1108--1123, October 1995. Google ScholarDigital Library
- S. C. Woo, J. P. Singh, and J. L. Hennessy. The performance advantages of integrating block data transfer in cache-coherent multiprocessors. In ASPLOS-VI, pages 219--229, New York, NY, USA, 1994. ACM. Google ScholarDigital Library
- T. C. Xu, A. W. Yin, P. Liljeberg, and H. Tenhunen. Operating system processor scheduler design for future chip multiprocessor. In 23th ARCS, pages 69--76, Berlin-Offenbach, Germany, 2010. VDE Verlag GMBH.Google Scholar
Index Terms
- Process scheduling for future multicore processors
Recommendations
Balanced Prefetching Aggressiveness Controller for NoC-based Multiprocessor
SBCCI '14: Proceedings of the 27th Symposium on Integrated Circuits and Systems DesignThe performance gap between memory hierarchy and processor is a well-known issue and the prefetching approach is often used to minimize this problem. This technique performs a data prefetch in memory and makes it available in the private cache before ...
Reactive NUCA: near-optimal block placement and replication in distributed caches
Increases in on-chip communication delay and the large working sets of server and scientific workloads complicate the design of the on-chip last-level cache for multicore processors. The large working sets favor a shared cache design that maximizes the ...
Scalable Hybrid Wireless Network-on-Chip Architectures for Multicore Systems
Multicore platforms are emerging trends in the design of System-on-Chips (SoCs). Interconnect fabrics for these multicore SoCs play a crucial role in achieving the target performance. The Network-on-Chip (NoC) paradigm has been proposed as a promising ...
Comments