Abstract
Future heterogeneous single-ISA multicore processors will have an edge in potential performance per watt over comparable homogeneous processors. To fully tap into that potential, the OS scheduler needs to be heterogeneity-aware, so it can match jobs to cores according to characteristics of both. We propose a Heterogeneity-Aware Signature-Supported scheduling algorithm that does the matching using per-thread architectural signatures, which are compact summaries of threads' architectural properties collected offline. The resulting algorithm does not rely on dynamic profiling, and is comparatively simple and scalable. We implemented HASS in OpenSolaris, and achieved average workload speedups of up to 13%, matching best static assignment, achievable only by an oracle. We have also implemented a dynamic IPC-driven algorithm proposed earlier that relies on online profiling. We found that the complexity, load imbalance and associated performance degradation resulting from dynamic profiling are significant challenges to using this algorithm successfully. As a result it failed to deliver expected performance gains and to outperform HASS.
- K. Asanovic et al. The Landscape of Parallel Computing Research: A View from Berkeley. UC Berkeley Technical Report UCB/EECS-2006-183, 2006.Google Scholar
- S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The Impact of Performance Asymmetry in Emerging Multicore Architectures. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (Madison, Wisconsin USA, June 04-08, 2005). ISCA '05. IEEE Computer Society, Washington, DC, USA, 506--517. Google ScholarDigital Library
- M. Becchi and P. Crowley. Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures. In Proceedings of the 3rd Conference on Computing Frontiers (Ischia, Italy, May 02-05, 2006). Computing Frontiers '06. ACM, New York, NY, USA, 29--40. Google ScholarDigital Library
- E. Berg and E. Hargersten. StatCache: A Probabilistic Approach to Efficient and Accurate Data Locality Analysis. In Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software (Austin, Texas, USA, March 10-12, 2004). ISPASS '04. IEEE Computer Society, Washington, DC, USA, 20--27. Google ScholarDigital Library
- S. Borkar. Thousand Core Chips--A Technology Perspective. In Proceedings of the 44th Annual Conference on Design Automation (San Diego, California, USA, June 04-08, 2007). DAC '07. ACM, New York, NY, USA, 746--749. Google ScholarDigital Library
- B. Cantrill, M. Shapiro, and A. Levinthal. Dynamic Instrumentation of Production Systems. In Proceedings of the USENIX Annual Technical Conference (Boston, MA, USA, June 27--July 02, 2004). USENIX '04. USENIX Association, Berkeley, CA, USA, 2. Google ScholarDigital Library
- D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting Inter-Thread Cache Contention on a Multi-Processor Architecture. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (San Francisco, California, USA, February 12-16, 2005). HPCA '05. IEEE Computer Society, Washington, DC, USA, 340--351. Google ScholarDigital Library
- C. Ding, Y. Zhong. Predicting Whole-program Locality through Reuse Distance Analysis. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (San Diego, California, USA, June 09-11, 2003). PLDI '03. ACM, New York, NY, USA, 245--257. Google ScholarDigital Library
- V. Freeh et al. Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications. IEEE Transactions on Parallel and Distributed Systems, 18, 6 (June 2007). IEEE Press, Piscataway, NJ, USA, 835--848. Google ScholarDigital Library
- M. Hill and M. Marty. Amdahl's Law in the Multicore Era. IEEE Computer, 41, 7 (July 2008). IEEE Computer Society Press, Los Alamitos, CA, USA, 33--38. Google ScholarDigital Library
- M. Hill and A. Smith. Evaluating Associativity in CPU Caches. IEEE Transactions on Computers, 38, 12 (December 1989). IEEE Computer Society, Washington, DC, USA, 1612--1630. Google ScholarDigital Library
- K. Hoste and L. Eeckhout. Microarchitecture-Independent Workload Characterization. IEEE Micro, 27(3), 2007. IEEE Computer Society Press, Los Alamitos, CA, USA, 63--72. Google ScholarDigital Library
- E. Humenay, D. Tarjan, and K. Skadron. Impact of Process Variations on Multicore Performance Symmetry. In Proceedings of the Conference on Design, Automation and Test in Europe (Nice, France, April 16-20, 2007). DATE '07. EDA Consortium, San Jose, CA, USA, 1653--1658. Google ScholarDigital Library
- R. Kumar et al. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (San Diego, California, USA, December 03-05, 2003). MICRO '03. IEEE Computer Society, Washington, DC, USA, 81. Google ScholarDigital Library
- R. Kumar et al. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. In Proceedings of the 31st Annual International Symposium on Computer Architecture (München, Germany, June 19-23, 2004). ISCA '04. IEEE Computer Society, Washington, DC, USA, 64. Google ScholarDigital Library
- T. Li, D. Baumberger, D.A. Koufaty, and Scott Hahn. Efficient Operating System Scheduling for Performance-Asymmetric Multi-Core Architectures. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (Reno, Nevada, USA, November 10-16, 2007). SC '07. ACM, New York, NY, USA, No. 53. Google ScholarDigital Library
- C.K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Reddi, K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (Chicago, Illinois, USA, June 11-15, 2005). PLDI '05. ACM, New York, NY, USA, 190--200. Google ScholarDigital Library
- J. Mogul et al. Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems. IEEE Micro, 28, 3 (May 2008). IEEE Computer Society Press, Los Alamitos, CA, USA, 26--41. Google ScholarDigital Library
- D. Shelepov and A. Fedorova. Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures. In Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture, in conjunction with the 35th International Symposium on Computer Architecture (Beijing, China, June 21-25, 2008). WIOSCA '08.Google Scholar
- T. Sherwood, S. Sair, and B. Calder. Phase Tracking and Prediction. In Proceedings of the 30th Annual International Symposium on Computer Architecture (San Diego, California, USA, June 09-11, 2003). ISCA '03. ACM, New York, NY, USA, 336--349. Google ScholarDigital Library
- A. Smith. A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory. IEEE Transactions on Software Engineering, 4, 2 (March 1978). IEEE Press, Piscataway, NJ, USA, 121--130. Google ScholarDigital Library
- R. Teodorescu and J. Torrellas. Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors. In Proceedings of the 35th International Symposium on Computer Architecture (Beijing, China, June 21-25, 2008). ISCA '08. IEEE Computer Society, Washington, DC, USA, 363--374. Google ScholarDigital Library
Index Terms
- HASS: a scheduler for heterogeneous multicore systems
Recommendations
Portable performance on asymmetric multicore processors
CGO '16: Proceedings of the 2016 International Symposium on Code Generation and OptimizationStatic and dynamic power constraints are steering chip manufacturers to build single-ISA Asymmetric Multicore Processors (AMPs) with big and small cores. To deliver on their energy efficiency potential, schedulers must consider core sensitivity, load ...
Asymmetry-Aware Scheduling in Heterogeneous Multi-core Architectures
NPC 2013: Proceedings of the 10th IFIP International Conference on Network and Parallel Computing - Volume 8147As threads of execution in a multi-programmed computing environment have different characteristics and hardware resource requirements, heterogeneous multi-core processors can achieve higher performance as well as power efficiency than homogeneous multi-...
The Impact of Dynamically Heterogeneous Multicore Processors on Thread Scheduling
Although most current multicore processors are homogeneous, microarchitects are now proposing heterogeneous core implementations, including systems in which heterogeneity is introduced at runtime. This article shows that operating system schedulers must ...
Comments