skip to main content
10.1145/1362622.1362694acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Efficient operating system scheduling for performance-asymmetric multi-core architectures

Published:10 November 2007Publication History

ABSTRACT

Recent research advocates asymmetric multi-core architectures, where cores in the same processor can have different performance. These architectures support single-threaded performance and multithreaded throughput at lower costs (e.g., die size and power). However, they also pose unique challenges to operating systems, which traditionally assume homogeneous hardware. This paper presents AMPS, an operating system scheduler that efficiently supports both SMP-and NUMA-style performance-asymmetric architectures. AMPS contains three components: asymmetry-aware load balancing, faster-core-first scheduling, and NUMA-aware migration. We have implemented AMPS in Linux kernel 2.6.16 and used CPU clock modulation to emulate performance asymmetry on an SMP and NUMA system. For various workloads, we show that AMPS achieves a median speedup of 1.16 with a maximum of 1.44 over stock Linux on the SMP, and a median of 1.07 with a maximum of 2.61 on the NUMA system. Our results also show that AMPS improves fairness and repeatability of application performance measurements.

References

  1. J. B. Andrews and C. D. Polychronopoulos. An analytical approach to performance/cost modeling of parallel computers. Journal of Parallel and Distributed Computing, 12(4):343--356, Aug. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Annavaram, E. Grochowski, and J. Shen. Mitigating Amdahl's law through EPI throttling. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 298--309, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The impact of performance asymmetry in emerging multicore architectures. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 506--517, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Banino, O. Beaumont, L. Carter, J. Ferrante, A. Legrand, and Y. Robert. Scheduling strategies for master-slave tasking on heterogeneous processor platforms. IEEE Transactions on Parallel and Distributed Systems, 15(4):319--330, Apr. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. A. Bender and M. O. Rabin. Scheduling Cilk multithreaded parallel programs on processors of different speeds. In Proceedings of the Twelfth ACM Symposium on Parallel Algorithms and Architectures, pages 13--21, July 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Y. Borkar, P. Dubey, K. C. Kahn, D. J. Kuck, H. Mulder, S. S. Pawlowski, and J. Rattner. Platform 2015: Intel® processor and platform evolution for the next decade. White Paper, Intel Corporation, 2005.Google ScholarGoogle Scholar
  7. R. Chandra, S. Devine, B. Verghese, A. Gupta, and M. Rosenblum. Scheduling and page migration for multiprocessor compute servers. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12--24, Oct. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. DeVuyst, R. Kumar, and D. M. Tullsen. Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In Proceedings of the 20th International Parallel and Distributed Processing Symposium, Apr. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In Proceedings of the 2005 USENIX Annual Technical Conference, pages 395--398, Apr. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. J. O. Figueiredo and J. A. B. Fortes. Impact of heterogeneity on DSM performance. In Proceedings of the Sixth IEEE Symposium on High-Performance Computer Architecture, pages 26--35, Jan. 2000.Google ScholarGoogle Scholar
  11. S. Ghiasi, T. Keller, and F. Rawson. Scheduling for heterogeneous processors in server systems. In Proceedings of the 2nd Conference on Computing Frontiers, pages 199--210, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. A. Hankins, G. N. Chinya, J. D. Collins, P. H. Wang, R. Rakvic, H. Wang, and J. P. Shen. Multiple instruction stream processor. In Proceedings of the 33rd Annual International Symposium on Computer Architecture, pages 114--127, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, pages 81--92, Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan. Heterogeneous chip multiprocessing. IEEE Computer, 38(11):32--38, Nov. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Proceedings of the 31st Annual International Symposium on Computer Architecture, pages 64--75, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. P. LaRowe, Jr., C. S. Ellis, and L. S. Kaplan. The robustness of NUMA memory management. In Proceedings of the 13th ACM Symposium on Operating System Principles, pages 137--151, Oct. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Linux Kernel Mailing List. Scalability of signal delivery for POSIX threads. http://lkml.org/lkml/2004/11/22/432, Nov. 2004.Google ScholarGoogle Scholar
  18. D. Menascé and V. Almeida. Cost-performance analysis of heterogeneity in supercomputer architectures. In Proceedings of the 1990 International Conference on Supercomputing, pages 169--177, June 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Pham, S. Asano, M. Bolliger, M. N. Day, H. P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, and K. Yazawa. The design and implementation of a first generation CELL* processor. In IEEE International Solid-State Circuits Conference Digest of Technical Papers, pages 184--185, Feb. 2005.Google ScholarGoogle ScholarCross RefCross Ref
  20. M. D. Powell, M. Gomaa, and T. Vijaykumar. Heat-and-run: Leveraging SMT and CMP to manage power density through the operating system. In Proceedings of the Eleventh International Conference on Architectural Support for Programming Languages and Operating Systems, pages 260--270, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. C. Sih and E. A. Lee. A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Transactions on Parallel and Distributed Systems, 4(2):175--187, Feb. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreading processor. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 234--244, Nov. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Topcuoglu, S. Hariri, and M.-Y. Wu. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems, 13(3):260--274, Mar. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. V. Uhlig, J. LeVasseur, E. Skoglund, and U. Dannowski. Towards scalable multiprocessor virtual machines. In Proceedings of the 3rd Virtual Machine Research and Technology Symposium, pages 43--56, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Zheng and J. Nieh. SWAP: A scheduler with automatic process dependency detection. In Proceedings of the First Symposium on Networked Systems Design and Implementation, pages 183--196, Mar. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Efficient operating system scheduling for performance-asymmetric multi-core architectures

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing
            November 2007
            723 pages
            ISBN:9781595937643
            DOI:10.1145/1362622

            Copyright © 2007 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 10 November 2007

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            SC '07 Paper Acceptance Rate54of268submissions,20%Overall Acceptance Rate1,516of6,373submissions,24%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader