research-article

Efficient operating system scheduling for performance-asymmetric multi-core architectures

Authors:
Tong Li

Intel Corporation

Intel Corporation
View Profile

,
Dan Baumberger

Intel Corporation

Intel Corporation
View Profile

,
David A. Koufaty

Intel Corporation

Intel Corporation
View Profile

,
Scott Hahn

Intel Corporation

Intel Corporation
View Profile

SC '07: Proceedings of the 2007 ACM/IEEE conference on SupercomputingNovember 2007Article No.: 53Pages 1–11https://doi.org/10.1145/1362622.1362694

Published:10 November 2007Publication History

SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing

Pages 1–11

ABSTRACT

Recent research advocates asymmetric multi-core architectures, where cores in the same processor can have different performance. These architectures support single-threaded performance and multithreaded throughput at lower costs (e.g., die size and power). However, they also pose unique challenges to operating systems, which traditionally assume homogeneous hardware. This paper presents AMPS, an operating system scheduler that efficiently supports both SMP-and NUMA-style performance-asymmetric architectures. AMPS contains three components: asymmetry-aware load balancing, faster-core-first scheduling, and NUMA-aware migration. We have implemented AMPS in Linux kernel 2.6.16 and used CPU clock modulation to emulate performance asymmetry on an SMP and NUMA system. For various workloads, we show that AMPS achieves a median speedup of 1.16 with a maximum of 1.44 over stock Linux on the SMP, and a median of 1.07 with a maximum of 2.61 on the NUMA system. Our results also show that AMPS improves fairness and repeatability of application performance measurements.

References

J. B. Andrews and C. D. Polychronopoulos. An analytical approach to performance/cost modeling of parallel computers. Journal of Parallel and Distributed Computing, 12(4):343--356, Aug. 1991. Google ScholarDigital Library
M. Annavaram, E. Grochowski, and J. Shen. Mitigating Amdahl's law through EPI throttling. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 298--309, June 2005. Google ScholarDigital Library
S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The impact of performance asymmetry in emerging multicore architectures. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 506--517, June 2005. Google ScholarDigital Library
C. Banino, O. Beaumont, L. Carter, J. Ferrante, A. Legrand, and Y. Robert. Scheduling strategies for master-slave tasking on heterogeneous processor platforms. IEEE Transactions on Parallel and Distributed Systems, 15(4):319--330, Apr. 2004. Google ScholarDigital Library
M. A. Bender and M. O. Rabin. Scheduling Cilk multithreaded parallel programs on processors of different speeds. In Proceedings of the Twelfth ACM Symposium on Parallel Algorithms and Architectures, pages 13--21, July 2000. Google ScholarDigital Library
S. Y. Borkar, P. Dubey, K. C. Kahn, D. J. Kuck, H. Mulder, S. S. Pawlowski, and J. Rattner. Platform 2015: Intel® processor and platform evolution for the next decade. White Paper, Intel Corporation, 2005.Google Scholar
R. Chandra, S. Devine, B. Verghese, A. Gupta, and M. Rosenblum. Scheduling and page migration for multiprocessor compute servers. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12--24, Oct. 1994. Google ScholarDigital Library
M. DeVuyst, R. Kumar, and D. M. Tullsen. Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In Proceedings of the 20th International Parallel and Distributed Processing Symposium, Apr. 2006. Google ScholarDigital Library
A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In Proceedings of the 2005 USENIX Annual Technical Conference, pages 395--398, Apr. 2005. Google ScholarDigital Library
R. J. O. Figueiredo and J. A. B. Fortes. Impact of heterogeneity on DSM performance. In Proceedings of the Sixth IEEE Symposium on High-Performance Computer Architecture, pages 26--35, Jan. 2000.Google Scholar
S. Ghiasi, T. Keller, and F. Rawson. Scheduling for heterogeneous processors in server systems. In Proceedings of the 2nd Conference on Computing Frontiers, pages 199--210, May 2005. Google ScholarDigital Library
R. A. Hankins, G. N. Chinya, J. D. Collins, P. H. Wang, R. Rakvic, H. Wang, and J. P. Shen. Multiple instruction stream processor. In Proceedings of the 33rd Annual International Symposium on Computer Architecture, pages 114--127, June 2006. Google ScholarDigital Library
R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, pages 81--92, Dec. 2003. Google ScholarDigital Library
R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan. Heterogeneous chip multiprocessing. IEEE Computer, 38(11):32--38, Nov. 2005. Google ScholarDigital Library
R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Proceedings of the 31st Annual International Symposium on Computer Architecture, pages 64--75, June 2004. Google ScholarDigital Library
R. P. LaRowe, Jr., C. S. Ellis, and L. S. Kaplan. The robustness of NUMA memory management. In Proceedings of the 13th ACM Symposium on Operating System Principles, pages 137--151, Oct. 1991. Google ScholarDigital Library
Linux Kernel Mailing List. Scalability of signal delivery for POSIX threads. http://lkml.org/lkml/2004/11/22/432, Nov. 2004.Google Scholar
D. Menascé and V. Almeida. Cost-performance analysis of heterogeneity in supercomputer architectures. In Proceedings of the 1990 International Conference on Supercomputing, pages 169--177, June 1990. Google ScholarDigital Library
D. Pham, S. Asano, M. Bolliger, M. N. Day, H. P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, and K. Yazawa. The design and implementation of a first generation CELL* processor. In IEEE International Solid-State Circuits Conference Digest of Technical Papers, pages 184--185, Feb. 2005.Google ScholarCross Ref
M. D. Powell, M. Gomaa, and T. Vijaykumar. Heat-and-run: Leveraging SMT and CMP to manage power density through the operating system. In Proceedings of the Eleventh International Conference on Architectural Support for Programming Languages and Operating Systems, pages 260--270, Oct. 2004. Google ScholarDigital Library
G. C. Sih and E. A. Lee. A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Transactions on Parallel and Distributed Systems, 4(2):175--187, Feb. 1993. Google ScholarDigital Library
A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreading processor. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 234--244, Nov. 2000. Google ScholarDigital Library
H. Topcuoglu, S. Hariri, and M.-Y. Wu. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems, 13(3):260--274, Mar. 2002. Google ScholarDigital Library
V. Uhlig, J. LeVasseur, E. Skoglund, and U. Dannowski. Towards scalable multiprocessor virtual machines. In Proceedings of the 3rd Virtual Machine Research and Technology Symposium, pages 43--56, May 2004. Google ScholarDigital Library
H. Zheng and J. Nieh. SWAP: A scheduler with automatic process dependency detection. In Proceedings of the First Symposium on Networked Systems Design and Implementation, pages 183--196, Mar. 2004. Google ScholarDigital Library

Efficient operating system scheduling for performance-asymmetric multi-core architectures

Recommendations

Virtualizing performance asymmetric multi-core systems
ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture

Performance-asymmetric multi-cores consist of heterogeneous cores, which support the same ISA, but have different computing capabilities. To maximize the throughput of asymmetric multi-core systems, operating systems are responsible for scheduling ...
Read More
Accelerating critical section execution with asymmetric multi-core architectures
ASPLOS 2009

To improve the performance of a single application on Chip Multiprocessors (CMPs), the application must be split into threads which execute concurrently on multiple cores. In multi-threaded applications, critical sections are used to ensure that only ...
Read More
Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil

Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time-consuming, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing
November 2007
723 pages
ISBN:9781595937643
DOI:10.1145/1362622
General Chair:
Becky Verastegui
Oak Ridge National Laboratory
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 November 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
SC '07 Paper Acceptance Rate54of268submissions,20%Overall Acceptance Rate1,516of6,373submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 186
  Total Citations
  View Citations
- 2,671
  Total Downloads
- Downloads (Last 12 months)68
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient operating system scheduling for performance-asymmetric multi-core architectures

SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing

ABSTRACT

References

Cited By

Recommendations

Virtualizing performance asymmetric multi-core systems

Accelerating critical section execution with asymmetric multi-core architectures

Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient operating system scheduling for performance-asymmetric multi-core architectures

SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing

ABSTRACT

References

Cited By

Recommendations

Virtualizing performance asymmetric multi-core systems

Accelerating critical section execution with asymmetric multi-core architectures

Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media