research-article

Analysis and approximation of optimal co-scheduling on chip multiprocessors

Authors:
Yunlian Jiang

College of William and Mary, Williamsburg, VA, USA

College of William and Mary, Williamsburg, VA, USA
View Profile

,
Xipeng Shen

College of William and Mary, Williamsburg, VA, USA

College of William and Mary, Williamsburg, VA, USA
View Profile

,
Jie Chen

Thomas Jefferson National Accelerator Facility, Newport News, VA, USA

Thomas Jefferson National Accelerator Facility, Newport News, VA, USA
View Profile

,
Rahul Tripathi

University of South Florida, Tampa, FL, USA

University of South Florida, Tampa, FL, USA
View Profile

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniquesOctober 2008Pages 220–229https://doi.org/10.1145/1454115.1454146

Published:25 October 2008Publication History

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

Pages 220–229

ABSTRACT

Cache sharing among processors is important for Chip Multiprocessors to reduce inter-thread latency, but also brings cache contention, degrading program performance considerably. Recent studies have shown that job co-scheduling can effectively alleviate the contention, but it remains an open question how to efficiently find optimal co-schedules. Solving the question is critical for determining the potential of a co-scheduling system. This paper presents a theoretical analysis of the complexity of co-scheduling, proving its NP-completeness. Furthermore, for a special case when there are two sharers per chip, we propose an algorithm that finds the optimal co-schedules in polynomial time. For more complex cases, we design and evaluate a sequence of approximation algorithms, among which, the hierarchical matching algorithm produces near-optimal schedules and shows good scalability. This study facilitates the evaluation of co-scheduling systems, as well as offers some techniques directly usable in proactive job co-scheduling.

References

J. R. Bulpin and I. A. Pratt. Hyper-threading aware process scheduling heuristics. In 2005 USENIX Annual Technical Conference, pages 103--106, 2005. Google ScholarDigital Library
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), 2005. Google ScholarDigital Library
W. Cook and A. Rohe. Computing minimum-weight perfect matchings. INFORMS Journal on Computing, 11:138--148, 1999. Google ScholarDigital Library
P. Denning. Thrashing: Its causes and prevention. In Proceedings of the AFIPS 1968 Fall Joint Computer Conference, volume 33, pages 915--922, 1968.Google Scholar
M. DeVuyst, R. Kumar, and D. M. Tullsen. Exploiting unbalanced thread scheduling for energy and performance on a cmp of smt processors. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), 2006. Google ScholarDigital Library
J. Edmonds. Maximum matching and a polyhedron with 0,1-vertices. Journal of Research of the National Bureau of Standards B, 69B:125--130, 1965.Google ScholarCross Ref
A. El-Moursy, R. Garg, D. H. Albonesi, and S. Dwarkadas. Compatible phase co-scheduling on a cmp of multi-threaded processors. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), 2006. Google ScholarDigital Library
A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In USENIX Annual Technical Conference, 2005. Google ScholarDigital Library
A. Fedorova, M. Seltzer, and M. D. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2007. Google ScholarDigital Library
H. Gabow and R. E. Tarjan. Faster scaling algorithms for general graph-matching problems. Journal of ACM, 38:815--853, 1991. Google ScholarDigital Library
M. Garey and D. Johnson. Computers and Intractability. Feeman, San Francisco, CA, 1979.Google Scholar
L. R. Hsu, S. K. Reinhardt, R. Lyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006. Google ScholarDigital Library
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. Keckler. A nuca substrate for flexible cmp cache sharing. In Proceedings of International Conference on Supercomputing, pages 31--40, 2005. Google ScholarDigital Library
Y. Jiang and X. Shen. Exploration of the influence of program inputs on cmp co-scheduling. In European Conference on Parallel Computing (Euro-Par), August 2008. Google ScholarDigital Library
R. Karp. Reducibility among combinatiorial problems. In R. Miller and J. Thatcher, editors, Complexity of Computer Computations, pages 85--103. Plenum Press, 1972.Google ScholarCross Ref
S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2004. Google ScholarDigital Library
R. Kumar, D. M. Tullsen, and N. P. Jouppi. Core architecture optimization for heterogeneous chip multiprocessors. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006. Google ScholarDigital Library
J. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsletter, 1995. http://www.cs.virginia.edu/stream.Google Scholar
P. Nagpurkar, M. Hind, C. Krintz, P. F. Sweeney, and V. Rajan. Online phase detection algorithms. In Proceedings of the International Symposium on Code Generation and Optimization, March 2006. Google ScholarDigital Library
Nakijima and Pallipadi. Enhancements for hyperthreading technology in the operating system -- seeking the optimal scheduling. In Proceedings of USENIX Annual Technical Conference, 2002. Google ScholarDigital Library
S. Parekh, S. Eggers, H. Levy, and J. Lo. Thread-sensitive scheduling for smt processors. Technical Report 2000-04-02, University of Washington, June 2000.Google Scholar
N. Rafique, W. Lim, and M. Thottethodi. Architectural support for operating system-driven cmp cache management. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006. Google ScholarDigital Library
A. Settle, J. L. Kihm, A. Janiszewski, and D. A. Connors. Architectural support for enhanced smt job scheduling. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pages 63--73, 2004. Google ScholarDigital Library
X. Shen and J. Shaw. Scalable implementation of efficient locality approximation. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, 2008. Google ScholarDigital Library
X. Shen, J. Shaw, B. Meeker, and C. Ding. Locality approximation using time. In Proceedings of the ACM SIGPLAN Conference on Principles of Programming Languages (POPL), 2007. Google ScholarDigital Library
X. Shen, Y. Zhong, and C. Ding. Locality phase prediction. In Proceedings of the Eleventh International Conference on Architect ural Support for Programming Languages and Operating Systems (ASPLOS XI), Boston, MA, 2004. Google ScholarDigital Library
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, October 2002. Google ScholarDigital Library
A. Snavely and D. Tullsen. Symbiotic jobscheduling for a simultaneous multithreading processor. In Proceedings of ASPLOS, 2000. Google ScholarDigital Library
A. Snavely, D. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, 2002. Google ScholarDigital Library
H. Stone, J. Turek, and J. Wolf. Optimal partitioning of cache memory. IEEE Transactions on Computers, 41(9), 1992. Google ScholarDigital Library
G. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2002.Google ScholarCross Ref
N. Tuck and D. M. Tullsen. Initial observations of the simultaneous multithreading Pentium 4 processor. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, September 2003. Google ScholarDigital Library
X. Zhang, S. Dwarkadas, G. Folkmanis, and K. Shen. Processor hardware counter statistics as a first-class system resource. In Proceedings of the 11th Workshop on Hot Topics in Operating Systems, 2007. Google ScholarDigital Library
Y. Zhong and W. Chang. Sampling-based program locality approximation. In Proceedings of the International Symposium on Memory Management, 2008. Google ScholarDigital Library

Index Terms

Analysis and approximation of optimal co-scheduling on chip multiprocessors
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Multiprocessing / multiprogramming / multitasking
        Scheduling

Recommendations

The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions

In Chip Multiprocessors (CMPs) architecture, it is common that multiple cores share some on-chip cache. The sharing may cause cache thrashing and contention among co-running jobs. Job co-scheduling is an approach to tackling the problem by assigning ...
Read More
A study on optimally co-scheduling jobs of different lengths on chip multiprocessors
CF '09: Proceedings of the 6th ACM conference on Computing frontiers

Cache sharing in Chip Multiprocessors brings cache contention among corunning processes, which often causes considerable degradation of program performance and system fairness. Recent studies have seen the effectiveness of job co-scheduling in ...
Read More
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

Translation Lookaside Buffers (TLBs) are critical to overall system performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as Chip MultiProcessors (CMPs) become ubiquitous, TLB design and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques
October 2008
328 pages
ISBN:9781605582825
DOI:10.1145/1454115
General Chair:
Andreas Moshovos
University of Toronto, Canada
,
Program Chairs:
David Tarditi
Microsoft, USA
,
Kunle Olukotun
Stanford University, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 October 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CMP scheduling
cache contention
co-scheduling
perfect matching
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate121of471submissions,26%
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 140
  Total Citations
  View Citations
- 680
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Analysis and approximation of optimal co-scheduling on chip multiprocessors

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

ABSTRACT

References

Cited By

Index Terms

Recommendations

The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions

A study on optimally co-scheduling jobs of different lengths on chip multiprocessors

TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Analysis and approximation of optimal co-scheduling on chip multiprocessors

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

ABSTRACT

References

Cited By

Index Terms

Recommendations

The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions

A study on optimally co-scheduling jobs of different lengths on chip multiprocessors

TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media