skip to main content
10.1145/1454115.1454146acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Analysis and approximation of optimal co-scheduling on chip multiprocessors

Authors Info & Claims
Published:25 October 2008Publication History

ABSTRACT

Cache sharing among processors is important for Chip Multiprocessors to reduce inter-thread latency, but also brings cache contention, degrading program performance considerably. Recent studies have shown that job co-scheduling can effectively alleviate the contention, but it remains an open question how to efficiently find optimal co-schedules. Solving the question is critical for determining the potential of a co-scheduling system. This paper presents a theoretical analysis of the complexity of co-scheduling, proving its NP-completeness. Furthermore, for a special case when there are two sharers per chip, we propose an algorithm that finds the optimal co-schedules in polynomial time. For more complex cases, we design and evaluate a sequence of approximation algorithms, among which, the hierarchical matching algorithm produces near-optimal schedules and shows good scalability. This study facilitates the evaluation of co-scheduling systems, as well as offers some techniques directly usable in proactive job co-scheduling.

References

  1. J. R. Bulpin and I. A. Pratt. Hyper-threading aware process scheduling heuristics. In 2005 USENIX Annual Technical Conference, pages 103--106, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. Cook and A. Rohe. Computing minimum-weight perfect matchings. INFORMS Journal on Computing, 11:138--148, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Denning. Thrashing: Its causes and prevention. In Proceedings of the AFIPS 1968 Fall Joint Computer Conference, volume 33, pages 915--922, 1968.Google ScholarGoogle Scholar
  5. M. DeVuyst, R. Kumar, and D. M. Tullsen. Exploiting unbalanced thread scheduling for energy and performance on a cmp of smt processors. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Edmonds. Maximum matching and a polyhedron with 0,1-vertices. Journal of Research of the National Bureau of Standards B, 69B:125--130, 1965.Google ScholarGoogle ScholarCross RefCross Ref
  7. A. El-Moursy, R. Garg, D. H. Albonesi, and S. Dwarkadas. Compatible phase co-scheduling on a cmp of multi-threaded processors. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In USENIX Annual Technical Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Fedorova, M. Seltzer, and M. D. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Gabow and R. E. Tarjan. Faster scaling algorithms for general graph-matching problems. Journal of ACM, 38:815--853, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Garey and D. Johnson. Computers and Intractability. Feeman, San Francisco, CA, 1979.Google ScholarGoogle Scholar
  12. L. R. Hsu, S. K. Reinhardt, R. Lyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. Keckler. A nuca substrate for flexible cmp cache sharing. In Proceedings of International Conference on Supercomputing, pages 31--40, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Jiang and X. Shen. Exploration of the influence of program inputs on cmp co-scheduling. In European Conference on Parallel Computing (Euro-Par), August 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Karp. Reducibility among combinatiorial problems. In R. Miller and J. Thatcher, editors, Complexity of Computer Computations, pages 85--103. Plenum Press, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  16. S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Kumar, D. M. Tullsen, and N. P. Jouppi. Core architecture optimization for heterogeneous chip multiprocessors. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsletter, 1995. http://www.cs.virginia.edu/stream.Google ScholarGoogle Scholar
  19. P. Nagpurkar, M. Hind, C. Krintz, P. F. Sweeney, and V. Rajan. Online phase detection algorithms. In Proceedings of the International Symposium on Code Generation and Optimization, March 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nakijima and Pallipadi. Enhancements for hyperthreading technology in the operating system -- seeking the optimal scheduling. In Proceedings of USENIX Annual Technical Conference, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Parekh, S. Eggers, H. Levy, and J. Lo. Thread-sensitive scheduling for smt processors. Technical Report 2000-04-02, University of Washington, June 2000.Google ScholarGoogle Scholar
  22. N. Rafique, W. Lim, and M. Thottethodi. Architectural support for operating system-driven cmp cache management. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Settle, J. L. Kihm, A. Janiszewski, and D. A. Connors. Architectural support for enhanced smt job scheduling. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pages 63--73, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Shen and J. Shaw. Scalable implementation of efficient locality approximation. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Shen, J. Shaw, B. Meeker, and C. Ding. Locality approximation using time. In Proceedings of the ACM SIGPLAN Conference on Principles of Programming Languages (POPL), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Shen, Y. Zhong, and C. Ding. Locality phase prediction. In Proceedings of the Eleventh International Conference on Architect ural Support for Programming Languages and Operating Systems (ASPLOS XI), Boston, MA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, October 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Snavely and D. Tullsen. Symbiotic jobscheduling for a simultaneous multithreading processor. In Proceedings of ASPLOS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Snavely, D. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Stone, J. Turek, and J. Wolf. Optimal partitioning of cache memory. IEEE Transactions on Computers, 41(9), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  32. N. Tuck and D. M. Tullsen. Initial observations of the simultaneous multithreading Pentium 4 processor. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, September 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. X. Zhang, S. Dwarkadas, G. Folkmanis, and K. Shen. Processor hardware counter statistics as a first-class system resource. In Proceedings of the 11th Workshop on Hot Topics in Operating Systems, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Zhong and W. Chang. Sampling-based program locality approximation. In Proceedings of the International Symposium on Memory Management, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Analysis and approximation of optimal co-scheduling on chip multiprocessors

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques
    October 2008
    328 pages
    ISBN:9781605582825
    DOI:10.1145/1454115

    Copyright © 2008 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 25 October 2008

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate121of471submissions,26%

    Upcoming Conference

    PACT '24
    International Conference on Parallel Architectures and Compilation Techniques
    October 14 - 16, 2024
    Southern California , CA , USA

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader