skip to main content
10.1145/1854273.1854306acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

On mitigating memory bandwidth contention through bandwidth-aware scheduling

Authors Info & Claims
Published:11 September 2010Publication History

ABSTRACT

Shared-memory multiprocessors have dominated all platforms from high-end to desktop computers. On such platforms, it is well known that the interconnect between the processors and the main memory has become a major bottleneck. The bandwidth-aware job scheduling is an effective and relatively easy-to-implement way to relieve the bandwidth contention. Previous policies understood that bandwidth saturation hurt the throughput of parallel jobs so they scheduled the jobs to let the total bandwidth requirement equal to the system peak bandwidth. However, we found that intra-quantum fine-grained bandwidth contention still happened due to a program's irregular fluctuation in memory access intensity, which is mostly ignored in previous policies.

In this paper, we quantify the impact of bandwidth contention on overall performance. We found that concurrent jobs could achieve a higher memory bandwidth utilization at the expense of super-linear performance degradation. Based on such an observation, we proposed a new workload scheduling policy. Its basic idea is that interference due to bandwidth contention could be minimized when bandwidth utilization is maintained at the level of average bandwidth requirement of the workload. Our evaluation is based on both SPEC 2006 and NPB workloads. The evaluation results on randomly generated workloads show that our policy could improve the system throughput by 4.1% on average over the native OS scheduler, and up to 11.7% improvement has been observed.

References

  1. }}Nas parallel benchmarks. http://www.nas.nasa.gov/resources/software/npb.html.Google ScholarGoogle Scholar
  2. }}The perfmon2 website. http://perfmon2.sourceforge.net/.Google ScholarGoogle Scholar
  3. }}The sream benchmark website. http://www.streambench.org/.Google ScholarGoogle Scholar
  4. }}C. D. Antonopoulos, D. S. Nikolopoulos, and T. S. Papatheodorou. Scheduling algorithms with bus bandwidth considerations for smps. In Proceedings of the 2003 International Conference on Parallel Processing (ICPP'03), page 547, Oct 2003.Google ScholarGoogle ScholarCross RefCross Ref
  5. }}C. D. Antonopoulos, D. S. Nikolopoulos, and T. S. Papatheodorou. Realistic workload scheduling policies for taming the memory bandwidth bottleneck of smps. In Proceedings of the 2004 IEEE/ACM International Conference on High Performance Computing (HiPC'04), pages 286--296, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}D. Burger, J. R. Goodman, and A. Kägi. Memory bandwidth limitations of future microprocessors. In Proceedings of the 23rd annual international symposium on Computer architecture (ISCA'96), pages 78--89, New York, NY, USA, 1996. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}A. S. Dhodapkar and J. E. Smith. Comparing program phase detection techniques. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture (MICRO'03), page 217, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}E. Ebrahimi, O. Mutlu, and Y. N. Patt. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA-15, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  9. }}F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A framework for providing quality of service in chip multi-processors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'07), pages 343--355, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}E. Koukis and N. Koziris. Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of smps. In Proceedings of the 12th International Conference on Parallel and Distributed Systems (ICPADS'06), pages 345--354, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}A. Krste, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, W. L. Patterson, David A. andPlishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, University of California, Berkeley, 2006.Google ScholarGoogle Scholar
  12. }}J. Liedtke, M. Völp, and K. Elphinstone. Preliminary thoughts on memory-bus scheduling. In Proceedings of the 9th workshop on ACM SIGOPS European workshop, pages 207--210, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In IEEE 14th International Symposium on High Performance Computer Architecture, pages 367--378, 2008.Google ScholarGoogle Scholar
  14. }}N. R. Mahapatra and B. Venkatrao. The processor-memory bottleneck: problems and solutions. Crossroads, page 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. }}R. L. McGregor, C. D. Antonopoulos, and D. S. Nikolopoulos. Scheduling algorithms for effective thread pairing on hybrid multiprocessors. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), page 28.1, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}C. McNairy and R. Bhatia. Montecito: A dual-core, dual-thread itanium processor. IEEE Micro, 25:10--20, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. SIGARCH Comput. Archit. News, 31(2):336--349, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm. Rapidmrc: approximating l2 miss rate curves on commodity systems for online optimizations. In Proceeding of the 14th international conference on Architectural support for programming languages and operating systems (ASPLOS'09), pages 121--132, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. }}J. Wang, S. Zhou, K. Ahmed, and W. Long. Lsbatch: A distributed load sharing batch system. Technical report, Computer Systems Research Institute, University of Toronto, 1993.Google ScholarGoogle Scholar

Index Terms

  1. On mitigating memory bandwidth contention through bandwidth-aware scheduling

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques
    September 2010
    596 pages
    ISBN:9781450301787
    DOI:10.1145/1854273

    Copyright © 2010 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 11 September 2010

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate85of263submissions,32%

    Upcoming Conference

    PACT '24
    International Conference on Parallel Architectures and Compilation Techniques
    October 14 - 16, 2024
    Southern California , CA , USA

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader