research-article

On mitigating memory bandwidth contention through bandwidth-aware scheduling

Authors:

Pen-Chung YewAuthors Info & Claims

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Pages 237 - 248

https://doi.org/10.1145/1854273.1854306

Published: 11 September 2010 Publication History

Abstract

Shared-memory multiprocessors have dominated all platforms from high-end to desktop computers. On such platforms, it is well known that the interconnect between the processors and the main memory has become a major bottleneck. The bandwidth-aware job scheduling is an effective and relatively easy-to-implement way to relieve the bandwidth contention. Previous policies understood that bandwidth saturation hurt the throughput of parallel jobs so they scheduled the jobs to let the total bandwidth requirement equal to the system peak bandwidth. However, we found that intra-quantum fine-grained bandwidth contention still happened due to a program's irregular fluctuation in memory access intensity, which is mostly ignored in previous policies.

In this paper, we quantify the impact of bandwidth contention on overall performance. We found that concurrent jobs could achieve a higher memory bandwidth utilization at the expense of super-linear performance degradation. Based on such an observation, we proposed a new workload scheduling policy. Its basic idea is that interference due to bandwidth contention could be minimized when bandwidth utilization is maintained at the level of average bandwidth requirement of the workload. Our evaluation is based on both SPEC 2006 and NPB workloads. The evaluation results on randomly generated workloads show that our policy could improve the system throughput by 4.1% on average over the native OS scheduler, and up to 11.7% improvement has been observed.

References

[1]

}}Nas parallel benchmarks. http://www.nas.nasa.gov/resources/software/npb.html.

[2]

}}The perfmon2 website. http://perfmon2.sourceforge.net/.

[3]

}}The sream benchmark website. http://www.streambench.org/.

[4]

}}C. D. Antonopoulos, D. S. Nikolopoulos, and T. S. Papatheodorou. Scheduling algorithms with bus bandwidth considerations for smps. In Proceedings of the 2003 International Conference on Parallel Processing (ICPP'03), page 547, Oct 2003.

[5]

}}C. D. Antonopoulos, D. S. Nikolopoulos, and T. S. Papatheodorou. Realistic workload scheduling policies for taming the memory bandwidth bottleneck of smps. In Proceedings of the 2004 IEEE/ACM International Conference on High Performance Computing (HiPC'04), pages 286--296, 2004.

Digital Library

[6]

}}D. Burger, J. R. Goodman, and A. Kägi. Memory bandwidth limitations of future microprocessors. In Proceedings of the 23rd annual international symposium on Computer architecture (ISCA'96), pages 78--89, New York, NY, USA, 1996. ACM.

Digital Library

[7]

}}A. S. Dhodapkar and J. E. Smith. Comparing program phase detection techniques. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture (MICRO'03), page 217, Washington, DC, USA, 2003. IEEE Computer Society.

Digital Library

[8]

}}E. Ebrahimi, O. Mutlu, and Y. N. Patt. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA-15, 2009.

[9]

}}F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A framework for providing quality of service in chip multi-processors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'07), pages 343--355, Washington, DC, USA, 2007. IEEE Computer Society.

Digital Library

[10]

}}E. Koukis and N. Koziris. Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of smps. In Proceedings of the 12th International Conference on Parallel and Distributed Systems (ICPADS'06), pages 345--354, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[11]

}}A. Krste, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, W. L. Patterson, David A. andPlishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, University of California, Berkeley, 2006.

[12]

}}J. Liedtke, M. Völp, and K. Elphinstone. Preliminary thoughts on memory-bus scheduling. In Proceedings of the 9th workshop on ACM SIGOPS European workshop, pages 207--210, New York, NY, USA, 2000. ACM.

Digital Library

[13]

}}J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In IEEE 14th International Symposium on High Performance Computer Architecture, pages 367--378, 2008.

[14]

}}N. R. Mahapatra and B. Venkatrao. The processor-memory bottleneck: problems and solutions. Crossroads, page 2.

Digital Library

[15]

}}R. L. McGregor, C. D. Antonopoulos, and D. S. Nikolopoulos. Scheduling algorithms for effective thread pairing on hybrid multiprocessors. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), page 28.1, 2005.

Digital Library

[16]

}}C. McNairy and R. Bhatia. Montecito: A dual-core, dual-thread itanium processor. IEEE Micro, 25:10--20, 2005.

Digital Library

[17]

}}T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. SIGARCH Comput. Archit. News, 31(2):336--349, 2003.

Digital Library

[18]

}}D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm. Rapidmrc: approximating l2 miss rate curves on commodity systems for online optimizations. In Proceeding of the 14th international conference on Architectural support for programming languages and operating systems (ASPLOS'09), pages 121--132, 2009.

Digital Library

[19]

}}J. Wang, S. Zhou, K. Ahmed, and W. Long. Lsbatch: A distributed load sharing batch system. Technical report, Computer Systems Research Institute, University of Toronto, 1993.

Cited By

Copik MChrapek MSchmid LCalotoiu AHoefler T(2024)Software Resource Disaggregation for HPC with Serverless Computing2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00021(139-156)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00021
Modi GBagchi AJindal NMandal APanda P(2023)CABARRE: Request Response Arbitration for Shared Cache ManagementACM Transactions on Embedded Computing Systems10.1145/360809622:5s(1-24)Online publication date: 31-Oct-2023
https://dl.acm.org/doi/10.1145/3608096
Xu ZZhang YZhao SChen GLuo HHuang K(2023)DRL-based Task Scheduling and Shared Resource Allocation for Multi-Core Real-Time Systems2023 IEEE 3rd International Conference on Intelligent Technology and Embedded Systems (ICITES)10.1109/ICITES59818.2023.10356887(144-150)Online publication date: 27-Oct-2023
https://doi.org/10.1109/ICITES59818.2023.10356887
Show More Cited By

Index Terms

On mitigating memory bandwidth contention through bandwidth-aware scheduling
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Writeback-aware bandwidth partitioning for multi-core systems with PCM
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Phase-Change Memory (PCM) has emerged as a promising low-power candidate to replace DRAM in main memory. Hybrid memory architecture comprised of a large PCM and a small DRAM is a popular solution to mitigate undesirable characteristics of PCM writes. ...
Providing fairness on shared-memory multiprocessors via process scheduling
SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems

Competition for shared memory resources on multiprocessors is the most dominant cause for slowing down applications and makes their performance varies unpredictably. It exacerbates the need for Quality of Service (QoS) on such systems. In this paper, we ...
Kronos: towards bus contention-aware job scheduling in warehouse scale computers
Abstract
While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

September 2010

596 pages

ISBN:9781450301787

DOI:10.1145/1854273

General Chair:
Valentina Salapura
IBM TJ Watson Research Center
,
Program Chairs:
Michael Gschwind
IBM Systems & Technology Group
,
Jens Knoop
Technische Universität Wien

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IFIP WG 10.3: IFIP working group 10.3 on concurrent systems
IEEE CS TCPP: IEEE-CS technical committee on parallel processing
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PACT '10

Sponsor:

IFIP WG 10.3
IEEE CS TCPP
SIGARCH
IEEE CS TCAA

PACT '10: International Conference on Parallel Architectures and Compilation Techniques

September 11 - 15, 2010

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

54
Total Citations
View Citations
959
Total Downloads

Downloads (Last 12 months)52
Downloads (Last 6 weeks)6

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Copik MChrapek MSchmid LCalotoiu AHoefler T(2024)Software Resource Disaggregation for HPC with Serverless Computing2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00021(139-156)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00021
Modi GBagchi AJindal NMandal APanda P(2023)CABARRE: Request Response Arbitration for Shared Cache ManagementACM Transactions on Embedded Computing Systems10.1145/360809622:5s(1-24)Online publication date: 31-Oct-2023
https://dl.acm.org/doi/10.1145/3608096
Xu ZZhang YZhao SChen GLuo HHuang K(2023)DRL-based Task Scheduling and Shared Resource Allocation for Multi-Core Real-Time Systems2023 IEEE 3rd International Conference on Intelligent Technology and Embedded Systems (ICITES)10.1109/ICITES59818.2023.10356887(144-150)Online publication date: 27-Oct-2023
https://doi.org/10.1109/ICITES59818.2023.10356887
Lurbe MFeliu JPetit SGomez MSahuquillo J(2022)DeepP: Deep Learning Multi-Program Prefetch Configuration for the IBM POWER 8IEEE Transactions on Computers10.1109/TC.2021.313999771:10(2646-2658)Online publication date: 1-Oct-2022
https://doi.org/10.1109/TC.2021.3139997
Shawky AEl-Kharashi MSafar MDessouky M(2022)Self-Optimizing Memory Controllers: Proposing Request-level Scheduling2022 Second International Conference on Computer Science, Engineering and Applications (ICCSEA)10.1109/ICCSEA54677.2022.9936277(1-5)Online publication date: 8-Sep-2022
https://doi.org/10.1109/ICCSEA54677.2022.9936277
Xue SZhao SChen QSong ZChen SMa TYang YZheng WGuo M(2022)Kronos: towards bus contention-aware job scheduling in warehouse scale computersFrontiers of Computer Science10.1007/s11704-021-0418-517:1Online publication date: 8-Aug-2022
https://doi.org/10.1007/s11704-021-0418-5
Gowanlock MGallet B(2021)Data-Intensive Computing Modules for Teaching Parallel and Distributed Computing2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW52791.2021.00062(350-357)Online publication date: Jun-2021
https://doi.org/10.1109/IPDPSW52791.2021.00062
Herrera AIbanez MStafford EBosque J(2021)A Simulator for Intelligent Workload Managers in Heterogeneous Clusters2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid51090.2021.00029(196-205)Online publication date: May-2021
https://doi.org/10.1109/CCGrid51090.2021.00029
Goponenko AIzadpanah RBrandt JDechev D(2020)Towards workload-adaptive scheduling for HPC clusters2020 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER49012.2020.00064(449-453)Online publication date: Sep-2020
https://doi.org/10.1109/CLUSTER49012.2020.00064
Zhao BHou RDong JHuang MMckee SZhang QLiu YLi YZhang LMeng D(2019)VeniceACM Transactions on Computer Systems10.1145/331036036:1(1-26)Online publication date: 14-Mar-2019
https://dl.acm.org/doi/10.1145/3310360
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten