skip to main content
10.1145/1854273.1854307acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

AKULA: a toolset for experimenting and developing thread placement algorithms on multicore systems

Published: 11 September 2010 Publication History

Abstract

Multicore processors have become commonplace in both desk-top and servers. A serious challenge with multicore processors is that cores share on and o chip resources such as caches, memory buses, and memory controllers. Competition for these shared resources between threads running on different cores can result in severe and unpredictable performance degradations. It has been shown in previous work that the OS scheduler can be made shared-resource-aware and can greatly reduce the negative e ects of resource contention. The search space of potential scheduling algorithms is huge considering the diversity of available multicore architectures, an almost infinite set of potential workloads, and a variety of conflicting performance goals. We believe the two biggest obstacles to developing new scheduling algorithms are the difficulty of implementation and the duration of testing. We address both of these challenges with our toolset AKULA which we introduce in this paper. AKULA provides an API that allows developers to implement and debug scheduling algorithms easily and quickly without the need to modify the kernel or use system calls. AKULA also provides a rapid evaluation module, based on a novel evaluation technique also introduced in this paper, which allows the created scheduling algorithm to be tested on a wide variety of work-loads in just a fraction of the time testing on real hardware would take. AKULA also facilitates running scheduling algorithms created with its API on real machines without the need for additional modifications. We use AKULA to develop and evaluate a variety of different contention-aware scheduling algorithms. We use the rapid evaluation module to test our algorithms on thousands of workloads and assess their scalability to futuristic massively multicore machines.

References

[1]
}}S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The Impact of Performance Asymmetry in Emerging Multicore Architectures. SIGARCH CAN, 33(2):506--517, 2005.
[2]
}}M. Becchi and P. Crowley. Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures. In Proc. of Computing Frontiers '06, 2006.
[3]
}}J. M. Calandrino, D. P. Baumberger, T. Li, J. C. Young, and S. Hahn. LinSched: The Linux Scheduler Simulator. In ISCA PDCCS, 2008.
[4]
}}G. Dhiman, G. Marchetti, and T. Rosing. vGreen: a System for Energy Efficient Computing in Virtualized Environments. In ISLPED, 2009.
[5]
}}S. Ghiasi, T. Keller, and F. Rawson. Scheduling for heterogeneous processors in server systems. In CF '05: Proceedings of the 2nd Conference on Computing Frontiers, pages 199--210, 2005.
[6]
}}Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 220--229, New York, NY, USA, 2008. ACM.
[7]
}}V. Kazempour, A. Kamali, and A. Fedorova. AASH: An Asymmetry-Aware Scheduler for Hypervisors. In Proceedings of ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), 2010.
[8]
}}R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS Observations to Improve Performance in Multicore Systems. IEEE Micro, 28(3):54--66, 2008.
[9]
}}D. Koufaty, D. Reddy, and S. Hahn. Bias Scheduling in Heterogeneous Multicore Architectures. In Proceedings of the 5th ACM European Conference on Computer Systems (EuroSys), 2010.
[10]
}}R. Kumar, D. M. Tullsen, and P. Ranganathan et al. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. In Proc. of ISCA '04.
[11]
}}T. Li, D. Baumberger, D. A. Koufaty, and S. Hahn. Efficient Operating System Scheduling for Performance-Asymmetric Multi-core Architectures. In SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pages 1--11, 2007.
[12]
}}P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A Full System Simulation Platform. IEEE Computer, 35, 2002.
[13]
}}A. Merkel, J. Stoess, and F. Bellosa. Resource-conscious Scheduling for Energy Efficiency on Multicore Processors. In Proceedings of the 5th ACM European Conference on Computer Systems (EuroSys), 2010.
[14]
}}M. Moudgill, P. Bose, and J. H. Moreno. Validation of Turandot, a Fast Processor Model for Microarchitecture Exploration. In IPCCC, pages 451--457, 1999.
[15]
}}J. Saez, A. Fedorova, M. Prieto, and H. Vegas. Operating System Support for Mitigating Software Scalability Bottlenecks on Asymmetric Multicore Processors. In Proceedings of the ACM International Conference on Computing Frontiers (CF), 2010.
[16]
}}J. Saez, M. Prieto, A. Fedorova, and S. Blagodurov. A Comprehensive Scheduler for Asymmetric Multicore Processors. In Proceedings of the 5th ACM European Conference on Computer Systems (EuroSys), 2010.
[17]
}}D. Shelepov, J. C. Saez, and S. Jeffery et al. HASS: a Scheduler for Heterogeneous Multicore Systems. ACM Operating System Review, 43(2), 2009.
[18]
}}C. I. Simplescalar, D. Burger, and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. Technical report, 1997.
[19]
}}A. Snavely and D. M. Tullsen. Symbiotic Jobscheduling for a Simultaneous Multithreading Processor. In ASPLOS 2000, 2000.
[20]
}}R. Teodorescu and J. Torrellas. Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors. In Proc. of ISCA '08, 2008.
[21]
}}K. Tian, Y. Jiang, and X. Shen. A study on optimally co-scheduling jobs of different lengths on chip multiprocessors. In CF '09: Proceedings of the 6th ACM conference on Computing frontiers, pages 41--50, New York, NY, USA, 2009. ACM.
[22]
}}J. A. Winter and D. H. Albonesi. Scheduling Algorithms for Unpredictably Heterogeneous CMP Architectures. In Proc. of DSN '08, pages 42--51, 2008.
[23]
}}X. Zhang, S. Dwarkadas, and K. Shen. Hardware Execution Throttling for Multi-core Resource Management. In USENIX Annual Technical Conference, 2009.
[24]
}}S. Zhuravlev and S. Blagodurov. Fast simulations of 1000 core system. In SOSP 2009 Posters and Work In Progress, 2009.
[25]
}}S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing Contention on Multicore Processors via Scheduling. In ASPLOS, 2010.

Cited By

View all
  • (2021)Joint security and performance improvement in multilevel shared cachesIET Information Security10.1049/ise2.1202315:4(297-308)Online publication date: 14-Apr-2021
  • (2019)RaceR: A Thread Mapping Algorithm for Race Reduction in Multi-Level Shared Caches2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/EMPDP.2019.8671576(228-232)Online publication date: Feb-2019
  • (2017)PoIiCymProceedings of the 28th International Symposium on Rapid System Prototyping: Shortening the Path from Specification to Prototype10.1145/3130265.3130321(23-29)Online publication date: 19-Oct-2017
  • Show More Cited By

Index Terms

  1. AKULA: a toolset for experimenting and developing thread placement algorithms on multicore systems

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques
September 2010
596 pages
ISBN:9781450301787
DOI:10.1145/1854273
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. contention-aware scheduling
  2. multicore simulation

Qualifiers

  • Research-article

Conference

PACT '10
Sponsor:
  • IFIP WG 10.3
  • IEEE CS TCPP
  • SIGARCH
  • IEEE CS TCAA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Joint security and performance improvement in multilevel shared cachesIET Information Security10.1049/ise2.1202315:4(297-308)Online publication date: 14-Apr-2021
  • (2019)RaceR: A Thread Mapping Algorithm for Race Reduction in Multi-Level Shared Caches2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/EMPDP.2019.8671576(228-232)Online publication date: Feb-2019
  • (2017)PoIiCymProceedings of the 28th International Symposium on Rapid System Prototyping: Shortening the Path from Specification to Prototype10.1145/3130265.3130321(23-29)Online publication date: 19-Oct-2017
  • (2016)Thread-Aware Adaptive Prefetcher on Multicore SystemsACM Transactions on Architecture and Code Optimization10.1145/289050513:1(1-25)Online publication date: 28-Mar-2016
  • (2016)PMCTrack: Delivering Performance Monitoring Counter Support to the OS SchedulerThe Computer Journal10.1093/comjnl/bxw06560:1(60-85)Online publication date: 8-Sep-2016
  • (2014)Low-Overhead Network-on-Chip Support for Location-Oblivious Task PlacementIEEE Transactions on Computers10.1109/TC.2012.24163:6(1487-1500)Online publication date: 1-Jun-2014
  • (2014)Cache Coherence Method for Improving Multi-threaded Applications on Multicore SystemsProceedings of the 2014 6th International Conference on Multimedia, Computer Graphics and Broadcasting10.1109/MulGraB.2014.18(47-50)Online publication date: 20-Dec-2014
  • (2014)A Thread-Aware Adaptive Data Prefetcher2014 IEEE 32nd International Conference on Computer Design (ICCD)10.1109/ICCD.2014.6974694(278-285)Online publication date: Oct-2014
  • (2013)A flexible simulation framework for multicore schedulersProceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/2486092.2486140(355-360)Online publication date: 19-May-2013
  • (2013)Scheduling optimization in multicore multithreaded microprocessors through dynamic modelingProceedings of the ACM International Conference on Computing Frontiers10.1145/2482767.2482774(1-10)Online publication date: 14-May-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media