skip to main content
10.1145/2442516.2442562acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
poster

Work-stealing with configurable scheduling strategies

Published:23 February 2013Publication History

ABSTRACT

Work-stealing systems are typically oblivious to the nature of the tasks they are scheduling. They do not know or take into account how long a task will take to execute or how many subtasks it will spawn. Moreover, task execution order is typically determined by an underlying task storage data structure, and cannot be changed. There are thus possibilities for optimizing task parallel executions by providing information on specific tasks and their preferred execution order to the scheduling system.

We investigate generalizations of work-stealing and introduce a framework enabling applications to dynamically provide hints on the nature of specific tasks using scheduling strategies. Strategies can be used to independently control both local task execution and steal order. Strategies allow optimizations on specific tasks, in contrast to more conventional scheduling policies that are typically global in scope. Strategies are composable and allow different, specific scheduling choices for different parts of an application simultaneously. We have implemented a work-stealing system based on our strategy framework. A series of benchmarks demonstrates beneficial effects that can be achieved with scheduling strategies.

References

  1. U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems, 35(3):321--347, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  2. N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2):115--144, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  3. P. Berenbrink, T. Friedetzky, and L. A. Goldberg. The natural work-stealing algorithm is stable. In In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS, pages 178--187, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing, 37(1):55--69, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5):720--748, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA, pages 519--538, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Clausen and J. L. Traff. Implementation of parallel branch-and-bound algorithms -- experiences with the graph partitioning problem. Annals of Operations Research, 33:331--349, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  8. R. Cole and V. Ramachandran. Resource oblivious sorting on multicores. In Automata, Languages and Programming, 37th International Colloquium (ICALP) Proceedings, Part I, volume 6198 of Lecture Notes in Computer Science, pages 226--237, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. G. Crainic, B. L. Cun, and C. Roucairol. Parallel branch-and-bound algorithms. In E.-G. Talbi, editor, Parallel Combinatorial Optimization, pages 1--28. Wiley, 2006.Google ScholarGoogle Scholar
  10. F. Evans, S. Skiena, and A. Varshney. Optimizing triangle strips for fast rendering. In Visualization'96. Proceedings., pages 319--326. IEEE, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the memory hierarchy. In ACM/IEEE Supercomputing, page 83, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Guo, R. Barik, R. Raman, and V. Sarkar. Work-first and help-first scheduling policies for async-finish task parallelism. In Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1--12, may 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Guo, J. Zhao, V. Cavé, and V. Sarkar. SLAW: A scalable locality-aware adaptive work-stealing scheduler. In 24th IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pages 1--12, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  14. K. T. Herley, A. Pietracaprina, and G. Pucci. Fast deterministic parallel branch-and-bound. Parallel Processing Letters, 9(3):325--333, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Houston, J. Y. Park, M. Ren, T. J. Knight, K. Fatahalian, A. Aiken, W. J. Dally, and P. Hanrahan. A portable runtime interface for multi-level memory hierarchies. In 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP), pages 143--152, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. M. Karp and Y. Zhang. Randomized parallel algorithms for backtrack search and branch-and-bound computation. Journal of the ACM, 40(3):765--789, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Kukanov and M. J. Voss. The foundations for scalable multi-core software in Intel Threading Building Blocks. Intel Technology Journal, 11(4), 2007.Google ScholarGoogle Scholar
  18. C. E. Leiserson. The CilkGoogle ScholarGoogle Scholar
  19. concurrency platform. The Journal of Supercomputing, 51(3):244--257, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Lenharth, D. Nguyen, and K. Pingali. Priority queues are not good concurrent priority schedulers. Technical Report TR-11--39, Department of Computer Science, The University of Texas at Austin, 2011.Google ScholarGoogle Scholar
  21. S. Olivier, J. Huan, J. Liu, J. Prins, J. Dinan, P. Sadayappan, and C. Tseng. Uts: An unbalanced tree search benchmark. Languages and Compilers for Parallel Computing, pages 235--250, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  22. C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Sanders. Fast priority queues for parallel branch-and-bound. In Parallel Algorithms for Irregularly Structured Problems, Second International Workshop, (IRREGULAR), volume 980 of Lecture Notes in Computer Science, pages 379--393, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Song, A. YarKhan, and J. Dongarra. Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 19:1--19:11, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Squillante and E. Lazowska. Using processor-cache affinity information in shared-memory multiprocessor scheduling. IEEE Transactions on Parallel and Distributed Systems, 4(2):131--143, feb 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Weissman. Performance counters and state sharing annotations: a unified approach to thread locality. In Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, ASPLOS-VIII, pages 127--138, New York, NY, USA, 1998. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Work-stealing with configurable scheduling strategies

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
          February 2013
          332 pages
          ISBN:9781450319225
          DOI:10.1145/2442516
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 48, Issue 8
            PPoPP '13
            August 2013
            309 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2517327
            Issue’s Table of Contents

          Copyright © 2013 Authors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 February 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • poster

          Acceptance Rates

          Overall Acceptance Rate230of1,014submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader