research-article

Performance driven multi-objective distributed scheduling for parallel computations

Authors:

Ankur Narang,

Abhinav Srivastava,

Naga Praveen Kumar Katta,

Rudrapatna K. ShyamasundarAuthors Info & Claims

ACM SIGOPS Operating Systems Review, Volume 45, Issue 2

Pages 14 - 27

https://doi.org/10.1145/2007183.2007186

Published: 18 July 2011 Publication History

Get Access

Abstract

With the advent of many-core architectures and strong need for Petascale (and Exascale) performance in scientific domains and industry analytics, efficient scheduling of parallel computations for higher productivity and performance has become very important. Further, movement of massive amounts (Terabytes to Petabytes) of data is very expensive, which necessitates affinity driven computations. Therefore, distributed scheduling of parallel computations on multiple places 1 needs to optimize multiple performance objectives: follow affinity maximally and ensure efficient space, time and message complexity. Simultaneous consideration of these objectives makes distributed scheduling a particularly challenging problem. In addition, parallel computations have data dependent execution patterns which requires online scheduling to effectively optimize the computation orchestration as it unfolds.

This paper presents an online algorithm for affinity driven distributed scheduling of multi-place 2 parallel computations. To optimize multiple performance objectives simultaneously, our algorithm uses a low time and message complexity mechanism for ensuring affinity and a randomized work-stealing mechanism within places for load balancing. Theoretical analysis of the expected and probabilistic lower and upper bounds on time and message complexity of this algorithm has been provided. On multi-core clusters such as Blue Gene/P (MPP architecture) and Intel multicore cluster, we demonstrate performance close to the custom MPI+Pthreads code. Further, strong, weak and data (increasing input data size) scalability have been demonstrated on multi-core clusters. Using well known benchmarks, we demonstrate 16% to 30% performance gain as compared to Cilk [6] on multi-core Intel Xeon 5570 (NUMA) architecture. Detailed experimental analysis illustrates efficient space (main memory) utilization as well. To the best of our knowledge, this is the first time multi-objective affinity driven distributed scheduling algorithm has been designed, theoretically analyzed and experimentally evaluated in a multi-place setup for multi-core cluster architectures.

References

[1]

Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. The data locality of work stealing. In SPAA, pages 1--12, New York, NY, USA, December 2000.

Digital Library

Google Scholar

[2]

S. Agarwal, R.Barik, D. Bonachea, V. Sarkar, R. K. Shyamasundar, and K. Yellick. Deadlock-free scheduling of x10 computations with bounded resources. In SPAA, pages 229--240, San Diego, CA, USA, December 2007.

Digital Library

Google Scholar

[3]

Eric Allan, David Chase, Victor Luchangco, Jan-Willem Maessen, Sukyoung Ryu, Guy L. Steele Jr., and Sam Tobin-Hochstadt. The Fortress language specification version 0.618. Technical report, Sun Microsystems, apr 2005.

Google Scholar

[4]

Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. Thread scheduling for multiprogrammed multiprocessors. In SPAA, pages 119--129, Puerto Vallarta, Mexico, 1998.

Digital Library

Google Scholar

[5]

P. Berenbrink, T. Friedetzky, and L.A. Goldberg. A natural work-stealing algorithm is stable. In Proceedings of the 42th IEEE Symposium on Foundations of Computer Science (FOCS), pages 178--187, 2001.

Digital Library

Google Scholar

[6]

Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46(5):720--748, 1999.

Digital Library

Google Scholar

[7]

Robert D. Blumofe and Philip A. Lisiecki. Adaptive and reliable parallel computing on networks of workstations. In USENIX Annual Technical Conference, Anaheim, California, 1997.

Digital Library

Google Scholar

[8]

Bradford L. ChamberLain, David Callahan, and Hans P. Zima. Parallel Programmability and the Chapel Language. International Journal of High Performance Computing Applications, 21(3):291--312, August 2007.

Digital Library

Google Scholar

[9]

Philippe Charles, Christopher Donawa, Kemal Ebcioglu, Christian Grothoff, Allan Kielstra, Christoph von Praun, Vijay Saraswat, and Vivek Sarkar. X10: An object-oriented approach to non-uniform cluster computing. In OOPSLA 2005 Onward! Track, 2005.

Digital Library

Google Scholar

[10]

Exascale Study Group and Peter Kogge et.al. Exascale computing study: Technology challenges in achieving exascale systems. Technical report, Sep 2008.

Google Scholar

[11]

A. Narang, A. Srivastava, Naga P.K. Katta, and R. K. Shyamasundar. Affinity driven distributed scheduling algorithm for parallel computations. In ICDCN, Bangalore, India, January 2011.

Digital Library

Google Scholar

[12]

Marc Tchiboukdjian, Nicolas Gast, Denis Trystram, Jean-Louis Roch, and Julien Bernard. A tighter analysis of work stealing. In ISAAC (2), pages 291--302, 2010.

Google Scholar

[13]

Katherine Yelick and Dan Bonachea et.al. Productivity and performance using partitioned global address space languages. In PASCO '07: Proceedings of the 2007 international workshop on Parallel symbolic computation, pages 24--32, New York, NY, USA, 2007. ACM.

Digital Library

Google Scholar

Cited By

View all

Narang ASrivastava AShyamasundar R(2013)High Performance Adaptive Distributed Scheduling AlgorithmProceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum10.1109/IPDPSW.2013.232(1725-1734)Online publication date: 20-May-2013
https://dl.acm.org/doi/10.1109/IPDPSW.2013.232

Index Terms

Performance driven multi-objective distributed scheduling for parallel computations
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
    2. Extra-functional properties
      1. Software performance

Recommendations

Performance driven distributed scheduling of parallel hybrid computations

Exascale computing is fast becoming a mainstream research area. In order to realize exascale performance, it is necessary to have efficient scheduling of large parallel computations with scalable performance on a large number of cores/processors. The ...
Performance study of matrix computations using multi-core programming tools
BCI '12: Proceedings of the Fifth Balkan Conference in Informatics

Basic matrix computations such as vector and matrix addition, dot product, outer product, matrix transpose, matrix - vector and matrix multiplication are very challenging computational kernels arising in scientific computing. In this paper, we ...
On the GPU-CPU Performance Portability of OpenCL for 3D Stencil Computations
ICPADS '13: Proceedings of the 2013 International Conference on Parallel and Distributed Systems

Although OpenCL programming provides full code portability between different hardware platforms, performance portability can be far from satisfactory. In this work, we use a set of representative 3D stencil computations to study OpenCL's performance ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review

ACM SIGOPS Operating Systems Review Volume 45, Issue 2

July 2011

58 pages

ISSN:0163-5980

DOI:10.1145/2007183

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2011

Published in SIGOPS Volume 45, Issue 2

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
219
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Narang ASrivastava AShyamasundar R(2013)High Performance Adaptive Distributed Scheduling AlgorithmProceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum10.1109/IPDPSW.2013.232(1725-1734)Online publication date: 20-May-2013
https://dl.acm.org/doi/10.1109/IPDPSW.2013.232

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Performance driven distributed scheduling of parallel hybrid computations

Performance study of matrix computations using multi-core programming tools

On the GPU-CPU Performance Portability of OpenCL for 3D Stencil Computations

Comments

Information

Published In

Publisher

Publication History

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations