research-article

Optimized approach based on time prediction and space chunking for polyhedron programs parallelization on multicores

Authors:

Yosr SlamaAuthors Info & Claims

HP3C: Proceedings of the 2nd International Conference on High Performance Compilation, Computing and Communications

Pages 22 - 26

https://doi.org/10.1145/3195612.3195615

Published: 15 March 2018 Publication History

Abstract

A complex challenge in parallel computing is cores load balancing aiming to minimize the parallel program overall execution time called makespan. As the performance of some parallel architectures such as multicores may vary during program execution, an effective mapping should support this unknown variation to avoid drawbacks on makespan. In fact, mapping or static load balancing method may not be effective when the target machine state changes during program execution. Thread affinity has appeared as an important technique to improve the program performance and for better stability.

In this context, we propose a predictive approach allowing parallel nested loops adaptation to processor's performance using iterations chunking at runtime. Our approach is based on thread pinning, space chunking and performance detection at runtime. Thus, parting from a parallel program, we define a first set of loop nest iterations called chunk. This first chunk is run using an initial mapping assuming homogeneous cores. Then, performance assessment will correct the mapping by predicting the future core's state. Then, this new mapping will be applied to a new chunk for further evaluation and prediction and so on. The process would stop when the program is fully run or when judging that chunking is no longer effective.

References

[1]

F. E. Sandnes and O. Sinnen. A New Strategy for Multiprocessor Scheduling of Cyclic Task Graphs. International Journal of High Performance Computing and Networking. 3(1):62--71. 2005.

Digital Library

[2]

R. Camposano. Path-Based Scheduling for Synthesis. IEEE Transactions on Computer-Aided Design. 10:85--93. 1991.

Digital Library

[3]

M. Rahmouniand A. A. Jerraya. Formulation and Evaluation of Scheduling Techniques for Control Flow Graphs. In Proceedings of the Design Automation Conference. 386--391. Los Alamitos, CA, USA. 1995.

Digital Library

[4]

M. Kaul, R. Vemuri, S. Govindarajan and I. Quaiss. An Automated Temporal Partitioning and Loop Fission Approach for FPGA Based Reconfigurable Synthesis of DSP Applications. In Proceedings of the Design Automation Conference, 616--622, 1999.

Digital Library

[5]

J. M. P. Cardoso, Loop Dissevering: A Technique for Temporally Partitioning Loops in Dynamically Reconfigurable Computing Platforms. In Proceedings of the International Parallel and Distributed Processing Symposium. 22--26. 2003.

Digital Library

[6]

P. Sucha, Z. Hanzalek, A. Hermanek, and J. Schier. Efficient FPGA Implementation of Equalizer for Finite Interval Constant Modulus Algorithm. In Proceedings of the International Symposium on Industrial Embedded Systems, 1--10. 2006.

[7]

M. Weinhardt and W. Luk. Pipeline Vectorization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 20(2):234--248. 2001.

Digital Library

[8]

T. Yang and C. Fu. Heuristic Algorithms for Scheduling Iterative Task Computations on Distributed Memory Machines. IEEE Transactions on Parallel and Distributed Systems. 8(6):608--622. 1997.

Digital Library

[9]

A.T. Chronopoulos, S. Penmatsa, J. Xu and S. Ali. Distributed loop- scheduling schemes for heterogeneous computer systems. Concurrency and Computation : Practice and Experience. 771--785. 2006.

Digital Library

[10]

B. Pradelle, P. Clauss, Vers la parallélisation dynamique dans le modèle polyédrique. ICPS/LSIIT - Université de Strasbourg, 2009.

[11]

C.P. Kruskal and A. Weiss. Allocating independent sub-tasks on parallel processors. IEEE Transactions on Software Engineering. (11/10):1001--1016. 1990.

Digital Library

[12]

D. R. Llanos, D. Orden and B. Palop. Meseta: A new scheduling strategy for speculative parallelization of randomized incremental algorithms. In Proc. 2005 ICPP Work-shops (HPSEC-05). 121-128. IEEE Press.2005.

Digital Library

[13]

D. R., Llanos, D. Orden, and B. Palop. Just-In-Time scheduling for loop-based speculative parallelization Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing PDP '08. 334--342. 2008.

Digital Library

[14]

C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy Or Discard Execution Model For Speculative Parallelization On Multicores. MICRO. IEEE Computer Society. 330--341. 2008.

Digital Library

[15]

T. Chen. Speculative Parallelization on Multicore Processors. Doctoral thesis. University of California. 2010.

[16]

S. Aravind. Beyond the Realm of the Polyhedral Model: Combining Speculative Program Parallelization with Polyhedral Compilation. Doctoral Thesis. Strasbourg University. 2015.

[17]

A. Mazouz, S.A.A. Touati and D. Barthou, Dynamic Thread Pinning for Phase-Based OpenMP Programs, Euro-Par 2013 Parallel Processing: 19th International Conference, Aachen, Germany, August 26-30, 2013.

Digital Library

Index Terms

Optimized approach based on time prediction and space chunking for polyhedron programs parallelization on multicores
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
      1. Shared memory algorithms

Recommendations

Programming Multicores: Do Applications Programmers Need to Write Explicitly Parallel Programs?

In this panel discussion from the 2009 Workshop on Computer Architecture Research Directions, David August and Keshav Pingali debate whether explicitly parallel programming is a necessary evil for applications programmers, assess the current state of ...
Synergistic execution of stream programs on multicores with accelerators
LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as ...
Run-Time Parallelization and Scheduling of Loops

The authors study run-time methods to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HP3C: Proceedings of the 2nd International Conference on High Performance Compilation, Computing and Communications

March 2018

123 pages

ISBN:9781450363372

DOI:10.1145/3195612

Conference Chair:
Steven Guan
Xi'an Jiaotong-Liverpool University, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 March 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HP3C 2018

HP3C 2018: 2018 2nd International Conference on High Performance Compilation, Computing and Communications

March 15 - 17, 2018

Hong Kong, Hong Kong

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
40
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten