research-article

The Basic Building Blocks of Parallel Tasks

Authors:

Felix WolfAuthors Info & Claims

COSMIC '15: Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores

Article No.: 3, Pages 1 - 11

https://doi.org/10.1145/2723772.2723778

Published: 08 February 2015 Publication History

Abstract

Discovery of parallelization opportunities in sequential programs can greatly reduce the time and effort required to parallelize any application. Identification and analysis of code that contains little to no internal parallelism can also help expose potential parallelism. This paper provides a technique to identify a block of code called Computational Unit (CU) that performs a unit of work in a program. A CU can assist in discovering the potential parallelism in a sequential program by acting as a basic building block for tasks. CUs are used along with dynamic analysis information to identify the tasks that contain tightly coupled code within them. This process in turn reveals the tasks that are weakly dependent or independent. The independent tasks can be run in parallel and the dependent tasks can be analyzed to check if the dependences can be resolved. To evaluate our technique, different benchmark applications are parallelized using our identified tasks and the speedups are reported. In addition, existing parallel implementations of the applications are compared with the identified tasks for the respective applications.

References

[1]

Llvm language reference manual.

[2]

M. Andersch, B. Juurlink, and C. C. Chi. A benchmark suite for evaluating parallel programming models. In Proceedings 24th Workshop on Parallel Systems and Algorithms, 2011.

[3]

C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011.

Digital Library

[4]

J. Ceng, J. Castrillon, W. Sheng, H. Scharwächter, R. Leupers, G. Ascheid, H. Meyr, T. Isshiki, and H. Kunieda. Maps: An integrated framework for mpsoc application parallelization. In Proceedings of the 45th Annual Design Automation Conference, DAC '08, pages 754--759, 2008.

Digital Library

[5]

I. Foster. Task parallelism and high-performance languages. IEEE Parallel and Distributed Technology, 2:27--36, 1994.

Digital Library

[6]

S. Garcia, D. Jeon, C. M. Louie, and M. B. Taylor. Kremlin: Rethinking and rebooting gprof for the multicore age. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 458--469, New York, NY, USA, 2011. ACM.

Digital Library

[7]

T. Grosser, A. Groesslinger, and C. Lengauer. Polly âĂŤ performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters, 22(04):1250010, 2012.

[8]

T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, and R. Tibshirani. The elements of statistical learning, volume 2. Springer, 2009.

[9]

A. Ketterlin and P. Clauss. Profiling data-dependence to assist parallelization: Framework, scope, and optimization. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 45, pages 437--448, Washington, DC, USA, 2012. IEEE Computer Society.

Digital Library

[10]

C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis and transformation. pages 75--88, San Jose, CA, USA, Mar 2004.

Digital Library

[11]

C. Lauderdale and R. Khan. Towards a codelet-based runtime for exascale computing: Position paper. In Proceedings of the 2Nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, EXADAPT '12, pages 21--26, New York, NY, USA, 2012. ACM.

Digital Library

[12]

Z. Li, R. Atre, Z. Ul-Huda, A. Jannesari, and F. Wolf. Discopop: A profiling tool to identify parallelization opportunities. In Tools for High Performance Computing 2014, pages 1--10. Springer International Publishing, 2015 (to appear).

[13]

Z. Li, A. Jannesari, and F. Wolf. Discovery of potential parallelism in sequential programs. In Proceedings of the 42nd International Conference on Parallel Processing, PSTI '13, pages 1004--1013, Washington, DC, USA, 2013. IEEE Computer Society.

Digital Library

[14]

Z. Li, A. Jannesari, and F. Wolf. Discovering parallelization opportunities in sequential programs âĂŤ a closer-to-complete solution. In Proceedings of International Workshop on Software Engineering for Parallel Systems, SEPS '14, pages 1--10, Portland, OR, USA, 2014.

[15]

N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not., 42(6):89--100, June 2007.

Digital Library

[16]

G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 38, pages 105--118, 2005.

Digital Library

[17]

C. Pheatt. Intel® threading building blocks. J. Comput. Sci. Coll., 23(4):298--298, Apr. 2008.

Digital Library

[18]

R. Rivest. The md5 message-digest algorithm. 1992.

[19]

Z. Sura, K. O'Brien, and J. Brunheroto. Using multiple threads to accelerate single thread performance. In Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS '14, pages 985--994, Washington, DC, USA, 2014. IEEE Computer Society.

Digital Library

[20]

U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395--416, 2007.

Digital Library

[21]

X. Zhang, A. Navabi, and S. Jagannathan. Alchemist: A transparent dependence distance profiling infrastructure. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '09, pages 47--58, Washington, DC, USA, 2009. IEEE Computer Society.

Digital Library

Cited By

Jannesari A(2018)Advances in Engineering Software for Multicore SystemsDependability Engineering10.5772/intechopen.72784Online publication date: 6-Jun-2018
https://doi.org/10.5772/intechopen.72784
Atre RUl‐Huda ZWolf FJannesari A(2018)Dissecting sequential programs for parallelization—An approach based on computational unitsConcurrency and Computation: Practice and Experience10.1002/cpe.477031:5Online publication date: 29-Jun-2018
https://doi.org/10.1002/cpe.4770
Atre RJannesari AWolf FScheideler CHajiaghayi M(2017)Brief AnnouncementProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087592(363-365)Online publication date: 24-Jul-2017
https://dl.acm.org/doi/10.1145/3087556.3087592
Show More Cited By

Index Terms

The Basic Building Blocks of Parallel Tasks
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
2. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

Communicating Data-Parallel Tasks: An MPI Library for HPF
HIPC '96: Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)

High Performance Fortran (HPF) has emerged as a standard dialect of Fortran for data-parallel computing. However, HPF does not support task parallelism or heterogeneous computing adequately. This paper presents a summary of our work on a library-based ...
Scalable computing with parallel tasks
MTAGS '09: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers

Recent and future parallel clusters and supercomputers use SMPs and multi-core processors as basic nodes, providing a huge amount of parallel resources. These systems often have hierarchically structured interconnection networks combining computing ...
Improving the Task Stealing in Intel Threading Building Blocks
CYBERC '11: Proceedings of the 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery

The Intel Threading Building Blocks (TBB)is a run-time library[1] for parallel programming based on C++. The TBB programming environment facilitates programmers to express concurrency in terms of parallel tasks rather than parallel threads. TBB provide ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

COSMIC '15: Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores

February 2015

74 pages

ISBN:9781450333160

DOI:10.1145/2723772

Program Chairs:
Zheng Wang
Lancaster University
,
Pavlos Petoumenos
The University of Edinburgh
,
Hugh Leather
The University of Edinburgh

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

COSMIC '15

COSMIC '15: International Workshop on Code Optimisation for Multi and Many Cores

February 8, 2015

CA, San Francisco Bay Area, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
172
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jannesari A(2018)Advances in Engineering Software for Multicore SystemsDependability Engineering10.5772/intechopen.72784Online publication date: 6-Jun-2018
https://doi.org/10.5772/intechopen.72784
Atre RUl‐Huda ZWolf FJannesari A(2018)Dissecting sequential programs for parallelization—An approach based on computational unitsConcurrency and Computation: Practice and Experience10.1002/cpe.477031:5Online publication date: 29-Jun-2018
https://doi.org/10.1002/cpe.4770
Atre RJannesari AWolf FScheideler CHajiaghayi M(2017)Brief AnnouncementProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087592(363-365)Online publication date: 24-Jul-2017
https://dl.acm.org/doi/10.1145/3087556.3087592
Huda ZAtre RJannesari AWolf F(2016)Automatic Parallel Pattern Detection in the Algorithm Structure Design Space2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2016.60(43-52)Online publication date: May-2016
https://doi.org/10.1109/IPDPS.2016.60
Li ZBeaumont MJannesari AWolf F(2015)Fast Data-Dependence Profiling by Skipping Repeatedly Executed Memory OperationsAlgorithms and Architectures for Parallel Processing10.1007/978-3-319-27140-8_40(583-596)Online publication date: 16-Dec-2015
https://doi.org/10.1007/978-3-319-27140-8_40

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten