skip to main content
10.1145/2723772.2723778acmotherconferencesArticle/Chapter ViewAbstractPublication PagescosmicConference Proceedingsconference-collections
research-article

The Basic Building Blocks of Parallel Tasks

Published: 08 February 2015 Publication History

Abstract

Discovery of parallelization opportunities in sequential programs can greatly reduce the time and effort required to parallelize any application. Identification and analysis of code that contains little to no internal parallelism can also help expose potential parallelism. This paper provides a technique to identify a block of code called Computational Unit (CU) that performs a unit of work in a program. A CU can assist in discovering the potential parallelism in a sequential program by acting as a basic building block for tasks. CUs are used along with dynamic analysis information to identify the tasks that contain tightly coupled code within them. This process in turn reveals the tasks that are weakly dependent or independent. The independent tasks can be run in parallel and the dependent tasks can be analyzed to check if the dependences can be resolved. To evaluate our technique, different benchmark applications are parallelized using our identified tasks and the speedups are reported. In addition, existing parallel implementations of the applications are compared with the identified tasks for the respective applications.

References

[1]
Llvm language reference manual.
[2]
M. Andersch, B. Juurlink, and C. C. Chi. A benchmark suite for evaluating parallel programming models. In Proceedings 24th Workshop on Parallel Systems and Algorithms, 2011.
[3]
C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011.
[4]
J. Ceng, J. Castrillon, W. Sheng, H. Scharwächter, R. Leupers, G. Ascheid, H. Meyr, T. Isshiki, and H. Kunieda. Maps: An integrated framework for mpsoc application parallelization. In Proceedings of the 45th Annual Design Automation Conference, DAC '08, pages 754--759, 2008.
[5]
I. Foster. Task parallelism and high-performance languages. IEEE Parallel and Distributed Technology, 2:27--36, 1994.
[6]
S. Garcia, D. Jeon, C. M. Louie, and M. B. Taylor. Kremlin: Rethinking and rebooting gprof for the multicore age. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 458--469, New York, NY, USA, 2011. ACM.
[7]
T. Grosser, A. Groesslinger, and C. Lengauer. Polly âĂŤ performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters, 22(04):1250010, 2012.
[8]
T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, and R. Tibshirani. The elements of statistical learning, volume 2. Springer, 2009.
[9]
A. Ketterlin and P. Clauss. Profiling data-dependence to assist parallelization: Framework, scope, and optimization. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 45, pages 437--448, Washington, DC, USA, 2012. IEEE Computer Society.
[10]
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis and transformation. pages 75--88, San Jose, CA, USA, Mar 2004.
[11]
C. Lauderdale and R. Khan. Towards a codelet-based runtime for exascale computing: Position paper. In Proceedings of the 2Nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, EXADAPT '12, pages 21--26, New York, NY, USA, 2012. ACM.
[12]
Z. Li, R. Atre, Z. Ul-Huda, A. Jannesari, and F. Wolf. Discopop: A profiling tool to identify parallelization opportunities. In Tools for High Performance Computing 2014, pages 1--10. Springer International Publishing, 2015 (to appear).
[13]
Z. Li, A. Jannesari, and F. Wolf. Discovery of potential parallelism in sequential programs. In Proceedings of the 42nd International Conference on Parallel Processing, PSTI '13, pages 1004--1013, Washington, DC, USA, 2013. IEEE Computer Society.
[14]
Z. Li, A. Jannesari, and F. Wolf. Discovering parallelization opportunities in sequential programs âĂŤ a closer-to-complete solution. In Proceedings of International Workshop on Software Engineering for Parallel Systems, SEPS '14, pages 1--10, Portland, OR, USA, 2014.
[15]
N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not., 42(6):89--100, June 2007.
[16]
G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 38, pages 105--118, 2005.
[17]
C. Pheatt. Intel® threading building blocks. J. Comput. Sci. Coll., 23(4):298--298, Apr. 2008.
[18]
R. Rivest. The md5 message-digest algorithm. 1992.
[19]
Z. Sura, K. O'Brien, and J. Brunheroto. Using multiple threads to accelerate single thread performance. In Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS '14, pages 985--994, Washington, DC, USA, 2014. IEEE Computer Society.
[20]
U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395--416, 2007.
[21]
X. Zhang, A. Navabi, and S. Jagannathan. Alchemist: A transparent dependence distance profiling infrastructure. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '09, pages 47--58, Washington, DC, USA, 2009. IEEE Computer Society.

Cited By

View all
  • (2018)Advances in Engineering Software for Multicore SystemsDependability Engineering10.5772/intechopen.72784Online publication date: 6-Jun-2018
  • (2018)Dissecting sequential programs for parallelization—An approach based on computational unitsConcurrency and Computation: Practice and Experience10.1002/cpe.477031:5Online publication date: 29-Jun-2018
  • (2017)Brief AnnouncementProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087592(363-365)Online publication date: 24-Jul-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
COSMIC '15: Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores
February 2015
74 pages
ISBN:9781450333160
DOI:10.1145/2723772
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

COSMIC '15

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Advances in Engineering Software for Multicore SystemsDependability Engineering10.5772/intechopen.72784Online publication date: 6-Jun-2018
  • (2018)Dissecting sequential programs for parallelization—An approach based on computational unitsConcurrency and Computation: Practice and Experience10.1002/cpe.477031:5Online publication date: 29-Jun-2018
  • (2017)Brief AnnouncementProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087592(363-365)Online publication date: 24-Jul-2017
  • (2016)Automatic Parallel Pattern Detection in the Algorithm Structure Design Space2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2016.60(43-52)Online publication date: May-2016
  • (2015)Fast Data-Dependence Profiling by Skipping Repeatedly Executed Memory OperationsAlgorithms and Architectures for Parallel Processing10.1007/978-3-319-27140-8_40(583-596)Online publication date: 16-Dec-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media