research-article

General data structure expansion for multi-threading

Authors:

Zhiyuan LiAuthors Info & Claims

PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 243 - 252

https://doi.org/10.1145/2491956.2462182

Published: 16 June 2013 Publication History

Abstract

Among techniques for parallelizing sequential codes, privatization is a common and significant transformation performed by both compilers and runtime parallelizing systems. Without privatization, repetitive updates to the same data structures often introduce spurious data dependencies that hide the inherent parallelism. Unfortunately, it remains a significant challenge to compilers to automatically privatize dynamic and recursive data structures which appear frequently in real applications written in languages such as C/C++. This is because such languages lack a naming mechanism to define the address range of a pointer-based data structure, in contrast to arrays with explicitly declared bounds. In this paper we present a novel solution to this difficult problem by expanding general data structures such that memory accesses issued from different threads to contentious data structures are directed to different data fields. Based on compile-time type checking and a data dependence graph, this aggressive extension to the traditional scalar and array expansion isolates the address ranges among different threads, without struggling with privatization based on thread-private stacks, such that the targeted loop can be effectively parallelized. With this method fully implemented in GCC, experiments are conducted on a set of programs from well-known benchmark suites such as Mibench, MediaBench II and SPECint. Results show that the new approach can lead to a high speedup when executing the transformed code on multiple cores.

References

[1]

http://http://www.spec.org/cpu/.

[2]

http://gcc.gnu.org/projects/gomp/.

[3]

http://software.intel.com/en-us/intel-compilers/.

[4]

M. G. Burke, R. Cytron, J. Ferrante, and W. C. Hsieh. Automatic generation of nested, fork-join parallelism. The Journal of Supercomputing, pages 71--88, 1989.

[5]

R. Cytron and J. Ferrante. What's in a name? -or- the value of renaming for parallelism detection and storage allocation. In ICPP'87, pages 19--27, 1987.

[6]

L. Dagum and R. Menon. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng., 5(1):46--55, Jan. 1998.

Digital Library

[7]

F. Dang, H. Yu, and L. Rauchwerger. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings of the 16th International Symposium on Parallel and Distributed Processing, IPDPS'02, pages 20--, 2002.

Digital Library

[8]

A. de Melo. The new linuxperftools. In Slides from Linux Kongress, 2010.

[9]

C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'07, pages 223--234, 2007.

Digital Library

[10]

R. Eigenmann, J. Hoeflinger, Z. Li, and D. A. Padua. Experience in the automatic parallelization of four perfect-benchmark programs. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing, LCPC'92, pages 65--83, 1992.

Digital Library

[11]

P. Feautrier. Array expansion. In Proceedings of the 2nd International Conference on Supercomputing, ICS'88, pages 429--441, 1988.

Digital Library

[12]

M. Feng, R. Gupta, and Y. Hu. SpiceC: scalable parallelism via implicit copying and explicit commit. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP'11, pages 69--80, 2011.

Digital Library

[13]

M. Feng, R. Gupta, and I. Neamtiu. Effective parallelization of loops in the presence of I/O operations. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'12, pages 487--498, 2012.

Digital Library

[14]

J. E. Fritts, F. W. Steiling, J. A. Tucek, and W. Wolf. Mediabench II video: Expediting the next generation of video systems research. Microprocess. Microsyst., 33(4):301--318, June 2009.

Digital Library

[15]

J. Gu, Z. Li, and G. Lee. Experience with efficient array data flow analysis for array privatization. In Proceedings of the 6th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP'97, pages 157--167, 1997.

Digital Library

[16]

M. Gupta. On privatization of variables for data-parallel execution. In Proceedings of the 11th International Symposium on Parallel Processing, IPPS'97, pages 533--541, 1997.

Digital Library

[17]

M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, WWC'01, pages 3--14, 2001.

Digital Library

[18]

M. H. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. In Proceedings of the 1995 ACM/IEEE Conference on Supercomputing (CDROM), Supercomputing'95, 1995.

Digital Library

[19]

N. P. Johnson, H. Kim, P. Prabhu, A. Zaks, and D. I. August. Speculative separation for privatization and reductions. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'12, pages 359--370, 2012.

Digital Library

[20]

K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast track: A software system for speculative program optimization. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO'09, pages 157--168, 2009.

Digital Library

[21]

M. Kim, H. Kim, and C.-K. Luk. SD3: A scalable approach to dynamic data-dependence profiling. In phProceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO'43, pages 535--546, 2010.

Digital Library

[22]

Z. Li. Array privatization for parallel execution of loops. In Proceedings of the 6th International Conference on Supercomputing, ICS'92, pages 313--322, 1992.

Digital Library

[23]

D. E. Maydan, S. P. Amarasinghe, and M. S. Lam. Array-data flow analysis and its use in array privatization. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL'93, pages 2--15, 1993.

Digital Library

[24]

M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'09, pages 166--176, 2009.

Digital Library

[25]

D. A. Padua and M. J. Wolfe. Advanced compiler optimizations for supercomputers. Commun. ACM, 29(12):1184--1201, Dec. 1986.

Digital Library

[26]

P. Prabhu, S. Ghosh, Y. Zhang, N. P. Johnson, and D. I. August. Commutative set: a language extension for implicit parallel programming. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'11, pages 1--11, 2011.

Digital Library

[27]

L. Rauchwerger, N. M. Amato, and D. A. Padua. A scalable method for run-time loop parallelization. Int. J. Parallel Program., 23(6):537--576, Dec. 1995.

Digital Library

[28]

L. Rauchwerger and D. Padua. The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization. In Proceedings of the 8th International Conference on Supercomputing, ICS'94, pages 33--43, 1994.

Digital Library

[29]

L. Rauchwerger and D. Padua. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the 1995 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'95, pages 218--232, 1995.

Digital Library

[30]

S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. In Proceedings of the 16th International Conference on Supercomputing, ICS'02, pages 274--284, 2002.

Digital Library

[31]

W. Thies, V. Chandrasekhar, and S. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In phProceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO'40, pages 356--369, 2007.

Digital Library

[32]

C. Tian, M. Feng, and R. Gupta. Speculative parallelization using state separation and multiple value prediction. In Proceedings of the 2010 International Symposium on Memory Management, ISMM'10, pages 63--72, 2010.

Digital Library

[33]

C. Tian, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'10, pages 62--73, 2010.

Digital Library

[34]

C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO'41, pages 330--341, 2008.

Digital Library

[35]

G. Tournavitis, Z. Wang, B. Franke, and M. F. O'Boyle. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'09, pages 177--187, 2009.

Digital Library

[36]

P. Tu and D. A. Padua. Automatic array privatization. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, LCPC'94, pages 500--521, 1994.

Digital Library

[37]

H. Vandierendonck, S. Rul, and K. De Bosschere. The Paralax infrastructure: automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT'10, pages 389--400, 2010.

Digital Library

[38]

H. Yu and Z. Li. Fast loop-level data dependence profiling. In Proceedings of the 26th ACM International Conference on Supercomputing, ICS'12, pages 37--46, 2012.

Digital Library

[39]

H. Yu and Z. Li. Multi-slicing: a compiler-supported parallel approach to data dependence profiling. In Proceedings of the 2012 International Symposium on Software Testing and Analysis, ISSTA'12, pages 23--33, 2012.

Digital Library

[40]

X. Zhang, A. Navabi, and S. Jagannathan. Alchemist: A transparent dependence distance profiling infrastructure. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO'09, pages 47--58, 2009.

Digital Library

Cited By

Balaji VTirumala DLucia B(2017)POSTERACM SIGPLAN Notices10.1145/3155284.301903052:8(431-432)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3019030
Li PHu XChen DBrock JLuo HZhang EDing C(2017)LDACM Transactions on Architecture and Code Optimization10.1145/304667814:1(1-25)Online publication date: 21-Mar-2017
https://dl.acm.org/doi/10.1145/3046678
Balaji VTirumala DLucia BSarkar VRauchwerger L(2017)POSTERProceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3018743.3019030(431-432)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3018743.3019030
Show More Cited By

Index Terms

General data structure expansion for multi-threading
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

General data structure expansion for multi-threading
PLDI '13

Among techniques for parallelizing sequential codes, privatization is a common and significant transformation performed by both compilers and runtime parallelizing systems. Without privatization, repetitive updates to the same data structures often ...
Quantifying the Benefits of SPECint Distant Parallelism in Simultaneous Multi-Threading Architectures
PACT '99: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques

In this paper we exploit the existence of distant parallelism that future compilers could detect and characterise its performance under simultaneous multithreading architectures. By distant parallelism we mean parallelism that can not be captured by the ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2013

546 pages

ISBN:9781450320146

DOI:10.1145/2491956

General Chair:
Hans-J. Boehm
HP Labs
,
Program Chair:
Cormac Flanagan
University of California, Santa Cruz

ACM SIGPLAN Notices Volume 48, Issue 6
PLDI '13
June 2013
515 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2499370
Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '13

Sponsor:

SIGPLAN

PLDI '13: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 16 - 19, 2013

Washington, Seattle, USA

Acceptance Rates

PLDI '13 Paper Acceptance Rate 46 of 267 submissions, 17%;

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
529
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Balaji VTirumala DLucia B(2017)POSTERACM SIGPLAN Notices10.1145/3155284.301903052:8(431-432)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3019030
Li PHu XChen DBrock JLuo HZhang EDing C(2017)LDACM Transactions on Architecture and Code Optimization10.1145/304667814:1(1-25)Online publication date: 21-Mar-2017
https://dl.acm.org/doi/10.1145/3046678
Balaji VTirumala DLucia BSarkar VRauchwerger L(2017)POSTERProceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3018743.3019030(431-432)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3018743.3019030
Oh TBeard SJohnson NPopovych SAugust D(2017)A Generalized Framework for Automatic Scripting Language Parallelization2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2017.28(356-369)Online publication date: Sep-2017
https://doi.org/10.1109/PACT.2017.28
Yu HLi GShu L(2016)$${\mathrm{DS}}_{\mathrm{spirit}}$$DSspiritThe Journal of Supercomputing10.1007/s11227-015-1612-872:2(770-788)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1007/s11227-015-1612-8
Amsaad FAl-Eidi SDarwish O(2023)Comparative Analysis of Sequential and Parallel Computing for Object Detection Using Deep Learning Model2023 24th International Arab Conference on Information Technology (ACIT)10.1109/ACIT58888.2023.10453894(1-5)Online publication date: 6-Dec-2023
https://doi.org/10.1109/ACIT58888.2023.10453894
Lu XChen LLi Z(2018)Performance Evaluation and Enhancement of Process-Based Parallel Loop ExecutionInternational Journal of Parallel Programming10.1007/s10766-015-0394-145:1(185-198)Online publication date: 28-Dec-2018
https://dl.acm.org/doi/10.1007/s10766-015-0394-1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten