Performance Evaluation and Enhancement of Process-Based Parallel Loop Execution

Lu, Xingjing; Chen, Long; Li, Zhiyuan

doi:10.1007/s10766-015-0394-1

Performance Evaluation and Enhancement of Process-Based Parallel Loop Execution

Published: 06 October 2015

Volume 45, pages 185–198, (2017)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Xingjing Lu¹,
Long Chen² &
Zhiyuan Li³

302 Accesses
Explore all metrics

Abstract

Parallel programming is known to be difficult and error-prone. Thread-based parallel execution has particular difficulties due to the tendency for the program to contain errors such as incorrect operation ordering, atomicity violation, and others. Worse yet, as many of such erroneous behaviors tend to be non-deterministic, the programmer is often unable to reproduce the exact event sequence that causes the program failure, which makes diagnosis difficult. In contrast, with process-based parallel execution, unintended data sharing can be avoided, thanks to the isolated address spaces among processes, which greatly simplifies the run-time program states, making it easier to reproduce and diagnose an error. Nonetheless, parallel loop execution on multicore has been dominated by parallel threads and thread-based language extensions and tools. This seems to be due to a long-held common wisdom that process-based parallel execution incurs much higher overhead. This paper reports experimental results that show the competitiveness of process-based parallel loop execution. Several benchmark programs of process-based parallel execution achieved speedups ranging from 6.73 to 20.24 on a 32 cores machine.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new thread-level speculative automatic parallelization model and library based on duplicate code execution

Article Open access 11 March 2024

Millán A. Martínez, Basilio B. Fraguela, … Francisco F. Rivera

DiscoPoP: A Profiling Tool to Identify Parallelization Opportunities

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

Article 16 April 2016

Alcides Fonseca, Bruno Cabral, … Ivo Correia

References

http://www.mcs.anl.gov/research/projects/mpi
http://openmp.org/wp/
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. SIGPLAN Not. 33(5), 212–223 (1998)
Article Google Scholar
http://threadingbuildingblocks.org/
Ding, C., Shen, X., Kelsey, K., Tice, C., Huang, R., Zhang, C.: Software behavior oriented parallelization. SIGPLAN Not. 42, 223–234 (2007)
Article Google Scholar
Johnson, N.P., Kim, H., Prabhu, P., Zaks, A., August, D.I.: Speculative separation for privatization and reductions. SIGPLAN Not. 47, 359–370 (2012)
Article Google Scholar
Feng, M., Gupta, R., Hu, Y.: SpiceC: scalable parallelism via implicit copying and explicit commit. In: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming (PPoPP ’11), pp. 69–80. ACM, New York, NY, USA (2011)
Yu, H., Ko, H.-J., Li, Z.: General data structure expansion for multi-threading. SIGPLAN Not. 48(6), 243–252 (2013)
Article Google Scholar
Fang, Z., Tang, P., Yew, P.-C., Zhu, C.-Q.: Dynamic processor self-scheduling for general parallel nested loops. IEEE Trans. Comput. 39(7), 919–929 (1990)
http://www.open64.net/
Feng, M., Gupta, R., Neamtiu, I.: Effective parallelization of loops in the presence of I/O operations. SIGPLAN Not. 47(6), 487–498 (2012)
Article Google Scholar
Stevens, W.R., Rago, S.A.: Advanced programming in the UNIX environment, 2nd edn. Addison-Wesley, ISBN 0-201-43307-9 (2005)
http://www.spec.org/cpu2006/
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: Proceedings of the 2009 international conference on parallel processing (ICPP ’09), pp.124–131 (2009)
Bienia, C., Kumar, S., Singh, J.P., Li, K.: Technical Report TR-811-08. Princeton University, January (2008)
http://icl.cs.utk.edu/papi/
UPC Language Specifications, v1.2. UPC Consortium Lawrence Berkeley National Laboratory Technical Report LBNL-59208 (2005)
Ke, C., Liu, L., Zhang, C., Bai, T., Jacobs, B., Ding, C.: Safe parallel programming using dynamic dependence hints. SIGPLAN Not. 46, 243–258 (2011)
Article Google Scholar
Berger, E.D., Yang, T., Liu, T., Novark, G.: Grace: safe multithreaded programming for C/C++. SIGPLAN Not. 44(10), 81–96 (2009)
Article Google Scholar
Cytron, R.: Doacross: beyond vectorization for multiprocessors. In: International conference on parallel processing, (ICPP1986), pp.836–844. ACM, New York, NY, USA (1986)

Download references

Acknowledgments

Our thanks go to Lei Liu and Shuangde Fang for their suggestions on the earlier versions of the paper. This work is supported in part by the National High Technology Research and Development Program of China (2012AA010902), the National Natural Science Foundation of China under the Grant (61432018), the Innovation Research Group of NSFC (61221062), and by the National Science Foundation (CNS-0915414).

Author information

Authors and Affiliations

SKL Computer Architecture, Institute of Computing Technology, CAS, Beijing, China
Xingjing Lu
Huawei Technologies Co, Ltd, Shenzhen, China
Long Chen
Department of Computer Science, Purdue University, West Lafayette, IN, USA
Zhiyuan Li

Authors

Xingjing Lu
View author publications
You can also search for this author in PubMed Google Scholar
Long Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyuan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xingjing Lu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, X., Chen, L. & Li, Z. Performance Evaluation and Enhancement of Process-Based Parallel Loop Execution. Int J Parallel Prog 45, 185–198 (2017). https://doi.org/10.1007/s10766-015-0394-1

Download citation

Received: 30 March 2015
Accepted: 20 May 2015
Published: 06 October 2015
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10766-015-0394-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance Evaluation and Enhancement of Process-Based Parallel Loop Execution

Abstract

Access this article

Similar content being viewed by others

A new thread-level speculative automatic parallelization model and library based on duplicate code execution

DiscoPoP: A Profiling Tool to Identify Parallelization Opportunities

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance Evaluation and Enhancement of Process-Based Parallel Loop Execution

Abstract

Access this article

Similar content being viewed by others

A new thread-level speculative automatic parallelization model and library based on duplicate code execution

DiscoPoP: A Profiling Tool to Identify Parallelization Opportunities

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation