Abstract
Parallel programming is known to be difficult and error-prone. Thread-based parallel execution has particular difficulties due to the tendency for the program to contain errors such as incorrect operation ordering, atomicity violation, and others. Worse yet, as many of such erroneous behaviors tend to be non-deterministic, the programmer is often unable to reproduce the exact event sequence that causes the program failure, which makes diagnosis difficult. In contrast, with process-based parallel execution, unintended data sharing can be avoided, thanks to the isolated address spaces among processes, which greatly simplifies the run-time program states, making it easier to reproduce and diagnose an error. Nonetheless, parallel loop execution on multicore has been dominated by parallel threads and thread-based language extensions and tools. This seems to be due to a long-held common wisdom that process-based parallel execution incurs much higher overhead. This paper reports experimental results that show the competitiveness of process-based parallel loop execution. Several benchmark programs of process-based parallel execution achieved speedups ranging from 6.73 to 20.24 on a 32 cores machine.
data:image/s3,"s3://crabby-images/14e8b/14e8baece4eade9ee58fa79a8bd1fee35cdfed78" alt=""
data:image/s3,"s3://crabby-images/15264/15264d9cde0ce71aabab8cb14630608926f91104" alt=""
data:image/s3,"s3://crabby-images/60e1b/60e1b8da0894bd24a8bf5f2be75ba77a7d92f64e" alt=""
data:image/s3,"s3://crabby-images/6b7d0/6b7d0c7ed25860a71edc22746004a9b8fe2b3217" alt=""
data:image/s3,"s3://crabby-images/5f512/5f51209599fad27377ee86c4ce1ebeb6aedbe09c" alt=""
data:image/s3,"s3://crabby-images/56bb9/56bb9ccc09c5b98617d3c0f26002c02d18257c10" alt=""
Similar content being viewed by others
References
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. SIGPLAN Not. 33(5), 212–223 (1998)
Ding, C., Shen, X., Kelsey, K., Tice, C., Huang, R., Zhang, C.: Software behavior oriented parallelization. SIGPLAN Not. 42, 223–234 (2007)
Johnson, N.P., Kim, H., Prabhu, P., Zaks, A., August, D.I.: Speculative separation for privatization and reductions. SIGPLAN Not. 47, 359–370 (2012)
Feng, M., Gupta, R., Hu, Y.: SpiceC: scalable parallelism via implicit copying and explicit commit. In: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming (PPoPP ’11), pp. 69–80. ACM, New York, NY, USA (2011)
Yu, H., Ko, H.-J., Li, Z.: General data structure expansion for multi-threading. SIGPLAN Not. 48(6), 243–252 (2013)
Fang, Z., Tang, P., Yew, P.-C., Zhu, C.-Q.: Dynamic processor self-scheduling for general parallel nested loops. IEEE Trans. Comput. 39(7), 919–929 (1990)
Feng, M., Gupta, R., Neamtiu, I.: Effective parallelization of loops in the presence of I/O operations. SIGPLAN Not. 47(6), 487–498 (2012)
Stevens, W.R., Rago, S.A.: Advanced programming in the UNIX environment, 2nd edn. Addison-Wesley, ISBN 0-201-43307-9 (2005)
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: Proceedings of the 2009 international conference on parallel processing (ICPP ’09), pp.124–131 (2009)
Bienia, C., Kumar, S., Singh, J.P., Li, K.: Technical Report TR-811-08. Princeton University, January (2008)
UPC Language Specifications, v1.2. UPC Consortium Lawrence Berkeley National Laboratory Technical Report LBNL-59208 (2005)
Ke, C., Liu, L., Zhang, C., Bai, T., Jacobs, B., Ding, C.: Safe parallel programming using dynamic dependence hints. SIGPLAN Not. 46, 243–258 (2011)
Berger, E.D., Yang, T., Liu, T., Novark, G.: Grace: safe multithreaded programming for C/C++. SIGPLAN Not. 44(10), 81–96 (2009)
Cytron, R.: Doacross: beyond vectorization for multiprocessors. In: International conference on parallel processing, (ICPP1986), pp.836–844. ACM, New York, NY, USA (1986)
Acknowledgments
Our thanks go to Lei Liu and Shuangde Fang for their suggestions on the earlier versions of the paper. This work is supported in part by the National High Technology Research and Development Program of China (2012AA010902), the National Natural Science Foundation of China under the Grant (61432018), the Innovation Research Group of NSFC (61221062), and by the National Science Foundation (CNS-0915414).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lu, X., Chen, L. & Li, Z. Performance Evaluation and Enhancement of Process-Based Parallel Loop Execution. Int J Parallel Prog 45, 185–198 (2017). https://doi.org/10.1007/s10766-015-0394-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-015-0394-1