Skip to main content
Log in

Performance Evaluation and Enhancement of Process-Based Parallel Loop Execution

International Journal of Parallel Programming Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Parallel programming is known to be difficult and error-prone. Thread-based parallel execution has particular difficulties due to the tendency for the program to contain errors such as incorrect operation ordering, atomicity violation, and others. Worse yet, as many of such erroneous behaviors tend to be non-deterministic, the programmer is often unable to reproduce the exact event sequence that causes the program failure, which makes diagnosis difficult. In contrast, with process-based parallel execution, unintended data sharing can be avoided, thanks to the isolated address spaces among processes, which greatly simplifies the run-time program states, making it easier to reproduce and diagnose an error. Nonetheless, parallel loop execution on multicore has been dominated by parallel threads and thread-based language extensions and tools. This seems to be due to a long-held common wisdom that process-based parallel execution incurs much higher overhead. This paper reports experimental results that show the competitiveness of process-based parallel loop execution. Several benchmark programs of process-based parallel execution achieved speedups ranging from 6.73 to 20.24 on a 32 cores machine.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. http://www.mcs.anl.gov/research/projects/mpi

  2. http://openmp.org/wp/

  3. Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. SIGPLAN Not. 33(5), 212–223 (1998)

    Article  Google Scholar 

  4. http://threadingbuildingblocks.org/

  5. Ding, C., Shen, X., Kelsey, K., Tice, C., Huang, R., Zhang, C.: Software behavior oriented parallelization. SIGPLAN Not. 42, 223–234 (2007)

    Article  Google Scholar 

  6. Johnson, N.P., Kim, H., Prabhu, P., Zaks, A., August, D.I.: Speculative separation for privatization and reductions. SIGPLAN Not. 47, 359–370 (2012)

    Article  Google Scholar 

  7. Feng, M., Gupta, R., Hu, Y.: SpiceC: scalable parallelism via implicit copying and explicit commit. In: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming (PPoPP ’11), pp. 69–80. ACM, New York, NY, USA (2011)

  8. Yu, H., Ko, H.-J., Li, Z.: General data structure expansion for multi-threading. SIGPLAN Not. 48(6), 243–252 (2013)

    Article  Google Scholar 

  9. Fang, Z., Tang, P., Yew, P.-C., Zhu, C.-Q.: Dynamic processor self-scheduling for general parallel nested loops. IEEE Trans. Comput. 39(7), 919–929 (1990)

  10. http://www.open64.net/

  11. Feng, M., Gupta, R., Neamtiu, I.: Effective parallelization of loops in the presence of I/O operations. SIGPLAN Not. 47(6), 487–498 (2012)

    Article  Google Scholar 

  12. Stevens, W.R., Rago, S.A.: Advanced programming in the UNIX environment, 2nd edn. Addison-Wesley, ISBN 0-201-43307-9 (2005)

  13. http://www.spec.org/cpu2006/

  14. Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: Proceedings of the 2009 international conference on parallel processing (ICPP ’09), pp.124–131 (2009)

  15. Bienia, C., Kumar, S., Singh, J.P., Li, K.: Technical Report TR-811-08. Princeton University, January (2008)

  16. http://icl.cs.utk.edu/papi/

  17. UPC Language Specifications, v1.2. UPC Consortium Lawrence Berkeley National Laboratory Technical Report LBNL-59208 (2005)

  18. Ke, C., Liu, L., Zhang, C., Bai, T., Jacobs, B., Ding, C.: Safe parallel programming using dynamic dependence hints. SIGPLAN Not. 46, 243–258 (2011)

    Article  Google Scholar 

  19. Berger, E.D., Yang, T., Liu, T., Novark, G.: Grace: safe multithreaded programming for C/C++. SIGPLAN Not. 44(10), 81–96 (2009)

    Article  Google Scholar 

  20. Cytron, R.: Doacross: beyond vectorization for multiprocessors. In: International conference on parallel processing, (ICPP1986), pp.836–844. ACM, New York, NY, USA (1986)

Download references

Acknowledgments

Our thanks go to Lei Liu and Shuangde Fang for their suggestions on the earlier versions of the paper. This work is supported in part by the National High Technology Research and Development Program of China (2012AA010902), the National Natural Science Foundation of China under the Grant (61432018), the Innovation Research Group of NSFC (61221062), and by the National Science Foundation (CNS-0915414).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xingjing Lu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, X., Chen, L. & Li, Z. Performance Evaluation and Enhancement of Process-Based Parallel Loop Execution. Int J Parallel Prog 45, 185–198 (2017). https://doi.org/10.1007/s10766-015-0394-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-015-0394-1

Keywords

Navigation