Skip to main content
Log in

Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Computational Fluid Dynamics (CFD) applications are highly demanding for parallel computing. Many such applications have been shifted from expensive MPP boxes to cost-effective Networks of Workstations (NOW). Auto-CFD-NOW is a pre-compiler that transforms Fortran CFD sequential programs to efficient message-passing parallel programs running on NOW. Our work makes the following three unique contributions. First, this pre-compiler is highly automatic, requiring a minimum number of user directives for parallelization. Second, we have applied a dependency analysis technique for the CFD applications, called analysis after partitioning. We propose a mirror-image decomposition technique to parallelize self-dependent field loops that are hard to parallelize by existing methods. Finally, traditional optimizations of communication focus on eliminating redundant synchronizations. We have developed an optimization scheme which combines all the non-redundant synchronizations in CFD programs to further reduce the communication overhead. The Auto-CFD-NOW has been implemented on networks of workstations and has been successfully used for automatically parallelizing structured CFD application programs. Our experiments show its effectiveness and scalability for parallelizing large CFD applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anderson JM, Amarasinghe SP, Lam MS (1995) Data and computation transformation for multiprocessors. Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’95), pp. 166–178

  2. Arif Wani M, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multi-ring network. The Journal of Supercomputing, 25(1):43–63

    Article  MATH  Google Scholar 

  3. Baden SB, Fink SJ (1998) Communication overlap in multi-tier parallel algorithm. Proceedings of Supercomputing (SC’98)

  4. Banerjee U, Eigenmann R, Nicolau A, Padua DA (1993) Automatic program parallelization. Proceedings of the IEEE, 81(2):211–243

  5. Bhandarkar SM, Arabnia HR (1997) Parallel computer vision on a reconfigurable multiprocessor network. IEEE Transactions on Parallel and Distributed Systems, 8(3):292–310

    Article  Google Scholar 

  6. Blume W, Eigenmann R, Hoeflinger J, Padua D (1994) Automatic detection of parallelism: A grand challenge for high-performance computing. IEEE Parallel and Distributed Technology, 2(3):37–47

    Article  Google Scholar 

  7. Brandes T, Zimmermann F (1994) ADAPTOR—A transformation tool for HPF programs. Programming environments for massively parallel distributed systems. Springer Verlag, pp. 91–96

  8. Brewer EA, Kuszmaul BC (1994) How to get good performance from the CM-5 data network. Proceedings of the 1994 International Parallel Processing Symposium (IPPS), pp. 858–867

  9. Chakrabarti S, Gupta M, Choi J-D (1996) Global communication analysis and optimization. Proceedings of the SIGPLAN ’96 Conference on Programming Language Design and Implementation (PLDI), pp. 68–78.

  10. Chao HY, Harper MP (1995) Minimizing redundant dependencies and interprocessor synchronizations. International Journal of Parallel Programming, 23(3):245–262

    MathSciNet  Google Scholar 

  11. Feng B (1999) On the program restructuring in automatic parallelization based on the domain partition, Ph.D. Dissertation, Department of Computer Science and Engineering, Northwestern Polytechnic University, May, 1999

  12. Glikman E, Ioffe L, Kelson I, Pinter SS (1995) Parallel algorithms for molecular dynamics simulation of irradiation effects in crystals. Scientific Programming, 4(3):185–191

    Google Scholar 

  13. Gartel U., Ressel L (1991) Parallel multigrid grid partitioning versus domain decomposition. Arbeispapiere der GMD, Nr. 599

  14. Gropp WD, Smith EB (1990) Computational fluid dynamics on parallel processors. Computers & Fluids, 18:289–304

    Article  MATH  Google Scholar 

  15. Heng ACK, Low YH (1997) Loop parallelization tool for message-passing system. Microprocessors and Microsystems, 20(7):409–420

    Article  Google Scholar 

  16. High Performance Fortran. http://www.crpc.rice.edu/HPFF/

  17. Hall, MW, Harvey TJ, Kennedy K, McIntosh N, McKinley KS, Oldham JD, Paleczny MH, Roth G (1993) Experiences using the ParaScope Editor, an interactive parallel programming tool. Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’93), pp. 33–43

  18. Kandemir M, Banerjee P, Choudhary A, Ramanujam J, Shenoy N (1999) A global communication optimization technique based on data-flow analysis and linear algebra. ACM Transactions on Programming Languages and Systems, 21(6):1251–1297

    Article  Google Scholar 

  19. Krothapalli VP, Sadayappan P (1991) Removal of redundant dependences in DOACROSS loops with constant dependences. Proceedings of the Third SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’91), pp. 51–60

  20. Lamport L (1974)The parallel execution of DO loops. Communication of the ACM, 17(2):83–93

    Article  MATH  MathSciNet  Google Scholar 

  21. Letauec P (1994) Domain decomposition methods in computational mechanics North-Holland, Amsterdam

    Google Scholar 

  22. Lim AW, Cheong GI, Lam MS (1999) An affine partitioning algorithm to maximize parallelism and minimize communication. Proceedings of ACM International Conference on Supercomputing (ICS’99), pp. 228–237

  23. Liao SW, Diwan A, Bosch Jr. RP, Ghuloum A, Lam MS (1999) SUIF explorer: An interactive and interprocedural parallelizer. Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’99), pp. 37–48

  24. Lee G (1995) Parallelizing iterative loops with conditional branching. IEEE Transactions on Parallel and Distributed Systems, 6(2)

  25. Midkiff S, Padua D (1987) Compiler algorithm for synchronization. IEEE Transactions on Computer, C-36(12): 1485–1495.

    Article  Google Scholar 

  26. Pacific-Sierra Research. http://www.psrv.com/vasthpf.html

  27. Polaris. http://polaris.cs.uiuc.edu/polaris/polaris.html

  28. POOMA. http://www.acl.lanl.gov/PoomaFramwork/index.html

  29. Roose D, Van Driessche R (1995) Parallel computers and parallel algorithms for CFD: An introduction, special course on parallel computing in CFD, AGARD R-807, NATO, 1995, ISBN 92-836-1025-3, pp. 1.1–1.23

  30. Rosing MM, Yabusaki S (1999) A programmable preprocessor for parallelizing Fortran-90. Proceedings of Supercomputing’99

  31. Roth G, Mellor-Crummey J, Kennedy K (1997) Compiling stencils in high performance fortran. Proceedings of Supercomputing’97

  32. Sohn A, Simon H (1994) JOVE: A dynamic load balancing framework for adaptive computations on an SP-2 distributed-memory multiprocessor. NJIT CIS Tech Report 94-40

  33. Simon HD (1992) Parallel computational fluid dynamics. MIT Press, Cambridge MA

    MATH  Google Scholar 

  34. Tseng CW (1995) Compiler optimizations for eliminating barrier synchronization. Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’95), pp. 144–155

  35. Wolfe W (1986) Loop skewing: The wavefront method revisited. International Journal on Parallel Programming, 15(4):279–293

    Article  MATH  Google Scholar 

  36. Wolfe M (1996) Parallelizing compilers. ACM Computing Surveys, 28(1)

  37. Yan Y (1998) Exploiting cache locality on symmetric multiprocessors: A run-time approach, Ph.D. Dissertation, Department of Computer Science, College of William and Mary

  38. Yan Y, Zhang X, Zhang Z (2000) Cacheminer: a runtime approach to exploit cache locality on SMPs. IEEE Transactions on Parallel and Distributed Systems, 11(4):357–374

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This work is supported in part by the China National Aerospace Science Foundation, and by the U.S. National Science Foundation under grants CCR-9812187, CCR-0098055, CCF-0325760, CCF 0514078, and CNS 0549006.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, L., Zhang, X., Kuang, Z. et al. Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations. J Supercomput 38, 189–217 (2006). https://doi.org/10.1007/s11227-006-8324-z

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-006-8324-z

Keywords

Navigation