Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations

Xiao, Li; Zhang, Xiaodong; Kuang, Zhengqian; Feng, Baiming; Kang, Jichang

doi:10.1007/s11227-006-8324-z

Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations

Published: November 2006

Volume 38, pages 189–217, (2006)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Li Xiao¹,
Xiaodong Zhang²,
Zhengqian Kuang³,
Baiming Feng⁴ &
…
Jichang Kang³

85 Accesses
Explore all metrics

Abstract

Computational Fluid Dynamics (CFD) applications are highly demanding for parallel computing. Many such applications have been shifted from expensive MPP boxes to cost-effective Networks of Workstations (NOW). Auto-CFD-NOW is a pre-compiler that transforms Fortran CFD sequential programs to efficient message-passing parallel programs running on NOW. Our work makes the following three unique contributions. First, this pre-compiler is highly automatic, requiring a minimum number of user directives for parallelization. Second, we have applied a dependency analysis technique for the CFD applications, called analysis after partitioning. We propose a mirror-image decomposition technique to parallelize self-dependent field loops that are hard to parallelize by existing methods. Finally, traditional optimizations of communication focus on eliminating redundant synchronizations. We have developed an optimization scheme which combines all the non-redundant synchronizations in CFD programs to further reduce the communication overhead. The Auto-CFD-NOW has been implemented on networks of workstations and has been successfully used for automatically parallelizing structured CFD application programs. Our experiments show its effectiveness and scalability for parallelizing large CFD applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Data-Centric Programming Model for Large-Scale Parallel Systems

The New UPC++ DepSpawn High Performance Library for Data-Flow Computing with Hybrid Parallelism

Combining Data and Computation Distribution Directives for Hybrid Parallel Programming : A Transformation System

Article 10 May 2016

References

Anderson JM, Amarasinghe SP, Lam MS (1995) Data and computation transformation for multiprocessors. Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’95), pp. 166–178
Arif Wani M, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multi-ring network. The Journal of Supercomputing, 25(1):43–63
Article MATH Google Scholar
Baden SB, Fink SJ (1998) Communication overlap in multi-tier parallel algorithm. Proceedings of Supercomputing (SC’98)
Banerjee U, Eigenmann R, Nicolau A, Padua DA (1993) Automatic program parallelization. Proceedings of the IEEE, 81(2):211–243
Bhandarkar SM, Arabnia HR (1997) Parallel computer vision on a reconfigurable multiprocessor network. IEEE Transactions on Parallel and Distributed Systems, 8(3):292–310
Article Google Scholar
Blume W, Eigenmann R, Hoeflinger J, Padua D (1994) Automatic detection of parallelism: A grand challenge for high-performance computing. IEEE Parallel and Distributed Technology, 2(3):37–47
Article Google Scholar
Brandes T, Zimmermann F (1994) ADAPTOR—A transformation tool for HPF programs. Programming environments for massively parallel distributed systems. Springer Verlag, pp. 91–96
Brewer EA, Kuszmaul BC (1994) How to get good performance from the CM-5 data network. Proceedings of the 1994 International Parallel Processing Symposium (IPPS), pp. 858–867
Chakrabarti S, Gupta M, Choi J-D (1996) Global communication analysis and optimization. Proceedings of the SIGPLAN ’96 Conference on Programming Language Design and Implementation (PLDI), pp. 68–78.
Chao HY, Harper MP (1995) Minimizing redundant dependencies and interprocessor synchronizations. International Journal of Parallel Programming, 23(3):245–262
MathSciNet Google Scholar
Feng B (1999) On the program restructuring in automatic parallelization based on the domain partition, Ph.D. Dissertation, Department of Computer Science and Engineering, Northwestern Polytechnic University, May, 1999
Glikman E, Ioffe L, Kelson I, Pinter SS (1995) Parallel algorithms for molecular dynamics simulation of irradiation effects in crystals. Scientific Programming, 4(3):185–191
Google Scholar
Gartel U., Ressel L (1991) Parallel multigrid grid partitioning versus domain decomposition. Arbeispapiere der GMD, Nr. 599
Gropp WD, Smith EB (1990) Computational fluid dynamics on parallel processors. Computers & Fluids, 18:289–304
Article MATH Google Scholar
Heng ACK, Low YH (1997) Loop parallelization tool for message-passing system. Microprocessors and Microsystems, 20(7):409–420
Article Google Scholar
High Performance Fortran. http://www.crpc.rice.edu/HPFF/
Hall, MW, Harvey TJ, Kennedy K, McIntosh N, McKinley KS, Oldham JD, Paleczny MH, Roth G (1993) Experiences using the ParaScope Editor, an interactive parallel programming tool. Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’93), pp. 33–43
Kandemir M, Banerjee P, Choudhary A, Ramanujam J, Shenoy N (1999) A global communication optimization technique based on data-flow analysis and linear algebra. ACM Transactions on Programming Languages and Systems, 21(6):1251–1297
Article Google Scholar
Krothapalli VP, Sadayappan P (1991) Removal of redundant dependences in DOACROSS loops with constant dependences. Proceedings of the Third SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’91), pp. 51–60
Lamport L (1974)The parallel execution of DO loops. Communication of the ACM, 17(2):83–93
Article MATH MathSciNet Google Scholar
Letauec P (1994) Domain decomposition methods in computational mechanics North-Holland, Amsterdam
Google Scholar
Lim AW, Cheong GI, Lam MS (1999) An affine partitioning algorithm to maximize parallelism and minimize communication. Proceedings of ACM International Conference on Supercomputing (ICS’99), pp. 228–237
Liao SW, Diwan A, Bosch Jr. RP, Ghuloum A, Lam MS (1999) SUIF explorer: An interactive and interprocedural parallelizer. Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’99), pp. 37–48
Lee G (1995) Parallelizing iterative loops with conditional branching. IEEE Transactions on Parallel and Distributed Systems, 6(2)
Midkiff S, Padua D (1987) Compiler algorithm for synchronization. IEEE Transactions on Computer, C-36(12): 1485–1495.
Article Google Scholar
Pacific-Sierra Research. http://www.psrv.com/vasthpf.html
Polaris. http://polaris.cs.uiuc.edu/polaris/polaris.html
POOMA. http://www.acl.lanl.gov/PoomaFramwork/index.html
Roose D, Van Driessche R (1995) Parallel computers and parallel algorithms for CFD: An introduction, special course on parallel computing in CFD, AGARD R-807, NATO, 1995, ISBN 92-836-1025-3, pp. 1.1–1.23
Rosing MM, Yabusaki S (1999) A programmable preprocessor for parallelizing Fortran-90. Proceedings of Supercomputing’99
Roth G, Mellor-Crummey J, Kennedy K (1997) Compiling stencils in high performance fortran. Proceedings of Supercomputing’97
Sohn A, Simon H (1994) JOVE: A dynamic load balancing framework for adaptive computations on an SP-2 distributed-memory multiprocessor. NJIT CIS Tech Report 94-40
Simon HD (1992) Parallel computational fluid dynamics. MIT Press, Cambridge MA
MATH Google Scholar
Tseng CW (1995) Compiler optimizations for eliminating barrier synchronization. Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’95), pp. 144–155
Wolfe W (1986) Loop skewing: The wavefront method revisited. International Journal on Parallel Programming, 15(4):279–293
Article MATH Google Scholar
Wolfe M (1996) Parallelizing compilers. ACM Computing Surveys, 28(1)
Yan Y (1998) Exploiting cache locality on symmetric multiprocessors: A run-time approach, Ph.D. Dissertation, Department of Computer Science, College of William and Mary
Yan Y, Zhang X, Zhang Z (2000) Cacheminer: a runtime approach to exploit cache locality on SMPs. IEEE Transactions on Parallel and Distributed Systems, 11(4):357–374
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Michigan State University, U.S.A.
Li Xiao
Department of Computer Science and Engineering, Ohio State University, U.S.A.
Xiaodong Zhang
Department of Computer Science and Engineering, Northwestern Polytechnical University, P.R. China
Zhengqian Kuang & Jichang Kang
Department of Computer Science, Northwestern Normal University, P.R. China
Baiming Feng

Authors

Li Xiao
View author publications
You can also search for this author inPubMed Google Scholar
Xiaodong Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Zhengqian Kuang
View author publications
You can also search for this author inPubMed Google Scholar
Baiming Feng
View author publications
You can also search for this author inPubMed Google Scholar
Jichang Kang
View author publications
You can also search for this author inPubMed Google Scholar

Additional information

This work is supported in part by the China National Aerospace Science Foundation, and by the U.S. National Science Foundation under grants CCR-9812187, CCR-0098055, CCF-0325760, CCF 0514078, and CNS 0549006.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, L., Zhang, X., Kuang, Z. et al. Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations. J Supercomput 38, 189–217 (2006). https://doi.org/10.1007/s11227-006-8324-z

Download citation

Issue Date: November 2006
DOI: https://doi.org/10.1007/s11227-006-8324-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Data-Centric Programming Model for Large-Scale Parallel Systems

The New UPC++ DepSpawn High Performance Library for Data-Flow Computing with Hybrid Parallelism

Combining Data and Computation Distribution Directives for Hybrid Parallel Programming : A Transformation System

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now