Skip to main content
Log in

Scalable heterogeneous parallelism for atmospheric modeling and simulation

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Heterogeneous multicore chipsets with many levels of parallelism are becoming increasingly common in high-performance computing systems. Effective use of parallelism in these new chipsets constitutes the challenge facing a new generation of large scale scientific computing applications. This study examines methods for improving the performance of two-dimensional and three-dimensional atmospheric constituent transport simulation on the Cell Broadband Engine Architecture (CBEA). A function offloading approach is used in a 2D transport module, and a vector stream processing approach is used in a 3D transport module. Two methods for transferring incontiguous data between main memory and accelerator local storage are compared. By leveraging the heterogeneous parallelism of the CBEA, the 3D transport module achieves performance comparable to two nodes of an IBM BlueGene/P, or eight Intel Xeon cores, on a single PowerXCell 8i chip. Module performance on two CBEA systems, an IBM BlueGene/P, and an eight-core shared-memory Intel Xeon workstation are given.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ainsworth TW, Pinkston TM (2007) On characterizing performance of the Cell Broadband Engine Element Interconnect Bus. In: Proceedings of the first international symposium on networks-on-chip (NOCS ’07), Princeton, NJ, pp 18–29. doi:10.1109/NOCS.2007.34

  2. Alam SR, Agarwal PK (2007) On the path to enable multi-scale biomolecular simulations on petaFLOPS supercomputer with multi-core processors. In: Proceedings of the IEEE international parallel and distributed processing symposium (IPDPS ’07), Long Beach, CA, pp 1–8. doi:10.1109/IPDPS.2007.370443

  3. Bader DA, Patel S (2008) High performance MPEG-2 software decoder on the Cell Broadband E. In: Proceedings of the IEEE international symposium on parallel and distributed processing (IPDPS ’08), Miami, FL, pp 1–10. doi:10.1109/IPDPS.2008.4536234

  4. Baik H, Sihn KH, Il Kim Y, Bae S, Han N, Song HJ (2007) Analysis and parallelization of H.264 decoder on Cell Broadband Engine Architecture. In: Proceedings of the IEEE international symposium on signal processing and information technology, Giza, pp 791–795. doi:10.1109/ISSPIT.2007.4458128

  5. Benthin C, Wald I, Scherbaum M, Friedrich H (2006) Ray tracing on the cell processor. In: Proceedings of the IEEE symposium on interactive ray tracing, Salt Lake City, UT, pp 15–23. doi:10.1109/RT.2006.280210

  6. Blagojevic F, Nikolopoulos DS, Stamatakis A, Antonopoulos, CD (2007) Dynamic multigrain parallelization on the Cell Broadband Engine. In: Proceedings of the 12th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP ’07), ACM, New York, NY, USA, pp 90–100. doi:10.1145/1229428.1229445

  7. Buttari A, Dongarra J, Langou J, Langou J, Luszczek P, Kurzak J (2007) Mixed precision iterative refinement techniques for the solution of dense linear systems. J High Perform Comput Appl 21(4):457–466. doi:10.1177/1094342007084026

    Article  Google Scholar 

  8. Carmichael GR, Peters LK, Kitada T (1986) A second generation model for regional scale transport/chemistry/deposition. Atmos Environ 20:173–188

    Article  Google Scholar 

  9. Carter WPL (1990) A detailed mechanism for the gas-phase atmospheric reactions of organic compounds. Atmos Environ 24A:481–518

    Google Scholar 

  10. Chen L, Hu Z, Lin J, Gao GR (2007) Optimizing the fast Fourier transform on a multi-core architecture. In: Proceedings of the IEEE international parallel and distributed processing symposium (IPDPS ’07), Long Beach, CA, pp 1–8. doi:10.1109/IPDPS.2007.370639

  11. Chen T, Raghavan R, Dale J, Iwata E (2006) Cell Broadband Engine Architecture and its first implementation. IBM DeveloperWorks

  12. Dally WJ, Labonte F, Das A, Hanrahan P, Ahn JH, Gummaraju J, Erez M, Jayasena N, Buck I, Knight TJ, Kapasi UJ (2003) Merrimac: Supercomputing with streams. In: Proceedings of the 2003 ACM/IEEE conference on supercomputing (SC ’03), Washington, DC. IEEE Computer Society, Los Alamitos, p 35

    Chapter  Google Scholar 

  13. Damian V, Sandu A, Damian M, Potra F, Carmichael GR (2002) The Kinetic Preprocessor KPP—a software environment for solving chemical kinetics. Comput Chem Eng 26:1567–1579

    Article  Google Scholar 

  14. Erez M, Ahn JH, Gummaraju J, Rosenblum M, Dally WJ (2007) Executing irregular scientific applications on stream architectures. In: Proceedings of the 21st annual international conference on supercomputing (ICS ’07). ACM, New York, pp 93–104. doi:10.1145/1274971.1274987

    Chapter  Google Scholar 

  15. Fatahalian K, Knight TJ, Houston M, Erez M, Horn DR, Leem L, Park JY, Ren M, Aiken A, Dally WJ, Hanrahan P (2006) Sequoia: programming the memory hierarchy. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing (SC ’06)

  16. Flachs B, Asano S, Dhong SH, Hofstee HP, Gervais G, Kim R, Le T et al (2006) The microarchitecture of the synergistic processor for a cell processor. IEEE J Solid State Circuits 41(1):63–70

    Article  Google Scholar 

  17. Gedik B, Andrade H, Wu KL, Yu PS, Doo M (2008) SPADE: The System S declarative stream processing engine. In: Proceedings of the 2008 international conference on management of data (SIGMOD ’08). ACM, New York, pp 1123–1134. doi:10.1145/1376616.1376729

    Chapter  Google Scholar 

  18. Grell GA, Peckham SE, Schmitz R, McKeen SA, Frost G, Skamarock WC, Eder B (2005) Fully coupled online chemistry within the WRF model. Atmos Environ 39:6957–6975

    Article  Google Scholar 

  19. Hieu NT, Keong KC, Wirawan A, Schmidt B (2008) Applications of heterogeneous structure of Cell Broadband Engine Architecture for biological database similarity search. In: Proceedings of the 2nd international conference on bioinformatics and biomedical engineering (ICBBE ’08), Shanghai, pp 5–8. doi:10.1109/ICBBE.2008.8

  20. Hirsch C (1988) Numerical computation of internal and external flows 1: fundamentals and numerical discretization. Wiley, Chichester

    Google Scholar 

  21. Hundsdorfer W (1996) Numerical solution of advection-diffusion-reaction equations. Tech. rep., Centrum voor Wiskunde en Informatica

  22. IBM (2006) PowerPC microprocessor family: vector/SIMD multimedia extension technology programming environments manual, 2nd edn. International Business Machines Corporation, Raleigh

    Google Scholar 

  23. Ibrahim KZ, Bodin F (2008) Implementing Wilson–Dirac operator on the Cell Broadband Engine. In: Proceedings of the 22nd annual international conference on supercomputing (ICS ’08). ACM, New York, pp 4–14. doi:10.1145/1375527.1375532

    Chapter  Google Scholar 

  24. Kurzak J, Dongarra J (2007) Implementation of mixed precision in solving systems of linear equations on the cell processor. Concurr Comput: Pract Exp 19(10):1371–1385. http://cscads.rice.edu/Presentations/fulltext-kurzak.pdf

    Article  Google Scholar 

  25. Li B, Jin H, Shao Z, Li Y, Liu X (2008) Optimized implementation of ray tracing on Cell Broadband Engine. In: Proceedings of the international conference on multimedia and ubiquitous engineering (MUE ’08), Busan, pp 438–443. doi:10.1109/MUE.2008.83

  26. Linford JC, Sandu A (2009) Vector stream processing for effective application of heterogeneous parallelism. In: Proceedings of the 24th annual ACM symposium on applied computing (SAC’09), Honolulu, HI

  27. Meng BZ, Gbor P, Wen D, Yang F, Shi C, Aronson J, Sloan J (2007) Models for gas/particle partitioning, transformation and air/water surface exchange of PCBs and PCDD/Fs in CMAQ. Atmos Environ 41(39):9111–9127

    Article  Google Scholar 

  28. Muta H, Doi M, Nakano H, Mori Y (2007) Multilevel parallelization on the Cell/B.E. for a motion JPEG 2000 encoding server. In: Proceedings of the 15th international conference on multimedia (MULTIMEDIA ’07). ACM, New York, pp 942–951. doi:10.1145/1291233.1291442

    Chapter  Google Scholar 

  29. Rafique MM, Butt AR, Nikolopoulos DS (2008) DMA-based prefetching for I/O-intensive workloads on the cell architecture. In: Proceedings of the 2008 conference on computing frontiers (CF ’08). ACM, New York, pp 23–32. doi:10.1145/1366230.1366236

    Chapter  Google Scholar 

  30. Ray J, Kennedy C, Lefantzi S, Najm H (2003) High-order spatial discretizations and extended stability methods for reacting flows on structured adaptively refined meshes. In: Proceedings of third joint meeting of the U.S. sections of the combustion institute, Chicago, USA

  31. Sandu A, Daescu D, Carmichael G, Chai T (2005) Adjoint sensitivity analysis of regional air quality models. J Comput Phys 204:222–252

    Article  MATH  Google Scholar 

  32. Strang G (1968) On the construction and comparison of difference schemes. SIAM J Numer Anal 5(3):506–517. http://www.jstor.org/stable/2949700

    Article  MathSciNet  MATH  Google Scholar 

  33. Williams S, Shalf J, Oliker L, Kamil S, Husbands P, Yelick K (2006) The potential of the cell processor for scientific computing. In: Proceedings of the 3rd conference on computing frontiers (CF ’06). ACM, New York, pp 9–20. 10.1145/1128022.1128027

    Chapter  Google Scholar 

  34. Zhu Z, Wang Q, Feng B, Shao L (2007) Speech codec optimization based on Cell Broadband Engine. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP ’07), Honolulu, HI, vol 2, pp 805–808. doi:10.1109/ICASSP.2007.366358

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John C. Linford.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Linford, J.C., Sandu, A. Scalable heterogeneous parallelism for atmospheric modeling and simulation. J Supercomput 56, 300–327 (2011). https://doi.org/10.1007/s11227-010-0380-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-010-0380-8

Keywords