Optimizing Sweep3D for Graphic Processor Unit

Gong, Chunye; Liu, Jie; Gong, Zhenghu; Qin, Jin; Xie, Jing

doi:10.1007/978-3-642-13119-6_36

Chunye Gong²⁰,
Jie Liu²⁰,
Zhenghu Gong²⁰,
Jin Qin²⁰ &
…
Jing Xie²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6081))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1901 Accesses
7 Citations

Abstract

As a powerful and flexible processor, the Graphic Processing Unit (GPU) can offer great faculty in solving many high-performance computing applications. Sweep3D, which simulates a single group time-independent discrete ordinates (S _n) neutron transport deterministically on 3D Cartesian geometry space, represents the key part of a real ASCI application. The wavefront process for parallel computation in Sweep3D limits the concurrent threads on the GPU. In this paper, we present multi-dimensional optimization methods for Sweep3D, which can be efficiently implemented on the fine grained parallel architecture of the GPU. Our results show that the performance of overall Sweep3D on CPU-GPU hybrid platform can be improved up to 2.25 times as compared to the CPU-based implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nguyen, H.: GPU Gems 3. Addison Wesley, Reading (2007)
Google Scholar
Kirk, D.: Innovation in graphics technology. In: Talk in Canadian Undergraduate Technology Conference (2004)
Google Scholar
AMD Corporation: ATI Radeon HD 5870 Feature Summary, http://www.amd.com/
NVIDIA Corporation: CUDA Programming Guide Version 2.1 (2008)
Google Scholar
AMD Corporation: ATI Stream Computing User Guide Version 1.4.0a (2009)
Google Scholar
Munshi, A.: The OpenCL Specification Version: 1.0. Khronos OpenCL Working Group (2009)
Google Scholar
NVIDIA Corporation: Vertical solutions on CUDA, http://www.nvidia.com/object/vertical_solutions.html
Mathis, M.M., Amato, N., Adams, M., Zhao, W.: A General Performance Model for Parallel Sweeps on Orthogonal Grids for Particle Transport Calculations. In: Proc. ACM Int. Conf. Supercomputing, pp. 255–263. ACM, New York (2000)
Google Scholar
Hoisie, A., Lubeck, O., Wasserman, H.: Scalability analysis of multidimensional wavefront algorithms on large-scale SMP clusters. In: The 7th Symposium on the Frontiers of Massively Parallel Computation, pp. 4–15. IEEE Computer Society, Los Alamitos (1999)
Chapter Google Scholar
Hoisie, A., Lubeck, O., Wasserman, H.: Performance and scalability analysis of teraflop- scale parallel architectures using multidimensional wavefront applications. International Journal of High Performance Computing Applications 14(4), 330–346 (2000)
Article Google Scholar
Los Alamos National Laboratory: Sweep3D, http://wwwc3.lanl.gov/pal/software/sweep3d/
Davis, K., Hoisie, A., Johnson, G., Kerbyson, D.J., Lang, M., Pakin, M., Petrini, F.: A Performance and Scalability Analysis of the BlueGene/L Architecture. In: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, pp. 41–50 (2004)
Google Scholar
Barker, K.J., Davis, K., Hoisie, A., Kerbyson, D.J., Lang, M., Pakin, S., Sancho, J.C.: Entering the petaflop era: the architecture and performance of Roadrunner. In: Proceedings of the 2008 ACM/IEEE conference on Supercomputing (2008)
Google Scholar
Lewis, E.E., Miller, W.F.: Computational Methods of Neutron Transport. American Nuclear Society, LaGrange Park (1993)
Google Scholar
Koch, K., Baker, R., Alcouffe, R.: Solution of the First-Order Form of Three-Dimensional Discrete Ordinates Equations on a Massively Parallel Machine. Transactions of American Nuclear Society 65, 198–199 (1992)
Google Scholar
Mathis, M.M., Kerbyson, D.J.: A General Performance Model of structured and Unstructured Mesh Particle Transport Computations. Journal of Supercomputing 34, 181–199 (2005)
Article Google Scholar
Kerbyson, D.J., Hoisie, A.: Analysis of Wavefront Algorithms on Large-scale Two-level Heterogeneous Processing Systems. In: Workshop on Unique Chips and Systems, pp. 259–279 (2006)
Google Scholar
Petrini, F., Fossum, G., Fernandez, J., Varbanescu, A.L., Kistler, N., Perrone, M.: Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine. In: The 21th International Parallel and Distributed Processing Symposium (2007)
Google Scholar
NVIDIA Corporation: NVIDIA Tesla S1070 1U Computing System, http://www.nvidia.com/object/product_tesla_s1070_us.html
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: Proceedings of the 2008 ACM/IEEE conference on Supercomputing (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Sciences, National University of Defense Technology, 410073, Changsha, China
Chunye Gong, Jie Liu, Zhenghu Gong, Jin Qin & Jing Xie

Authors

Chunye Gong
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenghu Gong
View author publications
You can also search for this author in PubMed Google Scholar
Jin Qin
View author publications
You can also search for this author in PubMed Google Scholar
Jing Xie
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Information Engineering, Chung Hua University, 300, Hsinchu, Taiwan, China
Ching-Hsien Hsu
Department of Computer Science, St. Francis Xavier University, B2G 2W5, Antigonish, NS, Canada
Laurence T. Yang
Department of Computer Science ad Engineering, Seoul National University of Technology, 172 Gongreund 2-dong, Nowon-gou, 139-742, Seoul, Korea
Jong Hyuk Park
Division of Computer Engineering, Mokwon University, 302-729, Daejeon, Korea
Sang-Soo Yeo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gong, C., Liu, J., Gong, Z., Qin, J., Xie, J. (2010). Optimizing Sweep3D for Graphic Processor Unit. In: Hsu, CH., Yang, L.T., Park, J.H., Yeo, SS. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2010. Lecture Notes in Computer Science, vol 6081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13119-6_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-13119-6_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13118-9
Online ISBN: 978-3-642-13119-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics