Abstract
As a powerful and flexible processor, the Graphic Processing Unit (GPU) can offer great faculty in solving many high-performance computing applications. Sweep3D, which simulates a single group time-independent discrete ordinates (S n ) neutron transport deterministically on 3D Cartesian geometry space, represents the key part of a real ASCI application. The wavefront process for parallel computation in Sweep3D limits the concurrent threads on the GPU. In this paper, we present multi-dimensional optimization methods for Sweep3D, which can be efficiently implemented on the fine grained parallel architecture of the GPU. Our results show that the performance of overall Sweep3D on CPU-GPU hybrid platform can be improved up to 2.25 times as compared to the CPU-based implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Nguyen, H.: GPU Gems 3. Addison Wesley, Reading (2007)
Kirk, D.: Innovation in graphics technology. In: Talk in Canadian Undergraduate Technology Conference (2004)
AMD Corporation: ATI Radeon HD 5870 Feature Summary, http://www.amd.com/
NVIDIA Corporation: CUDA Programming Guide Version 2.1 (2008)
AMD Corporation: ATI Stream Computing User Guide Version 1.4.0a (2009)
Munshi, A.: The OpenCL Specification Version: 1.0. Khronos OpenCL Working Group (2009)
NVIDIA Corporation: Vertical solutions on CUDA, http://www.nvidia.com/object/vertical_solutions.html
Mathis, M.M., Amato, N., Adams, M., Zhao, W.: A General Performance Model for Parallel Sweeps on Orthogonal Grids for Particle Transport Calculations. In: Proc. ACM Int. Conf. Supercomputing, pp. 255–263. ACM, New York (2000)
Hoisie, A., Lubeck, O., Wasserman, H.: Scalability analysis of multidimensional wavefront algorithms on large-scale SMP clusters. In: The 7th Symposium on the Frontiers of Massively Parallel Computation, pp. 4–15. IEEE Computer Society, Los Alamitos (1999)
Hoisie, A., Lubeck, O., Wasserman, H.: Performance and scalability analysis of teraflop- scale parallel architectures using multidimensional wavefront applications. International Journal of High Performance Computing Applications 14(4), 330–346 (2000)
Los Alamos National Laboratory: Sweep3D, http://wwwc3.lanl.gov/pal/software/sweep3d/
Davis, K., Hoisie, A., Johnson, G., Kerbyson, D.J., Lang, M., Pakin, M., Petrini, F.: A Performance and Scalability Analysis of the BlueGene/L Architecture. In: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, pp. 41–50 (2004)
Barker, K.J., Davis, K., Hoisie, A., Kerbyson, D.J., Lang, M., Pakin, S., Sancho, J.C.: Entering the petaflop era: the architecture and performance of Roadrunner. In: Proceedings of the 2008 ACM/IEEE conference on Supercomputing (2008)
Lewis, E.E., Miller, W.F.: Computational Methods of Neutron Transport. American Nuclear Society, LaGrange Park (1993)
Koch, K., Baker, R., Alcouffe, R.: Solution of the First-Order Form of Three-Dimensional Discrete Ordinates Equations on a Massively Parallel Machine. Transactions of American Nuclear Society 65, 198–199 (1992)
Mathis, M.M., Kerbyson, D.J.: A General Performance Model of structured and Unstructured Mesh Particle Transport Computations. Journal of Supercomputing 34, 181–199 (2005)
Kerbyson, D.J., Hoisie, A.: Analysis of Wavefront Algorithms on Large-scale Two-level Heterogeneous Processing Systems. In: Workshop on Unique Chips and Systems, pp. 259–279 (2006)
Petrini, F., Fossum, G., Fernandez, J., Varbanescu, A.L., Kistler, N., Perrone, M.: Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine. In: The 21th International Parallel and Distributed Processing Symposium (2007)
NVIDIA Corporation: NVIDIA Tesla S1070 1U Computing System, http://www.nvidia.com/object/product_tesla_s1070_us.html
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: Proceedings of the 2008 ACM/IEEE conference on Supercomputing (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gong, C., Liu, J., Gong, Z., Qin, J., Xie, J. (2010). Optimizing Sweep3D for Graphic Processor Unit. In: Hsu, CH., Yang, L.T., Park, J.H., Yeo, SS. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2010. Lecture Notes in Computer Science, vol 6081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13119-6_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-13119-6_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13118-9
Online ISBN: 978-3-642-13119-6
eBook Packages: Computer ScienceComputer Science (R0)