Discrete dipole approximation simulations on GPUs using OpenCL—Application on cloud ice particles
Highlights
► We analyse the use of graphic cards in discrete dipole approximation calculations. ► We have ported core routine for matrix vector multiplication to OpenCL language. ► A speedup by factor of 5 (15) per iteration is achieved in double (single) precision. ► We compare the calculation speed of graphic card and CPU in several problem sizes.
Introduction
The global distribution and climatology of ice clouds represent one of the main uncertainties in climate analysis and prediction [1]. For their daily global observation, spaceborne microwave millimeter and submillimeter radiometers have been suggested [2] because of their ability to sense the interior of cloud. Many algorithms to retrieve geophysical properties of ice clouds assume a spherical shape of the single crystals when calculating their scattering properties. While this simplifying assumption is appropriate in the Rayleigh regime, i.e., for particles small compared to the wavelength (>3 mm or ν < 100 GHz), for higher frequencies the aspherical shape of the ice crystals more and more determines their scattering properties which in turn are required for the forward radiative transfer modeling of ice clouds. The main properties of the scattered radiation depend on the single scattering properties of every involved particle. There are no sufficient possibilities to measure directly the distributions of size, shape and orientation of the ice crystals in ice clouds and convective systems. However, a possible approach is to build a database of the scattering properties of different shapes and sizes of ice crystals based on forward scattering calculations [3]. There are several different methods to calculate the single scattering properties of ice crystals.
The T-matrix method [4] is an accurate method for calculating scattering properties of particles without distinctive concavity, so it can be used for many different crystal shapes. The FDTD method [5] is computationally expensive on particle sizes which exceed twenty times the wavelength. However, the most flexible method is the DDA method [6] suited for arbitrary shapes. It is an accurate method to solve the Maxwell equations for single particle scattering. But DDA is computationally expensive and has a high memory consumption especially for lager particles. The field of application of FDTD and DDA is similar but DDA is better suited for small refractive indices [7]. In this study we use the ADDA program package [8] because it is written in the C programming language where an interface to OpenCL is provided.
In view of the growing number of multi-core CPU, the growing numbers of cores per CPU and the reduction of clock rate it seems to be the right time to rethink the algorithmic approaches which have been developed for serial computers. A well suited application for parallel computers are for example linear filters applied to large datasets like images. That kind of task is nowadays often done by GPU with up to 1600 scalar processing cores, in contrast to CPU with up to 6 cores. On the other hand the instruction set of a CPU is a lot bigger than that of a GPU, i.e., a CPU can perform more complex mathematical operations in fewer cycles than a GPU. As a consequence, GPU appear attractive to apply mathematically simple function on large datasets, as it is the case for DDA where a system of linear equations is solved. However, FDTD was the first method implemented on GPU by using the graphics API and low level programming [9].
A unified standard framework for programming multi-core CPU, GPU and also other parallel computing devices is the OpenCL by the Khronos group in December 2008 [10]. The GPU vendor Nvidia released an example with their SDK for OpenCL which does an FDTD calculation [11]. In this report, the DDA method, as realized in the ADDA software package, is analysed for the possibility to implement it in OpenCL, and the possible speedup over the CPU implementation will be estimated.
Section snippets
Overview
The DDA approach for calculating single scattering properties of particles approximates the shape of an ice crystal by a number of small dipoles on a cubic lattice. An incident wave is scattered by all dipoles. Every dipole will scatter the incident wave into all directions, among them the directions of all other dipoles so that not only the incident wave arrives at one specific dipole but also the scattered waves from all other dipoles. Therefore the self consistent solution for the dipole
Overview over open computing language (OpenCL)
OpenCL is the name of a parallelization software framework. It is an API and a programming language at the same time. It is used to distribute computational work load over several platforms and devices. The best value of computational power per money and computational power per power consumption available nowadays are high end gaming GPU which are basically highly parallel computation units. OpenCL offers a way to program such a GPU. The OpenCL language is similar to the programming language C
Used hardware
All timing results were obtained on ordinary consumer computer hardware. The relevant details are listed in Table 2. The CPU is a dual core CPU, but for the timing of ADDA just one core is used since we compiled ADDA to sequential mode.
The Nvidia Geforce GTX 260 used in this study is a consumer gaming graphics card. Its fast SP capabillities (about 800 Giga FLOPS) makes it well suited for scientific calculations. However, the DP performance is just about 65 Giga FLOPS. It was one of the first
Splitting into slices
The algorithm of the matrix vector multiplication in ADDA is designed to be executed in parallel by multiple CPU by splitting the matrix into independent slices. We take a closer look at the possibility to use this approach for distributing the work onto the GPU cores. In this approach, there will be as many slices as the x-dimension of the crystal-grid has elements. For some GPU architectures this will result in a low occupancy of the cores when there are more cores than slices. But even with
Runtime
We decided to use the FFT implementation of Apple (Section 5.4) because it showed the best performance. The arithmetic routines were converted to OpenCL without GPU architecture dependent customizations by using the guideline of Section 3. Fig. 3 compares the runtime of our GPU implementations of the matrix–vector multiplication of ADDA with the original GPU implementation on a sphere in a grid of 128 × 128 × 128. We use a power-of-two grid to have comparable results since non-power-of-two grids
Application example
We demonstrate the functionality of the implementation by aplying it to a 6-branch-bullet-rosette-shaped crystals in a grid of 76 × 76 × 76 dipoles where the wavelength is one per crystal dimension. As a validation criterion Draine and Flatau pointed out that one needs at least 10|m| dipoles per wavelength to obtain an accurate result [6] with m the complex refractive index. In our in our example we fulfill this condition with 76 dipoles per wavelength. The crystal consists of 4746 non-void dipoles
Conclusions and outlook
In this study, the potential to use GPU for light scattering calculations with DDA has been investigated. The time consuming matrix vector multiplication of the DDA was reimplemented in OpenCL. The ADDA scattering program by Yurkin and Hoekstra was modified to use the OpenCL implementation of the matrix vector multiplication. The OpenCL programming example from Apple for FFT was modified to handle DP and used to calculate the FFT needed for the matrix vector multiplication. The other parts were
Acknowledgements
Fruitful discussions with Thomas Wriedt on scattering and the suggestions of Maxim Yurkin on the possibility to contribute GPU calculations to the DDA are gratefully acknowledged. We also want to thank the anonymous reviewers for their fruitful comments and suggestions.
Marcus Huntemann studies physics at the University of Bremen. He works at the Institute of Environmental Physics, University of Bremen. His interests lie in high performance computing and general-purpose computing on graphics processing units. The context of his work is the retrieval of geophysical parameters from satellite data.
References (40)
- et al.
T-matrix computations of light scattering by nonspherical particles: a review
J. Quant. Spectrosc. Radiat. Transfer
(1996) - et al.
The discrete dipole approximation for simulation of light scattering by particles much larger than the wavelength
J. Quant. Spectrosc. Radiat. Transfer
(2007) - et al.
The discrete dipole approximation: an overview and recent developments
J. Quant. Spectrosc. Radiat. Transfer
(2007) - et al.
Comparison between discrete dipole implementations and exact techniques
J. Quant. Spectrosc. Radiat. Transfer
(2007) - IPCC, 2007: Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment...
- et al.
The prospect for remote sensing of cirrus clouds with a submillimeter-wave spectrometer
J. Appl. Meteorol.
(1999) - et al.
Scattering database in the millimeter and submillimeter wave range of 100–1000 GHz for nonspherical ice particles
J. Geophys. Res.
(2009) - et al.
Finite-difference time domain method for light scattering by small ice crystals in three-dimensional space
J. Opt. Soc. Am. A
(1996) - et al.
Discrete-dipole approximation for scattering calculations
J. Opt. Soc. Am. A
(1994) - et al.
Systematic comparison of the discrete dipole approximation and the finite difference time domain method for large dielectric scatterers
Opt. Express
(2007)
User Manual for the Discrete Dipole Approximation Code ADDA v. 0. 79
Finite difference time domain (fdtd) simulations using graphics processors
Improvements in the discrete-dipole approximation method of computing scattering and absorption
Opt. Lett.
Light scattering by rectangular solids in the discrete-dipole approximation: a new algorithm exploiting the block-toeplitz structure
J. Opt. Soc. Am. A
Application of fast-Fourier-transform techniques to the discrete-dipole approximation
Opt. Lett.
Polarized Light
Cited by (12)
Discrete dipole approximation
2023, Light, Plasmonics and Particles“pyGDM” - new functionalities and major improvements to the python toolkit for nano-optics full-field simulations
2022, Computer Physics CommunicationsCitation Excerpt :From the technical side, we plan to implement a conjugate gradients solver to avoid full inversion of systems with large amounts of mesh-cells [3]. Furthermore, such iterative solver could be strongly accelerated using GPU-based fast Fourier transforms [80,81]. More flexible meshing is also a feature that we plan to implement in the future [44,20,45].
MATLAB package for discrete dipole approximation by graphics processing unit: Fast Fourier Transform and Biconjugate Gradient
2021, Journal of Quantitative Spectroscopy and Radiative TransferCitation Excerpt :However, in other electromagnetic scattering problems, the biconjugate gradient (BCG) technique has been used as an iterative method, which is three- to six- times faster than the corresponding CCG [25,26]. Several attempts have been made to increase the performance of the DDA [22–24,27–31]. One of the pioneering works was carried out by Draine and Flatau, who popularized the DDA by releasing an open-source package written in FORTRAN, named DDSCAT [22,27].
Graphics Processing Units and Open Computing Language for parallel computing
2014, Computers and Electrical EngineeringCitation Excerpt :Some of these work are for image processing, the very application GPUs were initially designed for (see for example, [13–19]). But more are for modeling and simulation in many diverse fields, including molecular dynamics [20–25], computational fluid dynamics [26–28], climate physics modeling [29,30], astrophysics and other physics studies [31–35], computer aided design [36], fashion design [37], automotive manufacturing [38], speech recognition [39], and social systems [40]. Of the many modeling studies, n-body simulations are seen in all fields of physics and engineering.
Light scattering theory and programs: Discussion of latest advances and open problems
2012, Journal of Quantitative Spectroscopy and Radiative TransferCitation Excerpt :A suitable parallelization software framework is OpenCL. This framework has recently been applied to write a parallel version of A-DDA [38]. The time consuming matrix vector multiplication of the A-DDA was reimplemented in OpenCL.
Marcus Huntemann studies physics at the University of Bremen. He works at the Institute of Environmental Physics, University of Bremen. His interests lie in high performance computing and general-purpose computing on graphics processing units. The context of his work is the retrieval of geophysical parameters from satellite data.
Georg Heygster received the diploma degree in 1976 in solid state physics and the PhD in 1979 in digital image processing, both in physics from the University of Göttingen, Germany. He served as a consultant at the Computer Center of the University of Bremen, Germany, from 1979 to 1988. Since then, after one year working for one year on the imaging mechanisms of scanning acoustic microscopes, he has been head of the group Geophysical Analysis of Satellite Images at the Institute of Environmental Physics, University of Bremen. His research activities include passive and active microwave remote sensing, especially of both surface and atmospheric parameters in the high latitudes, various aspects of the hydrological cycle, long-term trends and retrieval techniques. He was or still is Principal Investigator or co-Investigator of many research projects funded by EU, ESA, DFG and JAXA. These projects include the development of sensor soft- and hardware, conducting campaigns, the final data analysis from multi- and single-sensor data to geophysical parameters, and the interpretation and application of these results in many areas such as meteorology, climatology and oceanography.
Gang Hong received the B.S. degree in atmospheric sciences from Nanjing Institute of Meteorology, Nanjing, China in 1995 and the PhD degree in environmental physics and remote sensing from the University of Bremen, Germany in 2004. He is currently a Senior Research Scientist with Science Systems and Applications, Inc., Hampton, VA, and is a member of the NASA Langley Cloud and Radiation Group. His research is focused on the remote sensing of cloud and aerosol properties and investigating their radiative properties and climatologies. He has also worked on investigating the optical properties of nonspherical cloud ice crystals and aerosols.