Discrete dipole approximation simulations on GPUs using OpenCL—Application on cloud ice particles

doi:10.1016/j.jocs.2011.05.011

Journal of Computational Science

Volume 2, Issue 3, August 2011, Pages 262-271

https://doi.org/10.1016/j.jocs.2011.05.011 Get rights and content

Abstract

The global distribution and climatology of ice clouds are among the main uncertainties in climate modeling and prediction. In order to retrieve ice cloud properties from remote sensing measurements, the scattering properties of all cloud ice particle types must be known. The discrete dipole approximation (DDA) simulates scattering of radiation by arbitrarily shaped particles and is thus suitable for cloud ice crystals. The DDA models the particle as a collection of equal dipoles on a lattice, and is computationally much more expensive than approximations restricted to more regularly shaped particles. On a single computer the calculation for an ice particle of a specific size, for a given scattering plane at one specific wavelength can take several days. We have ported the core routines of the scattering suite “ADDA” to the open computing language (OpenCL), a framework for programming parallel devices like PC graphics cards (graphics processing units, GPUs) or multi-core CPUs. In a typical case we can achieve a speed-up on a GPU as compared to a CPU by a factor of 5 in double precision and a factor of 15 in single precision. Spreading the work load over multiple GPUs will allow calculating the scattering properties even of large cloud ice particles.

Highlights

► We analyse the use of graphic cards in discrete dipole approximation calculations. ► We have ported core routine for matrix vector multiplication to OpenCL language. ► A speedup by factor of 5 (15) per iteration is achieved in double (single) precision. ► We compare the calculation speed of graphic card and CPU in several problem sizes.

Introduction

The global distribution and climatology of ice clouds represent one of the main uncertainties in climate analysis and prediction [1]. For their daily global observation, spaceborne microwave millimeter and submillimeter radiometers have been suggested [2] because of their ability to sense the interior of cloud. Many algorithms to retrieve geophysical properties of ice clouds assume a spherical shape of the single crystals when calculating their scattering properties. While this simplifying assumption is appropriate in the Rayleigh regime, i.e., for particles small compared to the wavelength (>3 mm or ν < 100 GHz), for higher frequencies the aspherical shape of the ice crystals more and more determines their scattering properties which in turn are required for the forward radiative transfer modeling of ice clouds. The main properties of the scattered radiation depend on the single scattering properties of every involved particle. There are no sufficient possibilities to measure directly the distributions of size, shape and orientation of the ice crystals in ice clouds and convective systems. However, a possible approach is to build a database of the scattering properties of different shapes and sizes of ice crystals based on forward scattering calculations [3]. There are several different methods to calculate the single scattering properties of ice crystals.

The T-matrix method [4] is an accurate method for calculating scattering properties of particles without distinctive concavity, so it can be used for many different crystal shapes. The FDTD method [5] is computationally expensive on particle sizes which exceed twenty times the wavelength. However, the most flexible method is the DDA method [6] suited for arbitrary shapes. It is an accurate method to solve the Maxwell equations for single particle scattering. But DDA is computationally expensive and has a high memory consumption especially for lager particles. The field of application of FDTD and DDA is similar but DDA is better suited for small refractive indices [7]. In this study we use the ADDA program package [8] because it is written in the C programming language where an interface to OpenCL is provided.

In view of the growing number of multi-core CPU, the growing numbers of cores per CPU and the reduction of clock rate it seems to be the right time to rethink the algorithmic approaches which have been developed for serial computers. A well suited application for parallel computers are for example linear filters applied to large datasets like images. That kind of task is nowadays often done by GPU with up to 1600 scalar processing cores, in contrast to CPU with up to 6 cores. On the other hand the instruction set of a CPU is a lot bigger than that of a GPU, i.e., a CPU can perform more complex mathematical operations in fewer cycles than a GPU. As a consequence, GPU appear attractive to apply mathematically simple function on large datasets, as it is the case for DDA where a system of linear equations is solved. However, FDTD was the first method implemented on GPU by using the graphics API and low level programming [9].

A unified standard framework for programming multi-core CPU, GPU and also other parallel computing devices is the OpenCL by the Khronos group in December 2008 [10]. The GPU vendor Nvidia released an example with their SDK for OpenCL which does an FDTD calculation [11]. In this report, the DDA method, as realized in the ADDA software package, is analysed for the possibility to implement it in OpenCL, and the possible speedup over the CPU implementation will be estimated.

Section snippets

Overview

The DDA approach for calculating single scattering properties of particles approximates the shape of an ice crystal by a number of small dipoles on a cubic lattice. An incident wave is scattered by all dipoles. Every dipole will scatter the incident wave into all directions, among them the directions of all other dipoles so that not only the incident wave arrives at one specific dipole but also the scattered waves from all other dipoles. Therefore the self consistent solution for the dipole

Overview over open computing language (OpenCL)

OpenCL is the name of a parallelization software framework. It is an API and a programming language at the same time. It is used to distribute computational work load over several platforms and devices. The best value of computational power per money and computational power per power consumption available nowadays are high end gaming GPU which are basically highly parallel computation units. OpenCL offers a way to program such a GPU. The OpenCL language is similar to the programming language C

Used hardware

All timing results were obtained on ordinary consumer computer hardware. The relevant details are listed in Table 2. The CPU is a dual core CPU, but for the timing of ADDA just one core is used since we compiled ADDA to sequential mode.

The Nvidia Geforce GTX 260 used in this study is a consumer gaming graphics card. Its fast SP capabillities (about 800 Giga FLOPS) makes it well suited for scientific calculations. However, the DP performance is just about 65 Giga FLOPS. It was one of the first

Splitting into slices

The algorithm of the matrix vector multiplication in ADDA is designed to be executed in parallel by multiple CPU by splitting the matrix into independent slices. We take a closer look at the possibility to use this approach for distributing the work onto the GPU cores. In this approach, there will be as many slices as the x-dimension of the crystal-grid has elements. For some GPU architectures this will result in a low occupancy of the cores when there are more cores than slices. But even with

Runtime

We decided to use the FFT implementation of Apple (Section 5.4) because it showed the best performance. The arithmetic routines were converted to OpenCL without GPU architecture dependent customizations by using the guideline of Section 3. Fig. 3 compares the runtime of our GPU implementations of the matrix–vector multiplication of ADDA with the original GPU implementation on a sphere in a grid of 128 × 128 × 128. We use a power-of-two grid to have comparable results since non-power-of-two grids

Application example

We demonstrate the functionality of the implementation by aplying it to a 6-branch-bullet-rosette-shaped crystals in a grid of 76 × 76 × 76 dipoles where the wavelength is one per crystal dimension. As a validation criterion Draine and Flatau pointed out that one needs at least 10|m| dipoles per wavelength to obtain an accurate result [6] with m the complex refractive index. In our in our example we fulfill this condition with 76 dipoles per wavelength. The crystal consists of 4746 non-void dipoles

Conclusions and outlook

In this study, the potential to use GPU for light scattering calculations with DDA has been investigated. The time consuming matrix vector multiplication of the DDA was reimplemented in OpenCL. The ADDA scattering program by Yurkin and Hoekstra was modified to use the OpenCL implementation of the matrix vector multiplication. The OpenCL programming example from Apple for FFT was modified to handle DP and used to calculate the FFT needed for the matrix vector multiplication. The other parts were

Acknowledgements

Fruitful discussions with Thomas Wriedt on scattering and the suggestions of Maxim Yurkin on the possibility to contribute GPU calculations to the DDA are gratefully acknowledged. We also want to thank the anonymous reviewers for their fruitful comments and suggestions.

Marcus Huntemann studies physics at the University of Bremen. He works at the Institute of Environmental Physics, University of Bremen. His interests lie in high performance computing and general-purpose computing on graphics processing units. The context of his work is the retrieval of geophysical parameters from satellite data.

References (40)

M.I. Mishchenko et al.
T-matrix computations of light scattering by nonspherical particles: a review
J. Quant. Spectrosc. Radiat. Transfer
(1996)
M.A. Yurkin et al.
The discrete dipole approximation for simulation of light scattering by particles much larger than the wavelength
J. Quant. Spectrosc. Radiat. Transfer
(2007)
M.A. Yurkin et al.
The discrete dipole approximation: an overview and recent developments
J. Quant. Spectrosc. Radiat. Transfer
(2007)
A. Penttilä et al.
Comparison between discrete dipole implementations and exact techniques
J. Quant. Spectrosc. Radiat. Transfer
(2007)
IPCC, 2007: Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment...
K.F. Evans et al.
The prospect for remote sensing of cirrus clouds with a submillimeter-wave spectrometer
J. Appl. Meteorol.
(1999)
G. Hong et al.
Scattering database in the millimeter and submillimeter wave range of 100–1000 GHz for nonspherical ice particles
J. Geophys. Res.
(2009)
P. Yang et al.
Finite-difference time domain method for light scattering by small ice crystals in three-dimensional space
J. Opt. Soc. Am. A
(1996)
B.T. Draine et al.
Discrete-dipole approximation for scattering calculations
J. Opt. Soc. Am. A
(1994)
M.A. Yurkin et al.
Systematic comparison of the discrete dipole approximation and the finite difference time domain method for large dielectric scatterers
Opt. Express
(2007)

M.A. Yurkin et al.

User Manual for the Discrete Dipole Approximation Code ADDA v. 0. 79

(2009)

S. Adams et al.

Finite difference time domain (fdtd) simulations using graphics processors

P.J. Flatau

Improvements in the discrete-dipole approximation method of computing scattering and absorption

Opt. Lett.

(1997)

P.J. Flatau et al.

Light scattering by rectangular solids in the discrete-dipole approximation: a new algorithm exploiting the block-toeplitz structure

J. Opt. Soc. Am. A

(1990)

J.J. Goodman et al.

Application of fast-Fourier-transform techniques to the discrete-dipole approximation

Opt. Lett.

(1991)

E. Collett

Polarized Light

(1993)

Cited by (12)

Discrete dipole approximation
2023, Light, Plasmonics and Particles
This chapter is dedicated to the general volume-discretization method, known as the discrete dipole approximation (DDA). The main DDA equations are presented for local non-magnetic materials, including a non-absorbing host medium. This framework describes not only the conventional scattering problems but also excitation by arbitrary complex vector beams, as well as the calculation of emission (decay-rate) enhancement and electron energy-loss probability. Many available DDA formulations and important numerical issues are also discussed. Special attention is paid to practical aspects of the DDA simulations, including the comparison of the two most popular open-source codes: DDSCAT and ADDA.
“pyGDM” - new functionalities and major improvements to the python toolkit for nano-optics full-field simulations
2022, Computer Physics Communications
Citation Excerpt :
From the technical side, we plan to implement a conjugate gradients solver to avoid full inversion of systems with large amounts of mesh-cells [3]. Furthermore, such iterative solver could be strongly accelerated using GPU-based fast Fourier transforms [80,81]. More flexible meshing is also a feature that we plan to implement in the future [44,20,45].
pyGDM is a python toolkit for electro-dynamical simulations of individual nano-structures, based on the Green Dyadic Method (GDM). pyGDM uses the concept of a generalized propagator, which allows to solve cost-efficiently monochromatic problems with a large number of varying illumination conditions such as incident angle scans or focused beam raster-scan simulations. We provide an overview of new features added since the initial publication [Wiecha, Comput. Phys. Commun. 233 (2018) 167–192]. The updated version of pyGDM is implemented in pure python, removing the former dependency on fortran-based binaries. In the course of this re-write, the toolkit's internal architecture has been completely redesigned to offer a much wider range of possibilities to the user such as the choice of the dyadic Green's functions describing the environment. A new class of dyads allows to perform 2D simulations of infinitely long nanostructures. While the Green's dyads in pyGDM are based on a quasistatic description for interfaces, we also provide as new external python package “pyGDM2_retard” a module with retarded Green's tensors for an environment with two interfaces. We have furthermore added functionalities for simulations using fast-electron excitation, namely electron energy loss spectroscopy and cathodoluminescence. Along with several further new tools and improvements, the update includes also the possibility to calculate the magnetic field and the magnetic LDOS inside nanostructures, field-gradients in- and outside a nanoparticle, optical forces or the chirality of nearfields. All new functionalities remain compatible with the evolutionary optimization module of pyGDM for nano-photonics inverse design.
Program Title: pyGDM2
CPC Library link to program files: https://doi.org/10.17632/5zdrppdc3j.1
Licensing provisions: GPLv3
Programming language: python
Nature of problem: Full-field electrodynamical simulations of photonic nanostructures. This includes calculations of the optical extinction, scattering and absorption, as well as the near-field distribution or the interaction of quantum emitters with nanostructures as well as fast electron beam simulations. The toolkit includes a module for automated evolutionary optimization of nanostructure geometries to obtain a user-defined optical response.
Solution method:: The optical response of photonic nanostructures is calculated using field susceptibilities (“Green Dyadic Method”, GDM) via a volume discretization. The approach is formally similar to the coupled dipole approximation.
Additional comments including restrictions and unusual features: 2D and 3D nanostructures. On typical office PCs (8-16GB RAM) the discretization is limited to about 10000-15000 meshpoints, therefore it applies best to single, small nanostructures.
MATLAB package for discrete dipole approximation by graphics processing unit: Fast Fourier Transform and Biconjugate Gradient
2021, Journal of Quantitative Spectroscopy and Radiative Transfer
Citation Excerpt :
However, in other electromagnetic scattering problems, the biconjugate gradient (BCG) technique has been used as an iterative method, which is three- to six- times faster than the corresponding CCG [25,26]. Several attempts have been made to increase the performance of the DDA [22–24,27–31]. One of the pioneering works was carried out by Draine and Flatau, who popularized the DDA by releasing an open-source package written in FORTRAN, named DDSCAT [22,27].
A MATLAB package describing discrete dipole approximation (MPDDA) is introduced to calculate the optical properties of arbitrary shaped plasmonic nanoparticles (NPs). The polarizability function, induced dipoles, and dipole interaction matrix are discussed. To calculate the dipole moments, Fast Fourier Transform (FFT) and Biconjugate Gradient (BCG) were used to reduce the computational time and memory. To further accelerate the computational procedure, we used MATLAB functions and toolboxes supported by the graphics processing unit (GPU) and execute them on GPU by supplying a “gpuArray” argument. Using the GPU, the computation cost significantly decreased when compared to the CPU. We also simulated the optical properties of plasmonic NPs, such as extinction, absorption, and scattering efficiencies, electric field enhancement around monomeric and dimeric structures using this package. The accuracy and capabilities of the code have been confirmed by comparing with Mie theory (exact solution) and ADDA code (a DDA package developed in C).
Graphics Processing Units and Open Computing Language for parallel computing
2014, Computers and Electrical Engineering
Citation Excerpt :
Some of these work are for image processing, the very application GPUs were initially designed for (see for example, [13–19]). But more are for modeling and simulation in many diverse fields, including molecular dynamics [20–25], computational fluid dynamics [26–28], climate physics modeling [29,30], astrophysics and other physics studies [31–35], computer aided design [36], fashion design [37], automotive manufacturing [38], speech recognition [39], and social systems [40]. Of the many modeling studies, n-body simulations are seen in all fields of physics and engineering.
Graphics Processing Units (GPUs) have become increasingly powerful over the last decade. Programs taking advantage of this architecture can achieve large performance gains and almost all new solutions and initiatives in high performance computing are aimed in that direction. To write programs that can offload the computation onto the GPU and utilize its power, new technologies are needed. The recent introduction of Open Computing Language (OpenCL), a standard for cross-platform, parallel programming of modern processors, has made a step in the right direction. Code written with OpenCL can run on a wide variety of platforms, adapting to the underlying architecture. It is versatile yet easy to learn due to similarities with the C programming language. In this paper, we will review the current state of the art in the use of GPUs and OpenCL for parallel computations. We use an implementation of the n-body simulation to illustrate some important considerations in developing OpenCL programs.
Light scattering theory and programs: Discussion of latest advances and open problems
2012, Journal of Quantitative Spectroscopy and Radiative Transfer
Citation Excerpt :
A suitable parallelization software framework is OpenCL. This framework has recently been applied to write a parallel version of A-DDA [38]. The time consuming matrix vector multiplication of the A-DDA was reimplemented in OpenCL.
Generalizing the exact multipole expansion: Density of multipole modes in complex photonic nanostructures
2022, Nanophotonics

View all citing articles on Scopus

Georg Heygster received the diploma degree in 1976 in solid state physics and the PhD in 1979 in digital image processing, both in physics from the University of Göttingen, Germany. He served as a consultant at the Computer Center of the University of Bremen, Germany, from 1979 to 1988. Since then, after one year working for one year on the imaging mechanisms of scanning acoustic microscopes, he has been head of the group Geophysical Analysis of Satellite Images at the Institute of Environmental Physics, University of Bremen. His research activities include passive and active microwave remote sensing, especially of both surface and atmospheric parameters in the high latitudes, various aspects of the hydrological cycle, long-term trends and retrieval techniques. He was or still is Principal Investigator or co-Investigator of many research projects funded by EU, ESA, DFG and JAXA. These projects include the development of sensor soft- and hardware, conducting campaigns, the final data analysis from multi- and single-sensor data to geophysical parameters, and the interpretation and application of these results in many areas such as meteorology, climatology and oceanography.

Gang Hong received the B.S. degree in atmospheric sciences from Nanjing Institute of Meteorology, Nanjing, China in 1995 and the PhD degree in environmental physics and remote sensing from the University of Bremen, Germany in 2004. He is currently a Senior Research Scientist with Science Systems and Applications, Inc., Hampton, VA, and is a member of the NASA Langley Cloud and Radiation Group. His research is focused on the remote sensing of cloud and aerosol properties and investigating their radiative properties and climatologies. He has also worked on investigating the optical properties of nonspherical cloud ice crystals and aerosols.

View full text

Discrete dipole approximation simulations on GPUs using OpenCL—Application on cloud ice particles

Abstract

Highlights

Introduction

Section snippets

Overview

Overview over open computing language (OpenCL)

Used hardware

Splitting into slices

Runtime

Application example

Conclusions and outlook

Acknowledgements

J. Quant. Spectrosc. Radiat. Transfer

J. Quant. Spectrosc. Radiat. Transfer

J. Quant. Spectrosc. Radiat. Transfer

J. Quant. Spectrosc. Radiat. Transfer

The prospect for remote sensing of cirrus clouds with a submillimeter-wave spectrometer

J. Appl. Meteorol.

Scattering database in the millimeter and submillimeter wave range of 100–1000 GHz for nonspherical ice particles

J. Geophys. Res.

Finite-difference time domain method for light scattering by small ice crystals in three-dimensional space

J. Opt. Soc. Am. A

Discrete-dipole approximation for scattering calculations

J. Opt. Soc. Am. A

Systematic comparison of the discrete dipole approximation and the finite difference time domain method for large dielectric scatterers

Opt. Express

User Manual for the Discrete Dipole Approximation Code ADDA v. 0. 79

Finite difference time domain (fdtd) simulations using graphics processors

Improvements in the discrete-dipole approximation method of computing scattering and absorption

Opt. Lett.

Light scattering by rectangular solids in the discrete-dipole approximation: a new algorithm exploiting the block-toeplitz structure

J. Opt. Soc. Am. A

Application of fast-Fourier-transform techniques to the discrete-dipole approximation

Opt. Lett.

Polarized Light