Recursive filter based GPU algorithms in a Data Assimilation scenario

https://doi.org/10.1016/j.jocs.2021.101339Get rights and content

Abstract

Data Assimilation process is generally used to estimate the best initial state of a system in order to improve accuracy of future states prediction. This powerful technique has been widely applied in investigations of the atmosphere, ocean, and land surface. In this work, we deal with the Gaussian convolution operation which is a central step of the Data Assimilation approach, as well as in several data analysis procedures. In particular, we consider the use of recursive filters to approximate the Gaussian convolution. In [1] we presented an accelerated first-order recursive filter to compute the Gaussian convolution kernel, in a very fast way. We present theory and results, and we provide a new GPU-parallel implementation which is based on the third order recursive filter. To observe the benefits in terms of performance, tests and experiments complete our work.

Introduction

Data Assimilation (DA) is a prediction-correction method for combining a physical model with observations. Assimilation techniques are used for atmospheric and ocean modeling. Main common approaches, like Kalman filters and Bayesian techniques, are based on statistical interpolation [2]. Other techniques, for example variational methods such as 3D-Var and 4D-Var, are based on minimization approaches [3], [4], [5]. In general, an almost complete overview of Data Assimilation methods can be found in [6].

Among the most important applications of DA there is the Machine Learning (ML) field [7]. Machine Learning algorithms try to provide an adequate forecast for predicting and understanding a multitude of phenomena. In other words, these two areas of investigation are at the same level, in statistical physics problems [8], [9]. In fact, many developments in DA have been used in the in ML field, producing alternative and innovative methods [10], [11].

In order to perform a correct ML, a suitable classifier is used in the analyze phase of ML process. Indeed, the variational approach of the Data Assimilation process, characterized by a cost function minimization, is a good choice for that classification. Numerically, this means to apply an iterative procedure using a covariance matrix defined by measuring the error between predictions and observed data. Here, we are interested in those numerical issues. In particular, since the error covariance matrix presents, in general, a Gaussian correlation structure, the Gaussian convolution process plays a key role in such a problem. Furthermore, it should be noted that, beyond its fundamental role in the Data Assimilation field, the convolution operation is always significant in the computational process of most big-data analysis problems.

Because of the need to process large amount of data, parallel approaches and High-Performance Computing (HPC) architectures, as multicore or Graphics Processing Units (GPUs), are mandatory [12], [13]. Moreover, parallel computing turned out to be very helpful during the modelling step of a suitable domain decomposition (DD) process of a specific algorithm [14]. Some papers deal with parallel Data Assimilation [15], [16], [17], and recently, an ad-hoc combination of parallel methods with different space reduction techniques (e.g. AutoEncoders, Truncated SVD, etc.) could be good in order to achieve appreciable results in space and time terms [18]. In this paper, we limit our attention to the basic step represented by a parallel implementation for the Gaussian convolution.

In particular, we propose an accelerated procedure to approximate the Gaussian convolution which is based on the recursive filters (RFs). In fact, Gaussian RFs have been designed to provide an accurate and efficient approximated Gaussian convolution [19], [20], [21], [22]. Since the use of RFs is mainly suitable in case of large execution time, many parallel implementations have been presented (see survey in [23]).

In [1] we presented a GPU implementation of the K-iterated first-order RF, together with a performance analysis in terms of execution times, by varying the parallel configurations and the iteration number K. Here we recall theory and results given in [1] and we also consider the third order recursive filter in order to study how much it is possible to improve the calculation speed, as the size of the problem increases.

In other words, we propose a novel parallel implementation, for the third order RF, that exploits the computational power of the GPUs, as we already did for the first-order RF in [1]. This choice of this parallelization environment is because GPUs are very useful for solving numerical problems in several application fields [24], [25] and to manage big size input data.

The rest of the paper is organized as follows. Section 2 recalls the variational Data Assimilation problem and introduces the computational kernel of Gaussian convolution. In Section 3, the use of the recursive filters, to approximate the discrete Gaussian convolution, is described together issues related to the boundary conditions. In particular, we exhibit features about the first-order recursive filter and the third order one. In Section 4 a detailed description of our GPU-parallel algorithms is provided: the domain decomposition strategy and the implementations, related to the first-order recursive filter and the third order one, are presented. The experiments shown in Section 5, allow us to make interesting considerations about the performance of the algorithms. Finally, conclusions are drawn in Section 6.

Section snippets

Data Assimilation and Gaussian convolutions

In this section, we show how the Gaussian convolution is involved in a Data Assimilation scenario. In particular, let us consider a three-dimensional variational Data Assimilation problem [26]: the objective is to give a best estimate of x, that is called the analysis or state vector, once a prior estimate vector xb (background), usually provided by a numerical forecasting model, and a vector y=H(x)+δy of observations, related to the nonlinear model H, are given. The unknown x solves the

Gaussian RFs

Gaussian recursive filters offer a good and fast approximation of the Gaussian convolution. More in detail let us consider a input signal s(0), i.e.:s(0)={sj(0)}jZ=(,s2(0),s1(0),s0(0),s1(0),s2(0),),withsj(0)C.The input signal can be thought of as a complex function with domain the set of integer or, more directly, as a sequence of complex numbers, CZ. Let g denote the zero-mean Gaussian with σ > 0 standard deviation, and letδj=1ifj=00ifj0represent the unit-sample, i.e. a signal

Parallel approach and GPU algorithms

In this section we give a description of our parallel algorithms, and the related strategies, to implement a fast and accurate version of both the K-iterated first-order Gaussian RF and the third order one. Our parallel approach exploits the main features of the GPU environment. For both implementations, the main idea relies on several macro steps in order to obtain a reliable and performing computation. The whole process, common to both algorithms, can be partitioned in three steps.

In the

Experimental results

In this section, several experimental results highlight and confirm the reliability and the efficiency of the proposed software. Following, the technical specifications where the GPU-parallel algorithms have been implemented, are shown:

  • two CPU Intel Xeon with 6 cores, E5-2609v3, 1.9 Ghz, 32 GB of RAM, 4 channels 51Gb/s memory bandwidth

  • two NVIDIA GeForce GTX TITAN X, 3072 CUDA cores, 1 Ghz Core clock for core, 12 GB DDR5, 336 GBs as bandwidth.

Thanks to GPUs’ computational power our

Conclusions

Recursive Filters (RFs) are a well known way to approximate the Gaussian convolution which can be very helpful in Machine Learning field for improving such fundamental steps. In this paper, we presented two different GPU-parallel implementations: the first deals with the first-order Gaussian Recursive filter, the second one involves the third order case. In order to achieve a high parallelism both implementations are designed by exploiting the well-known performance offered by the most recent

Declaration of Competing Interest

The authors report no declarations of interest.

Pasquale De Luca has received bachelor in Computer Science at “Parthenope” University of Naples. He is enrolled in M.Sc. Degree in Computer Science at University of Salerno. His research interests lie in the area of Parallel Computing. He attended in several international conferences. His skills are about HPC, Cloud Computing and Development of Parallel Algorithm, in particular high knowledge of CUDA and many/multi-core programming.

References (34)

  • B.K.W. Lahoz et al.

    Data Assimilation

    (2010)
  • P. De Luca et al.

    Distributed genomic compression in mapreduce paradigm

  • H.D. Abarbanel et al.

    Machine learning: deepest learning as statistical data assimilation problems

    Neural Comput.

    (2018)
  • R.C. Gilbert et al.

    Machine learning methods for data assimilation

    Comput. Intell. Archit. Complex Eng. Syst.

    (2010)
  • J. Brajard et al.

    Combining Data Assimilation and Machine Learning to Emulate a Dynamical Model From Sparse and Noisy Observations: A Case Study With the Lorenz 96 Model

    (2020)
  • P. De Luca et al.

    Performance analysis of a multicore implementation for solving a two-dimensional inverse anomalous diffusion problem

    International Conference on Numerical Computations: Theory and Algorithms

    (2019)
  • P. De Luca et al.

    Haptic data accelerated prediction via multicore implementation

    Science and Information Conference

    (2020)
  • Cited by (14)

    View all citing articles on Scopus

    Pasquale De Luca has received bachelor in Computer Science at “Parthenope” University of Naples. He is enrolled in M.Sc. Degree in Computer Science at University of Salerno. His research interests lie in the area of Parallel Computing. He attended in several international conferences. His skills are about HPC, Cloud Computing and Development of Parallel Algorithm, in particular high knowledge of CUDA and many/multi-core programming.

    Ardelio Galletti received the Ph.D. degree in mathematical sciences from the University of Naples Federico II. He is currently involved with several scientific projects, concerning the development of mathematical software, parallel software, and new methods in numerical analysis and applied mathematics. He taught the master's degree students with the Applied Mathematics and the Parallel and Distributed Computing in applied computer science and numerical computing. He taught Programming for students with the master's degree in mathematics, computer science and engineering. He also taught probability and statistics for the Ph.D. students with the Ph.D. Program Environment, Resources and Sustainable Development. He teaches Mathematics, Statistics and Numerical Computing to the bachelor's degree students in environmental science, biological science, computer science, and nautical and aeronautical science. He is currently an Associate Professor in numerical analysis with the University of Naples Parthenope, where he is also a member of the Ph.D. Program Environment, Resources and Sustainable Development. He has participated as an organizer and the program committee member with several international symposia and workshops. He is the author of about seventy papers published in international conference proceedings, books, and journals. His scientific research interests include applied mathematics, numerical analysis, scientific computing, parallel and distributed computing, numerical approximation and interpolation, via radial basis functions, barycentric coordinates, quadrature rules, and methods for reconstruction of curves and surfaces, such as inverse problems in image analysis, algorithms on parallel and distributed systems with applications in medicine and physics, classification and user profiling, and via reputation systems.

    Giulio Giunta is Full professor of Scientific Computing at the Department of Science and Technology of the Parthenope University of Naples (Italy). Dean of the School of Science, Engineering and Heath, Parthenope University of Naples. Head of the research laboratory High Performance Scientific Computing Smart Lab, Parthenope University of Naples. Member of the Italian Society for Applied and Industrial Mathematics (SIMAI), the Society for Industrial and Applied Mathematics (SIAM), the National Institute of Higher Mathematics (INdAM) – National Group of Scientific Computing.

    Livia Marcellino received the degree in mathematics and the Ph.D. degree in computational science and informatics from the University of Naples Federico II, Italy. Since 2006, she has been an Assistant Professor in numerical analysis with the Department of Science and Technology, University of Naples Parthenope, Italy. Her research interests include scientific computing, numerical analysis, and parallel computing areas. Her research activities are mainly devoted to analyze, design and development of methods, algorithms, and software for the numerical solution of ill-posed inverse problems arising of applied sciences.

    View full text