Recursive filter based GPU algorithms in a Data Assimilation scenario
Introduction
Data Assimilation (DA) is a prediction-correction method for combining a physical model with observations. Assimilation techniques are used for atmospheric and ocean modeling. Main common approaches, like Kalman filters and Bayesian techniques, are based on statistical interpolation [2]. Other techniques, for example variational methods such as 3D-Var and 4D-Var, are based on minimization approaches [3], [4], [5]. In general, an almost complete overview of Data Assimilation methods can be found in [6].
Among the most important applications of DA there is the Machine Learning (ML) field [7]. Machine Learning algorithms try to provide an adequate forecast for predicting and understanding a multitude of phenomena. In other words, these two areas of investigation are at the same level, in statistical physics problems [8], [9]. In fact, many developments in DA have been used in the in ML field, producing alternative and innovative methods [10], [11].
In order to perform a correct ML, a suitable classifier is used in the analyze phase of ML process. Indeed, the variational approach of the Data Assimilation process, characterized by a cost function minimization, is a good choice for that classification. Numerically, this means to apply an iterative procedure using a covariance matrix defined by measuring the error between predictions and observed data. Here, we are interested in those numerical issues. In particular, since the error covariance matrix presents, in general, a Gaussian correlation structure, the Gaussian convolution process plays a key role in such a problem. Furthermore, it should be noted that, beyond its fundamental role in the Data Assimilation field, the convolution operation is always significant in the computational process of most big-data analysis problems.
Because of the need to process large amount of data, parallel approaches and High-Performance Computing (HPC) architectures, as multicore or Graphics Processing Units (GPUs), are mandatory [12], [13]. Moreover, parallel computing turned out to be very helpful during the modelling step of a suitable domain decomposition (DD) process of a specific algorithm [14]. Some papers deal with parallel Data Assimilation [15], [16], [17], and recently, an ad-hoc combination of parallel methods with different space reduction techniques (e.g. AutoEncoders, Truncated SVD, etc.) could be good in order to achieve appreciable results in space and time terms [18]. In this paper, we limit our attention to the basic step represented by a parallel implementation for the Gaussian convolution.
In particular, we propose an accelerated procedure to approximate the Gaussian convolution which is based on the recursive filters (RFs). In fact, Gaussian RFs have been designed to provide an accurate and efficient approximated Gaussian convolution [19], [20], [21], [22]. Since the use of RFs is mainly suitable in case of large execution time, many parallel implementations have been presented (see survey in [23]).
In [1] we presented a GPU implementation of the K-iterated first-order RF, together with a performance analysis in terms of execution times, by varying the parallel configurations and the iteration number K. Here we recall theory and results given in [1] and we also consider the third order recursive filter in order to study how much it is possible to improve the calculation speed, as the size of the problem increases.
In other words, we propose a novel parallel implementation, for the third order RF, that exploits the computational power of the GPUs, as we already did for the first-order RF in [1]. This choice of this parallelization environment is because GPUs are very useful for solving numerical problems in several application fields [24], [25] and to manage big size input data.
The rest of the paper is organized as follows. Section 2 recalls the variational Data Assimilation problem and introduces the computational kernel of Gaussian convolution. In Section 3, the use of the recursive filters, to approximate the discrete Gaussian convolution, is described together issues related to the boundary conditions. In particular, we exhibit features about the first-order recursive filter and the third order one. In Section 4 a detailed description of our GPU-parallel algorithms is provided: the domain decomposition strategy and the implementations, related to the first-order recursive filter and the third order one, are presented. The experiments shown in Section 5, allow us to make interesting considerations about the performance of the algorithms. Finally, conclusions are drawn in Section 6.
Section snippets
Data Assimilation and Gaussian convolutions
In this section, we show how the Gaussian convolution is involved in a Data Assimilation scenario. In particular, let us consider a three-dimensional variational Data Assimilation problem [26]: the objective is to give a best estimate of x, that is called the analysis or state vector, once a prior estimate vector xb (background), usually provided by a numerical forecasting model, and a vector of observations, related to the nonlinear model , are given. The unknown x solves the
Gaussian RFs
Gaussian recursive filters offer a good and fast approximation of the Gaussian convolution. More in detail let us consider a input signal s(0), i.e.:The input signal can be thought of as a complex function with domain the set of integer or, more directly, as a sequence of complex numbers, CZ. Let g denote the zero-mean Gaussian with σ > 0 standard deviation, and letrepresent the unit-sample, i.e. a signal
Parallel approach and GPU algorithms
In this section we give a description of our parallel algorithms, and the related strategies, to implement a fast and accurate version of both the K-iterated first-order Gaussian RF and the third order one. Our parallel approach exploits the main features of the GPU environment. For both implementations, the main idea relies on several macro steps in order to obtain a reliable and performing computation. The whole process, common to both algorithms, can be partitioned in three steps.
In the
Experimental results
In this section, several experimental results highlight and confirm the reliability and the efficiency of the proposed software. Following, the technical specifications where the GPU-parallel algorithms have been implemented, are shown:
- •
two CPU Intel Xeon with 6 cores, E5-2609v3, 1.9 Ghz, 32 GB of RAM, 4 channels 51Gb/s memory bandwidth
- •
two NVIDIA GeForce GTX TITAN X, 3072 CUDA cores, 1 Ghz Core clock for core, 12 GB DDR5, 336 GBs as bandwidth.
Conclusions
Recursive Filters (RFs) are a well known way to approximate the Gaussian convolution which can be very helpful in Machine Learning field for improving such fundamental steps. In this paper, we presented two different GPU-parallel implementations: the first deals with the first-order Gaussian Recursive filter, the second one involves the third order case. In order to achieve a high parallelism both implementations are designed by exploiting the well-known performance offered by the most recent
Declaration of Competing Interest
The authors report no declarations of interest.
Pasquale De Luca has received bachelor in Computer Science at “Parthenope” University of Naples. He is enrolled in M.Sc. Degree in Computer Science at University of Salerno. His research interests lie in the area of Parallel Computing. He attended in several international conferences. His skills are about HPC, Cloud Computing and Development of Parallel Algorithm, in particular high knowledge of CUDA and many/multi-core programming.
References (34)
- et al.
A reduced adjoint approach to variational data assimilation
Comput. Methods Appl. Mech. Eng.
(2013) - et al.
Attention-based convolutional autoencoders for 3d-variational data assimilation
Comput. Methods Appl. Mech. Eng.
(2020) - et al.
Optimal reduced space for variational data assimilation
J. Comput. Phys.
(2019) - et al.
A time-parallel approach to strong-constraint four-dimensional variational data assimilation
J. Comput. Phys.
(2016) - et al.
Data assimilation in meteorology and oceanography
(1991) - et al.
Recursive implementation of the gaussian filter
Signal Process.
(1995) - et al.
Accelerated Gaussian Convolution in a Data Assimilation Scenario, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12142 LNCS
(2020) - et al.
Data assimilation in the geosciences: an overview of methods, issues, and perspectives
WIREs Climate Change
(2018) - et al.
Four-dimensional variational data assimilation using the adjoint of a multilevel primitive-equation model
Q. J. R. Meteorol. Soc.
(1991) - et al.
Model-reduced variational data assimilation
Mon. Weather Rev.
(2006)
Data Assimilation
Distributed genomic compression in mapreduce paradigm
Machine learning: deepest learning as statistical data assimilation problems
Neural Comput.
Machine learning methods for data assimilation
Comput. Intell. Archit. Complex Eng. Syst.
Combining Data Assimilation and Machine Learning to Emulate a Dynamical Model From Sparse and Noisy Observations: A Case Study With the Lorenz 96 Model
Performance analysis of a multicore implementation for solving a two-dimensional inverse anomalous diffusion problem
International Conference on Numerical Computations: Theory and Algorithms
Haptic data accelerated prediction via multicore implementation
Science and Information Conference
Cited by (14)
Parallel self-avoiding walks for a low-autocorrelation binary sequences problem
2024, Journal of Computational Science20 years of computational science: Selected papers from 2020 International Conference on Computational Science
2021, Journal of Computational ScienceLand Data Assimilation: Harmonizing Theory and Data in Land Surface Process Studies
2024, Reviews of GeophysicsParallel Optimization for Large-Scale Ocean Data Assimilation
2023, Jisuanji Yanjiu yu Fazhan/Computer Research and Development
Pasquale De Luca has received bachelor in Computer Science at “Parthenope” University of Naples. He is enrolled in M.Sc. Degree in Computer Science at University of Salerno. His research interests lie in the area of Parallel Computing. He attended in several international conferences. His skills are about HPC, Cloud Computing and Development of Parallel Algorithm, in particular high knowledge of CUDA and many/multi-core programming.
Ardelio Galletti received the Ph.D. degree in mathematical sciences from the University of Naples Federico II. He is currently involved with several scientific projects, concerning the development of mathematical software, parallel software, and new methods in numerical analysis and applied mathematics. He taught the master's degree students with the Applied Mathematics and the Parallel and Distributed Computing in applied computer science and numerical computing. He taught Programming for students with the master's degree in mathematics, computer science and engineering. He also taught probability and statistics for the Ph.D. students with the Ph.D. Program Environment, Resources and Sustainable Development. He teaches Mathematics, Statistics and Numerical Computing to the bachelor's degree students in environmental science, biological science, computer science, and nautical and aeronautical science. He is currently an Associate Professor in numerical analysis with the University of Naples Parthenope, where he is also a member of the Ph.D. Program Environment, Resources and Sustainable Development. He has participated as an organizer and the program committee member with several international symposia and workshops. He is the author of about seventy papers published in international conference proceedings, books, and journals. His scientific research interests include applied mathematics, numerical analysis, scientific computing, parallel and distributed computing, numerical approximation and interpolation, via radial basis functions, barycentric coordinates, quadrature rules, and methods for reconstruction of curves and surfaces, such as inverse problems in image analysis, algorithms on parallel and distributed systems with applications in medicine and physics, classification and user profiling, and via reputation systems.
Giulio Giunta is Full professor of Scientific Computing at the Department of Science and Technology of the Parthenope University of Naples (Italy). Dean of the School of Science, Engineering and Heath, Parthenope University of Naples. Head of the research laboratory High Performance Scientific Computing Smart Lab, Parthenope University of Naples. Member of the Italian Society for Applied and Industrial Mathematics (SIMAI), the Society for Industrial and Applied Mathematics (SIAM), the National Institute of Higher Mathematics (INdAM) – National Group of Scientific Computing.
Livia Marcellino received the degree in mathematics and the Ph.D. degree in computational science and informatics from the University of Naples Federico II, Italy. Since 2006, she has been an Assistant Professor in numerical analysis with the Department of Science and Technology, University of Naples Parthenope, Italy. Her research interests include scientific computing, numerical analysis, and parallel computing areas. Her research activities are mainly devoted to analyze, design and development of methods, algorithms, and software for the numerical solution of ill-posed inverse problems arising of applied sciences.