1 Introduction

The simulation of realistic combustion phenomena quickly becomes computationally expensive due to, e.g., inherent multi-scale characteristics, complex fluid flow, or detailed chemistry with many chemical species and reactions. Simulation codes for chemically reacting flows (e.g., turbulent flames) solve stiff systems of partial differential equations (PDE) on large grids, making the use of efficient computational strategies necessary. Here, algorithmic differentiation (AD, [3]) can help to resolve some of the computational challenges associated with chemically reacting flows. AD enables, for instance, sensitivity studies [1], or efficient optimization algorithms for key parameters [11] of these mechanisms.

In this work, we apply AD to the Universal Laminar Flame (ULF) solver [13], a C++ framework for solving generic laminar flame configurations. The code computes and tabulates the thermochemical state of these generic flames which is then parametrized by few control variables, i.e., the mixture fraction and reaction progress variable. The look-up tables obtained with ULF are used in 3D CFD simulations of chemically reacting flows, together with common chemical reduction techniques such as the flamelet concept [10]. The code is also used in other scenarios, e.g., for detailed flame structure analyses or model development, and is a key tool for studying the characteristics of chemically reacting flows.

The objective of this study is to introduce AD to ULF by using operator overloading. We focus on the technical details of this modification and how it impacts overall code performance. As an exemplary application, AD is used to compute exact Jacobians which are required by the numerical solver. The derivatives generated by AD are accurate up to machine precision, often at a lower computational cost w.r.t. finite differentiation (FD).

2 Two Canonical Models from the ULF Framework

We study the impact of AD using two canonical problems in the field of chemically reacting flows, i.e., a constant-pressure, homogeneous reactor (HR) and a freely-propagating premixed flame (FPF). The FPF is a 1D, stationary flame which burns towards a premixed stream of fresh reactants (e.g., a methane-air mixture), such that the fluid flow velocity and the burning velocity of the flame compensate each other. This yields a two-point boundary value problem described by a differential-algebraic equation (DAE) set, cf. [7]. We obtain the HR problem if all terms except the transient and the chemical source terms are omitted, resulting in an ordinary differential equation (ODE) set.

The stiff equation set is solved with an implicit time-stepping procedure based on backward-differencing formulas (BDF) with internal Newton iterations, i.e., the solvers BzzDAE [2] and CVODE [4] are used for the FPF and the HR, respectively. The discretization for the FPF is done on a 1D grid with n points. The HR is, on the other hand, solved on a single grid point. Each grid point i has a state \(X_i\) that is dependent on the temperature \(T_i\) and mass fractions \(Y_{i,j}\) for species \(j = 1...k\).

The internal Newton iteration of each solver requires a Jacobian. ULF uses a callback mechanism for user-defined Jacobians and computes them either numerically, or analytically for certain mechanisms. The Jacobian of the equation set \(f: \mathbb {R}^N \rightarrow \mathbb {R}^N\) is typically a block tridiagonal matrix of dimension \(\mathbb {R}^{N \times N}\) with \(N= n(k+1)\). The main diagonal is the state \(X_i\) (Jacobian of the chemical source terms), the lower and upper diagonal describe the influence of the neighboring grid points, e.g., \(i - 1\) and \(i + 1\), determined by the transport terms.

3 Algorithmic Differentiation

AD [3] describes the semantic augmentation of scientific codes in order to compute derivatives. In the context of AD, any code is assumed to be a set of mathematical operations (e.g., +) and functions (e.g., sin) which have known analytical derivatives. Thus, the chain rule of differentiation can be applied to each statement, resulting in the propagation of derivatives through the code.

Two modes exist for AD, the forward mode (FM) and the reverse mode (RM). The FM applies the chain rule at each computational stage and propagates the derivatives of intermediate variables w.r.t. input variables along the program flow. The RM, on the other hand, propagates adjoints - derivatives of the final result w.r.t. intermediate variables - in reverse order through the program flow. This requires temporary storage, called tape, but computes gradients efficiently.

The semantic augmentation for AD is either done by a source transformation or by operator overloading. Due to the inherent complexity of the C++ language, there is no comprehensive source transformation tool and, thus, operator overloading is typically used in complex C++ simulation software. With operator overloading, the built-in floating point type T is re-declared to a user-defined AD type \(\tilde{T}\). It stores the original value of T, called primal value, and overloads all required operations to perform additional derivative computations.

4 Introducing the AD Type to ULF

Introducing AD to a code typically requires 1. a type change to the AD type, 2. integration with external software packages, and 3. addition of code to initialize (seed) the AD type, compute, and extract the derivatives [9]. For brevity, we do not show the seeding routines. The FM seeding is analogous to the perturbation required for the existing FD implementation to compute the Jacobian. The RM, on the other hand, requires taping of the function and seeding of the respective function output before evaluating the tape to assemble the Jacobian.

We apply the operator overloading AD tool CoDiPack. CoDiPack uses advanced metaprogramming techniques to minimize the overhead and has been successfully used for large scale (CFD) simulations [5].

4.1 Fundamentals of Enabling Operator Overloading in ULF

We define the alias ulfScalar in a central header which, in the current design, is either set to the built-in double type or an AD type at compile time, see Fig. 1.

Fig. 1.
figure 1

RealForward and RealReverse are the standard CoDiPack AD types for the FM and RM, respectively. The AD types have an API for accessing the derivatives etc.

The initial design of ULF did not account for user-defined types (e.g., an AD type). This leads to compile time errors after a type change as built-in and user-defined types are treated differently [6]. Hence, we refactored several components in ULF, i.e., eliminating implicit conversions of the AD data types to some other type, handling of explicit type casts and re-designing the ULF data structures.

Generic Data Structures. The fundamental numeric data structure for computations in ULF is the field class. It stores, e.g., the temperature and concentration of the different chemical species. In accordance to [9], we refactored and templated this class in order to support generic scalar types, specifying two parameters, 1. the underlying vector representation type, and 2. the corresponding, underlying scalar type, see Fig. 2. We extended this principle to every other data structure in ULF, having the scalar alias as the basic type dependency.

Fig. 2.
figure 2

The class fieldTmpl represents the base class for fields and provides all operations. They are generically defined in the fieldDataTmpl. Finally, the ULF framework defines field as an alias for the data structure based on the ulfScalar alias.

Special Code Regions: External Libraries and Diagnostics. The external solver libraries and, more generically, external API calls (e.g., assert and printing statements) in ULF, do not need to be differentiated. We denote them as special regions, as they require the built-in double type. ULF, for instance, previously interfaced with external solvers by copying between the internal data representation and the different data structures of the solvers. With AD, this requires an explicit value extraction of the primal value (cf. Sect. 3). To that end, we introduced a conversion function, using template specialization, see Fig. 3. This design reduces the undue code maintenance burden of introducing user-defined types to ULF. If, e.g., a multi-precision type is required, only an additional specialization of the conversion function in a single header is required.

Fig. 3.
figure 3

The functions value and recast are used for value extraction and type casting, respectively. The former makes use of the struct in the detail namespace which is used to specialize for user-defined types, e.g., the AD type (not shown). If built-in doubles are used, the value is simply returned (as shown).

5 Evaluation

We evaluate the ULF AD implementation based on two reaction mechanisms, shown in Table 1. For brevity, we focus on the RM.

Table 1. Mechanisms for the model evaluations.

Timings are the median of a series of multiple runs for each mechanism. The standard deviation was less than 3% w.r.t. the median for each benchmark. All computations were conducted on a compute node of the Lichtenberg high-performance computer of TU Darmstadt, with two Intel Xeon Processor E5-2680 v3 at a fixed frequency of 2.5 GHz with 64 GB RAM.

5.1 Homogeneous Reactor

Figure 4 shows the absolute timings of the solver phase for FD and the RM, respectively. The substantial speedups observed for the RM mainly result from the cheap computation of the Jacobian J. One evaluation of J is 20 to 25 times faster, depending on the mechanism. With FD, each time J is computed, the HR model function F is executed multiple times. In contrast, with the RM, F is executed once to generate a trace on the tape. The tape is then evaluated multiple times to generate J, avoiding costly evaluations of F. Memory overheads are negligible for the HR as the RM adds about 5 MB to 7 MB memory use.

Fig. 4.
figure 4

For each mechanism, the top bar shows FD and the bottom bar shows AD RM measurements. The time of the solve process is divided into three phases: \(T_F\) and \(T_J\) are the cumulated time for the function and Jacobian evaluations, respectively. \(T_\text {P}\) is the time spent in CVODE.

5.2 Freely Propagating Flame

The mechanisms are solved on three different mesh resolutions on a 1D domain. As shown in Table 2, with AD, a single evaluation of F has a higher cost compared to FD. Here, the overhead ratio between FD and AD appears to be approximately constant for the different mesh configurations of both mechanisms. The evaluation of J is initially cheaper with AD but with rising mesh sizes the speedup vanishes. The total execution time with AD is slightly faster on average as the solver requires fewer steps (i.e., less invocations of F and J) for a solution.

The memory overhead is mostly constant for each mesh configuration as the tape only stores the adjoints of F to generate the Jacobian. We observe an additional main memory usage of about 100 MB to 430 MB and 180 MB to 800 MB memory from the smallest to largest mesh setup for USCMech and Aramco, respectively. We partly attribute the small variations of memory overhead to the accuracy of the measurement mechanism and due to the tape of CoDiPack which stores the data in chunks of a fixed size. Hence, the total tape memory consumption might be slightly higher than required.

Table 2. Timings for the FPF. F and J are times for a single evaluation, #F and #J are the number of invocations of each respective routine. \(\text {M}_{\text {AD}}/\text {M}_{\text {FD}}\) is the memory ratio. Note: #F includes the evaluations required for computing J.

6 Conclusion and Future Work

We presented the introduction of AD to ULF by using the operator overloading AD tool CoDiPack. In particular, we first introduced a global alias for the basic scalar type in ULF, which is set to the AD type at compile time. All other data structures were rewritten to use templates and are, subsequently, based on this alias. To interface with external APIs, template-based functions for casting and value extraction were introduced, and specialized for the AD types.

The HR model is solved on a single grid point, without transport properties. We observe substantial speedups due to the cheap computation of Jacobians with AD compared to FD. For the FPF, the underlying DAE solver typically requires less steps with AD but the evaluation of the model function is more expensive by a mostly fixed offset compared to FD.

In the future, experimentation with the ODE/DAE solver parameter settings to exploit the improved precision seems worthwhile. More importantly, the newly gained ability to calculate arbitrary derivatives up to machine precision in ULF enables us to conduct sensitivity studies not only limited to the chemical reaction rates, and model optimization experiments. In particular, advanced combustion studies such as uncertainty quantification, reconstruction of low-dimensional intrinsic manifolds or combustion regime identification become accessible.