1 Introduction

The presentation of clear and appealing depictions of flows and the mathematical definition and efficient extraction of features are two important aims of flow visualization. The common goal of most methods is to present an image that provides the viewer with a large amount of information in a compressed and easily comprehensible way. To obtain such an image, a visualization method has to focus on specific aspects of a field. For instance, topological methods [8, 32] and vector field clustering [18, 19] directly look at the streamline behavior within a flow field. Other methods use more specific features such as vortices [10, 22, 28] or flow convergence and separation [12]. In this context, the increasing availability of computational power and memory allows to run complex flow simulations on high-resolution grids, which makes a more precise investigation of flow features possible. But sharing the obtained simulation results via networks becomes more expensive in terms of transmission times, and the evaluation becomes more expensive in terms of computational power.

That is why various approaches have been presented that focus on the reduction of dataset sizes. First, Lodha et al. [17] introduced a topology-preserving compression algorithm based on the clustering of Telea and Wijk [29]. As the algorithm of Lodah et al. only preserves stationary points, Theisel et al. [30] introduced an extended definition of topological equivalence that additionally considers separatrices and their corresponding start/end points, respectively inflow/outflow intervals. They use their definition to implement a grid coarsening algorithm that iteratively removes grid vertices, as long as the resulting vector field is topologically equivalent to the original one. They do not provide the opportunity to control the approximation error. Later, Dey et al. [3] presented a simplification algorithm based on an error-bounded edge collapse. Although one of their goals is to preserve topological features of the original field, they did not implement an explicit mechanism to preserve them. Instead, they provide a detailed discussion of the influence of the error threshold on the simplified field.

In this work, we combine the advantages of these previous works and present an error-bounded, topology-preserving vector field approximation. In a first step, we compute a segmentation of the vector field into regions that can be linearly approximated. In a second step, we use the characteristics of the linear approximations in these regions to compress the original dataset.

For the first step, we developed a segmentation algorithm that follows the idea of Affine Linear Neighborhoods (ALNs), which has been presented by Koch et al. [13]. For one single point, an ALN comprises all connected points for which the data can be approximated by an affine linear function while staying below a certain error threshold. We extend this approach to compute a complete segmentation that provides a region-wise approximation of the vector field. The implementation of our approach is designed to fulfill four criteria:

  1. (1)

    The region-wise approximated vector field is topologically equivalent to the original field.

  2. (2)

    The maximal approximation error is below a user-defined threshold in each segmentation region.

  3. (3)

    The number of path-connected segmentation regions is as small as possible.

  4. (4)

    Each point of the segmentation region has the smallest possible approximation error.

The first criterion preserves the global flow behavior, whereas the second criterion arises from the ALN definition and helps to preserve the local flow behavior. The two latter criteria ensure that the vector field is approximated as accurately as possible when using a small and manageable number of regions. Unfortunately, these two criteria also cause a large computational effort through the high number of possible segmentations that have to be considered to find an optimal solution. Therefore, we introduce a greedy algorithm that uses sensible heuristics to achieve large segmentation regions while maintaining low approximations errors. For the second step, we adapted the edge collapse algorithm presented by Theisel et al. [30]. Their approach preserves the topology and is used to reduce the memory footprint of our region-wise approximated vector field.

We apply our method to several two-dimensional datasets to show the influence of the error threshold on the resulting approximation and the achieved compression rates. We also compare our work to other compression approaches.

2 Related work

In flow visualizations, a common way to guide a viewer to regions of interest is the use of simplified vector field representations. Since such representations try to reduce the amount of information in vector field visualizations and since many of them are based on segmentation, there exist many publications that are related to our work. An overview of the literature in this area can be found in state-of-the-art reports that discuss feature-based [15, 21, 23] and partition-based [25] techniques.

The focus of this work is mainly on the segmentation that is used to obtain an approximated vector field, whose topology is equivalent to the original field. Therefore, the works [3, 17, 30] we mentioned in Sect. 1 have to be considered as related work.

Aside from this, approximation and partitioning of vector fields according to an error measure are well known from clustering techniques. Telea and Wijk [29] use a geometrically motivated error measure for their hierarchical clustering. Further, they produce simplified vector field representations by depicting a representative arrow for each cluster. McKenzie et al. [20] use error measures based on vorticity, divergence, and the vector gradient for their variational clustering approach. Their method tries to minimize the error according to these measures to obtain the best-fitting clustering regions. Kuhn et al. [14] derive a scalar field that represents the bending energy of a vector field and use this as a basis for their clustering. In contrast, some methods refrain from using an error measure but are based on a multi-scale representation of the field instead. Garcke et al. [4] presented a clustering approach that is based on diffusion. They modify a physical clustering method, the Cahn Hillard phase separation model, to present a continuous family of clustered vector data. Griebel et al. [5] defined an anisotropic diffusion tensor and a corresponding differential operator to describe flow structures in vector fields. Based on their flow description, they compute a hierarchical vector field decomposition by using an algebraic multigrid algorithm. An important difference between these works and ours is that they compute clusterings to produce simplified vector field representations, but they do not use their approximations to compress the datasets.

In [20, 29], the granularity of the simplified visualization is controlled by computing a user-defined number of clusters, which causes an arbitrarily high approximation error. In contrast to this, we allow the user to steer the accuracy of the approximated vector field by defining a maximal error. Furthermore, the complexity of the flow behavior that is represented by each cluster differs in most of these methods. While some approaches use an averaged vector (cf. [29]) to describe the local flow, others use complex physically motivated measures (cf. [4, 5]). As a trade-off, our method uses a region-wise affine linear approximation of the flow field. These approximations provide a detailed and easily understandable description of the local flow, since linear vector fields represent the most simple class of non-trivial vector fields and are well understood [9].

Chen et al. [2] and Li et al. [16] presented similar segmentation algorithms to visualize the regions in a vector field that are influenced by stationary points. Both works adapt an image segmentation approach to determine one segment for each stationary point. Although Chen et al. successfully applied their algorithm to synthetic vector fields that contain a small number of stationary points, the algorithm does not work for more complex fields. Here, the algorithm of Li et al. is still applicable. However, like the previously discussed clusterings, both algorithms compute a fixed number of segments and thus cause an arbitrarily high approximation error. Furthermore, the obtained results are not used to approximate and compress the original dataset.

The linear flow behavior in the vicinity of a stationary point can be described by the Jacobian matrix at this stationary point. Schneider et al. [26] used this to introduce the Linear Neighborhoods, which are a more specialized form of the ALNs. Later, Wiebel et al. [33] used the Linear Neighborhoods to depict the flow behavior around stationary points. These two approximation approaches are restricted to the regions of influence of stationary points and do not provide a segmentation for the remaining parts of the vector field. Instead, our goal is to obtain a complete vector field approximation.

Besides clustering vectors by comparing them directly, error measures along trajectories can be defined and used to cluster either the vector fields or the trajectories themselves. The clustering technique introduced by Heckel et al. [7] uses the distance of streamlines starting in the same point in the simplified and the original vector field as their local error measure. Other methods directly use streamline clusters to provide a good vector field overview: Chen et al. [1] presented an algorithm to seed streamlines based on an entropy measure. To avoid visual clutter, streamlines are grouped, so that they are at least nearly coherent. Marchesin et al. [19] developed a GPU-based algorithm to dynamically determine a view-dependent streamline selection that avoids self-occlusion in three-dimensional vector fields. Since some streamline properties are indicative of certain flow features, Lu et al. [18] use statistical distributions of measurements along streamlines to define streamline clusters. In contrast to our method, these works focus on creating easily comprehensible depictions of flow fields that do not suffer from self-occlusion. Therefore, they need to select and depict meaningful streamlines. There is no obvious extension of the used streamline-based measures and clustering approaches towards the complete vector field approximation that we want to compute.

3 Region-wise approximation of vector fields

In this section, we discuss the region-wise approximation for continuous vector fields. For this, we first reformulate the definition of ALNs and discuss the definition of topological equivalence. Second, we present a heuristic to choose appropriate seed points for linear neighborhoods. Finally, we describe the general idea of our segmentation algorithm.

3.1 Affine linear neighborhood

3.1.1 Definition

To compute local approximations of a vector field, we use the ALNs that have been introduced by Koch et al. [13] as:

$$\begin{aligned}&L_\mathrm{orig}(\mathbf {x}_s) \nonumber \\&\quad =\left\{ \mathbf {y}\in \mathrm {R}^3 | \frac{ \Vert \mathbf {v}(\mathbf {y}) - J(\mathbf {x}_s) \cdot (\mathbf {y}- \mathbf {x}_s) - \mathbf {v}(\mathbf {x}_s) \Vert }{ \Vert \mathbf {v}(\mathbf {y}) \Vert } < E_\mathrm{max}\right\} .\nonumber \\ \end{aligned}$$
(1)

Here \(\mathbf {x}_s\) denotes the seed point of an ALN and \(J(\mathbf {x}_s)\) the corresponding Jacobian matrix at \(\mathbf {x}_{s}\). Using the linear approximation \(\mathbf {j}(\mathbf {y}) = J(\mathbf {x}_s) \cdot (\mathbf {y}- \mathbf {x}_s) + \mathbf {v}(\mathbf {x}_s)\) of the vector field \(\mathbf {v}(\mathbf {y})\) that is given by the seed point \(\mathbf {x}_s\), the set \(L_\mathrm{orig}(\mathbf {x}_s)\) contains all connected positions \(\mathbf {y}\) for which the approximation error is below a threshold \(E_\mathrm{max}\). In cases of stationary points, this definition is inappropriate because, due to a division by zero, the relative error in Eq. 1 is not well defined. The consequence is that we would need infinitely many ALNs to approximate a small neighborhood around a stationary point and no stationary point could be ever part of an ALN. Hence, this definition is not suitable for the computation of an ALN-based vector field segmentation.

To overcome this limitation, we redefine the linear neighborhood by using the absolute error:

$$\begin{aligned}&L(\mathbf {x}_s) \nonumber \\&\quad =\left\{ \mathbf {y}\in \mathrm {R}^3 \; | \; \Vert \mathbf {v}(\mathbf {y}) - J(\mathbf {x}_s) \cdot (\mathbf {y}- \mathbf {x}_s) - \mathbf {v}(\mathbf {x}_s) \Vert < E_\mathrm{max}\right\} .\nonumber \\ \end{aligned}$$
(2)

Using the absolute error has implications on the computation of the extent of an ALN. Koch et al. [13] used \(L_\mathrm{orig}\) to compute an ALN and therefore needed to find all intersections of the ALN boundary with the cell edges. This is due to the fact that in a piecewise linear vector field \(\mathbf {v}(\mathbf {y})\), the used relative error varies non-linearly. So the maximal approximation error can be exceeded along a cell edge, although the error values at the corresponding vertices are below \(E_\mathrm{max}\). In case of the new condition in Eq. 2, \(\mathbf {v}(\mathbf {y}) - J(\mathbf {x}_s) \cdot (\mathbf {y}- \mathbf {x}_s) - \mathbf {v}(\mathbf {x}_s)\) varies linearly if \(\mathbf {v}(\mathbf {y})\) is a linear vector field. Therefore, the maximal approximation errors are now located at the cell vertices and we only have to test every grid vertex to find the cells where the ALN ends. In addition, the interpretation of the linear approximation changes due to the new error measure.

3.1.2 Interpretation of the error

The maximal relative error \(E_\mathrm{max}\) in the original ALN definition allowed an interpretation of the linear approximation at each position \(\mathbf {y}\) that is independent of the current center vector \(\mathbf {v}(\mathbf {y})\) (cf. [13]). In our case, the range of all possible approximated vectors \(\mathbf {j}(\mathbf {y}) = J(\mathbf {x}_s) \cdot (\mathbf {y}-\mathbf {x}_s) + \mathbf {v}(\mathbf {x}_s)\) depends on \(E_\mathrm{{max}}\) as well as on the actual vector \(\mathbf {v}(\mathbf {y})\). To get this range, we rewrite the condition in Eq. 2 as

$$\begin{aligned} \Vert \mathbf {v}(\mathbf {y})\Vert ^2 - 2 \cdot \Vert \mathbf {v}(\mathbf {y})\Vert \cdot \Vert \mathbf {j}(\mathbf {y})\Vert \cdot \cos (\alpha ) + \Vert \mathbf {j}(\mathbf {y})\Vert ^2 < E_\mathrm{max}^2. \end{aligned}$$

Here, \(0 \le \alpha \le \pi \) denotes the angle between \(\mathbf {j}(\mathbf {y})\) and \(\mathbf {v}(\mathbf {y})\). First, to obtain the maximal variation of the magnitude, \(\alpha \) has to be zero, and we get

$$\begin{aligned} \Vert \mathbf {v}(\mathbf {y})\Vert - E_\mathrm{max} < \Vert \mathbf {j}(\mathbf {y})\Vert < \Vert \mathbf {v}(\mathbf {y})\Vert + E_\mathrm{max}, \end{aligned}$$

which is directly given by the absolute error definition. Second, if we assume the vectors to have the same magnitude, the maximal angle difference is

$$\begin{aligned} \alpha _\mathrm{max} = \arccos \left( 1 - \frac{1}{2} \cdot \frac{E_\mathrm{max}^2}{\Vert \mathbf {v}(\mathbf {y})\Vert ^2} \right) , \end{aligned}$$
(3)

with \(0 < E_\mathrm{max} < 2 \cdot \Vert \mathbf {v}(\mathbf {y})\Vert \). If \(0 < E_\mathrm{max} < 2 \cdot \Vert \mathbf {v}(\mathbf {y})\Vert \) is not valid, the part of \(E_\mathrm{max}\) that is larger than \(2 \cdot \Vert \mathbf {v}(\mathbf {y})\Vert \) has to cause a magnitude difference. All admissible approximation vectors \(\mathbf {j}(\mathbf {y})\) are constraint by these angle and magnitude criteria. This means that our error measure penalizes the maximal angle between two vectors differently depending on the actual velocity magnitude. It rather tolerates greater angle differences at low velocities than at high velocities. On the one hand, methods based on particle integration profit from the magnitude and directional stability in areas of high velocities. On the other hand, this behavior can be a disadvantage in the vicinity of stationary points, because the error-bound allows higher angle deviations from the original field in these slower regions, so a significantly changed flow behavior is tolerated by the measure here.

3.2 Topological equivalence

Our vector field approximation is designed to preserve the local, as well as the global flow behavior. To achieve the latter goal we want to preserve the topology, respectively the topological skeleton of the original vector field. This topological skeleton contains all stationary points, in- and outflow regions, as well as their connectivity that is represented by the separatrices. If we locally approximate the vector field, this could change its topology. Therefore, we would need to recompute the new topological skeleton and compare it to the original one. To reduce the involved computational costs, Theisel et al. [30] formulated a topology equivalence definition and a corresponding local test that reduces the number of topology recomputations. The topology stays unchanged if

  • All domain boundary switch points remain unchanged,

  • All stationary point positions remain unchanged,

  • The first derivatives at all stationary points remain the same, and

  • All separatrices start and end in the same stationary points or boundary inflow/outflow regions.

For more details, we refer the reader to [30]. In the following, we describe how the local affine linear approximations (ALAs) are computed and how they fulfill these four requirements.

3.3 Selecting seed points

We propose a strategy for selecting appropriated seed points that lead to large ALNs.

In a first step, we use all stationary points and the domain boundary switch points as seed points. For every such point, this ensures that its position as well as the derivative at its position remains unchanged. Thus, ALNs seeded at these positions fulfill the first three requirements of the topology equivalence definition from Sect. 3.2. Furthermore, these ALNs describe the flow exactly in the vicinity of their seed points. Therefore, we do not have a high deviation of the approximated flow in the regions of slow flow around stationary points, although the error-bound would allow it (cf. Sect. 3.1).

In a second step, when no stationary or domain boundary switch points are available, we use a heuristic in the vector field that predicts regions with a mainly linear flow behavior in order to extract large ALNs. In the case of analytic vector fields, there always exists a Taylor approximation

$$\begin{aligned} T_{\mathbf {x}_s}(\mathbf {y})= & {} \mathbf {v}(\mathbf {x}_s) + \frac{\mathbf {v}'(\mathbf {x}_s)}{1!} \cdot (\mathbf {y}- \mathbf {x}_s)\nonumber \\&+\, \frac{\mathbf {v}''(\mathbf {x}_s)}{2!} \cdot (\mathbf {y}- \mathbf {x}_s)^2 + \cdots \nonumber \\= & {} \mathbf {v}(\mathbf {x}_s) + J(\mathbf {x}_s) \cdot (\mathbf {y}- \mathbf {x}_s) + \mathbf {e}_{\mathbf {x}_s}(\mathbf {y}) \end{aligned}$$

that converges to \(\mathbf {v}(\mathbf {y})\). The smaller the sum of all higher order Taylor polynomials \(\mathbf {e}_{\mathbf {x}_s}(\mathbf {y})\), the better \(T_{\mathbf {x}_s}(\mathbf {y})\) describes the original vector field in the neighborhood of \(\mathbf {x}_s\) and the more linear the flow is there. Therefore, we consider those \(\mathbf {x}_s\) that provide minimal \(\Vert \mathbf {e}_{\mathbf {x}_s}(\mathbf {y})\Vert \) in their neighborhood as ALN seed points. Since the second derivative of \(\mathbf {v}(\mathbf {y})\) indicates the rate of change in the linear vector field description, we use the minima of the magnitude of the second derivative as a first prediction to find the most linear flow behavior within the neighborhood of \(\mathbf {x}_s\). In smooth vector fields, this strategy in general leads to large ALNs.

3.4 Computing the vector field segmentation

To be able to formulate our algorithm that leads to a complete segmentation of a vector field into connected (linearized) regions, we require the input field to be Lipschitz continuous. This condition limits how fast the field can change and hence, ensures that it is approximable by a finite number of ALNs. Fortunately, common input fields are Lipschitz continuous (cf. Sect. 4).

Our segmentation algorithm uses region growing to compute the ALN around a seed point. This process is repeated iteratively for each seed point, until every position is assigned to a unique segmentation region.

During each ALN-region-growing step, two cases have to be considered. First, it can happen that two ALNs overlap, because both ALNs can provide a valid linear approximation for the same position. To minimize the approximation error, we assign each position within such an overlap to the segmentation region that approximates the vector field at that position best. Second, while the presented seed point selection ensures that all stationary points as well as the in- and outflow regions along the domain boundary are preserved, we still need to track the changes of the topological skeleton, respectively the separatrices, to obtain a topologically equivalent approximated field. Therefore, we test whether the topology is changed through the local approximation given by a specific ALN. If the topology would be changed, the region growing cannot be continued further. Thus, the region growing procedure ensures that the largest possible region that does not violate the topology preservation criterion is extracted.

4 Implementation for discrete vector fields

After stating the foundations of our ALN-based segmentation in the continuous case (cf. Sect. 3.4), we now discuss the implementation of our approach for two-dimensional piecewise linear vector fields. We introduce an efficient algorithm for computing the segmentation and use the obtained region-wise approximations to compress the memory size of the input field.

For our algorithm, we require the input fields to be defined over simplicial grids and the corresponding vector values to be stored at the grid vertices. The limitation to these fields will ensure a straightforward preservation of all stationary and domain boundary switch points. Furthermore, these fields are always Lipschitz continuous [31].

4.1 Brute-force algorithm

A segmentation-based approximation algorithm for discrete vector fields could use the fact that there only exists a finite number of ALNs, one for each vertex. Since every segmentation region consists of an ALN or an ALN subset, there will be at least one set of regions that fulfills our four goals. So a brute-force algorithm could find a solution by searching for it in the set of all segmentation regions that completely subdivide a given vector field. But even without the guarantee that the approximated vector field has an equivalent topology, this approach is infeasible in terms of its memory consumption and running time because of the vast number of possible solutions. Therefore, we introduce a greedy algorithm in the next two sections.

4.2 Seed points in discrete fields

In order to reduce computational costs, one has to avoid the extraction and analysis of too many ALNs. As discussed in Sect. 3.3, we choose all stationary and domain boundary switch points as seed points. Further seed point candidates are determined by successively choosing the vertex with the minimal magnitude of the second derivative. For the implementation, we first compute and store the angular weighted average of all derivatives of the neighboring cells for each vertex. This is identical to computing the finite differences in uniform grids, but it also works for the unstructured case.

4.3 Computing the discrete segmentation field

We now discuss two important components of our approach—the comparison of different linearizations and the topology equivalence test—and introduce the actual segmentation algorithm subsequently.

4.3.1 Comparing linear approximations

If an ALN is seeded in the proximity of another one, they will share a certain part of their domains and will provide two potentially different approximations for this subdomain. In order to minimize the maximal error, we define a criterion to select the best linear approximation for a single cell.

Assume two ALNs seeded at \(\mathbf {x}_{s1}\) and \(\mathbf {x}_{s2}\), with the corresponding error vector fields that are defined by

$$\begin{aligned} \mathbf {e}_1(\mathbf {y})= & {} \mathbf {v}(\mathbf {y}) - J(\mathbf {x}_{s1}) \cdot (\mathbf {y}- \mathbf {x}_{s1}) - \mathbf {v}(\mathbf {x}_{s1}) \quad \text { and}\\ \mathbf {e}_2(\mathbf {y})= & {} \mathbf {v}(\mathbf {y}) - J(\mathbf {x}_{s2}) \cdot (\mathbf {y}- \mathbf {x}_{s2}) - \mathbf {v}(\mathbf {x}_{s2}). \end{aligned}$$

Because we know that \(\mathbf {v}(\mathbf {y})\) varies only linearly in simplicial cells, we know that these error vector fields vary linearly too and that the maxima of their magnitudes are located at the cell vertices. To reduce the overall approximation error, we just need to assign every cell vertex to the segmentation region that provides the smaller approximation error.

4.3.2 Topological equivalence test

By adding a vertex to an ALN, we change the local vector field approximation in the corresponding one-ring neighborhood. To test the resulting field to be topologically equivalent to the original field, we implemented the local test by Theisel et al. [30]. For this, we first have to check that no new stationary point occurs in the one-ring neighborhood of the vertex, we want to add to the current ALN. Then, we have to collect all switch points and intersections with separatrices along the boundary of the neighborhood before and after updating the vector value at the current vertex. If the relative order of these points to each other remains the same, the test ensures that all separatrices still end in the same stationary points or in- and outflow regions as before. In case of a successful test, the current vertex can be added to the ALN and we need to reintegrate all separatrices to update their course. Otherwise, the vertex has to be skipped.

Because of the high computational costs involved, we decided to exclude the cells that contain parts of any separatrix from the approximation process. Therefore, we do not need a high number of reintegrations of the separatrices and only need to check the local approximation for new stationary and boundary switch points. Although this approach causes many cells of the original field to be preserved in the approximated one, it significantly improves the running time.

4.3.3 Growing affine linear neighborhoods

Figure 1 outlines how a new ALN is added to the intermediate segmentation field. This part of the segmentation algorithm is mainly based on the ALN extraction algorithm presented in [13], but has been modified to take previously computed regions into account while adding the next ALN to the segmentation field. For every vertex, the algorithm stores a unique ALN-ID to distinguish the different regions. The parts of the vector field that are still unsegmented are marked with \(-1\).

Fig. 1
figure 1

Pseudocode describing how a new ALN is added to the intermediate segmentation field

Fig. 2
figure 2

Steps of the segmentation of the Center Point field (Sect. 5.1) for \(E_\mathrm{max}=1.0\). The dark blue background depicts the unsegmented area. The segmentation regions are colored from white to blue according to their seeding order. a Shows the first segment that is seeded at the center point in the vector fields origin. This region guarantees that the position and the derivative at the center point are preserved. Beginning with step two in b, the next regions are seeded around the domain boundary switch points of the field in order to preserve the in- and outflow intervals. Finally, from step 6 in d, the rest of the vector field is segmented using our seed point heuristic of Sect. 3.3. h Shows the final segmentation field

During the region growing shown in Fig. 1, vertex after vertex is added to the ALN, until \(E_\mathrm{max}\) is exceeded. The following three cases have to be considered for every vertex the algorithm tries to add to the ALN:

  1. (1)

    If the vertex does not belong to any other ALN, it can be assigned without problems.

  2. (2)

    If the vertex already is assigned to a previously computed ALN, it has to be checked, whether the new ALN possibly provides a smaller error.

    1. (a)

      If so, the vertex will be assigned to the new ALN.

    2. (b)

      Otherwise the region growing skips this vertex.

The second case causes a problem, because a previously computed segmentation region could be split by the reassignment of a vertex. Therefore, we test whether all vertices of the previous region are still connected after the reassignment. If this is not the case, we search for the vertices that no longer have an edge path to the corresponding seed point and remove them from the segmentation. Thus, the final number of segmentation regions may be increased, because new ALNs can be seeded at those vertices. However, this implementation also ensures that a new segmentation region can grow as long as it minimizes the local approximation error.

One drawback of this implementation is that it can create segmentation regions that are connected by edge paths only. In the case where this is problematic, a dual implementation based on cells instead of vertices and cell facet neighbors instead of edge neighbors can be used. Such an implementation guarantees each path to be a path of cells, making it at least one cell wide. Since we only observed a very small number of regions that are connected by edge paths, we decided to use the point-based segmentation algorithm in this work.

Figures 2 and 3 show first examples of two simple synthetic vector fields to demonstrate the seeding and segmentation process.

Fig. 3
figure 3

Segmentation of the saddle point field (Sect. 5.1) for \(E_\mathrm{max}=1.0\). Color coding like in Fig. 2. ad Show the first four segments that are seeded around the cells that contain the saddle point. The saddle point with its outgoing separatrices is omitted. The circular shape indicates the uniform non-linear change of the flow behavior that is caused by the Gaussian kernel. Starting with e, the segmentation is continued, whereby the new regions are allowed to reassign previously segmented vertices if the new region provides a smaller approximation error. h shows the final segmentation, including the four separatrices and the red saddle point

4.4 Vector field compression

After the computation of the segmentation, we continue by creating datasets that have a reduced memory footprint. For this, we adapted the grid coarsening algorithm of Theisel et al. [30]. Their algorithm is a greedy algorithm that iteratively performs a half-edge collapse, removing the shortest edge of the grid in each step, as long the resulting vector field alterations are topologically equivalent to the original field. To improve the running time, we want to skip reintegration of separatrices. Therefore, like for the approximation (Sect. 4.3), we exclude all cells that contain parts of the topological skeleton from the grid coarsening process.

Within each segmentation region, the vector field is linear and the approximation is independent of changes in the grid. Thus, an edge collapse within an ALN can be performed without any further tests. This improves the running time of the grid coarsening. We can further improve the performance by only removing the shortest edge within the one-ring neighborhood of the last removed vertex first before searching for the shortest removable edge in the whole grid.

The cells that do not completely belong to a unique segmentation region form bands between neighboring regions. Although these bands may contain strong changes in the flow behavior, the vector field is interpolated piecewise linearly there and the maximal error \(E_\mathrm{{max}}\) is exceeded nowhere. This is due to the fact that each vertex provides an error below \(E_\mathrm{{max}}\) and, according to Sect. 4.3, we know that the maximal error in the cell is located at the vertices.

During the grid coarsening, these bands require a special treatment. First, we cannot skip the complete topological equivalence test here—new stationary points could be created by the coarsening. Second, we have to limit the edge collapse process to those edges, where both vertices are part of the same segmentation region. For this, we could limit the edge contraction, so that only collinear edges can be collapsed and the shape of the bands is preserved. However, because the band is triangulated, it can show a characteristic zigzag pattern (cf. Fig. 4) that would prevent a further compression. To obtain a higher compression rate, we allow every edge collapse in the band that does not introduce a higher error \(E_\mathrm{{max}}\). The corresponding test is illustrated in Fig. 4. Now, step by step, the zigzag pattern can be coarsened and the band is flattened. This allows to triangulate neighboring segmentation regions with fewer cells, which in turn increases the compression rate.

Fig. 4
figure 4

A cell complex, where vertices belong to two different segmentation regions (colored white or black, respectively). The cells that do not provide a unique linearization are colored in gray. They form a band between the neighboring regions. By collapsing the marked edge, the blue dotted triangulation would arise in the inner of the cell complex. If the vertices of these new triangles do not belong to the same region, this edge collapse is only allowed if the user-defined \(E_\mathrm{max}\) is exceeded nowhere. Since the vector fields of the old, as well as new triangles, vary linearly, both fields vary linearly within the sub-cells that are limited by all old vertices and the intersection points marked in blue. Therefore, also their absolute error within these sub-cells varies linearly and we only need to check \(E_\mathrm{max}\) at all marked intersection points

In order to apply common post-processing and visualization methods to our approximated and compressed vector fields, we use the region-wise approximations to compute a point-based vector field. For this purpose, we just need to evaluate the linear approximation \(\mathbf {j}(\cdot )\) for each grid vertex in the affine linear vector field that is given by the corresponding segmentation region.

5 Results

In this section, we discuss the application of our algorithm on the basis of synthetic and real-world datasets. Since our presented vector field approximation only depends on the parameter \(E_\mathrm{{max}}\), i.e., the maximal approximation error, we will investigate its influence on the granularity of the resulting segmentation, the precision of the derived approximation, and the final compression rate. According to Sect. 4, all two-dimensional input fields are defined on simplicial grids.

In order to judge the chosen error threshold, the distribution of the velocities in the used datasets can be found in Table 1. The statistical results, including the computation times, of our approximation algorithm can be found in Tables 2, 3 and 4. The listed compression rates were computed by comparing the file sizes of the original and the compressed datasets.

Table 1 Distribution of velocity magnitudes (cf. Sect. 5)
Table 2 Statistical results for the approximated Oseen vortices dataset of Sect. 5.1 that show the influence of \(E_\mathrm{{max}}\) on the running time, the number of region-wise approximations and the reachable compression rate
Table 3 Statistical results for the approximated Kármán vortex street dataset of Sect. 5.2 that show the influence of \(E_\mathrm{max}\) on the running time, the number of region-wise approximations and the reachable compression rate
Table 4 Statistical results for the approximated Jet Stream dataset of Sect. 5.2 that show the influence of \(E_\mathrm{max}\) on the running time, the number of region-wise approximations and the reachable compression rate

5.1 Synthetic datasets

To demonstrate how our region-wise vector field approximation and compression works, we present results for three simple dataset in the following.

5.1.1 Center and saddle point

We generated two datasets by discretizing a pure rotational field (center) and a saddle-type vector field on a triangulated grid (\(200 \times 200\)) with a bounding box from \((-5.0, -5.0)\) to (5.0, 5.0). Since our algorithm is designed to determine linear describable regions, it produces a segmentation field with only one ALN in the center point example. Afterwards, the greedy compression approach removed all triangle cells except for two. That is why we enhance the examples in a second step by multiplying them with a Gaussian kernel given by \(g(x,y) = \frac{1}{2\pi \cdot 0.1} e^{-\frac{1}{2} \cdot 0.1 \cdot (x^2+y^2)}\), which causes a non-linear change in the flow behavior.

Figures 2 and 3 demonstrate how our segmentation works and give a detailed explanation of each step.

Finally, we applied our grid coarsening algorithm of Sect. 4.4 to the region-wise approximation to reduce the memory footprint of the saddle point field. The result is shown in Fig. 5. A large number of original cells within each segmentation region were discarded, while their region-wise ALAs of the original field were preserved. Due to the small number of large segments, the memory footprint could be reduced by \(97.7\,\%\). If we compare the Line Integral Convolution (LIC [27]) of the original vector field in Fig. 5b to the one of the approximated field in Fig. 5c, we can see that they are nearly identical and the simple flow characteristics of this field are preserved very well.

Fig. 5
figure 5

The left result shows the segmentation from Fig. 3h together with the grid of the compressed vector field. One can see that the borders of the region-wise linear approximation do not completely match with the grid. As discussed in Sect. 4.4, this is because our algorithm coarsens the bands between two segments as long as the user-defined \(E_\mathrm{max}\) is not exceeded. In this example, we achieved a compression rate of \(99.977\,\%\). The two latter LIC images compare the original vector field with its approximation. The very high \(E_\mathrm{max}\) of 0.1 causes first visible artifacts at region boundaries

5.1.2 Oseen vortices

After presenting segmentation results on simple center and saddle fields, we investigate another easily comprehensible vector field containing Oseen vortices. The Oseen vortex [24] models a free vortex, i.e., the tangential velocity behaves inversely to the distance from the center. It decays due to viscosity. Given two co-rotational Oseen vortices, the flow field shows the interaction of a saddle and two center points. The vector field is given on a domain with 79,202 cells.

In order to study the influence of \(E_\mathrm{{max}}\), we computed multiple approximations of the Oseen vortices dataset and show four of the results in Fig. 6. The corresponding statistics can be found in Table 2. One can clearly see that a larger approximation error \(E_\mathrm{{max}}\) allows a more generous approximation of the local flow behavior, which means the flow can be approximated with fewer and larger regions. In these examples, the segmentation regions around the saddle and center points keep their characteristic shape and only differ in their sizes. Furthermore, we can see by Table 2 that the computation times cannot necessarily be reduced by increasing \(E_\mathrm{{max}}\). This is due to the fact that every new seeded segment is allowed to take over all vertices of previously computed ones if it provides a smaller approximation error. So, the multiple steps of adding and removing the same vertices can increase the computation times, while reducing the number of regions.

Fig. 6
figure 6

The upper two rows show the region-wise approximations of the Oseen dataset for an increasing \(E_\mathrm{max}\). The segments are colored from white to blue according to the seeding order. With increasing \(E_\mathrm{max}\), the flow can be approximated more generously, which results in fewer and larger regions. e and f show the grid of the compressed approximations. The band between two neighboring regions can be coarsened, as long as the particular \(E_\mathrm{max}\) is not exceeded

Figure 6e and f show the corresponding coarsened grids of the approximated vector fields. We can see that the larger we choose \(E_\mathrm{{max}}\), the larger the segmentation regions get and the fewer triangles we need in the final grid. Therefore, the achieved memory reduction increases by choosing larger \(E_\mathrm{{max}}\). Besides the coarsened cells, one can see that some original cells are used to preserve the topological skeleton. Furthermore, the small bands between the different affine linear approximations, as described in Sect. 4.4, are coarsened by our algorithm. This reduces the number of triangles that are needed in neighboring segments. For example, in the lower left corner of Fig. 6f, the shape of the coarsened band differs from the underlying segmentation region. This introduces a higher error in the corresponding cells. But since we stay below \(E_\mathrm{{max}}\), we accept the greater error to achieve a higher compression rate.

Fig. 7
figure 7

The left image shows an LIC and the topological skeleton of the original Oseen vortices dataset. A saddle point is shown in red and center points in yellow. The right image shows a similar depiction of the approximated field of Fig. 6f. Despite the high \(E_\mathrm{max}\) of 0.8, the approximated field is very similar to the original one. This is because the characteristic linear flow behavior around the stationary points, as well as the nearly laminar flow towards the domain boundary, is well approximable by region-wise linearizations

Figure 7 shows the comparison of the LIC images of the original and the approximated vector field. Since the flow behavior is quite simple, even for \(E_\mathrm{{max}} = 0.8\), the main flow characteristics are preserved very well.

5.2 Real-world vector fields

The following two real-world examples show that our segmentation algorithm is applicable to more complex vector fields.

5.2.1 Kármán vortex street

The Kármán vortex street is a well-known flow pattern that is caused by the unsteady flow separation from an obstacle. In this simulation, the obstacle is a thin bar and the flow has an angle of attack of 45 degree. The dataset is given on a grid with \(156\,842\) cells.

Figure 8 and Table 3 show the approximation results for this dataset. Even though one would expect the flow to be nearly constant when looking at Fig. 8e, we obtain a high number of segmentation regions. This is due the fact that our error measure is Galilean invariant. Therefore, the approximation algorithm is sensitive to non-linear changes in the flow field, even if they are hidden by a fast constant component of the vector field. By choosing a higher \(E_\mathrm{{max}}\), we can see a large difference in the number of resulting regions. This has a direct implication on the grid coarsening and the compression rate. The higher the number of segments is and the smaller their sizes are, the more triangles we need. In this example, we were able to increase the compression rate by another \(5.7\,\%\) by increasing \(E_\mathrm{{max}}\). The comparison of the LIC images of the original field in Fig. 8e with the approximated one in Fig. 8f shows that the nearly laminar flow, as well as the swirling flow behind the obstacle, is preserved even for higher \(E_\mathrm{{max}}\).

Fig. 8
figure 8

a and c show the segmentation and the corresponding coarsened grid of the Kármán Vortex Street dataset for small \(E_\mathrm{max}\) and (b) and (d) for distinctly larger \(E_\mathrm{max}\). Color coding like in Fig. 2. e and f Finally show a comparison of the original and the compressed field. Note that there is high number of stationary points at the rotated bar because it has a no-slip boundary. Besides these, the strong through-flow eliminates all but one stationary point. The flow behavior can be approximated very well, even by using the larger \(E_\mathrm{max}\)

Fig. 9
figure 9

The upper row shows the approximation results of the Jet Stream dataset for an \(E_\mathrm{max}\) of 0.02 and the lower row for the distinctly higher \(E_\mathrm{max}\) of 0.1. The segments are colored from white to blue according to their seeding order. In the first column, one can clearly see that the number of segments can be drastically reduced with increasing \(E_\mathrm{max}\). But b and e show that the presented approximation algorithm keeps a high number of original grid cells to preserve the original topology. Because of this cell overhead, the compression rate does not scale well with \(E_\mathrm{max}\) for vector fields with complex topologies. Finally, c and f show the LIC images and the topological skeletons of the compressed fields (saddle points are red, center points yellow, and sources green). In this example, the higher approximation error \(E_\mathrm{max}\) causes strong changes in the flow behavior that do not occur in the original flow (cf. Fig. 10a). The distortions are mainly located in the near of the inflow, because the velocity magnitudes there are significantly smaller than the chosen \(E_\mathrm{max}\) of 0.1

Fig. 10
figure 10

These two LIC images show a comparison of the original Jet Stream dataset and its compressed version that was computed by using the algorithm of Theisel et al. [30]. Both fields have an equivalent topology. Nevertheless, there are large differences in the overall flow behavior. For instance, the strong inflow from the left domain boundary is clearly disturbed. Furthermore, rapid changes of the flow behavior from one cell to the neighboring ones occur

5.2.2 Jet stream

The Jet Stream dataset represents an unsteady flow at Mach 0.1. The flow enters the domain through a narrow opening, and it expands in flow direction and causes vortices leading to vortex merges over time. Therefore, many vortices and small turbulent structures can be found in every time step. One step consists of \(\sim \)3 million cells.

For this dataset, we computed two approximations, one with a small and one with a high approximation error \(E_\mathrm{{max}}\). The results can be found in Fig. 9 and Table 4. In Fig. 9a and d, one can clearly see the large difference in the number of regions that are needed, and the outline of the excluded topological skeleton is clearly visible. Since we need to keep many original cells to preserve the topological skeleton, we are only able to increase the final compression rate by \(0.5\,\%\) with increasing \(E_\mathrm{{max}}\) from 0.02 to 0.1. But nevertheless, we still obtain a compression rate around \(95\,\%\) in all cases.

Figure 9c and f show the corresponding LIC images of the approximated fields. Please note that we stopped the separatrix integration in cases were a separatrix enters the same cell a second time to avoid a high occlusion of the underlying LIC images by spiraling separatrices. Both LIC images mainly show the same flow characteristics as the original field in Fig. 10a. On further investigation, one can see that there are small differences in the flow behavior in the areas above and below the strong injection flow, where the magnitudes are very small. In case of the larger \(E_\mathrm{{max}}\), the flow in these regions mainly acts as constant flow until it rapidly merges into the strong inflow. The approximation of this transition from the calm to the fast injection flow profits from a smaller \(E_\mathrm{{max}}\). More such rapid changes in the flow behavior can be observed in the vicinity of nearly every separatrix. In the areas of higher velocity magnitudes, e.g., the inflow and the two main vortices, the flow is approximated well.

This example shows that we need to choose a sensible error threshold \(E_\mathrm{{max}}\) to preserve the flow characteristics in all areas of slow, as well as fast flow. That is why we thought about defining different error thresholds for different areas of the vector field. But the definition of the error thresholds, as well as interpretability of the resulting approximations, would suffer from this idea. Furthermore, we can see that, depending on the topological structure of the field, we arrive at a fixed number of cells that cannot directly be coarsened and, therefore, the compression rate does not scale with increasing \(E_\mathrm{{max}}\). In such cases, one should reduce \(E_\mathrm{{max}}\) to achieve a more precise approximation of the areas of slow flow without worsening the compression rate.

6 Discussion

We now discuss the robustness of our approximation and compression approach in the presence of noise. Then, we investigate an alternative seeding strategy that could be used to improve the results of our algorithm and an alternative type of linear approximation. Finally, we compare our method to previous approaches.

6.1 Robustness of the approximation

Since real-world datasets can be noisy, we study the approximation and compression results for the Oseen vortices with different levels of added noise. For this, we added randomly generated vector fields to the original one. The deviation between the resulting fields and the original one is limited by different relative errors.

Exemplary results of our study can be found in Fig. 11. These results show that our approximation and compression algorithm works well for small disturbances in the input fields. The seeding order of the segments, their size, as well as their shape are very similar to the results for the noise-free field in Fig. 6a. Since the achievable compression rate directly depends on the computed segments, the compression rates of 95.884 and \(92.727\,\%\) for Fig. 11a and b are about the same as the ones listed in Table 2. In contrast to this, Fig. 11c and d show the limitation of our approximation approach. With an increasing noise level, more and more non-linear changes are introduced into the local flow behavior. Thus, more and smaller segments get computed for the region-wise linear approximation of the flow field, which in turn decreases the achievable compression rates notably to \(73.228\,\%\).

In Fig. 11d, the vector field’s topology has already changed due to the added noise. One of the two center points has become a spiral source and the other a spiraling sink. This can be observed by the shape of the unsegmented parts around the separatrices. In our experiments, the results of our algorithm are still similar to those shown in Sect. 5, as long as the flow behavior was not significantly changed by the artificial noise.

Fig. 11
figure 11

Segmentation results (\(E_\mathrm{max}=0.2\)) for the Oseen vortices at different noise levels (deviation). The color coding shows the seeding order of the segments from white to blue. Compared to Fig. 6a, the results in a and b are similar in the number of their segmentation regions and shapes. With increasing noise, the regions become notably smaller and more (c, d)

6.2 Alternative seeding approaches

The quality of the seeding strategy essentially influences the approximation quality and achievable compression rates. As shown in the Sect. 5, we achieve the highest compression rates if our algorithm ends up with as large as possible, which means as few as possible, segmentation regions. In this sense, the order and position of the initial seed point is important, because our algorithm is designed to extract the largest possible segments first in order to find a complete segmentation as efficiently as possible.

Fig. 12
figure 12

Results of the alternative seed point selection (cf. Sect. 6.2). We applied this approach to the Oseen dataset and used two different \(E_\mathrm{max}\) and n-ring neighborhood sizes. Color coding like in Fig. 2. Compared to the results in Fig. 6 and Table 2, the number, size, and shape of the approximation regions were only slightly changed. We can see that the seeding order depends on the n-ring neighborhood size, because this parameter is mainly responsible for the error distribution in the precomputed ALAE field

To study the quality of our seeding point heuristic from Sect. 3.3, we now compare it to an alternative approach that explicitly evaluates how linear a vector field behaves in a certain neighborhood of every position. Koch et al. [13] introduced the concepts of ALAs and their corresponding Affine Linear Approximation Error (ALAE) fields. For both concepts, a given vector field is sampled and an affine linear vector field is fit to an n-ring neighborhood (n being user-defined) at each position. The obtained linear approximation of each neighborhood can be understood as the first derivative of a smoothed version of the vector field. To compute the corresponding ALAE field, the maximal error of the best linear fitting of each n-ring neighborhood is determined. Low values in the ALAE field can provide a reliable prediction of well linearly approximable areas of the chosen size in the field [13].

We demonstrate the results using the alternative seeding strategy on the basis of the Oseen dataset, since the flow is easily comprehensible and its segmentation results in a manageable number of region-wise approximations. The images in Fig. 12 compared to the ones in Fig. 6 show that the alternative approach does not significantly change the number of regions, their size, or their shape. However, the specification of the neighborhood size n for the ALAE computation introduces an additional parameter that is difficult to choose. One would need to compute more ALAE fields for different scales and compare the results to check whether the important flow features of the particular flow field are better preserved or not. But this suffers from the high computational costs that arise from the affine linear fitting that is needed for the ALAE computation.

6.3 Alternative approximation

Alternatively, we could compute a fitted linear vector field within each segment, instead of using the one given at the seed points. We did a brute-force implementation of a linear fitting that is applied every time a new vertex is added to a segmentation region. Therefore, the linear fitting, the test for the maximal approximation error, and the test for the topological equivalence have to be repeated for every vertex of the current region. Although this would reduce the overall approximation error, the runtime would be heavily increased. The disadvantages of this approach outweigh the advantages of our presented implementation in Sect. 4.

6.4 Comparison with previous approaches

In this section, we will compare our approximation algorithm to the previous works that were mentioned in Sect. 1. Technically, the compression algorithms of Lodha et al. [17] and Dey et al. [3] are similar to our approach, since both provide an error measure that limits the local angular and magnitude differences. But both algorithms do not explicitly preserve the topological skeleton of the original vector field. Furthermore, the approach of Dey et al. has some more drawbacks. As discussed in Sect. 3.1, the relative error is not well defined for every part of the vector field. Although the high approximation error in the vicinity of stationary points causes their preservation in the compressed field, the error is not defined at these points. Also they only use a coarse approximation of the maximal error, since they evaluate them only at the grid vertices that are removed. This could be easily improved according to [13].

Since we explicitly want to preserve the vector fields’ topology, we will compare our results to those of the compression algorithm of Theisel et al. [30]. They use a similar approach that is based on edge collapse. Note that their algorithm does not provide any means to steer the granularity of the results. The vector field is arbitrarily changed with every vertex that is removed and the algorithm proceeds until there is no vertex that can be removed. Therefore, the highest compression rate is achieved. In contrast, as we control the precision of the approximated field, we have to accept that this causes a reduction of the compression rate. Since no error-control corresponds to using an unbounded error, their algorithm can be seen as an extreme case of ours.

Table 5 quantifies the differences between the algorithm of Theisel et al. and ours. We applied their algorithm to all datasets used in Sect. 5. Compared to our results in Tables 2, 3, and 4, the obtained compression rates are clearly higher than ours. But since they do not use any error-bound, the average error is distinctly higher than the used error thresholds used for our examples.

In order to study the influence of high compression rates, as well as high approximation errors on the flow pattern, we carry out a representative visual comparison using the Jet Stream dataset. If we compare the approximated fields in Figs. 9c, f and 10b, we can clearly see strong differences in the field compressed by the method of Theisel et al., even though the center points, as well as the main vortices, are preserved in all cases. The inflow from the left domain boundary, for example, is disturbed in Fig. 10b. Although the in- and outflow regions are explicitly preserved by the algorithm of Theisel et al., such disturbances can appear, because there is a nearly stagnating inflow along the left domain boundary and therefore no switch point. Furthermore, one can see strong changes of the linear vector fields from one cell to its neighbors across the cell boundaries. Overall, the comparison shows that the topological equivalence between the compressed and the original field is important, but not sufficient to preserve other important flow characteristics like the Jet Stream’s fast inflow through the narrow opening. Our algorithm allows to steer the maximal strength of the local vector field changes by setting an appropriate \(E_\mathrm{{max}}\), so we can achieve a smooth approximated vector field that is very similar to the original field.

Table 5 Compression rates achieved by the algorithm of Theisel et al. [30] for our used datasets

The result of the algorithm of Theisel et al. in Fig. 10 shows the allowed deviation of the course of the separatrices. In contrast, our compressed fields (Fig. 9c, f) have exactly the same topology as the original field. To speed up our algorithm, we preserve all cells that contain parts of the topological skeleton and avoid all changes in the remaining field that would change the topology. Besides the differences of the resulting local and global flow behavior, there are great differences in running time and obtained compression rates. The approach of Theisel et al. achieves a compression rate of \(99.91\,\%\), which is clearly above our results listed in Table 4. But due to the high computational costs of testing and updating the topology of the field, the algorithm runtime of roughly 4 h was very high compared to that of our algorithm.

7 Conclusion and future work

We presented a fast ALN-based segmentation algorithm that results in a region-wise and topology-preserving approximation of a vector field. The algorithm only depends on the error threshold \(E_\mathrm{{max}}\), which limits the maximal approximation error of each segment. We use the region-wise approximations to compress the original dataset. To demonstrate our algorithm and the achievable compression rates, we discussed the results for five datasets (Sect. 5). We were able to show that the global as well as the local flow behavior is preserved well.

The presented results prepare the ground for further investigations. Since every region provides an analytic description of the vector field approximation by a Jacobian \(J(\mathbf {x}_s)\) and the corresponding offset vector \(\mathbf {v}(\mathbf {x}_s)\), common visualization methods like the FTLE [6, 11] can be applied to our approximated and compressed fields without decompression. This could be used to produce flow feature previews on very large datasets and is part of our recent research. Besides this, our method can be extended to time-dependent vector fields, as well as to scalar and higher order tensor fields. A direct extension to 3D is realizable, but requires a robust extraction of separatrix surfaces. An extension to surfaces in 3D seems not to be trivially achievable. However, there are also some short-term objectives like the improvement of the resulting triangulation to avoid nearly singular cells in the final approximation.