Keywords

1 Introduction

Studies on perception based on eye tracking are most often focused on the allocation and local density of fixations. Attention maps, fixation clustering [20], and aggregated values (e.g., dwell time on a region of interest (ROI)) are frequently used tools for such analysis. In fact, fixation-based analysis is motivated by the way our visual perception works; perception is only possible during fixations and suppressed during saccades, i.e., high velocity movements of the eyeball [11]. Therefore, saccadic patterns have mostly been studied indirectly, e.g., as transitions between ROIs [8]. Obviously, a large proportion of saccades will occur as ROI transitions, e.g., between the faces of people in a painting. However, ROIs might be ambiguous in an art work. For example, when viewing abstract art, gaze is supposed to follow artistic composition principles [1] or in medieval art, by inserting reflective gold leafs in the painting [16]. Yarbus defined composition as “[...] the means whereby the artist to some extent may compel the viewer to perceive what is portrayed in the picture” [25]. Especially for abstract paintings, the definition of meaningful ROIs is questionable and an analysis of saccades and gaze transitions would be restrained to these ROIs. Benefits from art viewing analysis are deeper understanding of pictures and human perception, keypoint extraction from paintings [12], image compression [17] as well as saliency map creation [6].

This work focuses on the analysis of saccade trajectories. Thus, instead of asking the question what is looked at, we aim at proving techniques to tackle the questions how and why our gaze is driven and guided over an artwork in a particular way.

First, we show how saccade trajectories form patterns that are characteristic for the stimulus material and enable the artist to guide the viewer’s gaze over the artwork. Then we introduce two methods to analyze saccade trajectories: (1) a novel visualization method, a saccadic heatmap, and (2) a clustering technique to cluster saccades for eye-tracking data of low temporal resolution. Both methods are compared to ROI transition diagrams and a trajectory clustering approach (attribute-driven edge bundling [18]). The proposed approaches allow to study saccadic patterns thoroughly and might contribute to a better understanding of the influence of image composition on visual scanning.

Both methods are implemented in the Eyetrace [15] software. Eyetrace is a visualization and analysis tool for static stimulus experiments, such as the viewing of fine art. It provides a variety of state-of-the-art algorithms for each processing step: Identification of fixations and saccades (e.g., [13, 23]), clustering of fixation locations, automatic ROI annotation, and scanpath comparison (e.g., [14]). Eyetrace is available at http://www.ti.uni-tuebingen.de/Eyetrace.1751.0.html

2 Related Work

In eye-tracking recordings, data samples recorded during fixations outweigh by far the saccade samples. A first version of Eyetrace already tried to implement a feature for sampling saccades [9, 22].

Dong et al. were among the first to work on simple heatmaps of saccades for the evaluation of enhanced imagery in cartography [7]. However, most studies work based on fixation heatmaps; sometimes even heatmaps containing both fixation and saccade data are employed, especially when no event filter is applied [3, 4]. Popelka and Voženílek propose a space-time cube visualization, where saccades make up a large part of the visualization: as all samples are connected by a line, saccades result in the longest line segments [19].

Probably the best metaphor for a saccade heatmap is a grassland, where the grass is trampled down in paths that are frequently walked over. Trails emerge and enlarge as they are used more frequently. Similarly, the saccade heatmap visualizes frequently traversed gaze trails derived from the saccade point in eye-tracking data. Corresponding to the ROIs emerging from hot spots in the fixation heatmap, we will explore so-called saccade bundles, i.e., clusters of saccades.

Other methods, such as attribute-driven edge bundling techniques to cluster general trails, have successfully been applied to eye-tracking data [18]. A visually appealing and fast implementation can be found in the CUBu software [26] that can be accessed via Eyetrace [15], if a Nvidia GPU is available. In another approach, so-called saccade plots [5], saccades can be visualized in a more abstract way. Similar to a ROI transition diagram, saccades are split into x- and y- components and visualized, e.g., by arcs that connect different stimulus regions.

3 Eye Tracking Experiments

The proposed methods were applied to eye-tracking data collected during the viewing of paintings. Two paintings were chosen for this experiment that are at the center of a controversial methodological discussion in art history for several decades. In 1961 Kurt Badt argues that in order to interpret a painting one has to describe the path taken by the eye to go through it. His foremost examples are Jan Vermeer’s Art of Painting at the Kunsthistorisches Museum, Vienna and Jacopo Tintoretto’s Last Supper in S. Giorgio Maggiore, Venice [2]. Badt’s argument was often discussed. But it could not yet be confirmed or falsified with empirical evidence.

Fig. 1.
figure 1

Paintings employed in the eye-tracking experiments. (Color figure online)

Experiment 1: The Art of Painting: In the first experiment, nine subjects viewed Johannes Vermeer’s The art of painting (Fig. 1(a)) on a screen for one minute. Eye movements were recorded by means of an EyeTribe eye tracker at 30 Hz sampling rate. Fixations and saccades were determined via a Gaussian mixture model [23, 24].

Experiment 2: The Last Supper (Tintoretto): This data set was recorded at the University of Vienna and contains eye-tracking data of 40 subjects viewing the painting shown in Fig. 1(b) for two minutes each. An IViewX RED 120 tracker was used and the painting shown on a 30" display (2560\(\times \)1600 pixel) with a distance of 90 cm to the observer. The 20 art historians and 20 novices were instructed to judge whether they liked the picture to induce a sense for aesthetics.

4 Saccade Heatmap

Characterizing saccades requires at least two points, the origin and the target of the saccade. In addition, the representation of a saccade may contain its direction, amplitude, velocity and a whole trail of samples to show its ballistic nature. Clustering saccades is in contrast to clustering fixations a challenging task. Instead of comparing 2D fixation locations to each other, we need to assess the similarity of whole saccade trajectories. In this context, saccade direction, amplitude and the position of intra-saccadic measured points might also be relevant. The visualization of saccades without further post-processing might not be very informative. For example, Despite the relatively short viewing time of one minute and the small number of subjects, we extracted overall 959 saccades from the eye-tracking data collected during Experiment 1, Fig. 2. Each saccade is visualized by an arrow, resulting thus in a visual clutter and overlapping shapes. Given this visualization, it is pretty hard to derive any pattern; this is probably an additional reason why saccades are usually excluded from further analysis in many studies.

Fig. 2.
figure 2

All saccades contained in a recording of nine subjects viewing The Art of Painting for one minute. Each saccade corresponds to one arrow in the visualization.

4.1 Construction of a Saccade Heatmap

To process saccadic data, we introduce a novel computational method for saccade heatmaps. The aim is to visualize the density of saccades, where frequently traversed areas gradually become hot while other areas stay cold. To achieve this, we have to (1) define a density function for a saccade, (2) integrate the density functions over all saccades, and (3) apply some post-processing, such as weighting. Each of these processing steps is described in detail in the following paragraph.

Fig. 3.
figure 3

(a) a raw saccade heatmap with Gaussian density functions stretched to cover the saccade trajectories. (b) thresholded at a minimum density to pronounce the most important paths. (c) raw heatmap with a small standard deviation for the saccade density function. As a result the heatmap is less smooth and more precise. (d) the heatmap was capped at a maximal density. The color resolution available for the remaining areas is therefore enlarged, but the resolution of the most frequently traveled paths is decreased. (e) Saccade heatmap with a low standard deviation, capped at a maximum density and with a minimum density threshold applied.

Density Functions: In the computation of fixation heatmaps, the density around a fixation location is usually modeled by a Gaussian. The mean of the Gaussian distribution is placed at the center of the fixation location and the standard deviation adjusted to represent 2–5\(^\circ \) of the visual angle. Thus, it is supposed to represent the area of the fovea, the accuracy of the eye-tracker, or the area of sharp, high-resolution vision.

To compute saccade heatmaps, such a Gaussian with equal spread towards each direction obviously does not represent saccades very well. But the approach can be adapted by stretching the Gaussian along the saccade to cover its origin and target. More specifically, we apply the following Eqs. 1 and  2 to calculate the standard deviations of the Gaussian, where dist is the length of the saccade.

In our implementation we used the pixel distance. To guarantee scale invariance, the pixel distance is calculated based on mm distances in the real world.

$$\begin{aligned} std_{dir}(dist) = \sqrt{dist} \cdot (1+\ln (dist)) \end{aligned}$$
(1)
$$\begin{aligned} std_{orto}(dist) = \sqrt{dist} \end{aligned}$$
(2)

Note that in this case we have a covariance matrix that can be split into the contribution in the direction of the saccade \(std_{dir}\) and its orthogonal vector \(std_{orto}\). The orthogonal contribution is chosen much smaller than the contribution along the saccade’s major direction. This way a slim, ellipsoid shape is produced. We used the natural logarithm of the distance as stretching factor with e as base of the Gaussian and the idea that \(e^{\ln {dist}} = dist\). For the Gaussian this is not completely correct (the standard deviation is the denominator), but the effect is as expected. The density function is then rotated and translated to align with the position and direction of the saccade vector.

$$\begin{aligned} g(x,y) = \frac{1}{2*std_{dir}*std_{orto}*\pi } * \mathrm {e}^{-\frac{1}{2} *(\frac{x^2}{std_{dir}^2} * \frac{y^2}{std_{orto}^2})} \end{aligned}$$
(3)

Eq. 3 shows the complete Gaussian function, where x and y are the positions shifted from the saccade center.

Fig. 4.
figure 4

Normal distribution density functions for two saccades. The height and color of the surface represents the density assigned to the respective position: (a) shows a short, (b) a long saccade.

Figure 4 visualizes the Gaussian distributions stretched along saccades. We can observe that the peak density is reached in the middle between the origin and target of the saccade and that positions along the saccade are not weighted equally. Furthermore, the start and end point are not contained within the high-density area. Figure 3(a) and (c) show the saccadic heatmap for two Gaussians with different standard deviations. In (a) the lines are smoothed and blurred by the high standard deviation. In (c) individual saccades are still visible and crisp. There is no obvious real-world equivalent to the spread of the Gaussian, like with the fovea for the fixation data. Instead, the parameter depends mainly on the eye-tracker’s accuracy and the homogeneity of eye movements that the stimulus material invokes. If we want to study fine-grained details, a small standard deviation needs to be chosen. When general saccadic patterns are of interest, a larger standard deviation contributes to a faster convergence of the heatmap. Saccade trajectories are more likely to overlap when the spread is larger.

Fig. 5.
figure 5

Modified normal distribution density function that assigns the same gradient to each position along the saccade trajectory. (a) low standard deviation, leading to a crisp saccade representation. (b) larger standard deviation, resulting in a smooth but blurry heatmap. The width of the Gaussian is based on the user standard deviation. (c) Contour plot of the three components: two caps for the saccade start and end point, and a length-variable adapter piece between them.

To achieve an equal weight of the whole saccade trajectory, the central cross-section of the 2D Gaussian density function is copied along the trajectory (see Fig. 5). The start and end of the saccade are then modeled as a dissection of the Gaussian with one half applied to the start, the other to the end of the saccade (Fig. 5(c)).

Integrating Density Functions: Integration over all saccades in a recording is simple, as the density functions can be added. Using the modified density function without further modification would result in an increased number of saccade overlaps within the smaller ROIs (just as it happens in the example shown in Fig. 3(a) with the face of the woman): transitions to multiple other locations originate here, overlap each other, and cause the saccade heatmap to highlight the overlap region instead of the saccadic trajectory. A simple approach to avoid this effect is to reduce the length of the saccade (e.g., 10 % at each side). Endpoints will not accumulate anymore as the small shift assigns a lower weight to the periphery of the trajectory. Note that this effect is already built-in for the non-modified Gaussian density approach, since there exists only one maximum at the center of the saccade as described above.

Fig. 6.
figure 6

(a,b,c) In the top row, the modified density function is applied. (a) unweighted heatmap. (b) heatmap weighted by the length of the saccade highlights longer trajectories. (c) heatmap weighted by the duration of the enclosing fixations. (d,e,f) the bottom row employ the non-modified, stretched Gaussian density.

Figure 6 shows the practical consequences of using either the modified density function (top row) or the non-modified density function (bottom row). When the modified density distribution is applied, frequent traversals between the painter, the woman and the mask are highlighted (first column). But there are also unwanted effects of saccadic overlay within the ROI regions of the faces (those are in fact the overall hottest areas). The stretched normal distribution compensates for this effect: hottest regions in this map are located in-between the face regions. However, the trail of gaze is not clearly visible. Especially the triangle between the two faces and the chandelier is not visible anymore.

A relevant drawback of the current implementation is that saccades sum up with each other independently of their direction. Theoretically, it would be possible to calculate separate heatmaps for saccades towards different directions. These heatmaps could then be merged by adding up only those heatmaps that stem from saccades with a similar direction. Heatmaps from different directions be then merged non-additively by taking the maximum of both maps. The implementation of these features is in scope of our future work.

Weighting and Post-processing: Just as the contribution of a fixation to a heatmap can be weighted by the fixation duration, the contribution of a saccade towards the saccade heatmap can be scaled. Figure 6(b) and (e) are weighted using the length of the saccade. Longer saccades contribute more towards the final heatmap, emphasizing the long-distance gaze transitions. For (e) and (f) saccades were weighted using the duration of both adjacent fixations. In these weighted heatmaps we can observe that the scaled normal distributions highlight the relevant gaze trails with only a minor overlap effect in the ROI regions when compared to the modified distribution.

Heatmaps of both, fixations and saccades, often suffer from the effect of one location that is so frequently looked at (or traversed), such that all other areas are covered by the large effect. This means that most of the color space is required to represent one spot and the remainder of the image has to be visualized with only a limited diversity of available colors. To cope with this effect, a parameter to cap the heatmap at some user defined maximum density is implemented. On the cost of resolution at the high density areas, low density effects can be studied in more detail. Figure 3 shows the saccadic heatmap with capped maximum and a minimum density threshold that cuts off non-relevant areas.

5 Saccade Clustering

This section introduces a new method for hierarchical clustering of saccades. In case of saccade clustering, we want to summarize most frequent gaze trails in the recording (similar to warm regions in the heatmap representing frequent gaze trails in the recording). Contrary to heatmaps we are working with the actual data that is not derived and generalized representation such as a probability distributions. This allows for a quantification and filtering of the results.

As for fixation clustering, we are then able to combine data from multiple recordings and subjects in order to reach a convergent gaze trail. Thereby, the most important, most repetitive elements are extracted from the recording. This process can be described as a denoising process that deletes individual variation from the data and highlights only the most common sequences.

When compared to visualization methods, clustering has several advantages: Each saccade can be uniquely assigned to one saccade bundle. These bundles can be quantified and compared to each other. Filtering and bundle selection can be applied (for example a visualization can select and display only the three most important gaze trails).

Clustering Algorithm: A hierarchical clustering [10] of a set of objects can be displayed as a binary tree. Each node represents one object of the set and the distance between two nodes represents the dissimilarity between the two objects. Constructing such a hierarchy tree consists of two steps. First calculating the dissimilarity between two nodes and second a linkage method for the dissimilarity between groups of nodes. Popular linkage criteria are maximum (or complete) linkage, minimum (or single) linkage and average linkage. Basically, maximum linkage will return the largest distance between any two elements contained in the two clusters, minimum linkage the smallest distance and average linkage the mean of all distances between any two nodes in the clusters.

Fig. 7.
figure 7

Saccade clustering workflow. On the right side the raw data and clusters after the angular clustering as well as the subsequent distance clustering step are shown. Saccades of the same cluster are colored the same.

For the definition of a distance metric between saccades we will consider both the orientation and the Euclidean distance between start and end points of the saccades. Data obtained from many or long recordings can easily contain some thousands saccades. As the dissimilarity calculation needs to be performed pairwise (resulting in a runtime of \(\mathcal {O}(n^2)\)), computational efficiency is an issue.

Saccade clustering is computed in a two-step approach as depicted in Fig. 7: in a first clustering step the orientation between saccades is used as similarity measure, afterwards a second clustering step based on the Euclidean distance between the saccades is performed, but only within the previously found clusters.

Runtime can be reduced by filtering short saccades that are unlikely to contribute much to driving gaze over the picture and by scanpath simplification, i.e., merging of temporally sequential saccades into the same direction.

Given a saccade with start point \(A=(x_a, y_a)\) and end point \(B=(x_b, y_b)\), the angle between \((-\pi ; \pi ]\) in relation to the positive x-axis plane is calculated as:

$$\begin{aligned} \measuredangle (A, B) = atan2(y_b - y_a, x_b - x_a) \end{aligned}$$
(4)

The angular difference between saccades \(S_1\) and \(S_2\) can then be computed as:

$$\begin{aligned} d(S_1, S_2) = {\left\{ \begin{array}{ll} |{\measuredangle (S_1)-\measuredangle (S_2)}|, &{} \text {if }|{angle(S_1)-angle(S_2)}| \le \pi \\ 2\pi -|{\measuredangle (S_1)-\measuredangle (S_2)}|, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(5)

The above equation can easily be adjusted for direction independence such that the saccades \(S_1=(A,B)\) and \(S_2=(B,A)\) are considered equal.

For the second clustering step, the spatial distance between two saccades \(S_1=(A,B)\) and \(S_2=(C,D)\) is calculated as the minimal distance of any of the start and end points to the line from start to end of the other saccade:

$$\begin{aligned} d(S_1,S_2) = \min (d(A,(\overline{CD})), d(B,\overline{CD}), d(C,\overline{AB}), d(D,\overline{AB})) \end{aligned}$$
(6)

The distance between a point A and a line segment \(\overline{BC}\) is defined as the Euclidean distance between A and the closest point on the line segment \(\overline{BC}\).

The construction of the clustering tree with the currently implemented method requires \(\mathcal {O}(n^3)\), cutting the hierarchy tree can be done in \(\mathcal {O}(n)\). The computational bottleneck is therefore the computation of the first clustering tree that includes many saccades. Due to the hierarchical clustering parameters can be adjusted easily and fast, allowing to choose between average cluster size and within-cluster similarity.

This ability to choose the detail-level after the computationally expensive part of the algorithm makes the method comfortable to use.

Fig. 8.
figure 8

Transitions between fixation clusters visualized by orange ellipses. (a) shows direct saccades that transition from one cluster to another. (b) shows the same clusters but includes also indirect transitions. In this visualization, the line width represents the transition frequency. Blue lines represent transitions that go from the left to the right, whereas green transitions stand for transitions in the opposite direction. (Color figure online)

ROI Transitions: Figure 8 shows the transitions between ROIs. The depicted ROIs were calculated as cumulative clusters, i.e., clusters with the highest density of fixations shared by all participants. We distinguish between direct transitions, i.e., a saccade that connects two ROIs, and indirect transitions. Indirect transitions contain at least two saccades. The first saccade starts from a ROI but does not land in another ROI. We consider the following saccades until a ROI is hit. The indirect transition is then counted as a transition from the start ROI of the first saccade to the target ROI of the last saccade in this chain. This was already implemented in the first version of EyeTrace [9, 22]. We added the capability to analyze indirect transitions.

6 Application of the Proposed Techniques to Art Viewing

The above approaches were applied to eye-tracking data of both free-viewing experiments introduced previously. For both eye-tracking datasets, the orientation clustering step was performed with maximum linkage criterion and direction dependency, whereas the distance-based clustering step with minimum linkage. Cutoff values were determined by successively easing the restrictiveness of the cutoff (i.e., increasing the cutoff threshold), until relatively many saccades were contained in the clusters. This parameter is necessarily subjective, as the homogeneity of saccades depends on the stimulus. While we are increasing the threshold, we are moving from very detailed saccade bundles towards a more general, coarser summary.

Results on Experiment 1: Figure 9(b) visualizes the clustering result for Experiment 1. Remarkably, saccade clustering reveals that the eye movements of the observers were driven by the social cue in the painting. The painter and the woman in this painting are displayed in a way that their gaze target can be estimated by the viewer. We can observe that the most frequent gaze trails computed by our clustering approach follow these social cues between the painter, the woman, and the plaster mask. We can further observe that the composition line of the painting that connects the mask, woman, and chandelier has a strong effect on gaze behavior.

Fig. 9.
figure 9

Comparison of the different approaches proposed in this paper (a,b,c) and one state-of-the-art visualization technique (d). In (b) the modified Gaussian (Fig. 5) with \(std=5.5\) and an absolute maxima of 25 overlapping saccades was applied.

In addition, Fig. 9 displays the spectrum of visualizations for saccades that is currently available in the Eyetrace software, where (a) visualizes the result of the saccade heatmap computation and (b) the result of the saccade clustering technique. Besides the different look, their main distinction lies in the amount of simplification that is performed. More specifically, the ROI transition graph as visualized in (c) builds upon the identification of ROIs (e.g., via mean-shift clustering of fixations). Its major advantage is that scanpath transitions instead of direct saccades between ROIs can be considered. Scanpath transitions may contain saccades to non-ROI areas in-between two ROIs and do not require a saccade directly from one ROI to another ROI.

The most recent and impressive example of clustering is attribute-driven edge bundling [18, 26]. Edge bundling performs the mean-shift algorithm on both, saccadic start and end points as well as samples distributed equally along the saccadic trajectory. Therefore, clustered trajectories get an organic look, as if the exact ballistic eye movement was measured with an extremely high sampling rate and accuracy.

The edge bundling approach shown in (d) consists of various different steps (clustering of fixations, clustering of the trajectories, relaxation, color choice,...). Each step is associated with a set of parameters that require adjustment. The parameters were adjusted to emphasize the same effect that was also found by the other methods and we can clearly observe the primary gaze trajectories along the faces and towards the chandelier. When looking at the results it is important to keep in mind that the displayed data represents a considerable simplification and that the suggested level of detail is in fact not contained in the data. The samples along the saccade trajectory are interpolated. Contrary to the approach suggested here, the whole saccadic trail can be clustered - if a recording at a high enough frame rate is available. The proposed clustering approach uses only the start and end point of each saccade and can therefore also be run on the CPU while edge bundling requires the massive parallelization of a GPU.

Fig. 10.
figure 10

(a) The last supper by Tintoretto. (b) the saccade heatmap using the modified Gaussian (Fig. 5) with \(std=5.5\) and an absolute maxima of 100 overlapping saccades. (c) saccade clusters where the color represents the cluster membership.

Results on Experiment 2: Tintoretto was the first painter who represented the table of the Last Supper from the side, hence foreshortening it in the depth of the space. The main composition lines, as they have been described by Badt on the left (Apostes) and right (cat, servant, sideboard) lead into the depth of the space. Most saccade trajectories measured in Experiment 2 are along those composition lines with almost no transitions across the table. Also gaze escapes towards the light source in the top left corner mainly via the woman in-between the central image area and the light. The empirical experiment confirms the assumption of a correlation between composition lines (as generally analyzed by art historians) and eye movements of beholders. However, in this specific example the experiment falsifies Kurt Badt’s analysis in one crucial point: His central assumption is that the viewer starting on the lower left corner will be refrained from following with his eyes the apostles along the table, and will instead follow the high-lighted leg of the left apostle, the dog and cat up to the servant and the right foreground. In the experiment, this connection was extremely rare (Fig. 10). This could be stated by using ROI transitions already before  [21], but it becomes much more evident with our new visualization techniques. By employed our new visualization techniques, eye-tracking becomes an easier to use and very powerful tool to verify art historical theories about composition of pictures. This tools are useful for figurative paintings as those chosen in the present experiments. We expect them to be even more pertinent for abstract art and representational art without figures and or very salient objects (such as landscapes).

Conclusion: We introduced two novel computational techniques to process saccades: (1) the saccade heatmap and (2) a completely data-driven method for saccade clustering. Both methods were applied to two art viewing experiments alongside ROI transition diagrams and edge bundling. As they work without a definition of regions of interest, our methods are relatively easy to apply. In our future work, the method to compute saccade heatmaps will be adjusted for a sense of saccade direction in order to reduce the effect of overlapping saccades towards different directions.