Keywords

1 Introduction

The user’s experience with digital products is often multi-faceted and complex. To better understand these complex interactions, researchers tend to advocate for the inclusion of multiple categories of metrics in order to produce a more complete understanding of the user experience [1]. While self-report (e.g., verbal comments, satisfaction questionnaire ratings) and performance are the two most commonly used types of metrics, physiological metrics (e.g., eye movements, pupillary response, galvanic skin response), which require specialized technology for observation, provide additional insight into the user experience. One such type of physiological data is eye movements, which can inform researchers about the allocation of visual attention on the design elements and language on a digital product. Eye movements can be used to provide an additional level of insight—over and above self-report and performance metrics—into the optimal design and language.

As eye-tracking software and hardware continues to evolve and improve, pupillometry (the measurement of pupil diameter) is becoming more accurate and is being captured at a higher sampling rate. Improvements in eye-tracking hardware and software now allow analyses to be conducted that were once impractical for user experience research practitioners. Pupil diameter is a continuous variable that is measured and recorded at every observation captured by the eye tracker. Slight but measurable changes in pupil diameter have been attributed to differing levels of mental workload [2–4], cognitive processing [5], attentional effort [6], perception [7], memory [8, 9], decision making [10], and physiological arousal [11].

By measuring and analyzing pupil diameter during a person’s interaction with a digital product, researchers can assess interactions with the product that require higher levels of mental workload to process. Researchers can then assess whether the increased levels of mental workload lead to comprehension problems or task failure. Combining pupillometry with traditional eye-movement data may lead to better informed design decisions and recommendations for digital product improvement.

Formative user experience research often consists of usability testing with eight to 10 participants to uncover errors and provide general feedback on design elements and language. While small, this sample size is typically sufficient to discover a relatively high proportion of possible errors given the homogeneity of responses. Pupillometry data is more typically included with cognitive research investigating executive functions. Not well understood is whether pupillometry data aids in the understanding of the user experience at these small sample sizes during commonplace internet interactions typical for this type of research.

Our current exploratory research investigates the relationship between eye-movement fixations and pupillary response. We approached this investigation with two broad research questions: (1) What is the relationship between pupillary response and fixation duration, and (2) What is the relationship between pupillary response and frequency of fixations?

2 Method

2.1 Data Source

Nine people participated in the study at the Fors Marsh Group User Experience Lab in Arlington, VA. One participant was removed due to a low eye-tracking capture rate, leaving eight participants (five female, three male), with a median age of 59 (range: 45–61) in the final data set. Participants were instructed to interact with an online calculator tool until they considered their experience complete. After completing the task, participants completed the System Usability Scale [12]. The moderator then conducted a debriefing interview with each participant about their experience using the site.

Data included in the analysis consisted of fixations and pupil diameters from participants while they used the calculator tool uninterrupted by the moderator. Data was collected using a Tobii X2-60(Hz) eye tracker from a system running Tobii Pro Studio version 3.2.3. The Velocity-Threshold Identification (I-VT) fixation classification algorithm [13] was applied to these raw data as preparation for analysis.

2.2 Within-Subjects Analysis

Our research question focuses on a general effect of (1) fixation duration and (2) number of fixations total on pupil diameter. To answer our research question, we first examined the correlation between participants’ left and right pupils for each observation. Most observations had valid entries for both left and right eyes (80,627). The correlation was extremely strong (r = 0.90), so left and right pupils were averaged for ease of analysis. For single left (6,817) or right (6,512) entries, the single valid value was used as that observation’s value.Footnote 1 The observation values were then aggregated by computing both the (1) average and (2) standard deviation across all observations within a single fixation. There were 8,843 total fixations across all eight participants in our data.

We used linear regression to estimate the effect that fixation duration and number of fixations had on the average pupil diameter. We chose the linear regression model as average pupil diameter was approximately and normally distributed and a good fit to the linear regression model.

We also included several control variables in the linear regression. All variables in the regression are discussed below.

  1. 1.

    Fixation duration:

    • Assesses the effect of length of a fixation on pupil diameter and should reflect mental workload independent of the below controls.

  2. 2.

    Number of fixations:

    • Assesses the effect of number of fixations on pupil diameter and should reflect mental workload independent of the below controls.

  3. 3.

    A set of dummy-coded indicator variables for each participant:

    • Removes each participant’s natural level of pupil diameter. Some participants simply have larger pupils than others.

  4. 4.

    Each fixation’s serial order:

    • Removes longer-term trends in pupil diameter across participants.

  5. 5.

    Standard deviation of pupil diameter:

    • Concurrent as well as one- and two-fixation lags to remove possible pupillary fatigue (i.e., from movement) effects on pupil diameter, which could change diameter toward resting levels.

  6. 6.

    Lagged pupil diameter:

    • One- and two-fixation lags remove any sedentary or inertia-like effects of diameter across fixations. To the extent diameter is similar across fixations, these effects will remove them.

The descriptive statistics for several of the within-subjects variables are reported in Table 1. Nine fixations had no valid values on pupil diameter, which left the total number of usable fixations at 8,834.

Table 1. Within-subjects descriptive statistics

2.3 Between-Subjects Analysis

Our research question did not require a strict focus on within-subjects data; we were also interested in obtaining relationships between the focal predictors of pupil diameter as aggregated between subjects.

To do so, we obtained each person’s average and standard deviation for pupil diameter at the fixation level, as well as each person’s average fixation duration. We assessed the relationship between average diameter and standard deviation of diameter with number of fixations total as well as each person’s average fixation duration using partial correlations controlling for each other variable. The between-subjects descriptive statistics for all eight participants are reported in Table 2.

Table 2. Between-subjects descriptive statistics

3 Results

3.1 Within-Subjects Results

The results of the within-subjects regressions are reported in Table 3. Note that participant 8’s dummy code was omitted due to overlap with other predictors.

Table 3. Within-subjects regression estimating average pupil diameter

The coefficient associated with fixation duration decreases pupil diameter, controlling for within-subjects effects and the previous two fixation periods. On average, a one second increase in gaze duration decreases pupil diameter by 0.0395 mm. The longer a person fixates, the smaller the pupil diameter is for that fixation. Conceptually, we expected fixation duration to decrease pupil diameter since it might reflect reduced mental and physiological workload.

The coefficient associated with total number of fixations increases pupil diameter, controlling for within-subject effects and the previous two fixation periods. Substantively, an increase of 100 fixations during the entire task would increase average pupil diameter by 0.00514 mm. More fixations, then, means larger pupil diameters. As we expected, the number of total fixations appears to be associated with increased pupil diameter and may be a useful indicator for level of workload in the context of a usability task.

Table 3 also shows that average pupil diameter from up to two previous fixations (i.e., lag 1 and lag 2 average pupil diameter variables) is serially correlated with the current period, suggesting that pupil diameters tend not to change greatly across fixations. The serial correlation between the fixations can also be observed below in Fig. 1. Figure 1 depicts the local polynomial-smoothed average diameter over the course of the entire task from fixation to fixation. The pupil diameters tend to be similar over time, move within a fairly narrow range, and tend not to change wildly over time. The stability in the trends in Fig. 1 is the reason for the effects obtained for the lagged pupil diameter variables.

Fig. 1.
figure 1

Local polynomial-smoothed average pupil diameter over the course of the entire task from fixation to fixation.

Similarly, variance in pupil diameter from the previous fixation period (e.g., lag 1 standard deviation) predicts a smaller pupil diameter by approximately 0.191 mm for the current period, for every one standard deviation increase. The effect of the lag standard deviation from the previous fixation suggests—as opposed to the lag average finding—that lots of movement in the last fixation might fatigue the iris sphincter muscles and result in a movement toward a smaller resting diameter. Lots of contractions and expansions in diameter are likely to tire the iris sphincter muscle and are likely to move the diameter toward a person’s average value (as can be seen in Fig. 1, in the separation between the lines, each person has different average diameters), which is usually smaller than the current size of the pupil—suggesting the pupil likely is expanding mainly during a fixation to result in a higher standard deviation.

Finally, a concern we had with these data was that the average and standard deviation would be strongly correlated as pupils are finite in terms of their dilation capability and we would expect large diameters to be associated with small standard deviations. The correlation between average pupil diameter and its standard deviation was, however −0.043. Thus, although the direction of the effect is as expected, the strength of the relationship was smaller than we were expecting. Additionally, the model does not suffer from unnecessary predictor multicollinearity with the inclusion of standard deviation controls. In other words, the standard deviation and average pupil diameter provide non-overlapping information and are effective control variables for estimating the effect of fixation duration.

3.2 Between-Subjects Partial Correlations

Results for the between-subjects partial correlations are reported in Table 4. The results for each variable correlated with average pupil diameter are in Column 1, and the results for each variable correlated with standard deviation of average pupil diameter are in Column 2. Because the correlations are partial, they control for the other variables. The only significant partial correlation was with the standard deviation of the pupil diameter with average fixation duration. On average, as average fixation durations increase, the variance of individuals’ pupil diameters increase for each fixation (see Column 2).

Table 4. Partial correlation coefficients

4 Discussion

We explored the relationship between pupillary response and eye-movement fixations in data from formative usability testing. Three interrelated findings emerged:

  1. 1.

    Within subjects, longer fixation durations were associated with a decrease in pupil size.

  2. 2.

    Within subjects, an increase in total fixations was associated with an increase in average pupil size.

  3. 3.

    Between subjects, participants who fixated for longer durations had less consistency in their pupil size than those who fixated for shorter durations.

The first finding suggests that as a fixation becomes longer, the pupil becomes smaller, possibly as a result of fewer resources being expended. In other words, the pupil becomes smaller, potentially because looking in the same location requires fewer mental and physiological resources over time. This decrement in pupil size becomes more evident the longer the fixation becomes. The second finding, a positive relationship between number of fixations and pupil size, follows from the first. More frequent fixations during a task means less time the eye is resting and more time processing new information, therefore potentially increasing the average pupil size by requiring more resources.

Finally, the third finding—that those who fixate longer have less consistency in their pupil diameters than those who fixate for shorter durations—follows necessarily from the first two. If pupil size decreases over time within a given fixation as a result of reduced mental and physiological workload, then those with longer average fixations will have greater variance in pupil diameter.

This work, while exploratory, provides insight into the relationships between pupillary response and eye-movement fixations, and the former’s potential utility in usability testing. Pupillary response has been found to be a useful measure of mental workload for the evaluation of interfaces with high cognitive demands; [14] however, not as much is known about the usefulness of the measure for less complex usability tasks. These findings demonstrate that in a usability task where participants are evaluating a website and not engaging in complex tasks, there is still significant variation in pupil diameter (and, by implication, workload), and that these variations in pupil diameter demonstrate consistent and interpretable relationships with fixation count and length.

Further research is needed to understand more fully how to interpret and apply pupillometry data in usability contexts, and whether the present results replicate in other commonplace online tasks. Future research is also needed to understand better the relationship between pupil diameter and different types of mental workload. Finally, it would be beneficial to research the relationship between pupil diameter and eye fixation, not just in the context of online task completion, but specifically their relationship to test outcome measures such as task success and satisfaction questionnaire ratings. It is possible that pupil diameter can provide additional insight into participants’ subjective perceptions of task completion or satisfaction, not just task difficulty.