Keywords

1 Introduction

1.1 Pupil Data as a Cognitive Measure

Pupil diameter has become a widely accepted physiological indicator of mental workload. As the amount of mental effort required for a task changes, pupil size will also change. Hess and Polt [1] were some of the first researchers to demonstrate that pupil size increases as task difficulty increases. They performed an experiment using mental arithmetic problems with increasing difficulty. Pupil diameter was recorded using a camera and measured manually for each individual frame with a ruler. The participants typically showed an increase in pupil size which reached a maximum immediately before answering a question, and then retracted to a baseline size. The results of this study revealed that there is a strong correlation between pupil diameter and level of difficulty of a problem. Hess and Polt helped to pioneer the use of eye data as a cognitive measure in 1960. Their results help to demonstrate that pupillary response is a valuable measure for problem-solving, as well as other mental processes [1]. Soon after, many researchers in the field quickly began investigating other cognitive factors influencing pupil diameter,

Working memory is a cognitive measure which has been shown to influence pupil diameter. Kahneman and Beatty [2] quickly began to expand on the findings of Hess and Polt [1]. They were concerned with investigating the discovery that the pupil dilates while a person is listening to information, and then contracts as they report it. The task required for this study involved listening to strings of digits and reporting them back immediately. Task difficulty varied across trials with more digits being considered more difficult. The researchers determined that there was a “loading phase” in which the pupil dilates with each digit presented, and an “unloading phase” in which the pupil size would decrease with each digit being reported. They also found that the maximum pupil size obtained was correlated to the number of digits that were presented [2]. This study has influenced the utilization of a digit-span task to observe working memory fluctuations by measuring pupil diameter.

Early research in pupil diameter also identified pupil diameter differences between individuals of different cognitive ability levels. Ahern and Beatty [3] performed a study using mental multiplication problems which differed in difficulty. They found that people with higher cognitive ability, as evidenced by scores on the Scholastic Aptitude Test (SAT), performed better on the multiplication problems at every level of difficulty than those with a lower cognitive ability. What is unique about this study is that the subjects with higher cognitive ability showed smaller task-evoked pupillary dilations compared to their lesser-scoring peers [3]. The results of this allow us to identify a possible physiological measure of intelligence.

More recent research built on the findings above [3] to observe the relationship between pupil diameter of individuals during baseline and intelligence [4]. The study classified individuals as having a lower working memory capacity if they scored in the lower quartile for an OSPAN task. Pupil diameter was measured during a baseline task in which the participants looked at a dark screen. The individuals that were considered to have lower working memory capacity were determined to have significantly smaller pupil diameters than individuals in the upper quartile of working memory capacity [4]. The average pupil diameter of individuals with the higher working memory capacity was 1 mm larger than those with the lower working memory capacity. This baseline difference is considerable since pupillary increases during cognitively challenging tasks are typically less than 0.5 mm [4]. These results help to solidify using pupil diameter as a physiological measure of intelligence.

Originally, research using pupil diameter to measure cognitive effort treated it as a reporter variable. Essentially, that means it was used as one which fluctuates with cognitive processes despite having no obvious relationship to those processes [5]. Several studies in recent years have uncovered a link between activation in the locus coeruleus (LC), the region in the brain stem responsible for production of norepinephrine, and pupil dilation [6]. The strength of the association is strong enough that many researchers now consider pupil diameter to be a means of assessing LC activation.

1.2 Other Physiological Measures

In spite of the growing research into understanding pupillary responses, there has been minimal work exploring the degree to which an individual’s baseline pupil diameter and pupillary response vary across days. Pupil diameter has become a widely accepted measure of cognitive ability, yet we are still unsure how reliable these measurements are across days. Much like pupil size, other physiological responses can change with cognitive tasks, as well as show a variation from day to day [7]. For example, heart rate variability changes depending on the type of cognitive task being performed [7].

Similarly, another study sought to predict changes in performance on a cognitive task using current heart rate variability [8]. They utilized an Advanced Trail Making Test (ATMT) as their task to measure cognitive performance. To assess heart rate variability, they used an electrocardiogram, which is a less evasive than other methods of assessing cognitive performance, such as an electroencephalogram (EEG). The researchers determined that for all of the participants, a decrease in heart rate variability (as well as an increase in sympathetic and parasympathetic nerve activity), were strong contributors to a decrease in cognitive performance. They were capable of predicting performance with an 84.4% accuracy [8]. This study demonstrates a way in which heart rate variability may be similar to pupillary response, as they are both methods for determining cognitive performance or workload. Because many physiological responses of a person are similar to pupillary responses, we hypothesize that one ought to see within subject differences in pupil diameter from day to day.

1.3 Prior Dark-Adapted Research

Brown et al. [9] are some of the only prior researchers to investigate day to day variations in pupil size. This was a fairly recent study in which they looked at whether there was a pupil diameter difference when testing dark-adapted pupils. The participants were subject to different dark-adaption protocols and had their pupil size measure twice in one week, between one to seven days apart. The results indicated that they did not find any significant difference in pupil diameter. However, their participants were only measured twice, which may not be sufficient time to observe a significant difference. Also, the purpose of this study was to evaluate different dark-adaptation protocols for the preoperative assessment of refractive surgery [9]. The present study aims to build on these findings with the intention of a different use for the conclusion.

1.4 Goal

As far as we know, there have yet to be any longitudinal studies conducted investigating the overall variation of pupil diameter across more than two days. Pupil size changes as mental workload changes, but does it also change across days? There has been extensive research on other physiological measures which may be indicators of mental workload. However, many of these methods are evasive and awkward. For example, an electrocardiogram or an electroencephalogram require a participant to be hooked up to electrodes. Methods like these restrict movement and make the participant feel uncomfortable. Not to mention, these can be very expensive tools to purchase [8]. Therefore, it is important to have a comfortable, low cost, and reliable method for measuring cognitive processes. Lost-cost eye trackers seem to fit the criteria, but it has yet to be determined if pupil diameter is a reliable measure from day to day for individuals.

There are many factors that could potentially influence a person’s physiological responses each day, such as amount of caffeine intake, fatigue, alertness, or sleepiness. We are interested in modeling whether there is variation of pupil diameter within subjects by day and time of day.

2 Methods

2.1 Participants

Eye tracking data were collected from 7 volunteers (4 male, 3 female) working at the Naval Research Lab. Their ages ranged from 21 to 38 (M = 30.43, SD = 6.65).

2.2 Materials

This experiment utilized a Gazepoint GP3 HD Desktop eye tracking system and pupil data were collected at 150 Hz. Each participant calibrated the system prior to the start of each experiment using the built-in calibration system from the Gazepoint control software. Data were collected on a 24 in. monitor with a 3840 × 2160 resolution. An Essilor digital corneal reflection pupilometer (CRP) was used to measure the interpupillary distance for each individual. The interpupillary distance was recorded for 100, 65, and 50 cm focal points.

2.3 Procedure

At the beginning of the experiment, each participant sat in the experimentation room for two minutes with the door shut, and the monitor and lights turned off. This waiting period was imposed to allow each individual’s eyes to acclimate to the darkness. Immediately following, participants turned on the monitor and opened the Gazepoint software. Participants were seated approximately 60 cm from the display based on Gazepoint’s guidelines. They began the calibration process for the eye tracker. The calibration was performed using the default procedure included with the Gazepoint software. This involves following a circle around the screen and pausing at 9 specific locations. Once calibration was satisfactory, the participants conducted a color change task, digit span task, and psychomotor vigilance task. The order was constant throughout the entire data collection period. However, only information from the color change task is presented below as the other tasks were not the focus of this study. Each participant performed the experiment twice per day; one session was before lunch, and the second session was performed ~3–5 h after the first. Participants did this for a total of 10 days.

2.4 Color Change Task

The color change task was a resting luminance change task. Pupil size was captured on each individual’s response to change in screen luminance. Each participant focused on a crosshair in the center of the screen. The screen started as black and remained constant for 30 s. The screen immediately changed to gray for 30 s, and then white for another 30 s. There was no transition period between the different colors. The entire process took one minute and 30 s.

Following data collection, the participants had the distance between their pupils measured. This ground truth data made it possible to convert pixels to millimeters using the pixel data generated by the Gazepoint GP3. Pupil diameter was manually converted from pixels to millimeters as the millimeter data from the Gazepoint system was found to be inaccurate. Having ground truth measurements made it possible to compare each participant’s data.

2.5 Eye Tracking Data

The Gazepoint GP3 system measures left and right pupil diameter in pixels, millimeters, and the x,y position of the pupil within the system’s camera. The pupil position data allowed for the computation of a third pupil diameter measurement, which used the individual’s interpupillary distance recorded from the Essilor CRP at 65 cm. The distance between the eyes in pixels was divided by the individual’s IPD in mm to provide a pixel to mm conversion factor. This factor was calculated for each sample and used to compute new left and right mm values for each sample from the pixel data. We converted the pupil diameter in pixels to millimeters because the millimeter data from the Gazepoint system was found to be inaccurate. Using an on-screen live measurement of pupil diameter in pixels and mm, we were able to observe that the mm data was extremely sensitive to slight shifts in distance to the eye tracker. For example, a head shift of a few centimeters closer to the screen could increase the mm data by more than a few mm, but not have any impact on the pixel data. Having ground truth measurements made it possible to compare each participant’s data.

The Gazepoint system records a binary quality measure with each data point to signify whether the system considers the quality of the data point to be good or bad. We used this as a general filtering method and removed any data the system considered to be bad.

2.6 Analysis

All of the analysis was conducted in R [10]. In past studies [11], left and right pupil size have been highly correlated (R = .90). Therefore, we only used data from the left pupil for this analysis.

The median pupil diameter of each person at each session was calculated. Medians were taken for the entire session, as well as for each individual background color. Medians were used to help reduce the effects of outliers in the data. There were a few missing time points – therefore, the mice package [12] was used to impute data. The predictive mean matching method was used and only used the first data set in the subsequent analysis since there were only 14 missing sessions (out of 140) due to missing data. Coefficient alpha (Cronbach) was then calculated to estimate the reliability of the eye data. A visual display was also created to show the median values of each participant which was calculated using ggplot2 [13].

3 Results

Coefficient alpha was calculated for the data described above. The analysis revealed that the data produced highly reliable results. With the seven participants across 20 measurement sessions (10 days, twice per day), the overall reliability across the color change task is α = 0.98. The results are also highly reliable for each individual background color. Black was shown to have an α of 0.98, Gray an α of 0.99, and White an α of 0.98 (Figs. 1, 2, 3 and 4).

Fig. 1.
figure 1

Median pupil diameter (mm) for each session each day, averaged across all three background colors of the Color Change Task. (Color figure online)

Fig. 2.
figure 2

Median pupil diameter (mm) for each session each day while looking at a black screen, by participant. (Color figure online)

Fig. 3.
figure 3

Median pupil diameter (mm) for each session each day while looking at a gray screen, by participant. (Color figure online)

Fig. 4.
figure 4

Median pupil diameter (mm) for each session each day while looking at a white screen, by participant.

A two sample t-test was also performed to compare pupil diameter from AM sessions and PM sessions. There was no evidence of a significant difference in pupil diameter across time of day (t (121.12) = − 0.687, p = .4933).

These results are highly reliable, except there was a slight abnormality observed in the data; the median pupil diameter appears to rise across days, which can be seen in the figures above. We are unsure of the exact reason for this second effect, and it could require further investigation to determine the underlying causes.

4 Discussion

The goal of this study was to determine whether pupil diameter is a constant measure across days for individuals. Using coefficient alpha, this study revealed pupil size is a highly reliable measure and it does not significantly change across days. Also, through a two sample t-test, it was shown that pupil diameter does not significantly differ across time of day.

Future research should attempt to replicate this procedure and possibly use a longer longitudinal study. It may be beneficial to see if other studies find the same result of median pupil size rising across days, and further investigate this abnormality. Additionally, it would be useful to explore if there are any factors that may contribute to pupil size variation within subjects. For example, a person who has more caffeine in the morning may have larger variation by time of day than someone who does not intake caffeine. This could be explored using a self-report questionnaire before each session to determine a person’s fatigue, sleepiness, amount of stress, etc. This could also be manipulated through a design in which one controls for variables, such as drugs or amount of sleep. Altering a person’s state from one day to the next may show different results than what were revealed with this study.

Future research should also investigate day to day variability across a variety of cognitive tasks, such as memory, attention, arithmetic, and vigilance tasks. Since research has shown that cognitive demands influence pupil size, it would be useful to see how reliably pupil data can be used to measure tasks which require mental effort across multiple days. We collected data on two cognitive tasks (digit span and psychomotor vigilance) but did not analyze it for the use of this study.

Overall, the stability of pupil diameter, as assessed via a low-cost eye tracker, suggests that the equipment can be sensitive not only to changes within an individual but also able to differentiate across individuals. However, it is important to note that these results were obtained using pixel data that was converted to mm based upon an individual’s interpupillary distance and not the system’s provided mm data. Although devices to measure interpupillary distance are also inexpensive it does add an extra step in data collection and processing. It also suggests that the mm data obtained directly from the Gazepoint software may not be reliable. Therefore, future studies might consider using the same process of converting pixel data into mm data manually, rather than performing analyses using the given mm data.

The findings of this study ought to be encouraging. The ability to reliably capture pupil diameter using low-cost eye trackers suggests that these new low cost systems may be incorporated into a broader range of cognitive research. Pupil diameter has become a widely used physiological measure of cognitive information, such as intelligence [3], and working memory capacity fluctuations [2]. The ability to reliably capture this data using low-cost systems, rather than something more expensive and uncomfortable, such as an electroencephalography or electrocardiogram, should persuade more researchers to opt for this low cost alternative.