Evaluating gaze control on a multi-target sequencing task: The distribution of fixations is evidence of exploration optimisation

https://doi.org/10.1016/j.compbiomed.2011.11.013Get rights and content

Abstract

Many high cognitive applications, such as vision processing and representation and understanding of images, often need to analyse in detail how an ongoing visual search was performed in a representative subset of the image, which may be arranged into sequences of loci, called regions of interest (ROIs). We used the Trial Making Test (TMT) in which subjects are asked to fixate a sequence of letters and numbers in a logical alphanumeric order. The main characteristic of TMT is to force the subject to perform a default and well-known path. The comparison of the expected scan-path with the observed scan-path provides a valuable method to investigate how a task force the subject to maintain a top-down internal representation of execution and how bottom-up influences the performance. We developed a mechanism that analyses the scan path using different algorithms, and we compared it with other methods: we found that fixations outside the ROI are direct influence of exploration strategy. The paper discusses the method in healthy subjects.

Introduction

Recent studies are focused on methods and models that evaluate how humans explore real scenes in a naturalistic approach and how machine vision should emulate human searching. In the real world, visual search is a common task that enables humans to explore the environment and direct attention towards regions of interest. In experimental settings in which neuro-physiological and cognitive functions are investigated, quantification methods for visual search are used to evaluate the allocation of attention during scene viewing [1, for a short review].

In order to understand how mechanisms drive attention during visual exploration, many studies in which image characteristics are manipulated have been conducted. The main concept is that particular regions of interest in the scene are selected assuming their cognitive relevance or local image's saliency. When image salience is thought to guide visual search, the mechanism is called bottom-up. Conversely, when mechanisms driving visual search depend more on human intention, they are called top-down. These two exemplifications describe the main theories of visual search, although endogenous and exogenous components presumably work together in normal circumstances. By considering this dichotomy, a variety of formal models have been proposed on the last decade, in order to describe the attentional selection mechanism: Feature Integration Theories [2, FIT], Guided Search [3], Theory of Visual Attention [4], [5] and a new purely bottom-up model Winner Take All [6, WTA]. It has been suggested that early selection stages are purely driven by the image saliency factors in a bottom-up prevalence, and later selection is due to the combination of top-down and bottom-up factors. A key debate in this literature is whether bottom-up can override top-down and vice versa. For instance Theeuwes [7] and Daniel Schreij [8] found that the appearance of distracters reduced search efficiency, presumably due to the involuntary capture of attention. Conversely, Chen [9] found that visual search on the real word is dominated by the top-down mechanism.

The focus of recent researches, however, has shifted to how these processes should be combined and their relative contributions to search guidance, how top-down and bottom-up work together to perform an efficient visual exploration and how they interfere with each other (see [10], for a review); therefore, the research is directed towards unified methods of analysis that may better reflect real conditions.

In our research, we aimed to investigate how the mechanism of bottom-up and top-down works together on neuro-psychological context during the ongoing visual search. We aimed to use a task which encourages free exploration, avoiding too high saliency features, such as proposed by Theeuwes. We did not want to use a real word image because we needed an easy method to evaluate “bottom-up”-“top-down” competition or collaboration; the key idea was to adapt the version of Trial Making Test part B (TMT) [11] and some variants: the test is a neuro-psychological instrument in which numbers and letters have to be connected in numerical and alphabetical order (1A2B3C4D5E). Recent studies [12] proposed the TMT as a powerfully test to evaluate sequencing, symbol classification, memory and searching. Wolwer and Gaebel [13] demonstrated the validity of its application in patients, adapting the “paper pencil version” of test to the “computer version” which uses the cursor and a tone as feedback for achieving the target. In the “eye-tracking version” proposed in our study, the subject is required to connect letters and numbers by moving the gaze (see Method section) over each sign of the sequence without any feedback such as tone or guided search.

The main characteristic of TMT is to force the subject to perform a default and “a priori known” path; the key idea was to compare the expected scan-path (1A2E) with the observed scan-path.

The Scan-path was one of the first methods [14] to identify patterns of eye movement: [15] defined a number of spatial Regions of Interest (ROIs) in the scene being scanned and recoding the fixation sequence as a series of letters representing the fixated locations. Brandt and Stark [16] used string-edit analysis to compare the viewing pattern of a scan-path. Recently, Cristino et al. [17] developed an interesting method (ScanMatch) which consists on transition matrix among ROIs and the usage of Levenshtein distance to compare scan path. The Levenshtein distance measures the editing cost of transforming one string into another one using, in its basic form, a set of three operations (insertion, deletion and substitution) with a cost of one for each operation. Related works [18], [19], [20], in which the cognitive processes underlying visual search performance are investigated, measures such as fixation duration, fixations per trial, saccadic latency and saccadic distribution, have been developed. In this context, Region of Interest (ROIs) are predefined over cognitively relevant parts of the image, and features such as time spent in ROI and saccade trajectories [21, for a survey] are calculated. Other authors extract the regions of interest automatically using fixation distribution during visual exploration [22], [23]. In a more complex manner, extraction of saliency maps was developed to predict fixation locations [6].

In our method, due to the distinctive characteristic of TMT, in which subject must follow a predefined sequence (attended scan-path) of symbols, we propose to evaluate the scan-path made by subject respect to the attended exploration, in order to extract some features representing the differences between attended and observed exploration. The method was tested in a group of normal volunteer subjects: we developed a set of computational indicators based on scan-path and fixations analysis to evaluate the visual search performance. We compared the results with the ScanMatch proposed by Cristino et al. [17] in order to provide a valuable reference.

Section snippets

Materials and methods

We enrolled 30 volunteers (17 female and 13 male) aged 25–45 years (CTRL) with normal vision and not taking medicines and without refractive defects were enrolled. All participants in the study were trained on the TMT test by a psychologist. Subjects were seated at a viewing distance of 78 cm from a 24 in. colour monitor (51 cm×33 cm). Eye position was recorded using the ASL 6000 system, which consists of a remote-mounted camera sampling pupil location at 240 Hz. Nine point calibration and 3-point

Calculations

Data were stored in a comma separated file imported into Matlab. A simple blink-removal filter was applied; the filter substituted blink values (pupil diameter 0) or missing data (horizontal or vertical coordinates out of range) by linear interpolation data. Large segments of missing data were marked (duration >40ms) in order to exclude them from analysis.

Numbers and letters were sampled as pre-defined rectangular ROIs centered on letters and numbers and having widths and heights merging from

Subjects vision assessment

It has been demonstrated that information acquired from the visual periphery in one fixation (peripheral preview) can influence the subsequent pattern of eye movements [30], [34], [35]. In order to assess that correct execution of the task was not affected by a defect on peripheral vision and scene understanding, we verified the probability to reach the target as a function of FDN. Fig. 4 shows the probability density function of normal subjects depending from distance only; the result showed

Discussion

The FDS indicator confirmed all hypotheses: FDS grew up when the subject was forced by TMTD saliency to make sparser fixations, but we found, also, a strong significant difference between ET and TMT. Considering the characteristics of peripheral vision and generally of attentive focus [36] explained in Section 4.1, fixations outside the ROI on the task TMT/TMTD were due to mechanisms of efficiency rather than to the salience of the image exploration. We concluded that FDS is an indicator of

Conclusion

According to Yarbus [41], Itti and Koch [6], Noton and Stark [15], Privitera and Stark [22], Awh et al. [42], the construction of an accurate visual representation depends crucially on optimal and precise selection of subsequent fixation points. Thus, the study of how fixations are made with respect to pre-defined ROIs is primary for understanding the visual search behaviour.

We developed three methods. The sequencing algorithm was specific for the TMT and related tasks where the subject is

Conflict of interest statement

None declared.

Giacomo Veneri is graduated at University of Siena in Computer Science. Experience in computer-assisted diagnosis and signal processing in neuroscience context. He is chief of technological office of Etruria Innovazione Industry.

References (48)

  • J. Braun

    Natural scenes upset the visual applecart

    Trends Cogn. Sci.

    (2003)
  • M. Pomplun

    Saccadic selectivity in complex visual search displays

    Vision Res.

    (2006)
  • R.J. van Beers

    Motor learning is optimally tuned to the properties of motor noise

    Neuron

    (2009)
  • J. Henderson

    Human gaze control during real-world scene perception

    Trends Cognit. Sci.

    (2003)
  • J. Wolfe

    Guided search 2.0—a revised model of visual-search

    Psychon. Bull. Rev.

    (1994)
  • C. Bundesen

    A theory of visual attention

    Psychol. Rev.

    (1990)
  • C. Bundesen et al.

    A neural theory of visual attention: bridging cognition and neurophysiology

    Psychol. Rev.

    (2005)
  • J. Theeuwes

    Top-down search strategies cannot override attentional capture

    Psychon. Bull. Rev.

    (2004)
  • J.T. Daniel Schreij et al.

    Abrupt onsets capture attention independent of top-down control settings

    Attention Percept. Psychophys.

    (2008)
  • Z.G. Chen X

    Real-world visual search is dominated by top-down guidance

    Vision Res.

    (2006)
  • S.V. der Stigchel et al.

    The limits of top-down control of visual attention

    Acta Psychol. (Amst)

    (2009)
  • R. Reitan

    The validity of the trail making test as an indicator of organic brain damage

    Percept. Motor Skills

    (1958)
  • C.R. Bowie et al.

    Administration and interpretation of the trail making test

    Nat. Protoc.

    (2006)
  • W. Wolwer et al.

    Impaired trail-making test-b performance in patients with acute schizophrenia is related to inefficient sequencing of planning and acting

    Neuropsychobiology

    (2003)
  • Cited by (9)

    • Gaze behaviour: A window into distinct cognitive processes revealed by the Tower of London test

      2022, Vision Research
      Citation Excerpt :

      Therefore, the present investigation sought to examine the utility of using gaze-based metrics to characterize the performance proficiency of healthy adults during a challenging problem-solving task, and to determine how changes in gaze behaviour support the increasing demands placed on planning and working memory. Gaze behaviour has been analyzed in various lab-based paradigms that specifically rely on executive functions (i.e., working memory, response inhibition, and attentional set shifting) (Hodgson et al., 2000; Kaller et al., 2009; Martin et al., 2017; Veneri et al., 2012). These higher-level cognitive processes provide the foundation for goal-directed behaviour (Diamond, 2013).

    • Eye movements in Parkinson's disease during visual search

      2022, Journal of the Neurological Sciences
      Citation Excerpt :

      Searching for an object of interest in a complex visual scene requires a combination of top-down cognitive control with visuomotor behavior. The cognitive and motor interaction was elegantly described in several studies where eye-tracking was combined with cognitive neuropsychological tasks [1–4]. Selective visual attention allocates limited cognitive resources to effective sampling of our environment, accurate perception of objects of interest around us, and efficient planning and execution of motor responses [5–7].

    • Teacher's visual attention when scaffolding collaborative mathematical problem solving

      2019, Teaching and Teacher Education
      Citation Excerpt :

      Instruction forms a very complex interactional and perceptual context for the teacher. A person's gaze is guided by both his or her intrinsic intentions and extrinsic stimuli, each affecting the other (Tatler & Land, 2015; Veneri, Rosini, Federighi, Federico, & Rufa, 2012). When teachers focus on managing the class to enhance students' learning (Wolff, Jarodzka, & Boshuizen, 2017), they monitor and interpret their actions and make multiple pedagogical decisions (Prieto et al., 2017; Yamamoto & Imai-Matsumura, 2012).

    • The effect of four user interface concepts on visual scan pattern similarity and information foraging in a complex decision making task

      2018, Applied Ergonomics
      Citation Excerpt :

      Local sequence alignment (Goldberg and Helfman, 2010; Myers, 2005; Ponsoda et al., 1995) quantifies the similarity between scan patterns or fragments thereof (Madsen et al., 2012; Pihko et al., 2011). This can outperform the Levenshtein distance (Cristino et al., 2010; Mathôt et al., 2012; Veneri et al., 2012), and informs about the length of parts/fragments within a sequence that might be similar between trials. A high alignment score indicates similarity between longer parts of two sequences and hence higher repeatability of scan patterns.

    • TEACHER’S GAZE BLIND SPOT IN SCIENCE LECTURE CLASS

      2023, Journal of Baltic Science Education
    • Tracking gaze while walking on a treadmill: Spatial accuracy and limits of use of a stationary remote eye-tracker

      2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2014
    View all citing articles on Scopus

    Giacomo Veneri is graduated at University of Siena in Computer Science. Experience in computer-assisted diagnosis and signal processing in neuroscience context. He is chief of technological office of Etruria Innovazione Industry.

    Francesca Rosini is neurologist with a clinical activity at the Unit of Neurology and Neurometabolic Diseases of the University of Siena.

    Pamela Federighi is graduated at University of Pisa in Computer Science. Experience in computer-assisted diagnosis and signal processing in the neurological context.

    Antonio Federico is a Full Professor of Neurology at the University of Siena, Director of the Neurometabolic Disease Unit and of the Research Center for diagnosis, therapy and prevention of Neurohandicap and Rare Neurological Diseases.

    Alessandra Rufa is neurologist and ophthalmologist with a research position and clinical activity at the Unit of Neurology and Neurometabolic Diseases of the University of Siena. is chief of Eye tracking & Visual Applications lab.

    View full text