A crowdsourced system for robust eye tracking

https://doi.org/10.1016/j.jvcir.2019.01.007Get rights and content

Abstract

Eye tracking is widely used in modern intelligent applications, such as HCI, somatosensory game and fatigue driving. Traditional eye tracking system based on Haar-like features or external hardware, which is loss of accuracy and complicated. It is obviously that human gaze point is related to head pose. However, the label of head pose in most dataset is ambiguous. So in this paper, we propose a crowdsourced system which can collect large-scale dataset for eye tracking. For better performance, we leverage head guidance point and random dot instead of fixed dot as the concern when capture frames from camera. And different illumination, poses and persons also considered for robust performance. And we propose a two-phase CNN training strategy for combining head pose and eye angles. The proposed CNN architecture can reduce the overfitting when we train eye tracking models with head pose directly. The experimental results show that our proposed method can perform well in eye tracking.

Introduction

With the development of computer vision, eye tracking is an indispensable technique [3], [33]. Eye tracking can be used in HCI, somatosensory or fatigue driving. HCI is an interface for communication between humans and computers, so eye tracking technique can be easily used for better experiences [31], [32], [33]. Eye tracking aims at considering the direction of gaze, so it is widely used in fatigue driving, which can provide alert when drivers are fatigued because of their changing direction of gaze. Besides, in modern researches, biology and psychology experiments show that human only focus on few objects when they observe a scene or images [5], [7], [9]. The gaze shifting paths (GSPs) can reflect the sequences of saliency regions within an image [2]. As shown in Fig. 1, human tend to focus on the most saliency regions within an image. Eyetracker II is a hardware which can capture human gaze shifting paths when human observe in front of the computer. It has a very high accuracy, but the cumbersome hardware is a limit.

Traditional method for eye tracking always based on Haar-like features. Haar-like feature is widely used in face recognition. Haar-like feature aims at the difference between different regions and consider the difference as the feature. It is low-level feature, so it is loss of accuracy. For example, it is always affected by different illuminations such as dark light. A better method is to use external hardware such as infrared equipment. It can leverage fusion feature of eyes to pursue a better performance. In this way, eye tracking can be achieved in real-time and some applications based on this can be possible. But it is obviously that relying on hardware devices is complicated and expensive.

As far as we know, there is no related dataset in this domain. So in this paper, we propose our dataset collection framework based on crowdsourced system. And a two-phase training strategy is proposed for better performance. In the first phase training, we train head pose and gaze angle respectively, which does not need very precise labels. It undoubtedly reduces the difficulty of collecting data and increases the robustness. In our second phase training, we combine the models of the first phase training to continue finetune. In this phase, the label is gaze point which obtained in our data collecting system. Our two-phase training strategy can reduce overfitting compared with training head pose directly.

The main contributions of our work can be summarized as follows:

  • Most related works demonstrate that training eye tracking model with head pose directly will lead to overfitting. The label of head pose is not easy to obtain. So in order to obtain more accurate result, researches always need precise equipment, which is undoubtedly complicated and expensive. We propose a two-phase training strategy, which does not require very precise labels during the first phase of training.

  • In this paper, we propose a succinct crowdsourced system for collecting dataset for eye tracking. In our implement, we consider most of the environment so our dataset is robust.

  • As far as we know, there is less related dataset for eye tracking. We believe that our work can promote the development of eye tracking.

Section snippets

Related work

In many intelligent systems, eye tracking technique can be used in many domains. Eye tracking can replace the mouse to complete the operation of the computer [11], [13], [22], [23], [24], [25]. And in many computer games, the application of eye tracking will provide better game experiences. Eye gaze system can be used in advertising analysis, fatigue driving. For example, when users browse the webpage, we can record the user's gaze shifting path and analyze the user's attention and time of

The data set

In this section, we propose our framework for eye dataset collection. As shown in Fig. 3, it is our dataset collection platform. In our implement, randomly generated dot is leveraged instead of fixed dot. In this way, large-scale various of eye dataset can be obtained, so our data are diverse. First, a randomly generated dot will be shown in the screen and volunteers are required to point their head at the point where they appeared. Then, around this point, another point was randomly generated.

Dataset collection

In our implement, we invite volunteers to collect eye dataset. In order to obtain robust performance, we set different illumination, persons or wear glasses or not. Our dataset collection is conducted in a PC with a camera.

First, volunteers are required to fixed their head in front of PC, the length Lec between eyes of volunteers and camera is recorded. The angle between eyes and the generated dot is an important information. As shown in Fig. 4, α denotes the angle between the generated dot and

Conclusion

Eye tracking is an indispensable technique in intelligent systems [17], [18], [19], [27], [28], [29], [30]. Traditional eye tracking based on low-level features, the result is always loss of accuracy. In this paper, we propose a data set collected strategy. We leverage randomly generated dot instead of fixed dot in order to obtain more robust result. Based on the data set, we propose a two-phase training strategy for eye tracking. We argue that the two-phase training strategy performs better

Conflict of interest

There is no conflict of interest.

References (34)

  • Tadas Baltrusaitis, Peter Robinson, Louis-Philippe Morency, OpenFace: an open source facial behavior analysis toolkit,...
  • Luming Zhang et al.

    Spatial-aware object-level saliency prediction by learning graphlet hierarchies

    IEEE Trans. Ind. Electron.

    (2015)
  • L. Zhang et al.

    Actively learning human gaze shifting paths for semantics-aware photo cropping

    IEEE Trans. Image Process.

    (2014)
  • L. Sun et al.

    Real-time gaze estimation with online calibration

  • L. Zhang et al.

    Weakly supervised photo cropping

    IEEE Trans. Multimedia

    (2014)
  • Mingliang Xu et al.

    An efficient method of crowd aggregation computation in public areas

    IEEE Trans. Circ. Syst. Video Technol.

    (2017)
  • L. Zhang et al.

    Probabilistic graphlet transfer for photo cropping

    IEEE Trans. Image Process.

    (2013)
  • G. Cheng et al.

    Duplex metric learning for image set classification

    IEEE Trans. Image Process.

    (2018)
  • L. Zhang et al.

    Fusion of multichannel local and global structural cues for photo aesthetics evaluation

    IEEE Trans. Image Process.

    (2014)
  • E. Wood, T. Baltrusaitis, X. Zhang, Y. Sugano, P. Robinson, A. Bulling, Rendering of eyes for eye-shape registration...
  • L. Zhang et al.

    An effective video summarization framework toward handheld devices

    IEEE Trans. Ind. Electron.

    (2015)
  • K. Krafka et al.

    Eye tracking for everyone

  • L. Zhang et al.

    Discovering discriminative graphlets for aerial image categories recognition

    IEEE Trans. Image Process.

    (2013)
  • X. Yao et al.

    Revisiting co-saliency detection: a novel approach based on two-stage multi-view spectral rotation co-clustering

    IEEE Trans. Image Process.

    (2017)
  • Luming Zhang et al.

    Weakly supervised human fixations prediction

    IEEE Trans. Cybernet.

    (2016)
  • J. Han et al.

    Robust object co-segmentation using background prior

    IEEE Trans. Image Process.

    (2018)
  • Luming Zhang et al.

    A fine-grained image categorization system by cellet-encoded spatial pyramid modeling

    IEEE Trans. Ind. Electron.

    (2015)
  • Cited by (9)

    View all citing articles on Scopus

    This article is part of the Special Issue on TIUSM.

    View full text