Development of Night Time Calling System by Eye Movement Using Wearable Camera

Sakamoto, Kazuki; Saitoh, Takeshi; Itoh, Kazuyuki

doi:10.1007/978-3-030-60149-2_27

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12426))

Included in the following conference series:

International Conference on Human-Computer Interaction

1707 Accesses

Abstract

This paper proposes a night time call system using a wearable camera for patients. The proposed system consists of a wearable camera, computer, relay controller, and nurse call. The user wears the wearable camera. All captured eye images are fed to the convolutional neural network to detect the pupil center. When the detected pupil center exceeds a preset threshold value, the computer sends the signal to operate the nurse call via each relay controller. Two experiments were conducted to evaluate the proposed system: verification of the accuracy of pupil center detection and quantitative evaluation of call success. In the former experiment, we collected 2,800 eye images from seven people and conducted a pupil center detection experiment on several training conditions. As a result, an average error of 1.17 pixels was obtained. In the latter experiment, a call experiment was conducted on five healthy people. The experiment time for each subject was about five minutes. The subject experimented while lying on the bed. Twelve audio stimuli were given in one experiment, after getting the stimuli, the subject moved his eye. The correct call in response to the audio stimulus was considered successful, and the precision, recall, and F-measure were calculated. As a result, we obtained the precision, recall, and F-measure of 0.83, 1.00, and 0.91, respectively. These experimental results show the effectiveness of the proposed system.

You have full access to this open access chapter, Download conference paper PDF

Deconvolutional Neural Network for Pupil Detection in Real-World Environments

Eye Centre Localisation with Convolutional Neural Networks in High- and Low-Resolution Images

The Android-Based Acquisition and CNN-Based Analysis for Gaze Estimation in Eye Tracking

Keywords

1 Introduction

In amyotrophic lateral sclerosis (ALS) patients with speech and motor dysfunction, multiple system atrophy (MSA) patients, and muscular dystrophy patients, it becomes difficult to communicate their intentions due to severe motor dysfunction. Since eye movements often function until the end-stage, our research project focuses on developing an eye movement-based communication support system. In this paper, we develop a system that can be used, especially at night, using a wearable camera.

A (nurse) calling system is a tool for calling a nurse or a caregiver at a hospital or a nursing care insurance facility, and a family at home. It is an indispensable tool for patients to use when a physical abnormality occurs or if they have questions about their life. The patient pushes a call sensor or button, and calls a remote nurse or family member. However, for patients with ALS patients who have difficulty moving their muscles, an input device must be prepared according to the residual function. The eyeSwitch [1] is an operation support switch that can be operated (ON/OFF) by eye movements. The user can make calls, operate home appliances, and communicate through call devices, environmental control devices, and communication devices. The eyeSwitch can be used at night, but it requires large eye movements, it is difficult to detect slight eye movements. The eyeSwitch needs to be fixed near the bed with an arm, but the position needs to be corrected each time the patient’s position moves.

In this paper, we introduced an image-based method for detecting the pupil center with a wearable camera with high accuracy at night for a calling system. We also develop a prototype of a calling system that can be used at night time and evaluate the effectiveness of its performance through subject experiments.

2 Related Research

This section briefly introduces the nurse call and eye movement analysis related to this paper.

Ongenae et al. developed an ontology-based Nurse Call System [11], which assesses the priority of a call based on the current context and assigns the most appropriate caregiver to a call. Traditional push button-flashing lamp call systems are not integrated with other hospital automation systems. Unluturk et al. developed an integrated system of Nurse Call System Software, the Wireless Phone System Software, the Location System Software, and the communication protocol [13]. By using this system, both the nurse and the patient know that the system will dedicate the next available nurse if the primary nurse is not available. Klemets and Toussaint proposed a nurse call system [8] that allows nurses to discern the reason behind a nurse call allows them to make a more accurate decision and relieves stress. Regarding the nurse call system, there are many studies on improving the whole system, not individual devices such as switches.

Images and electromyography are available as means for analyzing eye movements. Since the latter uses contact sensors, this paper targets the former, which is a non-contact sensor. As for the image-based eye movement analysis, some products have already released that can estimate the gaze point rather than the movement of the eyes. The devices used can be roughly classified into two types; non-wearable and wearable devices. The former is a screen-based eye tracker that attaches to a display, for example, Tobii Pro Nano [5] and Tobii Pro Fusion [3]. The latter is the type with a small camera mounted on the eyeglass frame, for example, Tobii Pro Glass 3 [4] and Gazo GPE3 [2]. The research target related to eye movement analysis is divided into two types; it uses the existing eye tracker to analyze the gaze [10, 14] and proposes a method for detecting the eye or the pupil center point [6, 7, 15].

3 Night Time Calling System

3.1 Overview

The proposed system consists of a wearable camera, computer, relay controller, and nurse call, as shown in Fig. 1.

Our system is assumed to use at night. Therefore, a standard color camera is not suitable for shooting at night. Of course, the visible light illumination is not used because the user is too drowsy to sleep. Thus, a near-infrared LED (IR-LED) and near-infrared camera (IR camera) are used. Wearable cameras are not affected by the movement of the user’s head, and can always capture stable eye images. Although wearing a wearable camera during sleep puts a burden on the user, we decided to use a wearable camera after discussing it with a physical therapist. The wearable camera attached to the mannequin on the right in Fig. 1 is the device used in our system.

The computer processes all eye images taken by the wearable camera. A large-scale and high-performance computer is desired. However, our system is assumed to install near the bed of the user. At the facility’s request, our system avoids the use of both wired and wireless networks. For the above reasons, we adopted a small computer with a GPU for our system.

If our system sends a continuous signal directly from the user’s computer to a nurse call, the nurse call will ring every time the nurse call receives the signal. This system uses a relay controller to prevent a nurse call from being made due to a malfunction.

3.2 Pupil Center Detection

Our system uses the CNN-based pupil center detection method proposed by Chinsatit and Saitoh [6]. The method uses two CNN models, as shown in Fig. 2. The first CNN model is used to classify the eye state, and the second is used to estimate the pupil center position.

The architecture of the classification model is based on AlexNet [9]. The output of this model is two; the closed eye or the non-closed eye. Since the pupil’s center cannot be detected from the closed eye, unnecessary processing is skipped.

The second CNN model is based on the pose regression ConvNet [12]. The output of this model is the pupil center position \((P_x, P_y)\).

3.3 ROI Extraction

The eye image is taken with a wearable camera. However, the background sometimes appears in the eye image. In this case, since the pupil detection accuracy may decrease, the region of interest (ROI) is first extracted instead of directly inputting the captured image to the CNN.

An ROI is extracted based on an intensity difference value between two consecutive frames. When accumulates pixels of difference value equal to or larger than a threshold value. The maximum region in the accumulated image within a fixed time is extracted. Next, two types of ROI extracted from this region. The first is to extract without considering the aspect ratio (named ROI1), and the other is to extract a rectangle with a fixed aspect ratio of 3:2 (named ROI2).

Our system uses a wearable camera. Therefore, a pixel having no motion, such as background, has a low difference value. On the other hand, the difference in the pixels of the eyes and the skin around them becomes large due to blinking and eye movements. This makes it possible to crop the ROI around the eyes.

3.4 Calling Mechanism

In our system, the user’s intention is read from the detected pupil center, and a signal is outputted from the computer to the relay controller for calling. The target users in this study are patients with intractable neurological diseases. However, the progression of symptoms varies in individuals. For example, the direction and amount of movement of the eye are different. Therefore, it is desirable that the parameters can adjust for each user, and our system adopts a policy of manual adjustment.

In our system, when the user wants to call a person, he or she moves his/her eyes by a certain amount in the up, down, left, or right directions. In other words, four thresholds (upper, lower, left, and right) for the eye position are set in advance, and when the eye position exceeds any of the thresholds, it is determined that the user intends to call. Upon detecting this movement, the system sends a signal to the relay controller.

Figure 3 shows four eye images in which the eye image and four thresholds are drawn. Here, the green circle is the detected pupil center point, and the rectangle around the pupil is four thresholds. In the figure, the left side has the eye facing the front, and the pupil’s center is inside the rectangle. The second and third from the left are examples of exceeding the right and lower thresholds, respectively. In the figures, red bands on the right side and the lower side of the image are drawn to visually display the direction in which the threshold value is exceeded. The rightmost one is an example with an eye closed.

3.5 Implementation

As described in Sect. 3.1, our system needs to operate standalone without using the network. Therefore, in our system, we constructed a CNN server by Flask, a web application framework, in the computer, and sent the eye image of the wearable camera acquired by the client software to the server, and received the pupil center position which is the output of CNN. Figure 4 shows the process flow of our system.

Some users can move their eyes quickly, while others can move their eyes only slowly. In the former case, the signal of the relay controller is transmitted from the computer immediately after the eye position exceeds the threshold value. Even if the eye cannot be transmitted because the detection error at the center of the pupil does not exceed the threshold value, it can be moved again to take measures against false detection. Therefore, in this case, the single pulse transmission mode is performed in which a signal is transmitted once each time the threshold value is exceeded. On the other hand, in the latter case, the continuous pulse transmission mode in which signals are continuously transmitted while the threshold value is exceeded is adopted. Switching between these two modes allows the setting to be changed by the user.

Our system requires manual settings such as thresholds and transmission modes, which allows people with various symptoms to respond.

Figure 5 is an image captured on the computer monitor during the experiment. In the figure, the upper left is an eye image overlaid with information. The upper right is the temporal transition of the pupil center’s vertical position, and the lower left is the horizontal transition of the pupil center, which is visualized in real-time.

4 Evaluation Experiments

In this research, two experiments were conducted to evaluate the proposed system: verification of the accuracy of pupil center detection and quantitative evaluation of call success.

4.1 Pupil Center Detection

Dataset. We collected eye images from seven people; four healthy staff and three patients, using the proposed system. Table 1 shows the subject information and the number of collected images. At the time of collection, subjects had their eyes moved in five directions: front, up, down, left, and right. Since the number of collected images differs depending on the subjects from Table 1, it was decided to use 400 eye images from each subject, for a total of 2,800 eye images in this experiment.

Table 1. Subject information and the number of collected images.

Full size table

The size of the eye image taken by the wearable camera is \(1280 \times 720\) [pixels]. However, it was resized to \(120 \times 80\) [pixels] to reduce the processing time. It is necessary to give the ground truth to the eye state and the center of the pupil for each eye image to train and evaluate the CNN models. This work was done visually. Regarding the eye condition, a label of “Non-closed eye” was given when 50% or more of the pupil was visible, and a label of “Closed eye” was given otherwise.

Experimental Conditions. In this paper, we propose two types of ROI extraction methods of ROI1 and ROI2. Therefore, as for ROI, two extraction methods were compared.

It is desirable to prepare a lot of data for training the CNN model. However, in this experiment, we have not collected enough data, so we introduce two approaches, data augmentation (DA) and fine-tuning (FT). Regarding DA, we generated four images for each eye image, which was a combination of scaling, translation, rotation, and brightness value correction. Concerning FT, 1,980 images were collected from six healthy males, three females, nine healthy persons, using a different wearable camera, for a total of 17,820 images. The weight of the CNN models learned by using this is used as the initial value of FT.

Eye-state recognition and pupil center detection were performed under eight conditions that combined the application of two types of ROI, DA, and FT.

A person-independent task was conducted. That is, the test data was one patient, and the training data was six (the remaining two patients and four healthy staff). The experiment was conducted by the one-patient-leave-out method.

Result and Discussion. Experimental results are shown in Table 2. In the table, \(E_p\) means the error between the ground truth and the detection result of the pupil center.

Regarding the eye-state recognition, the non-closed eye’s recognition accuracy is higher than that of the closed eye. This is presumed to be due to the small number of closed eye training data. The highest recognition accuracy of 82.1% was obtained when ROI1 was used without applying DA and FT.

Regarding the pupil center detection task, the average error was at most 2.3 pixels, although the error varied depending on the conditions. The minimum average error of 1.17 pixels was obtained when DA and FT were applied using ROI1. Figure 6 shows the eye images in which the ground truth (green point) and the detection result (red point) are plotted. The errors of Figs. 6(a), (b), (c) and (d) were 2.63, 2.90, 8.73 and 8.82, respectively. Based on the plot results, it is judged that Figs. 6(a)(b) have been detected successfully. On the contrary, Figs. 6(c)(d) judges that the detection has failed.

Table 2. Eye state recognition and pupil center detection results.

Full size table

4.2 Calling Experiment

Experimental Protocols. A call experiment was conducted on five healthy people. The experiment time for each subject was about five minutes. The subject experimented while lying on the bed, as shown on the left side of Fig. 7. In order to avoid moving the eyes other than calling, the subject gazed at the image on the monitor mounted on the wall to look at the front, as shown on the right side of Fig. 7.

To reproduce an open call, we prepared a voice stimulus pointing in any direction up, down, left, or right. During the experiment, the subject instructed to perform eye movement after this voice stimulus. The time and direction of voice stimulation are random. Twelve audio stimuli were given in one experiment; that is, the subject was called 12 times by eye movement.

Result and Discussion. The experiment was conducted by turning off the lights at night. When the brightness during the experiment was measured with an illuminometer, the minimum, maximum, and average were 3.92 lx, 16.7 lx, and 9.9 lx, respectively.

The correct call in response to the voice stimulus was considered successful. The numbers of true positives (TP), false negatives (FN), and false positives (FP) were counted in all experiments. We also calculated precision P, recall R, and F-measure F by the following equations: \(P = TP/(TP + FP)\), \(R = TP/(TP + FN)\), \(F = 2 PR/(P + R)\).

Table 3 shows the result. The precision, recall, and F-measure of all subjects were 0.833, 1.000, and 0.909, respectively. The recall is 1.000, which means that the call succeeded without missing. On the other hand, the precision was 0.833. This is because the wrong pupil position was detected when the eyes closed with blinking by S2 and S5.

Table 3. Experimental result of calling experiment.

Full size table

Figure 8 is a graph showing the temporal transition of the pupil center coordinates in the subject experiment of S4. The horizontal axis is the number of frames, which corresponds to time. The vertical axis is the x or y coordinate. The red curves are the coordinate of the detected pupil center. The horizontal lines of green and blue mean the left and right or upper and lower thresholds. The vertical pink strip indicates that the eyes are closed. From these graphs, it can be confirmed that the pupil position exceeds the upper threshold or the lower threshold for 12 calls.

5 Conclusion

In this research, we developed a system that allows patients to call without stress at night using eye movements. Two experiments of pupil center detection and calling experiments were conducted to evaluate the effectiveness of the development system. As a result, a high detection accuracy with an average error of 1.17 was obtained for detecting the pupil center. In the subject experiment, the experiment was conducted not for the patient but the healthy person, and a high call success rate was obtained.

The user of our system is a patient. We have not been able to perform a call experiment with patient cooperation. We will work on this experiment in the future. In the pupil center detection, there is a failure due to blinking so that we address this problem.

References

eyeSwitch. http://www.emfasys.co.jp/index8f.html
Gazo GPE3. https://www.gazo.co.jp/gaze_point_estimation
Tobii Pro Fusion. https://www.tobiipro.com/product-listing/fusion/
Tobii Pro Glasses 3. https://www.tobiipro.com/product-listing/tobii-pro-glasses-3/
Tobii Pro Nano. https://www.tobiipro.com/product-listing/nano/
Chinsatit, W., Saitoh, T.: CNN-based pupil center detection for wearable gaze estimation system. Applied Computational Intelligence and Soft Computin 2017 (2017). https://doi.org/10.1155/2017/8718956
Gou, C., Zhang, H., Wang, K., Wang, F.Y., Ji, Q.: Cascade learning from adversarial synthetic images for accurate pupil detection. Pattern Recogn. 88, 584–594 (2019). https://doi.org/10.1016/j.patcog.2018.12.014
Article Google Scholar
Klemets, J., Toussaint, P.: Does revealing contextual knowledge of the patient fs intention help nurses’ handling of nurse calls? Int. J. Med. Inform. 86, 1–9 (2016). https://doi.org/10.1016/j.ijmedinf.2015.11.010
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: 26th Annual Conference on Neural Information Processing Systems (NIPS2012), pp. 1097–1105 (2012)
Google Scholar
Ohya, T., Morita, K., Yamashita, Y., Egami, C., Ishii, Y., Nagamitsu, S., Matsuishi, T.: Impaired exploratory eye movements in children with Asperger fs syndrome. Brain Dev. 36(3), 241–247 (2014). https://doi.org/10.1016/j.braindev.2013.04.005
Article Google Scholar
Ongenae, F., Claeys, M., Kerckhove, W., Dupont, T., Verhoeve, P., Turck, F.: A self-learning nurse call system. Comput. Biol. Med. 44, 110–123 (2014). https://doi.org/10.1016/j.compbiomed.2013.10.014
Article Google Scholar
Pfister, T., Simonyan, K., Charles, J., Zisserman, A.: Deep convolutional neural networks for efficient pose estimation in gesture videos. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9003, pp. 538–552. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16865-4_35
Chapter Google Scholar
Unluturk, M.S., Ozcanhan, M.H., Dalkilic, G.: Improving communication among nurses and patients. Comput. Methods Programs Biomed. 120(2), 102–12 (2015). https://doi.org/10.1016/j.cmpb.2015.04.004
Article Google Scholar
Wang, C.C., Hung, J.C.: Comparative analysis of advertising attention to Facebook social network: evidence from eye-movement data. Comput. Hum. Behav. 100, 192–208 (2019). https://doi.org/10.1016/j.chb.2018.08.007
Article Google Scholar
Yiu, Y.H., et al.: DeepVOG: open-source pupil segmentation and gaze estimation in neuroscience using deep learning. J. Neurosci. Methods 324, 108307 (2019). https://doi.org/10.1016/j.jneumeth.2019.05.016
Article Google Scholar

Download references

Acknowledgment

This work was supported by JSPS KAKENHI Grant Numbers 19KT0029.

Author information

Authors and Affiliations

Kyushu Institute of Technology, 680–4 Kawazu, Iizuka, Fukuoka, 820–8502, Japan
Kazuki Sakamoto & Takeshi Saitoh
National Rehabilitation Center for Persons with Disabilities, 4–1 Namiki, Tokorozawa, Saitama, 359–8555, Japan
Kazuyuki Itoh

Authors

Kazuki Sakamoto
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Saitoh
View author publications
You can also search for this author in PubMed Google Scholar
Kazuyuki Itoh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takeshi Saitoh .

Editor information

Editors and Affiliations

University of Crete and Foundation for Research and Technology – Hellas (FORTH), Heraklion, Greece
Constantine Stephanidis
Foundation for Research and Technology – Hellas (FORTH), Heraklion, Greece
Margherita Antona
Tsinghua University, Beijing, China
Qin Gao
Chongqing University, Chongqing, China
Jia Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sakamoto, K., Saitoh, T., Itoh, K. (2020). Development of Night Time Calling System by Eye Movement Using Wearable Camera. In: Stephanidis, C., Antona, M., Gao, Q., Zhou, J. (eds) HCI International 2020 – Late Breaking Papers: Universal Access and Inclusive Design. HCII 2020. Lecture Notes in Computer Science(), vol 12426. Springer, Cham. https://doi.org/10.1007/978-3-030-60149-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-60149-2_27
Published: 25 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60148-5
Online ISBN: 978-3-030-60149-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Development of Night Time Calling System by Eye Movement Using Wearable Camera

Abstract

Similar content being viewed by others

Deconvolutional Neural Network for Pupil Detection in Real-World Environments

Eye Centre Localisation with Convolutional Neural Networks in High- and Low-Resolution Images

The Android-Based Acquisition and CNN-Based Analysis for Gaze Estimation in Eye Tracking

Keywords

1 Introduction

2 Related Research

3 Night Time Calling System

3.1 Overview

3.2 Pupil Center Detection

3.3 ROI Extraction

3.4 Calling Mechanism

3.5 Implementation

4 Evaluation Experiments

4.1 Pupil Center Detection

4.2 Calling Experiment

5 Conclusion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us