Keywords

1 Introduction

Visual Working Memory (VWM) is considered as a limited capacity system for storing, manipulating, and utilizing, visual information, which is fundamental for many cognitive tasks or further processing [12]. It is important for reasoning and guidance of decision making and behavior. In addition, working memory helps users to hold information long enough to put the information to use. Working memory capacity is an evaluation of individual differences in the efficiency with complex cognitive functions [18]. In addition, weak working memory may affect the learning process and overall performance, especially in the work place, and may cause events resulting in injury or even death.

The human brain processes information by paying attention and reacting accordingly to all types of sensory inputs, including visual data [8]. The concept of attention is considered as a brain mechanism, which is developed to ensure that the most significant information is chosen for further processing. This selective attention is a significant process in the vision system [6]. However, with the lack of attention, the limitation of information processing capacity in the human brain stops the brain from receiving information for further processing.

Visual working memory is considered as part of the cognitive system in which memory and attention interact with each other in order to solve complicated cognition problems [18]. The challenge can be presented by sort of visual tasks of remembering the order of items. Therefore, this task provides an ideal context for studying working memory capacity to control the contents of attention.

The human brain constantly produces several electrical signals from different activities such as thinking, learning, sleeping, etc. [20]. These signals can be detected from outside of brain through several sensors [20]. Over the last few years, there has been a growth of affordable devices for electroencephalographic (EEG) recording systems and these devices allow scientists to access the raw data for research purposes [23]. MUSE is one of the devices which can detect a range of brainwave in an active manner [23]. Brainwaves are usually divided into five bands including Delta, Theta, Alpha, Beta, and Gamma waves [21]. According to Adjouadi et al., Beta waves occur when alert, actively thinking, or problem solving [1]. Using these waves can help obtain the level of attention during performing specific tasks. These sensors have been used as a complement for the assessment along with a humanoid robot.

The existence of low-cost humanoid robots allows the researchers in the robotics community to explore further research in human-robot interaction (HRI). In addition, robots are able to simulate and imitate many human functions continuously without getting tired [17]. They can observe, communicate, move, sense their environment, respond to changes, etc. In this research, we have used a humanoid robot in order to instruct, observe, and evaluate the user for the specific tasks. The reasons why a humanoid robot is used are included, but not limited to following; First, in many cases robots do not have the background of users which can affect the result of assessment and they can provide feedback unlimitedly. Second, robots are able to perform tiring and dangerous tasks. Third, robots in industry are able to work at a constant speed without breaks or delay.

The main contribution of the present research is to verify the effects of cognitive training and proving feedback during a working memory assessment in work stations by using humanoid robots. We have collected data from several sensors of the human brain as well as several parameters during the assessment including the number of errors, delay, speed, and level of attention. In addition, a survey is given to each subject which participated in the assessment to ask about their experience with the assessment and the results are provided in the results and discussion sections.

The present section provides an introduction about the visual working memory and monitoring attention assessment as well as introduction to human brain signals and the NAO robot which have been used for this study. The following sections of the paper are dedicated to the thorough analysis and representation of the proposed application. In Sect. 2, we address the influences of related work, which had similar assessment. Section 3 introduces the hardware and sensors which have been used. Section 4 provides the game overview and includes visual working memory and monitoring attention assessments. Section 5 describes the methods which have been used to conduct this study including computer vision methods, and task switching paradigm. Section 6 discusses the experimental results. Section 7 is a discussion of the study. Finally, in Sect. 8, we provide a summary of the work that was conducted and we mention the challenges and plans for future work.

2 Related Work

Techniques developed in recent years provide a reasonable degree of choice in the nature and level of detail of the working memory assessment which are now also open to a wider range of users. One new development is that working memory problems can now been assessed indirectly, using knowledge of the workers’ behavior while performing tasks.

The NAO humanoid robot is being used as an intelligent assistant for several studies and is capable of color recognition, voice recognition, and face recognition [10, 14] These Nao robots are extensively used in recent researches in the field of rehabilitation and training.

Simonov and Delconte proposed an approach where the NAO robot was used for rehabilitation training and assessment with minimal human supervision. The paper aims at providing a humanoid robot based assessment for rehabilitation such as pulmonary rehabilitation in Chronic Obstructive Pulmonary Disease (COPD) patients. The NAO robot is implemented with an automated judgment functionality to assess how the rehabilitation exercise matches the pre-programed sequence. The authors were able to achieve complete monitoring of the patients with minimal human supervision [19].

Shamsuddin et al. proposes a robot-based assessment and training for autistic children who had impaired intelligence. Authors have proved that using a humanoid robot for interaction augments their communication skills. After training the children with the NAO robot, assessments show that 4 among 5 children exhibited a decrease of autistic behavior with credits going to the human like appearance of the robot [16].

Similar research is being done to improve and assess the short term working memory over the past decades. With the improvement of technology, researchers aim to utilize them in order to increase the speed and efficiency of the assessments.

A popular research conducted by Lorenzo et al. proposed an approach for work memory assessment and response inhibition through “First person shooter” game. The game requires the players to rapidly react to fast moving objects and auditory stimuli. The working memory is assessed with two set of people, one having experience with FPS games and the other group did not. Results proved that the impulsivity and updating of the task relevant information was much higher with experienced people [4].

Similarly, Daneman and Carpenter have taken a different approach of assessing working memory capacity through reading and comprehending. With the developed system, the authors were able to prove that poor readers are less efficient that they can maintain less information in their working memory. The assessment was made with tests that involved facts retrieval, words recall and pronominal references [5].

Westerberg et al. examined the effects of working memory assessment in adults affected by stroke. Authors claim that statistically significant training effects were observed on the non-trained tests for the working memory and attention. They also claim that more than one year after stroke, systematic working memory training can significantly improve attention [22].

Belpaeme et al. proposed a multi-modal approach for child-robot interaction using NAO robot to build social bonds. Authors claim that, robot that is built to communicate with humans have greater impacts in creating bonds with the humans. The authors explain about techniques to increase the effectiveness in interaction [2].

Our proposed framework combines working memory assessment approaches with Socially Assistive Robotics (SAR) in order to evaluate cognitive function and visual working memory among users (workers) in industries to reduce or prevent death or injuries due to weakness in working memory or cognitive function.

3 Hardware and Sensors

In the following sections, the hardware and sensors which have been used for this study are described including the humanoid robot, the vision system and the brain sensing headband.

3.1 Humanoid Robot

Nao is a humanoid robot, which is developed by Aldebaran, and is used for research and education purposes [9]. NAO equipped with several sensors that it uses to sense its environment. For example, the Nao robot has sonar sensors to verify the distance of objects in its vicinity and tactile sensors on its head and body which are triggered when they get touched by a user. These sensors provided capability of communication between user and robot [9]. The Nao humanoid robot is shown in Fig. 1.

Fig. 1.
figure 1

Aldebaran’s Nao robot is an autonomous, programmable humanoid robot.

3.2 NAO’s Vision System

The vision system uses the built-in cameras in the robot. The Nao robot is equipped with two different cameras. One of them is located in the forehead (top) and the other one is located at the mouth level (bottom). The top camera scans the horizontal direction, while the bottom camera focuses on the floor. Both cameras provide 640 × 480 up to 960 × 1280 resolution at 30 frames per second [9]. For this research, the bottom camera is used in order to perform computer vision processing including object and color detection.

3.3 Brain Sensing Headband

Electroencephalogram (EEG) is one of the rich sources of information for accessing to brain electrical activities [11]. An extensive amount of research has been done using professional EEG hardware in a wide variety of contexts and applications. However, with the development of cheap, easy to use EEG hardware, using EEG as a human-computer interface for many types of applications has become plausible. There are several sensors associated with the headband for collecting EEG waves including reference, forehead, and Smart-sense Conductive Rubber ear sensors. In this research, we have used MUSE in order to measure attention level based on brain signals. The signals give the degree of attention and concentration of users while performing the assessment. The Nao robot receives these values and the feedback and training will be given to user based on the values which received from the MUSE.

4 Game Overview

A rehabilitation training requires the ability to adapt to a changed living and working environment. In order to train and evaluate workers’ cognitive function in industries a robot-based assessment is designed. This assessment is implemented to evaluate the visual working memory and cognitive function of users. The game starts with an easy task and the level of difficulty of the game will be increased gradually according to users’ performance. The humanoid robot observers and collects data in order to provide feedback. Feedback can be visual, verbal, and immediate. In the following section the details of assessment are provided.

4.1 Visual Working Memory Assessment

A humanoid robot is used as an instructor for a vocational assessment task. The main reason which a humanoid robot is used, is to provide visual and verbal feedback while user performing the assessment. The assessment starts with showing a random sequence of different colors for three (initial value) seconds. By passing each level to another, the level of difficulty will be increased. In addition, the assessment contains task-switching to replace a color with a different color and user needs to remember this substitution. The assessment has three levels of difficulty including easy, medium, and hard. The levels and number of each blocks which are associated to each level are provided in Table 1.

Table 1. Levels of difficulty.

The data collected during the assessment will be used in order to give feedback and decrease the level of difficulty of the game, in case a user needs more attention and concentration. As can be seen from Fig. 2, user is performing the task and Nao robot is observing to give constructive feedback during the assessment.

Fig. 2.
figure 2

Visual working memory assessment. In each level, the robot displays a sequence of colors on a monitor for 3 s and it disappears. User needs to imitate the task in front of humanoid robot and robot observes and gives feedback based on user’s performance. (Color figure online)

The level of difficulty of the assessment being increased gradually based on the user’s performance. If user performs well, in each level, it increases the level of difficulty by adding more blocks for the assessment. If user does not perform well, the robot encourages user by showing the blocks again and reading the order of colors to pursue the user to complete the task. The robot records the number of errors, delay, EEG data, and task completion time of the assessment for further analysis. In the following section, the methods which have been used to implement the assessment are provided.

5 Methods

5.1 Proposed Architecture

The proposed architecture contains several parts including computer vision, robot feedback, and analysis. Figure 3 illustrates the proposed system architecture.

Fig. 3.
figure 3

Proposed architecture. Robot generates sequence of tasks and user performs the task. Robot instructs and observers the task. (Color figure online)

5.2 Task Switching

Task switching is considered as regular shifts between cognitive tasks [13]. For this purpose, several models have used to examine the brain mechanisms underlying individual differences specific to the selection of representations that use cognitive control in task switching. In this research, we have used several techniques for task switching including switching the colors during the assessment, and switching color with plain text. For instance, it may ask user to use red instead of blue for the given model. In this case, user needs to memorize and switch the colors while doing the exercise.

5.3 Computer Vision Methods

In this section, the computer vision methods which have been used in this assessment are described. The computer vision methods for the proposed framework consist of edge detection, image segmentation, object detection, and color detection. The procedure starts with edge detection. Edge detection is one of the important components in several vision systems such as object recognition and image segmentation algorithms [7]. For the edge detection purpose, Canny edge detector is used to locate the edges of the board and the objects on the board. The second method in the process is image segmentation. The main aim of image segmentation is to divide a digital image into meaningful regions [15]. In this research, the image segmentation is performed based on the pixels in region which are similar with respect to the given colors. The third step is to detect objects on the board. The goal was achieving a scalable object detection by predicting a set of bounding boxes, which represent potential objects. In the next step, a color detection method is used. The proposed framework uses the Nao’s camera to capture the user model and extract its features based on an area of pixels to detect and respond to user actions. A coordinate transformation may then be deployed to detect corresponding color values from camera data, which can in turn be analyzed to distinguish between the colors of detected objects. Hence, HSV color space is used as well as histogram threshold. the HSV color space is one of the suitable color spaces for image segmentation [3]. The captured image by the camera of the robot is in RGB color space. Therefore, image will be transformed into HSV color space and then the transformed image is split into three different components based on intensity and color to obtain the histogram for the three components (hue, saturation and value). The threshold value is individually applied to all the three components. Lastly, the morphological operations are performed for the extraction of the desired region. Figure 4 illustrates an intermediary image, where all pixels classified as object (using the range in channel H already established) were set to value 255, and non-object pixels were fixed to 0.

Fig. 4.
figure 4

User input model-in each level, robot captures the board and analyses the number and color of blocks. (Color figure online)

As can be seen from Fig. 4, there are several objects on the board, and with the aforementioned methods their types, colors and the number of objects will be obtained.

Figure 5 illustrates the method of obtaining user model as well as generating robot model. Once they have given, robot compares two models in real-time to give feedback to user. In the last level, there is color-switching paradigm. In this case, the instruction is different with normal condition. In this case, robot asks user to change certain color with another color.

Fig. 5.
figure 5

Robot and user models. Two models are compared in real-time based on the number of blocks as well as their colors. (Color figure online)

5.4 Feedback Mechanism

In many cases, it is better to simplify a process than to train people to cope with the intricacy. Feedback should be considered to be a way of simplifying the interaction between the user and the robot. In this assessment, there are three different types of feedback which robot will provide based on the user performance. The feedbacks are Visual, verbal, and instant based on the design and user models. The design mode is the model created by robot and shows to user of how the order of blocks should be and the user model is the model of the system, as built up by the user during user interaction with the system. The visual feedback shows the task to user visually, meanwhile verbal feedback reads the task for user verbally after completing each level. The instant feedback alerts users immediately if they are making error or if they need to pay more attention. This feedback mainly receives the data from brain waves to alert user to avoid many mistakes due to lack of concentration.

5.5 Procedures

The procedure to evaluate and collect data from subjects received approval from The University of Texas at Arlington with reference No. IRB 2017-0563. In order to test and evaluate the accuracy of our proposed framework, we recruited 20 participants (13 males and 7 females) to perform the task in front of a humanoid robot with two different conditions for training and assessment purposes. The first condition was with error detection disabled. Users started the task without getting any feedback and the number of errors and their speed recorded for training purposes of the user. The second condition was with error detection enabled. The user needed to imitate the model which generated by robot in any step and the robot automatically detected completion status and number of errors to notify user in real-time. All twenty participants were subject to do the same task with the same condition. For each condition, the participants were asked to play the visual working memory assessment in front of a NAO robot and to put the blocks on the board based on robot models. In the following, the Pseudocode of how the assessment performs and showing how robot instructs, observes, and provides feedback.

figure a

6 Experimental Results

We used a 2 × 2 factorial within-subjects experimental design. The independent variables were the user’s gender, environment, and user’s attention level. The dependent variables were the time taken to complete the task, number of errors, ease of use, ease of understanding, ease of instruction, satisfaction, and usefulness. The task completion time, number of errors made during assessment were recorded for each level. After completion of each assessment, each participant was asked to fill out a questionnaire asking them for feedback about their experience with each condition using the questions shown in Table 2. The questionnaire consisted of 4-point Likert-Type Scale Response, with 1 indicating the most positive response and 4 indicating the most negative response.

Table 2. Survey results.

There are several metrics that have been measured including number of errors, attention level, and task completion time.

As can be inferred from Table 3, the mean value of attention level is higher among females when they perform a cognitive task, meanwhile the error rate among females is slightly higher than males in the same assessment. However, the difference is not significant based on attention level which is less than 0.03.

Table 3. Error rate and attention level based on gender.

Figures 6, 7, and 8 show the performance of three selected users. The first two figures show that if a user has a lower attention level, or it start to drop, the performance of the user will decline. This will result in a higher probability of an error occurring, especially in tasks of a higher difficulty. The third figure shows that when a user maintains a relatively high level of attention, then no errors are likely to occur.

Fig. 6.
figure 6

The correlation between attention level and error.

Fig. 7.
figure 7

The correlation between attention level and error.

Fig. 8.
figure 8

The correlation between attention level and error.

Figure 9 illustrates the average attention level of each user compared to the total number of errors committed. This once again shows that users, regardless of gender, will make more errors when their attention level starts to decrease.

Fig. 9.
figure 9

The correlation between gender, attention level and error for all participants.

7 Discussion

The primary objective of this research was to assess visual working memory based on several parameters including task completion time, attention level and cognitive function, while testing visuospatial memory among workers in industries by using humanoid robots as an instructor and EEG sensors for collecting attention level. Therefore, the following predictions are made for our hypotheses:

  • MH1: Participants evaluate the robot instructions as more suitable for visual feedback than verbal feedback.

  • MH2: Participants evaluate the high attention level as the most suitable parameter for visual working memory assessment.

  • MH3: Participants report better preference when they get encouraging and positive feedback from a robot.

  • MH4: Task-switching paradigm is a challenging parameter for participants.

  • MH5: Participants report that they perform better when they get more than three seconds to memorize the order of 4 to 8 blocks.

In our assessment, the results from survey suggest participants were more encouraged by getting visual feedback than verbal feedback. The results supported our first hypothesis (MH1). In addition, participants performed better when they pay more attention on a vocational assessment task, which supported our second hypothesis (MH2). The results from survey also support our third hypothesis (MH3) which proves that positive feedback encourages users during the assessment. We also can infer that color-switching increase the challenges to the assessment which supports the forth hypothesis (MH4). Finally, from the survey and results, it can be derived that as the users get more time, they perform better for visual working assessment tasks.

8 Conclusion and Future Work

Robot-Assisted tasks are increasingly being used to train and improve social skills and cognitive functions among workers in industries. In this research, we have proposed an interactive robot-based vocational assessment method in order to assess and train visual working memory of workers in industries by increasing the difficulty of the tasks at each level and switching between them. We have used computer vision methods for object and color detection and using humanoid robot along with several sensors for collecting EEG data to make interactive connection between user and robot. Future work can add more challenges to the tasks which users need to perform. Followings are some of the future work which will be considered by authors for future extension.

  • Applying machine learning, particularly reinforcement learning in order to produce the process more intelligent and interactive based on the previous data which is collected from subjects.

  • Applying emotion detection to capture emotions (sadness, happiness, anger, and the neutral state) during the assessment based on human face to provide better feedback.