Keywords

1 Introduction

Typical brain-computer interface (BCI) systems allow people to communicate without performing movements and speaking. This category of systems are mainly designed for severely disabled persons to provide them an alternative communication and control channel. However, the research community has recently started developing BCIs also for persons with less severe disabilities or even for clinically healthy users. Nowadays is a high demand for natural user interfaces and even if BCIs may seem difficult to use in the beginning, by using them for a long term period the performances and acceptance rates keep increasing for the majority of users.

Most of the BCIs are based on the signals recorded from the electroencephalogram (EEG). The most significant results have been obtained by using one of the following signals: event-related potential (ERPs) P300, steady-state visual evoked potential (SSVEP) and event-related desynchronization/synchronization (ERD/ERS) [1,2,3,4]. ERD-based BCIs require subjects in general to imagine natural movements of hands and/or feet and translating them in commands to control devices like mobile robots, wheelchairs and orthoses, to control cursors on a computer monitor or to navigate through virtual environments and games [5,6,7,8]. Their main advantages are related to the short amount of time required to learn to control such a BCI and the high values obtained for classification rates [5]. In general this type of BCIs are limited to very few commands that can be generated, while the information transfer rate (ITR) falls dramatically if more than two commands are used [9].

The P300 potential is a positive amplitude in the EEG signals that appears around 300 ms after the user perceives a target visual, auditory or somatosensory stimulus. The P300-based BCI systems requires the users to silently count the target event among a large set of such stimuli [10,11,12,13]. Initially, these BCIs have been mainly developed for communication applications called spellers [12]. They are based on the presentation of visual flashing items (letters, digits, punctuation marks and various typical keyboard commands) on a computer monitor, thus allowing the users to generate a command from a large set of items. Multiple presentation paradigms have been proposed so far by researchers, like: row-column, single-character, checkerboard, lateral-single character or half checkerboard [12,13,14,15,16,17]. This category of BCIs offers both high ITRs and classification accuracy rates, and just a couple of minutes are required for users to learn how to use such a system. Furthermore, the mentioned presentation paradigms have been successfully used for various types of applications such as smart homes control, Internet browsing, environment control and even for drawing [18,19,20,21,22,23].

SSVEP-based BCIs allow also a large number of choices, but it is limited by the number of frequencies that can be analyzed in the recorded EEG signals. SSVEP has been used in studies that involve low-level cognitive processes in the brain because it is an intrinsic neuronal response which is mostly independent of higher-level cognitive processes [23]. The number of SSVEP-based applications is increasing every year due to its robustness, high ITR, a relatively simple system configuration and a very short training time [3, 24,25,26,27]. Also, the SSVEP response of the brain is straightforward to model and interpret. In order to have a usable BCI application, correct decisions should be made by using short signal segments, with intervals ranging from 0.5 s to 4 s, while an interval of 2 s is proved to achieve very high classification accuracy rates [27].

Latest research activities have started focusing on various combinations of EEG-based signals, and also on combinations between eye tracking technologies, electrooculography (EOG) and electromyography (EMG) with typical BCIs [2, 4, 16]. This new category of BCIs, namely hybrid BCIs (hBCIs), have already been validated to provide increased ITRs and higher classification rates [16]. Such hybridization solutions can offer the users multiple input methods by working in a sequential or consecutive paradigms, or the various signals used can represent redundant checks to maximize the classification rates and maximize ITRs.

The present study aims to develop a hybrid multimodal interface based on SSVEP, eye tracking technologies and user’s hand gesture identified by the Leap Motion controller [28], to remotely provide commands to a Jaco robotic arm [29] to manipulate objects in a specific workspace. The proposed system is mainly developed for clinically healthy users who are supposed to use it in the future on a daily basis. For the initial validation phase presented in this paper the user was placed in the same room with the robotic arm in order to facilitate the users’ capability to learn how to specifically use the hybrid multimodal interface.

2 Materials and Methods

2.1 Interaction Paradigm

Increasing and diversification of the information related to commands that can be send to robotic systems presents limitations that can be translated into considerable cognitive efforts for the operator. Most often the limitations appear at the training stage when the operator has to make judgments on the commands used for a particular task, identify and select the right ones. Obviously, as a certain routine is established, the cognitive effort decreases, but also for this stage is important to organize and improve the interaction modalities.

A common approach to reduce cognitive effort is the use of multimodal interfaces. These can allow the distribution of task information to several communication channels when the cognitive effort increases, therefore allowing the user to distribute and manage the cognitive effort. Instead of using various joystick commands and consuming an extra cognitive effort, the multimodal interface allows the use of perceptual and mental systems at a normal level, so the user can focus on the task that needs to be solved.

The use of biopotentials in combination with other human-machine communication channels, for interaction with a robotic system, offers a possible solution to the problems of the current interfaces. This approach allows different ways to accomplish a robotic task when required. The use of biopotentials in the robot interaction paradigms is gaining more and more interest in the research community. Dedicated interfaces improve user performance through specific interaction methods. Developing new interaction methods, acquiring and automating them can minimize multiple processes in the user’s brain, thus reducing his cognitive effort.

The proposed hybrid multimodal interface is capable of adapting to the user’s intentions and behavior, in dynamically changing environments. It combines latest BCIs technology and concepts to provide a natural interaction between humans and the robotic systems, which is unconstrained and robust. Thus, the interface provides to the users the means to command a robotic arm for manipulation tasks. The users are placed in front of a computer monitor where they are able to see the workspace where the robotic arm is placed and to select the objects in the workspace (see Fig. 1). The workspace scene is displayed on the computer’s monitor by acquiring the video stream from a Logitech C920 webcam. Automatically the system identifies the moments when the user is gazing at the monitor by using an eye tracking system, Tobii X120 [30]. Thus, the system enables the selection of an object by superimposing a set of flashing blocks on two different frequencies (7.5 Hz and 10 Hz) over the objects found in scene (see Fig. 1). The flashing blocks trigger SSVEP signals in the recorded EEG signals, while the classification of EEG data will determine which object was selected by the user (see Fig. 2). An object selection will trigger an audio feedback to the user and will automatically stop the flashing blocks on the computer’s monitor.

Fig. 1.
figure 1

System architecture of hybrid multimodal interface

Fig. 2.
figure 2

Flow diagram of the hybrid multimodal interface

The webcam (1920 × 1080 resolution, captured at 30 fps) is used to record the workspace with the robotic arm and the objects to be selected. An edge-detection algorithm is used to identify the two objects that are placed in front of the robotic arm. The Emgu CV library was chosen for image processing, because it can use OpenCV functions in .NET language [31]. As such, an OpenCV implementation of the Canny edge detection algorithm was used to detect the edges of the target objects [32]. The Canny Algorithm contains 5 basic steps:

  1. (1)

    apply a Gaussian filter to eliminate the noise in the image;

  2. (2)

    search for image intensity gradients;

  3. (3)

    apply a non-maximum suppression to get rid of the false response to edge detection;

  4. (4)

    a double threshold is applied to determine the potential edges;

  5. (5)

    the hysteresis route is followed and the edges are finally detected by suppressing the lower intensity and not connected to the most pronounced ones.

For this initial study, the objects are placed in predefined positions in the workspace. After the user selects a specific object, the robotic arm is automatically positioned near the object waiting for manipulation commands. For the command of the Jaco robotic system the Leap Motion controller was used because, based on [33], it allows accurate tracking of hand movements. The Leap Motion controller can detect the position, orientation and velocity of hands along with the position, orientation and velocity of each component finger [34]. The user can send translation (X axis) and rotation (roll, pitch, yaw angles) commands to the Jaco robotic arm by moving the right hand. Grasping and releasing an object is controlled by closing and opening the right hand. The implementation of robotic arm commands using natural user interaction is based on the functions available in the Leap Motion SDK. The actual position and orientation of the right hand is acquired continuously from the Leap Motion controller and compared with the previous one. If the difference is higher than a predefined threshold, the movement command is sent to the Jaco robotic arm.

2.2 Experimental Design and Procedure

The purpose of this initial study is to familiarize the users with the proposed hybrid interface, without aiming to define a qualitative or quantitative evaluation. Most of the BCIs related studies require users to test a specific system after a very short training period. The entire study we aim to conduct assumes that users will know exactly how to interact with the developed system. Thus, we asked three users to participate in a series of practice sessions for a period of three weeks (one session per week).

The experimental design assumes three phases related to familiarization and calibration procedures followed by a practice session during which the users are required to perform a set of selection and manipulation tasks.

The familiarization procedure allows the participants to use the Leap Motion controller to command the Jaco robotic arm to grasp and manipulate the two objects placed in the workspace until they feel confident in using the interface.

The calibration procedure is related to the eye tracking system calibration and the SSVEP classifier training dataset. The eye tracking calibration is performed by successively gazing at a series of 5 points displayed on the computer’s monitor. The Tobii SDK functionalities were used for both calibration procedure and the continuous tracking of users’ gaze. The calibration procedure can be repeated if the calibration results are not satisfactory. The SSVEP calibration is described in Sect. 2.4.

During the practice session the subjects are requested to perform a set of 10 manipulation task, 5 for each object placed in the workspace. The subjects are requested to alternatively select (see Fig. 3), grasp and move (see Fig. 4) the two yellow and blue objects from their initial positions to the other two predefined positions marked in the workspace. After the subject successfully managed to move each object in the workspace, an experiment observer repositions the two objects to their initial locations. Between each two consecutive trials is a pause of 12 s during which the robotic arm is automatically moving to the default position (see Fig. 3).

Fig. 3.
figure 3

SSVEP-based object selection

Fig. 4.
figure 4

Object manipulation based on Leap Motion controller and Jaco robotic arm

2.3 Participants

Data was collected from three healthy subjects (mean age = 22.33 years, SD = 2.05, range = 20–25 years), students of the Transilvania University of Brasov, with no prior experience with BCIs. The subjects were informed about the purpose of the study, were given a written consent form to sign and did not receive a financial reward for their participation.

2.4 EEG Data Processing

Subjects were placed on a seat in front of a 24 inches monitor (resolution: 1920 x 1080 pixels, vertical refresh rate: 60 Hz) at a distance of around 60 cm. The EEG signals were recorded by a 16-channels g.Nautilus amplifier channels with a 250 Hz sampling rate, and were further bandpass filtered between 0.5–30 Hz and also a notch filter at 50 Hz was applied. Standard g. SAHARA dry active electrodes were used to acquire the data from the surface of the scalp. Data were recorded by using eight electrodes from the occipital region of the brain, with the ground electrode placed on the Fpz and referenced to the right earlobe. The electrodes were attached at locations O1, Oz, O2, PO7, PO3, POz, PO4 and PO8.

For the calibration phase the subjects were asked to perform a set of trials by gazing at a set of flashing boxes on the computer’s monitor. For each trial they were requested to focus at a specific flashing box for 10 s after which the trial was over and a pause of 5 s was allowed before next trial. The stimulation boxes were oscillating on two frequencies 7.5 Hz (left stimulus) and 10 Hz (right stimulus). The users were requested to alternatively gaze at the two flashing stimuli resulting in total 8 trials. The resulting data was used to train the classifier which was next used to classify the EEG during the on-line test.

During the online practice session the subjects can select one of the two objects in the workspace based on the SSVEP. The classification of recorded EEG signals is based on the minimum energy combination (MEC) which is a method that overcomes the limitations of the bipolar and Laplacian approaches and can use an arbitrary number of electrodes [35]. MEC implies finding the best combinations of electrode signals that can cancel as much of the noise and interferences as possible, thus optimizing the signal-to-noise ratio. The algorithm used the Levinson AR Model with epochs of 3 s consisting of 750 samples. The obtained features were entered into a linear discriminant analysis classifier for pattern classification.

3 Discussion

The present study aims to develop a hybrid multimodal interface to command a Jaco robotic arm by using biopotentials and hand gestures. The initial phase of the research is presented in this paper. Three users were asked to participate for three weeks to a couple of evaluation session during which we aimed to identify if the proposed hybrid interaction paradigm is appropriate for tasks involving objects selection based on SSVEP and manipulation with a robotic arm by means of hand gestures.

One of our findings during this validation phase are related to the calibration of the eye tracking system for which the distance from the computer’s monitor influenced the calibration results and sometimes it was required to repeat this step for 2 or 3 times.

Related to the SSVEP-based selection all users were able to correctly select the target object, thus achieving 100% accuracy.

During the tests performed in the first week two of the users had some issues in correctly positioning the object in the requested area. Four times they released the object before reaching the target area and three times they moved the robotic arm too far from the target area. For the next sessions after familiarizing more with the Leap Motion controller they managed to correctly position all the time the objects in the requested area.

4 Conclusions

Overall, the proposed hybrid multimodal BCI system for robotic arm manipulation task presents promising initial testing results. The users were able to correctly select all the time the target object after the successful calibration for eye tracking system and for SSVEP-based classifier. Also, the interaction based on the Leap Motion controller was found to be reliable and easy to master even if in the beginning for a couple of trials the users didn’t correctly position the object to the target area (83 successful trials out of 90).

The future work is focused on the integration within the interface of selection possibilities for more than two objects and also on the object’s manipulation possibilities in the 3D space. Also, we aim to realize a comprehensive evaluation of the proposed interface for performance parameters, i.e. ITR, time-to-complete of a specific task, SSVEP classification accuracy, ergonomics assessment.