1 Introduction

Current technological context in HCI casts a view on computers that are regarded as solid-state machines relying on explicit interaction through mouse, keyboard and monitor. Although users became familiar with the devices that are enabling explicit HCI, they undoubtedly limit the speed and naturalness of HCI [1]. On the other hand, specific challenge in the direction of improvement of existing Human Computer Interaction (HCI) studies is to bring it closer to the communication patterns of human beings, and hence to create more “natural” interaction. Schmidt [2] provided a definition of implicit HCI as “An action performed by the user that is not primarily aimed to interact with a computerized system but which such a system understands as an input”. This definition was preceded by the notion that the most of the interaction between people, and situation in which they are interacting, is implicitly exploited in communication [2]. This notion clearly outlines that an important part of natural interaction actually depends on implicit interaction. In that direction, the development of small, reliable and affordable mobile sensors opens a whole set of opportunities for natural interaction with computing entity through sensitive environments.

Multimodal HCI (MMHCI) present multidisciplinary research area including, but not limited to, psychology and gesture recognition [3]. Traditionally, MMHCI is used with the main aim to investigate the possibility of bringing closer computer technologies to users [3]. However, MMHCI research was mainly concerned with an explicit, rather than implicit interaction. In order to fulfill this gap current study is mainly aiming in investigating possibility for employing implicit MMHCI, particularly in industrial environment.

Industry workers’, especially those working in assembly positions that require performing monotonous repetitive tasks are susceptible to mental fatigue and loss of concentration as time progresses. Their activities often require execution of tasks dependent on use of tools and/or operating a machine. In such a context, explicit interaction with computer becomes increasingly impractical. A new approach to communication is needed, through an interaction model that will be more natural. Stable foundation in building such interaction model in production workplace should be on different communication modalities that can ensure implicit interaction between worker and workplace, such as movement, voice, psychophysiological signals, etc.

2 Problem Statement

Throughout the industrial history, industrial accidents were mainly attributed to the equipment failure and system malfunctioning [4]. However, as technology became sufficiently reliable the remaining accidents are mainly attributed to human error [5]. As it was pointed out, human is often characterized as the most fallible element in the production line, and the main causes for these failures are limited mental and physical endurance that sometimes cause behavior and reactions to be unpredictable [6]. In other words, the human has an inborn tendency to failures, but at the same time the human can realize his or her faults and their reasons [6].

Although reliable automated systems can somewhat suppress workers operating errors and their consequences, they are still unable to assure completely “error free” industrial processes. One of the main reasons for chronic occurrence of human errors is that these systems demand the shift in the role of workers, from active operators’ to the system control operators [7]. This shift introduced lowered mental workload to the operators, which further leads to the hypovigilance state of the operator and consequently the human error is still likely to occur. Another important notion is that despite of all the technological advancements, resulting in process automation, there are still many work places requiring operators’ manual repetitive and monotonous actions [6]. The repetitive and monotonous actions are also known to induce the hypovigilance to the operator, leading to the lowered attention state of the operators, which could further lead to work-related injuries and industrial accidents [8].

Current practice in human factors and ergonomics (HFE) for studying the operators’ cognitive abilities mostly rely on subjective questionnaires and measurement of operators’ overall performance [9]. However, these methods are unreliable, since they mostly rely on subjective assessment and are dependent on the expertise of an interpreter of the collected data [9]. Another drawback is that the data acquisition and analysis is carried offline. Physiological measurements, on the other hand, are able to provide real-time data acquisition and processing, as well as objective results on one’s mental states. In this context, researchers are proposing the use of computer-assisted methods that are attempting to directly acquire information on worker cognitive state and behavior using different types of sensing equipment [10].

Researchers almost exclusively agree that obtaining the information about vigilance, attention and mental fatigue can be done based on electroencephalography (EEG) and other physiological signals, such as Galvanic Skin Response (GSR) and Heart Rate Variability (HRV), [11, 12]. Until very recently, the major drawback of all established technologies for the non-invasive study of human physiological signals and brain function, was that they were confined to the highly controlled laboratory environments and conditions [10, 13]. However, the recent technological advances have enabled miniaturization of recording amplifiers and integration of wireless transmission technologies into the physiological sensors, opening a whole set of opportunities for estimating the various physiological states of the human in applied environments.

Another important notion is that many aspects of industrial work are physical in nature. On the one side, there are many tasks requiring manual action of the operator, e.g. object manipulation, lifting, pushing, pulling, etc. that are one of the major sources of work-related musculoskeletal disorders (MSDs). From the other side, automation has reduced the need of the operators’ to conduct these manual tasks, however, the need of the operator to handle the automated processes did not sufficiently reduced the improper postures of the workers, which represents another source of work-related MSDs. There are different methods and tools existing for the ergonomic assessment of manual tasks and postures of the workers, such as self-reports, observational measurements and direct methods [14]. However, all of these methods have a lot of drawbacks and the biggest of them is that the analyses need to be carried off-line. Therefore, postural evaluation that can be carried out in real-time could provide benefits in practice.

Researchers are continuously working on developing supportive tools for identification and evaluation of potentially hazardous human motor tasks and postures with the main goal to improve ergonomics in work processes. Currently there are numerous tools that are based on manual observation by experts, or self reporting, such as QEC, manTRA, RULA, REBA, HAL- TLV, OWAS, LUBA, OCRA, Strain Index, SNOOK tables and the NIOSH lifting equation [15]. Most approaches in analyzing human movement are based on pose estimation techniques, that refer to the process of estimating the configuration of the underlying kinematic skeletal articulation structure of a person [16]. This representation can be reached using various sensor settings, starting from typical video cameras, variety of range cameras (structured light, time-of-flight, stereo triangulation etc.) or some combination of wearable sensors. In industry setting, researchers are working on applying this approach in defining work processes, preventing improper worker positions [17] and proper training and monitoring of new workers [18].

Ambition of this work lies in advancing interaction between worker and workplace. In order to reach this goal, we started developing a truly unobtrusive sensing workplace environment. By having unobtrusive motion sensing technology and mobile physiological monitoring system, one is able to monitor work activities without interfering with standard activities of industry worker. This could enable the development of human error detection and prevention system in a production workplace. Existing approaches are mostly limited to application in early stages of product design and workplace planning, and are confined to laboratory spaces. Contrary to existing systems, that mostly observe specific features that are considered relevant, contribution of our proposed approach is to integrate a more complete physiological and motion parameter set. In essence, our approach should provide a continuous and real time monitoring of worker activities in realistic production environment. Continuous input stream will be interpreted as implicit commands in line with the suggested human computer interaction model. In comparison to existing systems that require workers adoption to designed workplace, our approach should enable continuous improvement of the work process according to specific profile of the worker. Introduction of such a system into workplace environment is aimed to reduce workplace injuries, reduce work related errors, increase productivity and improve overall job satisfaction. This approach is aimed towards providing an automated system for monitoring of workers with different set of goals, such as improving ergonomics, detecting errors or as a tool for worker training and adaptation of workplace. This is based on the notion that continuous monitoring of the operators’ behavior through his body movement and gestures, as well as his mental state, e.g. vigilance state, mental fatigue, arousal, etc. in operational environment, could decrease potential for serious errors and provide valuable information concerning the ergonomics of the tasks being performed [19].

3 State-of-the-Art

An overall research of software and hardware available on the market for biomechanical analysis indicated a number of largely diverse solutions. Larger companies (especially automotive) have made considerable financial investments in Motion Capture (MoCap) devices in recent years, example being: Impuls X2, motion capture system (PhaseSpace, Inc.); The ART Motion Capture (Advanced Realtime Tracking, Inc.); MOTIONVIEWTM (AllSportSysrems, Inc.), etc. These devices are well known, i.e. from the entertainment industry, where it is possible to animate a virtual character as a result of capturing real actor movements. Thanks to the expensive MoCap devices, it is possible to acquire positions of points (called markers) on a character ́s body in real time. Once the data has been acquired, there is a need to import it to the 3D simulation software, e.g. JACK (Siemens, Inc.), 3DSSPP (developed at university of Michigan, http://www.umich.edu/~ioe/3DSSPP/index.html), OpenSimulator (http://www.opensimulator.org) etc., in order to perform subsequent ergonomics analysis.

Although MoCap systems could offer highly precise ergonomic analysis, there are still certain bottlenecks in performing the on-line measurements in real-life industrial environments. The first difficulty for the industry, especially for small to medium enterprises (SMEs), is that technology for on-the-fly recording by MoCap systems is financially very demanding and often, it is necessary to devote an entire room to perform recording [20]. Further, the majority of MoCap Systems uses external sensors (Led diodes, Depth Of Field targets, etc.) that have to be attached to the person being recorded, which could interfere with workers everyday regular operations in industry. Another drawback for majority of recording systems is that there is no possibility of on-line recording of person’s movements, together with the on-line analysis with the possibility for providing feedback to the operator in case of a bad posture position. To our knowledge, there are only two systems that could possibly be used for the on-line recordings and analysis: Real-time Siemens JACK & PSH Ergonomics Driver (Sinterial, Inc., http://www.synertial.com) and Cognito system [21]. However, the first system can be used when company is addressing the ergonomic aspects of manual operations during early stages of product design and manufacturing planning and there is a need to use the Synertials motion captures suits. On the other hand, the Cognito system uses on-body sensor network. These sensors are composed of tri-axial accelerometer, a tri-axial gyroscope and a tri-axial magneto-inductive magnetic sensor [21]. Therefore, Cognito systems does not record the movements of the worker, but uses the sensor readings as an input data to computer based RULA ergonomic assessment method and provide feedback when certain thresholds are reached.

Concerning current technologies, regarding real-time tracking of operators’ mental fatigue and its ability to maintain desired alertness level, research and industry market has mostly been oriented towards the transportation sector, mining industry, etc., while production industry is left aside. In 2008th the Caterpilar, Inc. has published “Operator Fatigue: Detection Technology Review”, the most recent critical review that has been done on the technologies regarding operators’ fatigue real-time detection. In summary from that review, only three out of 22 top rated technologies are reported immediately available: ASTiD (Pernix), HaulCheck (Accumine) and Optalert (Sleep Diagnostics). Of these technologies only ASTiD and Optalert can be considered as fatigue detection technologies and are recommended for the immediate use by the Caterpillar, Inc. However, none of the proposed methods rely on the physiological signals, rather on the vehicle dynamics (ASTiD) and measuring the delay between the reopening of an eye after eye closure (Optalert).

Recently, two novel commercial fatigue detection systems emerged on the market, namely Smart Cap (EdanSafe, Inc.) and Driver State Sensor (DSS, SeeingMachines, Inc.). The DSS In Vehicle System (DSS-IVS) uses a console-mounted camera to track the driver’s head and eyes, resulting in a continuous assessment of drowsiness and distraction. Using proven eye-tracking algorithms and image processing techniques powerful enough to accommodate eyeglasses, safety glasses and sun glasses (http://www.dssmining.com).

Smart Cap represent only commercially available system for fatigue detection that is based on the reliable EEG systems. EdanSafe’s SmartCap solution incorporates brainwave electroencephalographic (EEG) monitoring technologies that provide a direct, physiological measurement of driver/operator fatigue in real time by sensing and analysing brainwaves (http://www.smartcap.com.au). However, this system is presented as an EEG system that is based on dry EEG electrodes. However, the desired signal quality is not achieved yet and the dry electrodes are still unable to reduce the movement artifacts, which related to the relative movement of electrodes against the head surface [22]. Further, this system is created only to monitor the operators’ fatigue, but not the vigilance correlates and it is not suitable for the on-line monitoring of operators’ attention level.

4 Proposed Approach

This work aims in development of an automated system for human error detection and prevention in a production workplace. System will rely on novel human-computer interaction system founded on implicit input. Underlying idea is to use unobtrusive motion tracking sensors to record worker body movement (BodyMovement), identify gestures (GestureRecognizer) and develop a model of optimal worker movement on a workplace (GestureAnalyser), Fig. 1. Using structured light technology captivated in KinectTM and LeapMotionTM devices, we are able to capture body movements represented with estimated stick figure of body and hand pose estimations retrieved in time. This is possible to achieve utilizing MMK recorder (for KinectTM) and LeapMotion SDK, adopted and developed at IT department of Faculty of Organizational Sciences (University of Belgrade). Based on this input it is intended to develop a Gesture recognizer, able to recognize generic gesture patterns on a workplace. Output from this module will feed in to application Gesture analyzer that we plan to develop in order to specify models of worker behavior on a specific workplace.

Fig. 1.
figure 1

Multimodal concept visualization

On the other track, we will also use physiological sensors such as EEG, GSR and HRV to record workers physiological signal (physiological signal), distinguish physiological features (physiological feature extractor) and attempt to detect worker attention state, mental fatigue, vigilance, engagement and emotional state (physiological analyzer), Fig. 1. For the second track, intention is to acquire physiological signals of worker, using concurrent physiological sensor technologies. For EEG recoding we will use state-of-the-art SMARTING device (mBrainTrain, Serbia), an small in size and lightweight (80 × 50 × 12 mm, 55 gr) system for the purpose of recording the brain activity in the unrestricted environment. SMARTING amplifier can be tightly connected at the occipital sites to a 24-channel EEG recording cap (Easycap, Germany). For GSR recording we opt to use technology developed at University of Kragujevac, Department for production and industrial engineering. Further, commercial HRV sensor (Canyon) will be used, as it was previously validated as a good method for estimation of sleepiness and performance predictor. All physiological sensors are connected to recording computers via Bluetooth connection. Based on these recordings we will develop a physiological feature extractor, able to identify workers’ relevant mental states. Output from this module will feed in to the application physiological analyzer that will be developed in order to specify a model of worker attention, vigilance, mental fatigue, engagement and emotional state for operational process. In order to improve the physiological analysis, and reach more stable conclusions, we will research the possibility of including the output from gesture analysis in to physiological analyzer decision-making process (Fig. 1). Since body movement represents a final result of cognitive effort, establishing correlation between noticed disturbance in worker gestures and mental state of the worker (acquired through his physiological signals) should enable us to early recognize and prevent possible mental or physical fatigue of worker.

In order to provide an adequate interoperability of defined components, a communication layer that carries some specific demands will be defined. It is important to note that interaction signal modalities need to be in almost perfect synchronization in order to obtain both fine time scale correlations between sensor observations and to reach necessary conditions for proper segmentation of event-related potentials (ERPs) observations from EEG recordings. Recently, Swarcwald Center for Computational Neuroscience (SCCN) developed the Lab Streaming Layer LSL, available at https://code.google.com/p/labstreaminglayer/), which is a real-time data collection and distribution system that allows multiple continuous data streams as well as discrete marker timestamps to be acquired simultaneously in an eXtensible Data Format (XDF, available at https://code.google.com/p/xdf/). This data collection method provides synchronous, precise recording of multi-channel, multi-stream data that is heterogeneous in both type and sampling rate, and is obtained via local area network (LAN). In order to use available features of the LSL recorder, the recording software of all devices used in this study were optimized with aim to allow real-time and synchronous data streaming.

Upon creation of optimal worker behavior (movement and physiological states) model, using the same set of sensors we can perform a real time worker supervision. A comparison can then be made between actual and expected models (Multimodal interaction app). System should be able to detect model deviation and suggest adequate feedback (Multimodal feedback). The existing workplace can therefore be extended to serve in a fail-safe capacity, by providing feedback information (Visual and Audio cues) about a possibly emerging problem (Fig. 1). For implementation of Multimodal feedback, initial idea is to use RGB (Red, Green, Blue) LED (light-emitting diode) strips for visual feedback and small-scale integrated speaker system for audio feedback. Special attention will be on finding user-friendly feedback system that does not increase stress levels and does not interfere with other workplaces. Further, proposed feedback systems would not interfere workers projected operations, on contrary it will be designed to increase workers’ productivity, prevent lapses in attention, prevent bad postures and consequently decrease potential operating errors.

5 Current Progress

In order to conduct our initial study of worker behavior, upon which the model should be created, we created a full-scale replica of the existing workplace from our industrial partner (Fig. 2a) at Faculty of Engineering (University of Kragujevac), which is shown at Fig. 2. All major aspects of the existing workplace, including spatial ratios and microclimate conditions were completely replicated from industry. Further, a study for determination of placement of sensors used in this study was conducted and the position of the recording devices is presented on Fig. 2b. It is notable that sensors do not pose any movement restrictions for the participants in this study, and therefore, the simulation of a working process can be carried out without interfering with the process itself.

Fig. 2.
figure 2

a – Authentic industrial workplace from our industrial partner; b - laboratory replica of the workplace with the set of sensors: 1 – KinectTM; 2 - Wireless EEG sensor (Smarting, mBrainTrain); 3 - Heart Rate Sensor (Cannoy); 4 – LeapMotionTM; 5 - Galvanic Skin Response sensor (developed at our department) (Color figure online).

The workplace chosen for this study is replicated from automotive sub-component manufacturing industry and in laboratory settings, we simulate the assembly of the hoses used for the hydraulic brake systems in vehicles. The process itself is simple, comprising of six sub-actions which could be summarized as follows: (1) – Picking the rubber hose (blue box, on the right hand of participant in the study, Fig. 2); (2) – picking the metal extension, that should be crimped to the hose (yellow box, on the left hand side of the participant, Fig. 2); (3) – placing metal extension on the rubber hose; (4) – Placing unassembled part in the improvised machine (white box in front of participant, Fig. 2); (5) – upon placement of the unassembled part, the participant should press the pedal, with his right foot, in order to initiate the simulated crimping process; (6) – once the simulated crimping process is finalized, participant should remove the assembled part from the machine and place it inside the box with the assembled parts (grey box in front of participant, Fig. 2). Although the explained assembly process comprises of six sub-processes, it lasts roughly ten seconds and in industry settings one operator assembles approximately 2500–3000 parts in 8 h working shift. Therefore, this assembly operation presents typical repetitive and monotonous industrial task, which are known to induce vigilance decrement and mental fatigue. Moreover, the assembly process is carried out in sitting and static position, which require extended manual material handling, thus being suitable for our study, since it is well known that these kind of tasks impose also physical load for the operators if they are performed for the prolonged period of time.

Participants in this study are supposed to be seated in the comfortable chair in front of an improvised machine (Fig. 2), while performing the simulated assembly task. In order to investigate the time-locked features of physiological signals, ERPs from EEG and skin-conductance level (SCL) from GSR signals, one functional modification was made in simulated work routine, in sense of information presentation to the participants. For example, evaluation of the ERP latency and magnitude can provide us important information about workers’ cognitive state that can further be used as an input for the physiological feature extractor, i.e. P300 response can serve as an indicator for the amount of attention allocated to a task [23]. For that reason, instead of real information during the work process, including information on initiation of the assembly action, verified psychological sustained attention to response tast (SART) for estimating ones’ cognitive ability were presented on the 24” screen from a distance of approximately 100 cm. The task specifications were programmed in Simulation and Neuroscience Application Platform (SNAP, available at https://github.com/sccn/SNAP), developed by the SCCN. The reason for utilizing SNAP software is its ability to send markers as strings to above-explained LSL and therefore, satisfying synchronization between physiological, as well as MoCap sensors can be achieved. We believe that this modification does not alter the working routine, while it provide access to the cognitive states of the participants simultaneously with the pose estimation of the participants, while performing simulated operation.

Data acquired during these recording sessions will be used for optimal worker behavior (movement and physiological states) model development for each of the constructed workplaces and specific worker. In the next steps we can develop an initial interaction model that will enable us to compare implicit inputs with defined models of optimal worker behavior and provide corresponding outputs. Further, development of interaction model will be data driven, input data that will be acquired in empirical studies will affect the parameters of the implicit input section of the system. Also, the output will be edited according to conclusions gathered in empirical studies in order to find the best possible feedback to worker. To summarize, final interaction model implementation will be reached through iterative-incremental development.

6 Discussion and Concluding Remarks

The main aim of presented work is to develop a new automatized, computer assisted observation method, by improving the interaction of worker with his work environment. The approach used relies on implicit multimodal human computer interaction. In the workplace setting, where worker is mainly focused on performing defined work activities, it is difficult to base communication between worker and is workplace on explicit interaction. In a world where people’s movements and transactions can be tracked—where individuals trigger non-deliberate events just by being at a certain place, physical or virtual, at a certain time—the notion of interaction itself is being fundamentally altered [24].

Workers’ activities in a workplace can be regarded as implicit interaction input towards system for error detection and prevention. Majority of the production workplaces are constructed in respect with guidelines given in ergonomic and other industrial standards, and work process is developed in such a manner to ensure optimal level of productivity. Compared to computers that are capable of executing given algorithm, humans are unable to perform projected repetitive activities, without variation in performance and deviation from projected work plan. With time, people become subject to physical and mental fatigue, which if not properly regarded may lead to drop in productivity or even induce work-related injuries. The purpose of the proposed system is to use novel HCI model involving implicit interaction input in detecting deviations from the projected work behavior and perform preventive actions by providing adequate feedback to worker. Interaction modalities as input in our approach are human gestures, body motion and physiological signals, and audio and visual signals will be delivered as an output feedback.

Currently this system reached a phase of initial laboratory installment and testing. System components for recording and monitoring of physiological signals and body movement are developed and initially tested. Communication and synchronization platform, that as a base uses open source technology called Lab Streaming Layer LSL, is developed and optimized. Acquisition modules are connected to communication platform via specifically designed and developed drivers.

A full-scale replica of workplace environment was built at Faculty of Engineering (University of Kragujevac). The workplace represents a work process of the assembly of the hoses used for the hydraulic brake systems in vehicles. Sensing equipment and computer components are integrated in to workplace environment based on iterative process of signal analysis and sensor placement.

Current activities of our research team are focused on developing and executing an initial round of experimental recordings in order to retrieve a first data set. This data set will be used in realization of the first version of implicit MMHCI model. Additionally, results will be used in iterative improvement of system components.

Upon these initial steps, our plan is to take the system to our industrial partner and perform a small-scale experimental study, using one of the replicated workplaces that were used in building a simulated, and laboratory environment. Output from these experiments will be used for system and models fine-tuning and preparation for a full-scale production environment, i.e. it is intended to create generalized model, which could be utilized on the majority of existing industrial workplaces. Finally, the system will be evaluated in a relevant, production environment, using workplaces not utilized in previous development. In order to achieve this, entire process starting from recording, specific workplace model development and enrolment of system in to existing production conditions will be conducted. This requires a larger scale empirical study.