Keywords

1 Introduction

Forms of interaction between humans and computers have been evolved to new and diverse objects and environments of our day-to-day life. Advances in technologies, as miniaturization of devices, tools for no wired communications, and others, have provided the design of applications for smart homes, turning casual activities more simple, comfortable and intuitive. Residential automation is composed by a set of sensors, equipments, services and diverse technological systems integrated with the aim of assist basic necessities of security, communication, energetic management and housing comfort [1,2,3].

Those scenarios demand ways of interaction more dynamic, natural and easy learning. Several works presents novel interaction solutions based on voice commands, facial expressions, as well capture and interpretation of gestures, the last a point of investigation of this work. Systems that use free gestures interpret actions naturally realized by persons to communicate, giving the possibility of a more easy and intuitive way of interact to users, reducing the cognitive overload of information, training and learning [4]. Such type of interaction was perceived by researches in Human-Computer Interactions as relevant since the firsts investigations [5].

Therefore, we propose a model of interaction for smart homes without use of sensors or controllers, based on artificial intelligence techniques and computer vision. These methods are applied to recognizing actions of interaction with targets dynamically distributed in a residence, using capture of images from a trivial camera, in a manner similar to a touch area. This approach aims a more cheap, efficient and simple manner of interact with a smart home and its electric devices, using representative symbols and artificial intelligence techniques, recognizingly interactions from the user. Our work connect these techniques with well known computational platforms, which communicate with devices and another residential functions.

This work is organized as follow: in the next section, we present related works in residential automation. Our model, the techniques used and methodology is explicated in the section Methodology. Experiments and results are showed in section Experiments and Results. Finally, we conclude discussing the results and possible future improvements and researches to be explored.

2 Related Work

Capture of the gestures can be performed through the use of special devices, such as sensors, accelerometers and gloves. These accessories facilitate interpretation, however, the total cost of the system is increased [6]. In this sense, the work developed in Bharambe et al. [7] presents a system composed of an accelerometer to capture the gestures, a microcontroller responsible for identifying the information collected, a transmitter and an infrared receiver. It is worth mentioning that there is a limitation regarding distance, in addition to the fact that the infrared transmitter and receiver must be fully aligned.

In contrast to the use of special accessories, the methods based on computer vision require only a computer camera to capture images and automatically interpret complex scenes [8]. Therefore, Dey et al. [9] proposed a work where the capture and interpretation of gestures is performed using the camera of a smartphone. A binary signal is generated from the captured image, which is graphically analyzed and transmitted using bluetooth to a control board responsible for activate the devices in the residence.

Still referring to methods based on computer vision, Pannase [10] developed a research that focused on quickly detecting hand gestures. For this, algorithms of segmentation and detection of the positioning of the fingers were used, and, finally, the classification of the gesture through the neural network. Another aspect of this study is the limited amount of gestures, as well as the hand should be positioned exactly in front of the camera.

In this work, we propose a model of interaction between a human and a residential environment using gestures, dispensing especial sensors or another devices, providing a interaction solution more natural and intuitive to control the functions of a smart home as, for example, switch on and switch off electronic devices, lamps, open and close doors and windows etc. The system must be able to operate in both daytime and nighttime conditions. For this, a webcam was adapted to see infrared light.

In the next section will be presented the main characteristics of the proposed model.

3 Metodology

For the development of this project, techniques of artificial intelligence and computer vision, as well complex contexts images, have been adopted to identify interaction targets and actions of the user with the ambiance.

Our model is composed as presented in Fig. 1. An application located in a main computer is responsible for the execution of all procedures involving the techniques of artificial intelligence and computer vision. These procedures are applied upon a image captured from a webcam located in the environment, directed to where the target icons are placed. The action icons are disposed in the ambiance as desired by the user. The application in the main computer identifies if a user interaction action occurred. If positive, the respective action command is send to the control board, which is connected with the devices and another interactive functions of the residence.

Fig. 1.
figure 1

Scheme of the model being proposed in this work, with it components and flux of information

To illustrate the operation of the proposed model was used a miniature of a residence. Is possible notice the triggering of a light as the interaction with an object occurs, as seen in the Fig. 2.

Fig. 2.
figure 2

Illustration of model working in a miniature

3.1 The Computer Vision Model

The application in the main computer incorporate a model developed in previous researches made by our research team. This model integrate different techniques of artificial intelligence, computer vision and image processing to discovery and classify specific objects (in this case, target icons of interaction) in a complex and low controlled environment. From a captured image of the ambiance, a pre-processing is performed for saturation of red-colored pixels, then we apply visual attention to identify interest areas in the image and discard any other visual information. In this step we have modified the classic method of visual attention as proposed in Itti et al. [11], simplifying it to respond only for color stimulus. The areas of interest are marked as seeds, which are used as input in the segmentation process. For segmentation we have used the method Watershed, as developed by Klava [12]. The segmentation step is responsible for generating individual elements of the target icons from the input image. These elements are classified using neural networks, were a action of interaction is attributed in respect with the type of each icon.

In follow, the interaction routines are started, continuously capturing images and searching for possible interact actions performed by the user. Interactions are recognized as changes in the coefficient of dissimilarity between the histograms of the regions where icons of interaction has been found. The dissimalirity \(dSim_{cos}\) is calculated using an adaptation of the cosine similarity defined by:

$$\begin{aligned} dSim_{cos}(h_{ini},h_{acq}) = 1-\frac{\sum _{i=1}^{n}h_{ini_i} \cdot h_{acq_i}}{\sqrt{\sum _{i=1}^{n} (h_{ini_i})^{2}} \cdot \sqrt{\sum _{i=1}^{n} (h_{acq_i})^{2}}}, \end{aligned}$$
(1)

where \(n = 255\), \(h_{ini}\) corresponds to the histogram of the red channel obtained from an initial capture and \(h_{acq}\) is the histogram of the red channel acquired from subsequent captured images.

The scheme of the computer vision model is show in Fig. 3.

Fig. 3.
figure 3

Scheme of the computer vision model implementing artificial intelligence techniques (Color figure online)

3.2 The Human-House Interaction Model

The Human-house interaction model is composed as previously show in Fig. 1. The interactive icons placed in the environment was designed considering its function, simplicity and intuitive meaning. The icons are tested in diverse ambiances, with different conditions of light, distance from the webcam, and elements in the surrounds, to assure its correct recognizability by the computer vision model. The interactive functions are elected by its pertinence and utility. In Fig. 4 there is five icons of interaction used by our model, presented with their names and related functions.

Fig. 4.
figure 4

Icons set of actions of interaction

Its is pertinent mention that our model is expansible and more icons can be introduced in the application.

4 Experiments and Results

The experiments were realized in several residential environments varying the positioning of the camera and with different levels of luminosity, including total absence of light. The process occurred through the attempt of interactions and the answers obtained as results. The environments where the tests were performed can be seen as in Fig. 5.

Fig. 5.
figure 5

Environments used in the tests

In total, 530 interactions were executed in ambiances with luminosity, being divided into 106 interactions attempts per option. The results in these conditions are presented through a confusion matrix, unsuccessful interactions (UI) and individual accuracy, as seen in Table 1. In addiction, the total accuracy was calculated, with a result of 96.04%.

Table 1. Confusion matrix and individual accuracy of interaction in environments with luminosity

According with the objectives of this work, which refers to operation in nocturnal conditions, have been realized specific tests for these settings. Were executed the total of 205 interactions, distributed in 41 attempts per option. As done previously, the results are showed through confusion matrix, unsuccessful interactions (UI) and individual accuracy. The Table 2 exposes the results. In this case, the total accuracy obtained was 96.59%.

Table 2. Confusion matrix and individual accuracy of interaction in dark environments

In Fig. 6 is presented a graphic where interaction actions can be seen with its respective coefficient of dissimilarity, obtained using the Eq. 1, representing the human-house interaction value (\(\theta _{HHI}\)). Only \(\theta _{HHI}\) values above 0.8 are considered as a interaction by the user, triggering the send of the command of action. In the test execution of the Fig. 6 was intended the execution of twenty-nine interactions, with different functions icons, which all of them been recognized correctly.

Fig. 6.
figure 6

Graphic of interaction in relation of the function action to the respective value of \(\theta _{HHI}\)

The tests presented consistent results in both settings of light and scenarios, reaching high accuracy values. It is remarkable that none action have been triggered instead of another, corroborating the efficiency of the computer vision model adopted. Also, it is noteworthy that unsuccessful interactions are defined as the no recognized user interaction by the model, and not a interaction action executed arbitrarily, the last case being a more problematic situation.

Lastly, the interactions actions have been detected with confortable margins, with values of \(\theta _{HHI}\) close of one, for the actions actually desired, and zero for the others, demonstrating robustness of the model.

5 Conclusions

In this work, a model of interaction based on artificial intelligence techniques and computer vision was proposed for smart homes without use of sensors or controllers. The proposed model was able to detected and recognized the icons of interaction correctly in a uncontrolled environment, with different distances from the camera used and the target icons, and different luminosity variances in the ambiance, rarely being necessary parameters changes or any type of calibration.

As presented in the previous section, the results exposed that interactions made by the user have been correctly identified in almost every action of interaction, which reinforces the confidence in the model been proposed.

Additionally, our model works in day and night conditions, only using a ordinary camera, dispensing special sensors or another controllers, with allows a low cost, easy learning and more natural form of interaction only using gestures. These characteristics was possible by the use of artificial intelligence and computer vision techniques, directing the complexity and cognitive enforces to the model and not to the users.

Conclusively, the model demonstrated satisfactory results, concretely offering a new option to control a residence. As future works, improvements in the method for detect interactions can be investigated, applying more intelligent approaches. Furthermore, a segmentation technique, providing a lower computational cost, and better segments quality, could be developed.