Keywords

1 Introduction

Non-trivial virtual reality (VR) environments usually require implementation of complex forms of user interaction, including spatial navigation within the whole environment as well as various forms of interaction with particular components of the virtual world. The forms of interaction available in a given environment depend on the interaction channels provided by the particular VR system used. There is a great diversity of virtual reality systems – from simple 3D-enabled desktop environments, providing interaction through a mouse and a keyboard, through different kinds of head-mounted displays, where interaction is typically implemented by head-tracking, to high-end immersive systems, featuring advanced interaction techniques, such as body tracking and gesture recognition. Often specialized interaction devices are used, but this creates a barrier to the expansion of virtual reality into new areas. Vast diversity of available VR system configurations forces application designers to build VR applications for specific systems, which results in large fragmentation of the market. Migrating an application from one environment to another is difficult and time-consuming. These issues severely limit the current use of VR applications.

In this work, the use of semantic techniques for modelling of possible user interactions is proposed. On the VR system part, there is an ontology describing available interaction channels and techniques in a particular VR system configuration. On the VR environment side, an ontology describing interaction capabilities in the virtual environment is provided. Based on these two ontologies, an automatic mapping can be performed, to match the available interaction techniques in the VR system with the interaction capabilities of the environment. Because, the system-side ontology includes a semantic taxonomy of typical uses for particular interaction techniques, intuitive use of interaction channels can be automatically proposed – for example, a “Cancel” action in a VR environment can be mapped to the “X” button on a game controller, left-hand waving in gesture-based system, or back-button of a head-mounted display (HMD).

In the presented implementation, an authoring tool based on the Unity 3D editor is used. The tool enables describing interactive objects in a virtual environment with semantic metadata. At the runtime, a user can the control the virtual environment by the use of a mobile application. Mobile devices are currently widely available and provide advanced user-interface features, including high-resolution touch screen displays and various types of built-in sensors, such as gyroscope and accelerometer. In an interaction context, interaction metadata are sent to the mobile application to automatically generate a contextual personalized user interface. The interface depends on the collection of objects the user can interact with in a particular context. User interaction data are then sent from the mobile application to the environment, enabling the user to conveniently use all interactive functions of the virtual environment. The system has been tested on a Powerwall VR system.

The remainder of this paper is structured as follows. Section 2 situates the work in the context of resilient systems. Section 3 provides an overview of the current state of the art in the domain of methods of implementing interaction in VR environments. Section 4 describes the proposed method of semantic modelling of interactions in VR. Section 5 presents a reference implementation of the proposed method. Finally, Sect. 6 concludes the paper and indicates the possible directions of future research.

2 Relationship to Resilient Systems

With the constant progress in technology and hardware performance, virtual reality becomes a part of our everyday life. Activities that have been undertaken in the real world so far are now often replaced by activities in a virtual world. VR techniques allow us to meet our needs – both the basic ones (such as shopping) and those of a higher order (such as contact with art). Moreover, there are no physical barriers in the virtual world – virtual reality offers the possibility of “travelling” to distant places in no time, no cost and without all the dangers associated with traveling and staying in crowded places.

One of the main barriers to using virtual reality is the need for specialized presentation and interaction devices. There is a great variety of VR devices currently available. The offered systems differ significantly in their characteristics and capabilities, making it difficult to build applications that could be widely used. Even if multiplatform development tools (such as game engines) are being used, the diversity of the available input and output channels and their characteristics undermine the cross-platform use of VR applications.

The approach proposed in this paper enables dynamic adaptation of VR applications to the specific characteristics of interaction channels available in a particular VR system, enabling an application to run in multiple different hardware setups. As a result, the proposed solution may contribute to easier deployment and consequently wider dissemination of VR applications.

3 State of the Art

In this section, the most common approaches to user navigation and interaction in VR environments are described. Beyond the use of standard user interfaces, methods based on the use of specialized interaction equipment, natural user interaction, contextual approaches, and dedicated remote interfaces are presented.

3.1 Generic Input Device Approach

Interaction and navigation in VR environments can be implemented with the use of standard input devices (i.e., mouse and keyboard). Indirect mapping of 2D mouse interaction into 3D space can be implemented with the use of mouse or keyboard buttons [1]. However, the use of a mouse and a keyboard is problematic in immersive environments, such as HMD and CAVE. Moreover, although a keyboard enables relatively rich interaction, it is limited to two states only (button pressed or released), which may not be sufficient in modern virtual environments. In addition, these devices are generic, and therefore mapping of interface actions (mouse moves or pressing buttons) is often not intuitive and has to be memorized by users. Moreover, these devices do not provide any meaningful form of reverse communication, which prevents from implementing hints or facilitators for users. This makes the whole interaction complicated, inconvenient and, as a result, ineffective. An advantage of this approach is that users do not need additional devices. Generic input device approach is not recently a popular subject of research, however, methods of adapting this approach to touch-screens have been elaborated [2].

3.2 Specialized Input Device Approach

Specialized input device approach focuses on the use of specialized equipment – gaming input devices, such as joysticks and pads, or dedicated VR devices, such as haptic arms and flysticks – to navigate and interact in virtual environments. A significant advantage of this approach is higher user comfort and good control and accuracy in properly designed and configured environments [3]. A general disadvantage is the natural limitation of the number of available buttons and other interaction elements, which – in addition – do not allow creating user-friendly interfaces in context-based applications. Device-based approach is often the basis for further research associated with virtual reality [4, 5].

3.3 Natural Interaction Approach

Natural interaction is a widely used method of interaction with virtual environments. This approach is based on a projection of natural human behaviour within virtual reality; it includes techniques such as motion capture (using marker tracking [6] or marker-less tracking, e.g., with Kinect sensor system [7]), gesture recognition [8], eye tracking [9], and verbal/vocal input [10]. All these techniques focus on providing an intuitive natural interface, which is user-friendly even for non-experienced users. Main problems encountered when using natural interaction are lack of precision and users’ fatigue. This approach also requires specific sensor equipment, which registers and analyses users’ behaviour.

3.4 Context-Based Approach

Context-based approach is an interaction technique invented long time ago [11]. This approach is popular in computer games, in particular simulations (e.g., “The Sims” and “SimCity” series) and adventure games. In practice, after initialization of interaction with a specific game object, the current context (e.g., time, position, current object state) is analysed and proper user interface is dynamically generated and presented within the virtual scene. In modern virtual environments, standard context-based approach is often used [12]. However, this approach is uncomfortable due to difficulty in navigation and the mismatch between classical UI elements (buttons, menus, charts) and the 3D virtual environment.

3.5 Dedicated Control Interface Approach

Dedicated control interface approach uses separate devices with their own CPUs to navigate and interact within virtual environments. Due to necessity of data exchange between the control device (the client) and the environment (the server), the communication aspect appears. Two most popular technologies used for such communication are Bluetooth [13] and WiFi [14]. This approach is used not only for VR interaction, but also for controlling vehicles (e.g., drones) and robots [14].

In the dedicated control interface approach, a predefined user interface is implemented on a control device. A user can manipulate the interface, which sends events (like pushing buttons) to a server, where appropriate actions are performed. Currently, application of this approach does not longer require implementing a dedicated client application due to the availability of generic software packages (e.g., PC Remote application by Monect [15]) that allow to configure interfaces depending on the user’s requirements or to choose one of predefined interfaces (e.g., a TV remote control).

3.6 Discussion of Existing Approaches

Five common approaches to navigation and interaction within virtual environments have been described above. Mixing different techniques is also a commonly used solution. For example, specialized device approach and natural interaction approach can be mixed. In this case, comfort and functionality of a specialized device is combined with intuitiveness of natural interaction [16]. However, this type of combination requires even more devices, which makes it highly specific, expensive, and difficult to implement in practice.

There are also specific techniques that cannot be simply assigned to any of the above categories, e.g., usage of specific props for object manipulation within a CAVE-type system [17], which however are out of the scope of this paper.

4 Semantic Modelling of User Interactions

Currently, interaction channels are strongly related to the specific VR system configuration. Applications are usually created for a specific setup, which has a predefined set of user interface channels. This reduces the applicability of VR technology and makes users dependent on hardware.

However, the use of semantic techniques allows building resilient applications that can be dynamically adapted to the available input and output devices. This requires the creation of several ontologies. The first one describes VR systems and their configuration. This ontology classifies the interface elements, taking into account the senses associated with them, such as physical touch, sight, hearing, smell and balance, and particular actions. Next, the interaction channels available in the current system configuration are recognized. For example, if a user is in possession of an HMD interface, all data input techniques for the application will be recognized, such as reading the position of the gyroscope and the user’s physical contact with the button located on the device.

The second ontology describes interactions foreseen in a VR environment. It includes all kinds of activities related to user’s navigation and interaction in the environment. The basis of this ontology is a general model that takes into account navigation in many degrees of freedom and any possible interaction with virtual objects. Subsequently, the ontology is constrained by application-specific constraints. They may concern both the reduced number of degrees of freedom (e.g., if the application does not allow a user to “fly”), as well as limited interaction with objects (e.g., when the user cannot influence the arrangement of objects on stage or change their size). Both these ontologies are independent of each other (Fig. 1).

Fig. 1.
figure 1

Ontologies of interaction channels – VR system (A), VR environment (B)

Additional ontologies and taxonomies can be used to represent typical uses of interaction elements (e.g., escape or return button to represent cancelling actions) and specific user preferences (e.g., a user prefers to use an interaction device instead of using body movements). Taking this information into consideration enables creation of more user-friendly personalized interfaces.

Based on the available ontologies, an appropriate mapping of actions in the VR environment to the available interaction channels of a particular VR system can be generated. The mapping is performed in such a way that at least one interaction option is assigned for each possible action in the VR environment. If the number of interaction channels exceeds the number of activities, it is possible to leave unused channels of interaction. These channels can also be combined for making application interface more intuitive. For example, moving just the knob on a gamepad can be used for the movement of a virtual character, while moving this knob in combination with holding down the action button can be used for looking around the scene.

The semantic description of possible actions within a VR environment enables the mapping algorithm to assign intuitive interaction methods to actions. It enables faster adaptation of non-expert users to a VR environment thanks to their experience with similar applications and it allows smooth application transfer from one VR system to another. Intuitive use of interaction channels can be automatically proposed – for example, a “Jump” action in a VR environment can be mapped to the spacebar on a keyboard, a sudden lifting of the head on the HMD or literally a jump in the case of a tracking system.

An important element of the proposed approach is the selection of classification criteria for ontology describing capabilities of VR systems. As new input/output VR devices are constantly emerging, the method cannot be based on the classification of the available equipment. The criteria should enable classification of each new device. The solution proposed in this article is classification based on natural human senses, which limits the number of classification groups.

VR devices often use many senses simultaneously to interact with a user. In addition, most devices can act as both input and output devices. For example, a large part of HMD devices offer input interaction through touch (pressing buttons), balance (using the built-in gyroscope), and sometimes also sight (eye tracking) and hearing (voice control). They also provide output stimuli through sight (image), hearing (sound), and touch (vibrations). The proposed classification method forces the division of functions of each device into a limited number of categories. Such a solution makes these functions repetitive between different devices, which allows the interaction channels to replace each other.

Figure 2 depicts the overall architecture of a system implementing the presented method. The architecture consists of three main elements. The first element is a group of interaction devices. Since many devices act as both input and output devices at the same time (e.g., with a mobile device it is possible to enter data into the system as well as display the application content) all these devices are treated as one element of the architecture. The second element is a VR environment represented by any of the specific applications. These applications contain sets of foreseen interactions and – depending on their nature – enable a user to interact with the environment in a variety of ways. The third architectural element is the VR system. With the use of a mapping ontology the system assigns available interaction channels to the interaction capabilities of the VR environment. After completing the mapping process, a user is able to interact with the application in the real time with the use of the available equipment.

Fig. 2.
figure 2

System architecture

5 Implementation

A prototype of the system implementing the proposed method has been developed using the Unity 3D engine. An interface that allows describing objects in virtual environments using semantic metadata directly in the Unity editor is presented in Fig. 3 (top-right). The described objects are presented in a virtual scene (Fig. 3, bottom). A user can navigate through the scene and initiate interaction with objects by directing towards them. When interaction with a specific object becomes possible, semantic metadata related to this object are sent to the mobile device. Since in the current version of the prototype the only input device is a mobile device, all interaction options associated with the currently active object are presented on the screen (Fig. 3, top-left).

Fig. 3.
figure 3

Prototype implementation

Selecting one of the interaction options triggers appropriate action on the server side. In this way, a two-dimensional user interface is extracted from the three-dimensional presentation environment. WiFi connection has been used for communication between devices due to its speed, safety and sufficient range [14]. The popular JSON format has been used for structuring exchanged messages. All of this has been tested on a Powerwall setup with a 3.5-m screen and 4 projectors providing stereoscopic projection with 4 K resolution at 120 Hz (Fig. 3, middle-right).

6 Conclusion

In this paper, semantic modelling of user interactions technique has been proposed. Ontologies related to the VR system configuration and the VR environment are used in connection with additional mapping ontologies describing possible matching and user preferences. Based on the semantic descriptions, automatic mapping of interaction channels to VR environment actions is possible. The presented approach enables building resilient VR applications, in which the interface of a virtual environment can be adapted to any VR system. This may lead to simplification and acceleration of the development of VR applications.

This paper provides a foundation for future work on creating a general model for a semantic description of user interaction in VR. The prototype application is limited to interactions with one device – a smartphone or a tablet. Extending it to other devices requires the use of an extensive and well-structured ontology on the VR system side. In addition, not only the interaction channels, but also the presentation channels should be also taken into consideration in the general semantic description of user interactions in VR.