Keywords

1 Introduction

Several studies point out the importance of interaction to information visualization (InfoVis) area, particularly the interplay between interaction and cognition [14]. This interplay, combined with the demands from increasingly large and complex datasets, has directed the InfoVis area to seek different ways to interact as alternative to the traditional mouse and keyboard.

The interaction focusing on Natural User interfaces (NUI) paradigm [6] appears as a promising alternative to interactions by conventional means. These NUIs are based on how people interact in the physical world, for instance, through gestures, touch, sketch and speech. This paper will address the NUI by Voice Users Interface (VUI) in InfoVis 3D environment.

Thus, this work aims to present aspects of design, development and evaluation of an interface for voice commands to interact in InfoVis 3D applications. To test the interaction proposal, was developed an InfoVis tool (IVOrpheus) that uses the 3D scatterplot technique, and to represent data, it uses three spatial dimensions: x, y, and z-axes. And three visual channels: color, shape, and size. The tool allows interaction by mouse and keyboard or voice commands in Brazilian Portuguese, which is performed by Coruja Software [14]. Good practices in VUI were adopted for the application design, as the voice commands automatic adaptability to the used dataset and the dynamic generation of the grammars used in the speech recognition system.

To test the proposal efficacy, usability tests with 10 users were performed to evaluate the application interface with and without interaction by voice. This usability test measured the time spent to complete the tasks by each user, and user’s workload using NASA Task Load Index [12]. The test tasks approach 4 of the 12 interaction tasks proposed by Heer and Shneiderman [3], which are: Visualize - specify a visualization of data by setting the axes, Filters - reduce the analysis data set, Select - Pointing to an item or region of interest, Navigate – apply zoom and rotate in the visualization.

Thus, this paper presents the results obtained in the usability test as good practices to assist researchers in development of InfoVis tools to approach the Command-and-Control Voice User Interface. In addition, the role and difficulties present in VUIs were addressed, since the uses of VUIs are underexplored in infoVis area.

2 Application

IVOrpheus [18] is an information visualization tool that uses scatterplot 3D technique, using three spatial dimensions (X, Y and Z) and three visual channels (color, shape and size) to represent data. Moreover, the tool accepts input by voice commands in Brazilian Portuguese.

2.1 Conceptual Aspects

IVOrpheus follows the basic guidelines of a good InfoVis tool, defined by [4] and commonly referred to as the InfoVis mantra, which are: overview of data - user should have a general idea of the data for analysis; semantic zoom - focus on a subset of the data; filters - reduce the analysis data set; and details on demand - present data that are not visually represented (hidden data).

Interface

The first guideline for building IVOrpheus environment is that both interactions by keyboard and mouse and by voice commands will share the same interface. In the development of the interface were considered six main guidelines [5, 11]:

  • Home equivalent: a command/button that returns users to a known starting point;

  • Back equivalent: a command/button that allows the user to backup one-step at a time;

  • Meaningful communication: user should easily identify the commands available for interaction and their meaning, and get help about them on the available commands on the screen;

  • Minimal user action: the commands should be simple on each screen. And the input data should be on the screen;

  • Consistency and standardization in interaction and screens: forms of interaction and standardization of screens are maintained, as well as the commands in different contexts for similar operations;

  • Speaker independent.

The IVOrpheus interface is divided into four main areas, as shown in Fig. 1: the Options bar (1), the Preview area (2), the Legend area (3), and the Menu bar (4). In all areas, each option presented on the screen can be performed by voice command, or mouse click.

Fig. 1.
figure 1

IVOrpheus Interface.

The labels of buttons and menus are the commands available for interaction by voice. For example, speaking the voice command “Filter” can access the filter option. The following voice commands are available and visually displayed on the Option bar: “open”, “save”, “screenshot”, “help”, “legend” and “details”. The Menu bar is enabled only after a database is loaded into the tool, and has the following initial commands: “configure”, “filter” and “interact”. All this commands are available in Brazilian Portuguese.

Features

The IVOrpheus features follow the main characteristics of a good visualization tool. Below, such features are presented:

Configure/Filter: you can configure or filter the axes x, y, z, and the visual channels color, shape and size. Figure 2 presents an overview of the features set. The setting or filtering of the axes can be either applied to categorical data (discrete values), as continuous data (floating values) and the visual channels color, shape, and size can be applied only for categorical data.

Fig. 2.
figure 2

Interaction by voice/mouse in IVOrpheus software

Interaction: In order to manipulate the visualization, the interaction can be applied to rotate, translate, increase or decrease the size of the chart. In addition, the interaction has the “initial state” and “stop” functionalities, as presented in Fig. 2, because IVOrpheus system continuously increment or decrement zoom, rotation, and translation. The stop functionality is used to pause the increment or decrement of these values. While the “initial state” button returns the visualization to their initial values of zoom, rotation and translation.

Options Bar: features contained in this bar are present throughout the execution of the tool, and the user can say these global rules at any time. For instance, the command “Open” can be used at any time to open and load a new base. The “screenshot”, “legend”, and “details” commands will only be enabled after an already configured loaded base. The option “screenshot” allows the user to capture the current screen, and share the user`s discovers via email or other means of communication. While the “details” option calls the details panel, where the user can configure which attributes are wanted to receive extra information and make such extra details visible when selecting a point.

IVOrpheus as well as all information visualization tools must meet the basic functionality to meet the definition established by Shneiderman [13]: “Overview first, zoom and filter, then details on demand”, since all of these features have been accomplished by IVOrpheus application in its features.

3 Usability Test

In this section is presented the test with users, participants and their profiles; procedure for the test; and the analysis of the results.

3.1 Procedure

Participants were presented to IVOrpheus tool through a training video and after that, it was applied a questionnaire to identify the user profile. Therefore, the tests start by giving the users the tasks and at the same time the users performance time was recorded. Upon completion of the tasks, it was administered the questionnaire to identify the level of difficulty in each scenario. And finally, the post-task questionnaire, NASA-TLX, was presented to the user, so that the users workload was measured.

3.2 User’s Profile

To identify the different users profiles, a pre-task questionnaire was applied. With the following questions:

  1. 1.

    Do you know the Cartesian plan/space for 2 and 3 dimensions?

  2. 2.

    Have you ever used any software in order to analyze a data table, example: Excel® (Microsoft) Numbers® (Apple) or other?

  3. 3.

    Do you make use or have made use of any application on your phone or computer using the voice as a input, for example, personal voice assistant, such as: the SIRI® Apple or Samsung S Voice® of? If so, how often (rarely or occasionally or frequently)?

  4. 4.

    Are you familiar with applications that use the three spatial dimensions in mobility from the user viewpoint, i.e. the camera, as Blender®, Zbrush®, Autodesk 3D Max®, among others?

Below, are shown in Table 1 the results of the questionnaire where the first five columns represents mouse users and the other five columns, the speech interface users.

Table 1. Questionnaire to identify user’s profiles. Legend, R- rarely, OC-occasionally and OF-often.

It is observed that due to the popularization of personal assistants in various embedded systems, users have had previous contact with the voice interaction technology, even by the fact that the frequency of use of voice software is a secondary mean of interaction. As can be seen, most of the users in Table 1 rarely make use of this technology. The use of three-dimensional environments is underutilization, due to the fact that most software and applications make use of two dimensions.

3.3 Training Video

It was presented, to all the volunteers who performed the tests, a training video lasting 7 min, which had the following points: introduction to visualization tool IVOrpheus, presentation of the interface and its functionalities. Thereupon, it was shown an example of how to open and load a base, and how to configure the spatial dimensions (x, y, and z) and visual channels (color, shape and size). Then, the users were introduced to the use of categorical and continuous filters. Finally, it was shown to the users how to use the details on demand function.

3.4 Tasks

Each user had 18 min to complete the tasks. If the users do not finish in time, the remaining tasks would be unanswered and the volunteer would be asked to complete the NASA-TLX post-task surveys and questionnaire to identify the level of difficulty of the scenarios. In the tests, it was used a database of cars from the 80 s, which contained 789 records and 16 attributes (7 continuous and discrete 9). Using this base, it was presented the following tasks to the users:

  • Configure the X-axis to brand, Y-axis to value and Z-axis to year. And find the most valuable car in 1986 between BMW, Isuzu and Dodge brands. After finding it, apply the zoom in it, and centering the point that represent the most valuable car on the center of the screen.

  • The IVOrpeus tool start with a loaded base set with the following attributes on the X-axis (brand), Y-axis (value), and Z-axis (year). Configure the visual channels color to cylinders, shape to type and size to traction. After that, find out what is the most valuable car in the value range of $ 22,965 and $ 34,875 and using the legend, write the quantity of cylinders, type and traction of the most valuable car in that range.

  • Select the car with the following dimensions X-axis (brand: Toyota), Y-axis (value: $11248), Z – (year: 1981), Color – (cylinders: 4), Shape – (type: Hatch), and size - (traction: 4 × 4). Use the details on demands functionality to select the desired point, and find the number of doors and fuel of the desired car.

3.5 Time

The user had at his disposal 18 min to fulfill all the tasks presented in the list. To measure user performance, tasks were timed as showed in Table 2, which illustrates the time spent by the group of mouse users. Table 3 shows the time of the group of users who used the speech interface to complete the tasks.

Table 2. Time of tasks and its subtasks (Mouse)
Table 3. Time of tasks and its subtasks (Speech Interface)

3.6 Questionnaire to Identify the Level of Subjective Difficulty During the Execution of Tasks and Subtasks (Scenarios)

After the completion of the tasks, users receive the post-task questionnaire to identify the level of subjective difficulty during the execution of tasks and subtasks (scenarios). This questionnaire consisted of a Likert scale [15] with five levels: very easy, easy, medium, hard and very hard. In Tables 4 and 5 is presented the results of the questionnaire for mouse group and the voice group respectively.

Table 4. Post task questionnaire results for mouse users.
Table 5. Post task questionnaire results for speech interface users

3.7 Nasa-Tlx

The user’s workload was evaluated with the NASA Task Load-Index [12], which is used to identify the overall workload in the different tasks and the main sources workload. Workload is defined by Hart [12] as a “hypothetical construction cost incurred by a human operator to achieve a given performance level”. The NASA TLX is estimated in six subscales (physical demands, mental, time, effort, frustration, and performance). Each of these dimensions has twenty discrete levels to print the user’s workload. The smaller the value of each scale, less is the weight of that scale in the final workload and vice versa. Shortly after the user identifies the value of each scale, it was used a sequence of 15 questions, in order to balance and better approach the overall workload felt by the volunteer. It is shown in Table 6 the results of NASA TLX to the users who used the mouse and Table 7 shows the results to the users who used the natural interaction interface. Finally it is shown in Table 8 averages of two groups.

Table 6. NASA-TLX results (Mouse)
Table 7. NASA-TLX results (Speech Interface)
Table 8. Results average to mouse and speech interface groups.

3.8 Results Analysis

The analysis verifies the requisites, functionality, effectiveness, efficiency, utility, and usability pointed by Mazza [15], which serve as indicators to attest good usability features in an information visualization software.

To measure the functionality, it is needed to verify the possibility, through the use of interaction by natural user interface, of doing all the actions that are achived by conventional means (mouse and keyboard). This is the proposal of this work, and to achieve that, it was created the IVOrpheus information visualization tool. This tool includes all the functionalities provided for Visualization Mantra. Which are:

  • Overview: view of a configured base in its entirety;

  • Zoom: allows the user to zoom in/out a chart. Thus, the user observes clusters and trends in the data;

  • Filters: allows the user to remove information from visualization, thus leaving to be analyzed only relevant information;

  • Details on demand: facilitates the user to obtain additional details about a point beyond the axis values and visual channels.

All these features were included for both mouse and speech interfaces, making possible, through the use of a natural interaction interface, to do all actions performed by conventional means of interaction.

To measure the fficacy, it is needed to verify if the proposed interaction allows the users to complete all tasks.

During the test with users, all participants who used the mouse completed all the tasks. While those who used the voice, four out of five (4/5) of users, were able to complete all the tasks, unless a volunteer who had difficulties in the third task, specifically the sub-task of applying details on demand. Because, that user has extrapolated the time the task was given as incomplete. Therefore, it can be concluded the efficacy of the voice interaction in 80 % of the studied cases.

To measure the efficiency, it is needed to verify if the use of voice input allows a shorter time and a smaller workload in performing the tasks.

As can be seen in Tables 2 and 3, the first task had an average time for mouse of 02:57 s and for speech interface of 04:04 s. This means that the mouse was 28 % faster than the voice, and the other subtasks from task 1 follow the same trend. To configure the axes x, y and z. The mouse had an average 44 % lower than the voice. While the sub-task to apply a categorical filter was only 23 % and zoom and rotate was 33 % faster.

The pattern is repeated in the second task with an average time by mouse 25 % faster, also configure color, shape and size mouse had a 16 % less time.

The same goes for filtering continuous values, which had a 27 % lower time by mouse. To read the visualization and interact with the mouse persisted faster with a difference of 27 % compared to voice. And read legend, again, the mouse excelled with 29 % faster than voice.

In the third task, the mouse had the advantage of being with an average 11 % faster. The most significant case where the mouse had a much shorter time, then the voice was in the details on demand, the average was 81 % faster than speech interface. This happens because selecting a point by voice requires a much greater complexity than just clicking a point with the mouse.

Underlining this, the total time to perform all the tasks had a difference of 12 % between mouse and voice.

To measure the usability, it is needed to verify if the speech interface use was simple and intuitive enough for the completion of tasks.

To certify the software’s usability, it was been proposed to users to answer two post tasks questionnaires. The first one was the questionnaire to identify the level of subjective difficulty during the execution of tasks and sub-tasks, which the result can be seen in Tables 4 and 5, as shown in these tables the frequency of difficulty for mouse was very easy (15), easy (10), medium (20), hard (4) and very hard (1). And for speech interface the frequency of difficulty was very easy (2), easy (26) medium (15), hard (7) and very hard (0).

In general, a larger number of values ​​are between the very easy, easy and medium levels. Therefore, the average values ​​show that the interaction by mouse was easy to interact and the voice interaction due to have a higher frequency between easy, medium and hard levels was an average medium difficulty to interact.

After the questionnaire aimed at measuring the level of difficulty of tasks, the users were asked to fill in the NASA-TLX questionnaire, which measures the workload for the tasks, as can be seen in Tables 6, 7 and 8.

It can be observed that physical demand was 68 % lower when using the mouse. It is likely that due to the user actually be more accustomed of using the mouse to interact with the computer than voice. Also this has led to a temporal demand 36 % lower in mouse group than speech interface, and consequently a 32 % less effort in using the mouse.

One of the greatest weighting factors to natural user interface was frustration, which achieved a 90 % greater when compared to the mouse. This factor influence on all scales measured, mainly in the subjective performance of the user, which was 51 % better in mouse than in the voice.

Frustration has its relevance in the test, because it identifies the user’s discomfort during the taks execution. This means that higher the frustration higher is the workload experienced by the user, as shown in TLX score that was 26 % better in mouse than in the speech interface.

In relation to time, workload and difficulty level, the mouse proves to be efficient, while the voice shows to be effective.

To measure the utility, it is needed to verify if in which context speech interface better benefit the user.

The voice input has the role to meet a wider range of users. For instance, users with motor problems or users without sensation in the hands, both can make use of IVOrpheus system to interact with a dataset. Thus, reaching the objective presented by Alan Kay [17], which is as more “friendly” is the interaction between man and machine, greater is the range of people reached.

4 Final Considerations and Future Works

In this work, an information visualization tool in three-dimensional environment was developed using the visualization technique scatterplot 3D, with voice input-based interface. During the tool’s implementation process, good practices were adopted related to grammars management used in the Coruja Software. These practices were:

  • Sort the voice commands into two types: voice commands independent of the selected base (Global Command) and commands related to the attributes belonging to each database (Local Commands);

  • Take into account the ASR (Automatic Speech Recognition) tool limitations: to determine the voice commands, take into account possible restrictions in used ASR, for example, recognition difficulties of some phonemes;

  • Choose commands near user’s natural language (UNL): For example, the tool should allow the user to enter the number (34427) the way the user says this number in everyday life. That is, as (thirty-four thousand, four hundred twenty-seven). In opposed to make the user dictate the number one by one. I.e. (three, four, four, two, seven);

  • Local commands: perform preprocessing on database in order to adjust the attributes present in the database in a readable form for the ASR system.

Based on the results of usability testing, it was observed that most users, who used the voice as input, had a high frustration and workload for the tasks completion. This may have been enlarged, because the voice recognition process sometimes generates erroneous outputs, mainly because of ambient noises or the user’s speech pace, among other issues.

Thus, these events directed the research to the following question: “Which approach the use of voice in information visualization technique of scatterplot 3D has greater efficiency in reducing the cognitive load and time of the user? “. As pointed out by usability test, the approach of using voice with the aim of mimic a mouse, have lower efficiency as well as time and workload using standard interaction interface (keyboard and mouse). However, the use of a natural user interface was shown effective, because it allowed most of the users to finish all the tasks displayed.

This observation guides the future of IVOrpheus application, to meet a proposed voice interaction more efficient and that requires less cognitive effort from the user.

Knowing that, for future work, the IVOrpheus system will address the use of the interaction by voice through dialogue, using the principle presented in the works [810], leaving the invisible interaction interface to the user, where it should not stick to interface commands, such as buttons and panels. However, it was only introduced the desired question and the system presented a visualization as response.

Moreover, it was proposed certain improvements and additions in the current tool features. Among them are:

  • The implementation of configure visual channels for continuous values, such as color assume a range of colors based on continuous values;

  • Add functionality to sort the data on the axes of ascending or descending order according to the user’s choice;

  • Enabling the derivation of data from the selected base, for example, table generating continuous averaging values, thereby generating new tables within the selected base;

  • Implement functionality to save the user state, so the users can continue their progress from where it left off, or can share their findings with others;

  • Create section for user notes, or allow that the user can comment by voice discoveries in the database;

  • And, finally, to extend user tests for viewing by dispersing points in two dimensions and compare with the current tests, ascertaining the increment/decrement of a dimension increases or decreases the time, difficulty, and exerting workload by the user.