Keywords

1 Introduction

In the last years, it is possible to notice a rapidly growing amount of data available for analysis, resulting not only from process automation but also from the increase in the data storage capacity [1]. Therefore, there is frequent need to deal with large, multidimensional datasets containing large volumes of data, often too complex to be interpreted in their brute form, and execute analytical and exploratory tasks to extract interesting patterns from them. Some of that information is easily understandable by humans when the right presentation is used. That is why plenty of Information Visualization (IV) techniques were developed, in order to provide tools to facilitate the analysis and interaction with data by humans through graphical abstractions and interaction metaphors [2]. Given the need to allow end users autonomy to generate and edit visualizations according to their need and independently of the nature of the information; this work aims to underscore the importance of end user participation in the creation and support of these graphical abstractions of data by providing a tool that supports the interactive creation of visualizations.

Despite information visualization tools often being indispensable to support data interpretation, there are still deterrents to their use. One of these is the difficulty of visualization development, which often requires advanced knowledge in both programming and math to construct the dynamic visualizations and, ideally, knowledge of the data’s domain [3]. Other frequent problem is to find a reusable model in different application domains, since each particular subject demands different presentation layouts and interaction techniques to create successful visualizations [4]. Furthermore, in many cases it is important to preserve the privacy of part of the data, which may include sensitive information in the application’s domain, such as personal or financial information [5].

As a solution for this problem, we have specified the graphical interactive system – InterVis – for the creation of interactive Information Visualizations based on dynamic data. Ideally, it does not require technical user knowledge either in information visualization techniques or in code production, being therefore well suited for use by domain experts. The system is tested to verify whether this interactive creation of Visualizations without programming, allied to user knowledge of application domain, is more efficient from the perspective of usability without significant loss of flexibility. To perform the usability test of InterVis, we use a public collection of data about population statistics from Geography and Statistic Brazilian Institute (IBGE) [6]. The volunteers are invited to execute tasks of visualization construction and a qualitative evaluation is made based on their experiences by the application of USE Questionnaire [7].

This paper is organized into sections on Fundamentals of IV and Related Work, followed by the description of InterVis and its Experimental Tests and Results.

2 Fundamentals of Information Visualization

The amount of data that can be shown textually for an average human to interpret is only about one hundred items, which is impracticable when dealing with data collections with millions of items [8]. The field of Information Visualization aims to facilitate the exploration of natural human visual perception and pattern recognition abilities to find and interpret information [9] by providing the communication of abstract data using visual interactive interfaces [10].

The way data is represented depends directly on which problems users are trying to solve, so the visualization can vary according to data types and their relationships [11]. As an example, time line visualizations [12] are ideal to describe personal history while graphs can be very effective in representing relationships [13].

Based on that, Shneiderman [11] proposes one of the most important concepts in Information Visualization, the mantra “Overview first, zoom and filter, then details-on-demand”. Thus, in visualization, it is important not only to present information well at first but also to let users interact with it in effective ways to find what is necessary for them to execute a given task.

Another concept is the principle of transparency, which affirms that when the user focuses their energy in the task being executed, the tool seems to disappear [14]. In this context, we can say that the visualization should be noted as a data abstraction rather than a tool, which is the focus of user’s attention and helps them to execute tasks and make decisions based on data.

2.1 Components of a Visualization

During the process of creating a visualization, the user has to decide between the diverse visual items and characteristics the visualization may be composed of to represent the data. For instance, each tuple is represented by a visual item in which its variables correspond to the visual item’s appearance, such as shape, color, size and interaction function. In addition, we have a visualization, which is the space where related visual items are positioned according to a layout that abstracts their relationship. Lastly, there is the display, which combines one or more visualizations and other visual components; and enables the user to create interaction functions between them.

Keim [15] also classifies visualizations according to three criteria: data to be shown, the way information is laid out and the way users can interact with it.

Data Type.

According to Shneiderman [11], data can be organized according to problems the user wants to solve. Therefore, he proposes a taxonomy of tasks by data type, where groups of data can be divided in categories that infer their dimension and relationships; they are uni-, bi- tri- and multi-dimensional data, temporal data, hierarchical data (trees) and relationships (graphs). Besides, data is made up of a number of items, with each one corresponding to an observation that can be represented for diverse dimensions [8]. If we analyze them individually, these variables can be classified as nominal, ordinal, quantitative and intervals [16], or simply as ordinal and quantitative, being at some level considered names and intervals if necessary [17].

Layout.

Beyond the data dimension and format, another concept considered in the creation of a visualization is its visual dimension, limited by physical dimensions plus the dimension of time. For this reason, one of the challenges in Information Visualization is to use then the best way possible to represent abstract data. The result generally is bi- or tri-dimensional according to the resulting image; may be animated and allow interaction; and may contain other visual components such as legends, icons, menus and selection boxes [18] that give significance to characteristics of abstraction, such as colors and forms. Harger and Crossno [19] classify the visualizations’ layout, similarly to the data classification given by Shneiderman [11], dividing the item’s positions in Graph Layouts, Tree Layouts, Tabular Layouts and Georeferenced Layouts.

Interaction.

In the context of HCI, interaction can be described as the communication between user and system [20]. Thus, in IV context we consider any way that enables the user to interact with visualization and manipulate data by means of a graphical interface with visual components as icons and figures, and textual components as search boxes, labels and filters [18]. Shneiderman also proposes the essential interaction ways in any IV tool based in seven tasks: overview, zoom, filter, details-on-demand, relate, history and extract [11], and describes models of combining interaction functions, such as zoom and pan, with display space, later evolved to focus+context and overview+detail [21]. Those models aim to classify the visualization techniques based on the way the granularity levels of information are shown at the same time, providing a system that can answer user questions and allow the progressive refinement of data, known as hierarchical decision-making [11, 18, 2224].

2.2 Architecture of Graphical User Interfaces

Because of the growth of application development demand, the number of different high-level GUI toolkits has also been increasing, which raises the discussion about how to use them more effectively in order to avoid code replication and provide more integrative systems. When we are dealing with toolkits to help with GUI generation, it is possible to note that, despite their effectiveness in the creation of traditional interfaces, they are insufficient when it is necessary to create interfaces with novel components. This creates the need to write plenty of code to personalize the solution, depending on the toolkit limitations, as is the case with Java Swing [25] and AWT [26], that support the creation of simple components, but is not helpful when a new component solution is necessary. Adobe Flash Builder [27] is a popular example that does not even support different external integrations [28].

For this purpose, Bederson et al. [28] presented the trend in which bi-dimensional toolkits are usually implemented in a more concrete structure and the objects tend to look more like real life. Still, tri-dimensional toolkits have their architectures generally based on specification trees to generate scene graphs. Thus, they exemplified those two architectures with the toolkits Jazz and Picollo. The architectures were called respectively Monolithic, defined as the ones that use primary inheritance in compilation time to extend a functionality, such as in Pad++ [29], Jazz [30, 31] and Swing [25]; and Polilithic, defined as the ones that use primary composition in execution time to extend a functionality, such as in VTK [32], Java 3D [33] and Open Inventor [34]. In Fig. 1 it is possible to observe an example of a fade rectangle, where (a) shows the class hierarchy in compilation time and (b) the runtime scene graph [28].

Fig. 1.
figure 1

Class and runtime hierarchies in polylithic and monolithic architectures proposed by Bederson et al. [28].

2.3 Usability

Another important question in IV that frequently does not receive enough attention is the usability of the tool to create the visualizations or of the visualizations themselves and the ways to interact with them. ISO defines this characteristic as the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use [35]. One of the possible reasons is the particularity of each visualization which, to be properly assessed, may require specific knowledge and tests based on task type to be executed and data to be explored [36].

Information Visualization aims to explore the cognitive capability of human beings and give support to decision making by means of interactive visual interfaces [10]. Consequently, the usability, not only in interactive creation of IVs but also in visualization’s exploration, is indispensable to develop the proposed solution.

3 Related Work

Diverse IV techniques have been created to solve the problem of high dimensional massive data analysis. However, in general, those who create the visualization are not those who use it. While those who create the visualization should know how to abstract the data and to implement it, they often lack knowledge of the data or application domain. This makes the process of data abstraction and finding its best representation a more complex problem where the creator of the visualization not necessarily knows the problem nor the data.

In addition, it is possible to find two kinds of tools and libraries to create visualizations: the tools that demand technical knowledge implementing visualizations; and the tools that may have a visual and easy-to-use interface and only require application domain knowledge. The first kind generally provides several reusable components and predefined visualizations and forms of interaction, but requires considerable code production and customization to fulfill IV requirements for a single application. Examples of that are the toolkits VTK [32], InfoVis [37] and prefuse [3], which are fully rich with 2D and 3D visual resources such as different layouts and interactions, but cannot be used to explore data without before being customized.

On the other hand, tools such as TABLEAU [38] and its predecessor Polaris [17], ASK-GraphView [39] and OpenedEyes [40] offer an interactive interface that does not demand code creation or customization, but restrict the visualization by the application’s resources and data types. It usually limits users to use the same static graphs or charts instead of allowing creation of dynamic Information Visualizations; and provides solutions only for a predetermined application’s domain. In both cases, user autonomy to manipulate data is limited by the tool or application used and their knowledge about them.

In addition, with the increase of new IV techniques, there are a plenty of patterns, guidelines and criteria formulated especially to evaluate information visualization tools [4144]. In general, these techniques are part of a user interface and include interaction patterns. However, since information visualization’s role is to serve as a tool for data exploration, there are bigger issues involved in its evaluation, as the tasks to be accomplished, the users and the data context.

Besides that, since IV aims to support the exploration by users, the usability of the IV tool is one of the major points that requires evaluation in order to guarantee that it not only provides an effective exploration technique, but also a satisfying and easy-to-use interface in each task’s context [43]. Following these lines, some works such as [4549] include usability questionnaires, personal user’s feedbacks, logs and observation of tasks.

4 InterVis

The purpose of the InterVis is to be a facilitator tool in the process of creation of Interactive Information Visualizations by users that do not necessarily have experience in programming or technical knowledge of Information Visualization but instead have good domain knowledge. It also aims to guarantee the privacy of raw data by providing a textual abstraction of data, as well as to explore user application domain knowledge during the creation of visualizations.

InterVis is based on a monolithic architecture, which relies on compilation time inheritance, in order to facilitate future extension of techniques, and allows the user to use static or dynamic data. Furthermore, the system’s visualization and interaction techniques are mainly supported by prefuse [3].

4.1 Interface

The interface of InterVis works primarily in one of two modes: edition and visualization. Besides these two models, it is possible to configure dynamically the data sources and handle the created reports.

Edition Mode.

The edition mode displays graphical tools to select data and associate it with aspects of the visualization (such as type, style, additional components like search boxes and legends) and interaction techniques (such as filters, zoom and pan, connection between two charts etc.). In Fig. 2, it is possible to observe the edition mode of a report. It is composed of a dataset panel, which shows datasets from available data sources to create the report and allows the user to drag and drop information in the visualizations; a preview panel, that generates a preview of how the report should look like; and a tools panel, where the report and their visualizations and interactions can be designed and configured.

Fig. 2.
figure 2

Edition perspective of InterVis

Exhibition Mode.

On the other hand, the visualization mode simply allows interaction with the generated visualization, where the data and interaction tools are predetermined on the edition mode in the form of a slide presentation report, as shown in Fig. 3.

Fig. 3.
figure 3

Exhibition perspective of InterVis

4.2 Data Sources

Users can load their own data in the tool or configure a source to import data from, so it can be dynamically explored according to their metadata. In addition, it is possible to use static data, which is saved with the reports, or dynamic data that reload according to data source updates. The source of data can be a tabular local file, a database configuration or a web service, according to user choice during report creation or edition.

5 Experimental Methodology

Usability is an important factor in software quality. For this reason, different patterns, guidelines and criteria have been created over the years due to the need for more usable interfaces [50]. However, IV tools face some problems when validating their interaction. One of the reasons is that visualizations depends directly on the context, the task to be executed and who is executing it [41], so traditional User Interface evaluation techniques may not be enough in the process of evaluation. Users need to look at data from different perspectives, which may require a long time. Besides, it is common that, during the exploration of visualizations, new questions are formulated and answered dynamically in a collaborative way, making it difficult to observe and measure this process of discovery [44].

Therefore, to validate InterVis, we describe a test in which the process of visualization creation is validated as a task instead of testing only data exploration and interface components per se. The principle is to validate not only the visualization generated, but also the system’s tools to create it dynamically from scratch. That way, participants have a compound task, to validate the usability of the creation interface and of the visualization that results from it.

To validate the system’s usability we apply the Questionnaire of Usefulness, Satisfaction and Ease of Use (USE) [7] in the end of the test, and collect the user’s commentaries during the test, in order to map details of interaction and tasks in the user’s viewpoint. The results are evaluated along with their previous comments and task completion, and intends to measure user experience in creating visualizations and not only exploring them.

6 Tests and Results

In order to validate the InfoVis, five participants were selected to represent the target population of likely users. The volunteer group includes system analysts that deal with diverse data contexts and have no specific knowledge in visualization development. The evaluation consists of a short test description followed by some adaptation time when the participants can freely explore the system and make questions. The participants are then encouraged to interact with the system interface and complete a number of compound tasks while describing their preferences and opinions about the experience. Each task takes approximately 15 min and is described as the creation of a new visualization from scratch based on a given dataset (public domain statistical information about Brazilian population provided by IBGE [6]). At the end of the tasks, participants answered the USE questionnaire [7] in order to record the experience from a usability point of view.

Users were asked to explore InterVis tools and use their intuition in choosing which resource would function better to complete each task. The reports resulting from tasks had different presentation according to each user, and it was observed that once a resource was used by some user at one task, the same resource tended to be used again in the following tasks. Despite that, all the users accomplished the task objective in the expected time.

Regarding our experiments, we noticed that despite the initial difficulties of users on the first tasks, their ability was progressive and growing even in just the short time allotted to the experiment, suggesting the system has good learnability. For instance, during the exploration phase users showed some doubts on where to start the visualization and drop the data dragged as well as in the buttons to edit a report and go to the initial page. In the case of the buttons, the default behavior of users was to look for them at the top of the window. However it does not usually take long time or repeats through the tasks.

Another characteristic mentioned is the visualization update when some configuration is changed. According to user reports, depending on what property is changed, it is not obvious what was updated in the visualization, or sometimes this update was not automatic and required saving the report to be applied.

Furthermore, we could observe some user difficulty in finding the data series and adding it to the visualization. There were frequent questions during the tasks about how to classify the data and create a legend. In addition, the section responsible for those resources was frequently the last area in the tools panel to be explored.

Based on the questionnaire results, we may conclude that despite InfoVis ease of use and satisfaction not having the most favorable evaluation, they were still above the expected. Usefulness and learnability were evaluated as pretty close to the ideal by users. We also received some interesting suggestions to implement, such as a filter based on intervals and a horizontal scroll on timelines to facilitate the zoom, resources that could make the visualization richer and more useful.

7 Conclusion

In this work, we describe a GUI for interactive creation of Information Visualizations from scratch and their dynamic exploration. The purpose is to provide a tool that allows the user to create dynamic report presentations without technical knowledge in neither programming nor direct contact with the raw data. In addition, the system’s architecture is designed to support easy extension of functionalities and connection to new data sources.

A usability study was conduced using the USE Questionnaire [7] and user’s descriptions. The results show that the system was considered above average in providing satisfaction and ease of user but that there is certainly room for further improvement in our tool, particularly regarding highlighting automatic updates, finding data series and creating legends. InfVis was evaluated as highly usable and easy to learn.

As next step, a deeper analysis could also show more gaps that were not explicit in this usability evaluation, such as performance and technical issues. In addition, a richer set of techniques and resources may improve the capacity of InterVis to attend different contexts, such as geographic and graph layouts, as well as different interaction and animation techniques. In spite of those improvements, InfoVis has succeeded in accomplishing the proposed objectives and give support to the easy manipulation of information visualization as expected.