Keywords

1 Introduction

Followers of the quantified self trend assume that collecting biological, physical, behavioral, or environmental information can help to improve one’s well-being and performance [1]. In line with this idea, self-monitoring is a frequently used method in behavior change interventions [2]. One risk of this approach is, however, that self-monitoring applications do not help people to reach their personal, actual goals (e.g., reducing subjective stress level), but the collection of data itself becomes the goal [3]. Therefore, the reflective stage [4, 5] is a crucial one before the intended behavior change can occur [6]. To facilitate this stage, it is essential to provide suitable visualizations to allow users to answer health-related questions and to decide whether a certain behavior should be maintained or adapted [7]. Some of these questions about the user’s behavior have already been identified by Li and colleagues [8]. In the context of the visualization of long-term health data we suggest that the following three types of question are of special interest:

  1. 1.

    Progress over time (Has my consumption of e.g. alcohol changed over time?)

  2. 2.

    Correlations between different health behaviors (Is there a correlation between e.g. my subjective stress level and sleep behavior?)

  3. 3.

    Health consciousness (Am I health conscious with regard to e.g. my consumption of coffee?)

2 Data Visualization in Health Apps

Some data visualization heuristics which facilitate the reflection of personal health data are summarized by Cuttone et al. [7]. Aside from avatars [9, 10], notifications [11], and abstract arts [12], charts [5, 13, 14] are the most common form of data visualization in modern health technology.

The common representation for explorations of patterns in time series data is the line plot [7]. The bar chart with time on the x-axis is also often used for long-term visualization (e.g., Sony LifeLog, Fitbit). Superficially, these visualizations seem to be perfect for the first two question types. However, they do not take into consideration how the respective health behavior should be appraised and therefore do not allow making decisions for or against behavior change [15]. Fitbit saves this gap, e.g., by providing additional goal fulfillment charts, which depict in a colored ring to which degree the personal goal of the user is achieved. To date, effective visualizations, which combine time series data and the appraisal of this data in one chart are, however, rare.

3 Methods

3.1 Types of Visualization

To solve this issue, we compared two alternative long-term visualizations of health behavior: an accumulated bar chart and a point chart (see Fig. 1). In the accumulated bar chart, each bar represents the appraisals of data entries for one week (red = not recommendable, yellow = might still be recommendable, green = recommendable). In the point chart, time is displayed on the x-axis and the data values on the y-axis. The background is colored according to current health recommendations by the World Health Organization [16], the European Food Information Council [17], the German Nutrition Society [18], and the National Sleep Foundation [19, 20].

Fig. 1.
figure 1

(a): Accumulated bar chart based on appraisals of data entries for one week for each bar. (b): Point chart with time on x-axis, value on y-axis, the background is colored according to the respective health recommendations (red = not recommendable, yellow = might still be recommendable, green = recommendable) (Color figure online)

In order to compare these two designs with respect to their suitability to answer the presented question types, four fictional datasets covering the following ten health behaviors and stress-related factors were constructed: sleep [21, 22], exercise [16], portions of unsweetened drinks [23], fruits and vegetables [24], caffeinated drinks [17], alcoholic drinks [25] positive [26] and negative events [27,28,29], mood [30], and subjective stress level. The data sets covered a period of 8 and 16 weeks.

3.2 Participants

Twenty young adults participated. All except one (a trainee) were students of the University of Kaiserslautern. The mean age was 23.1 years (age range = 20-27 years, standard deviation = 1.80 years). Participants were randomly assigned to two groups which did not differ with respect to gender distribution (5 males and 5 females per group). The mean age was, however, higher in group A (mean age = 24 years) than in group B (mean age = 22 years), t(18) = 2.53, p = .02. All participants had former experience with smartphones or tablets.

3.3 Procedure

All participants answered three questions with regard to progress over time, correlations between different health behaviors and stress-related factors as well as health consciousness for all four datasets (3 × 3 × 4 = 36 questions). The sequence of visualization for the data sets was cross balanced over participants (group A: bar chart for dataset 1 + 2, point chart for dataset 3 + 4; group B: point chart for dataset 1 + 2, bar chart for dataset 3 + 4). After having completed the questions for each type of visualization, participants were asked to rate the usability of each type of diagram based on the following subscales of the Mobile Application Rating Scale (MARS) [31]: performance, ease of use, navigation, layout, visual information, graphics, and visual appeal. The whole procedure was embedded into an interview to allow general comments justifying why the respective answers were chosen. The interview took about 45 min.

3.4 Data Analysis

The two types of visualization were compared with regard to differences in response pattern for each question, the number of questions for which participants were unable to pick an option, perceived difficulty to answer the question, as well as usability aspects based on MARS [31].

4 Results

4.1 Response Patterns for the Two Types of Visualization

Differences in response pattern between the two types of visualization were found for all question types, but not in each trial. Pearson’s Chi2 tests revealed significant differences for the following questions and data sets.

Progress Over Time

  • “How has the consumption of fruits and vegetables developed over time?” resulted in the following response patterns: bar chart with 90% “improved” and 10% “stayed the same” answers, point chart with 50% “improved” and 50% “fluctuating” answers, χ2(2) = 7.14, p = .03 (see Fig. 2).

    Fig. 2.
    figure 2

    Both versions of visualization for the consumption of fruits and vegetables in data set 2 (left bar chart and upper point chart) and the consumption of caffeinated drinks in data set 3 (right bar chart and lower point chart).

  • “How has the amount of caffeinated drinks developed over time?” resulted in the following response patterns: bar chart with 90% “increased” and 10% “stayed the same” answers, point chart with 10% “increased” and 90% “stayed the same“answers, χ2(1) = 12.80, p ≤ .01 (see Fig. 2).

Correlations

  • “Is there a correlation between the amount of exercises and the consumption of water?” resulted in the following response patterns: bar chart with 20% “unable to pick an option” and 80% “no correlation” answers, point chart with 50% “positive correlation” and 50% “no correlation” answers, χ2(2) = 7.69, p = .02. (actual correlation of scales: r = .51, see Fig. 3).

    Fig. 3.
    figure 3

    Both versions of visualization for the amount of exercise in data set 4 (left bar chart and upper point chart) and the consumption of water in data set 4 (right bar chart and lower point chart).

  • “Is there a correlation between the amount of sleep and the consumption of alcohol?” resulted in the following response patterns: bar chart with 100% “no correlation” answers, point chart with 10% “positive correlation”, 50% “no correlation”, and 40% “unable to pick an option” answers, χ2(2) = 6.67, p ≤ .04. p = .02. (actual correlation of scales: r = .09, see Fig. 4).

    Fig. 4.
    figure 4

    Both versions of visualization for the amount of sleep in data set 4 (left bar chart and upper point chart) and the consumption of alcohol in data set 4 (right bar chart and lower point chart).

Health Consciousness

  • “Is the person who inserted these data health conscious with regard to the amount of exercises?” resulted in the following response patterns: bar chart with 90% “yes” and 10% “no” answers, point chart with 60% “yes” and 40% “no” answers, χ2(1) = 5.50, p = .02 (see Fig. 5).

    Fig. 5.
    figure 5

    Both versions of visualization for the amount of exercise in data set 2 (left bar chart and upper point chart) and the consumption of fruits and vegetables in data set 2 (right bar chart and lower point chart).

  • “Is the person who inserted these data health conscious with regard to his or her consumption of fruits and vegetables?” resulted in the following response patterns: bar chart with 20% “yes” and 80% “no” answers, point chart with 100% “yes” and 0% “no” answers, χ2(2) = 20.00, p ≤ .01 (see Fig. 5).

4.2 Inability to Pick an Option

Concerning the inability to pick an option, an analysis of variance (ANOVA) with sequence of visualization types as between subject factor (group A: bar chart for dataset 1 + 2, point chart for dataset 3 + 4; group B: point chart for dataset 1 + 2, bar chart for dataset 3 + 4) and the within subject factors question type (type 1 = change over time, type 2 = correlations, type 3 = health consciousness) and data sets (data set 1, data set 2, data set 3, data set 4) revealed a main effect of question type, \( \text{F}(2\text{,}36) = 14.28, \text{p}<.01, \eta_\text{p}^{2} = .44 \), that can be explained by the fact that there was no single trial for question type 1 (progress over time) in which participants were unable to choose an option. In contrast, the other two question types (correlations and health consciousness) did not differ from each other, t(19) = 1.44, p = .17. Moreover, the bar chart (group A data set 1 + 2 and group B data set 3 + 4) resulted in more trials in which participants could not pick an option compared to the point chart (group A data set 1 + 2 and group B data set 3 + 4), t(19) = 2.46, p = .02, indicated by a significant interaction between sequence of visualization types and data sets, \( \text{F}(1\text{,}18) = 5.76, \text{p}<.03, \eta_\text{p}^{2} = .24 \) (see Fig. 6).

Fig. 6.
figure 6

Mean sum of “I don’t know answers” for all three question types, illustrated for the two groups and the different datasets. Trials in which data sets were presented in the bar chart (group A data set 1 + 2 and group B data set 3 + 4) resulted in more “I don’t know” answers, as compared to trials in which the data sets were presented in the point chart (group A data set 3 + 4 and group B data set 1 + 2).

4.3 Perceived Difficulty to Answer

The ANOVA for perceived difficulty of answers revealed the following results: There was no systematic difference between the two versions of visualization for perceived difficulty. There was a main effect of question type, F(2,36) = 23.97, p < .01, ηp2 = .57, indicating that question type 2 (correlations) is perceived as more difficult compared to question type 1 (change over time), t(19) = 4.64, p < .01) or question type 3 (health consciousness), t(19) = 7.45, p < .01, which did not differ from each other, t(19) = .81, p = .43.

4.4 MARS Ratings

Finally, one ANOVA for each MARS subscale was conducted. The results for the usability aspects were mixed with preference for the point chart with regard to performance, F(1,18) = 6.79, p = .02, ηp2 = .27, and preference for the bar chart with regard to graphics, F(1,18) = 9.97, p < .01, ηp2 = .36 and visual appeal, F(1,18) = 7.13, p = .02, ηp2 = .28. No preferences were found for the remaining scales (ease of use, navigation, layout, and visual information). The overall usability scores were medium to high for both types of visualizations.

5 Discussion

Based on these results, we will discuss the advantages and disadvantages of both versions of visualization and refer to additional tools for the effective visualization of correlations in health apps.

5.1 Comparison of Both Visualization Types

Based on the pattern of results, we identified three main differences be-tween the two types of visualization:

  1. 1.

    Detection of fluctuations

  2. 2.

    Interpretation of the raw data

  3. 3.

    Interpretation of the full color spectrum

Besides the identification of trends within time series data, the detection of periodic patterns has been pointed out to be fundamental [7, 32]. The fluctuations in data set 2 (see Fig. 2) were, however, more frequently detected when presented in the point chart, indicating that the accumulation algorithm of the bar chart covers up some of the periodic variance in the original data.

Moreover, our results support the assumption that participants used the information from the raw data and the y-axis when it was available. Passing the border from green to yellow in data set 3 was interpreted as an increasing trend for coffee intake in the accumulated bar chart, whereas participants who rated the point chart mostly did not observe this trend, as they probably considered the total amount of cups (see Fig. 2). This means that people do not seem to be overloaded by too much information when both the raw data and the corresponding appraisals are provided. This is also supported by the fact that ratings of perceived difficulty to answer did not differ between the two types of visualization and that the point chart resulted in fewer trials in which participants were unable to pick an option.

Finally, the presentation of the full color spectrum of appraisals also seems to play a role. Persons rated the exercise sheet of data set 2 more frequently as to be health conscious when it was presented in the bar chart (see Fig. 5); probably because in contrast to the point chart, it was not obvious that the full color spectrum also includes green ratings. This bias might be stronger for inexperienced users of such a system, as the traffic light feedback system should be adopted easily over time [33].

Taken together, although both charts resulted in satisfying overall usability scores, our findings are in favor of the point chart, as it allows detecting fluctuations more easily and does not distort the original data. There were no indications of information overload when both raw data and appraisal are presented within one chart.

5.2 Visualization of Correlations

The analysis of the participants’ perceived difficulty to answer revealed that the questions regarding correlations were rated with the highest difficulty. This was found for both types of visualization. However, the number of trials in which participants were unable to pick an option was higher for the bar chart. As a result, we recommend using the point chart instead of a cumulated bar chart.

Some participants also suggested using trend lines instead of single points only to facilitate the observation of correlations in the point charts, as this approach reduces unavoidable noise [7]. The most critical factor that complicated the observation of correlations, however, was that the two diagrams were not displayed simultaneously on the screen. Therefore, we recommend using an additional tool in which two point charts or trend lines can be displayed at the same time. Other approaches to visualize correlations have been summarized in Cuttone et al. [7], including scatterplots, scat-terplot matrices, and corrgrams [34].

5.3 Conclusion

The visualization of long-term health data is a challenging task. We suggest that by coupling quantified-self data and appropriate feedback, users can decide more easily, whether they are reaching their goals and if not, how they can adapt their behavior to achieve them. This work provides first insights how appraisals of the respective health data can be integrated by means of a traffic lights feedback system in a bar or point chart.