Keywords

1 The Purpose of the Study

Conventional usability evaluation techniques as Think Aloud [7, 20], Heuristic Evaluation [21] and Cognitive Walkthrough [16, 27] are focusing on the users’ ability to interact conveniently and effectively with the application. But in complex application domains this is usually not sufficient. Here users’ understanding of the information content is often the main challenge—and the aspect of understanding is by nature difficult to uncover and evaluate.

Many examples of the importance of evaluating users’ understanding can be found in the eHealth domain [2, 3, 8, 18], an application domain which certainly is complex. This is a result of the complexity of understanding the health care issue at stake combined with the challenges inherent in adapting possible treatments to the condition of the individual client. The complexity has to be understood both from the clients’ perspective and the health professionals’ perspective in order to successfully design and make use of new applications. To contribute effectively the clients must understand their role in the treatment when using the new application; the health professionals must learn the new possibilities offered by the application, and perhaps reconsider existing work practices.

In the current paper we propose a new usability evaluation technique targeting users’ content understanding. The evaluation technique is developed and illustrated in a case study within eHealth, specifically in the setting of a rehabilitation clinic with the participation of physiotherapists and clients.

2 CUT with QU: The Proposed Usability Evaluation Technique

In order to uncover users’ content understanding during usability testing, it is necessary for the evaluators and users to collaborate and discuss issues of understanding and potential misunderstanding. Therefore we have chosen to build upon a usability evaluation technique which has direct collaboration between evaluators and users as its core approach, namely CUT: Cooperative Usability Testing [9], a technique that we are familiar with and have used in many different application domains.

In CUT, users and evaluators are brought together in a constructive dialogue in order to understand usability problems. This happens through a video-recorded interaction session (IAS), where the test user performs relevant tasks with the application to uncover usability problems following a standard procedure, e.g. Think Aloud [7, 20] or Contextual Inquiry [4].

Then follows a cooperative interpretation session (IPS) based upon the video of IAS. The IAS video serves as a medium to supporting the test user and the evaluators to recall situations of interest raised during the IAS. In the IPS the evaluators and the test user identify and discuss the most important usability problems. The aim is not to reach a complete description of the usability problems. The aim is to establish a clear understanding of the most important issues, as the full descriptions are reached afterwards through analyses of the documentation in the form of video recordings and evaluators notes. Here affinity diagramming [12] is highly useful.

In “CUT with QU”, the IPS is modified compared to CUT. The IPS is expanded with a set of key questions addressing the issues of content understanding (QU). Through the questions the user is invited to describe how the information from the application makes sense. We have approached this through a dialog with What, Why, and How questions adapted to the situation at hand.

How does the user interpret this information? Does the information fit with the user’s prior understanding? Can the user make use of the information in new settings, et cetera? A great variety of questions can be raised; and the evaluator’s experience, insight and creativity will be highly challenged in order to guide and manage such “questions of understanding”.

Further, it must be realized that the evaluators will need to have solid domain knowledge at a level that goes beyond what can be expected by usual human-computer interaction experts. This proposal of bringing more domain knowledge into usability testing, also through the evaluators, will more generally contribute to the value and impact of the evaluation results as shown by Følstad [10, 11].

In Table 1, the questions of understanding, QU, related to the case study can be seen. These questions were develop during the interpretation sessions with our participants, but our key experience is that the most effective questions have to be grabbed situationally and adjusted to the individual participant. This is a very demanding evaluator capability, which must be refined through training. The video-recorded IPS will here be highly useful for the evaluators to analyze and evaluate their own performance.

Table 1. Six key questions challenging the issues of content understanding (QU).

We have strived to build our questions of understanding on Gilbert Ryle’s description of “knowing how and knowing that”, as well as his ideas about understanding and theory building [25]; and on William James’ work about “knowing” [15]. As expressed by James: “…the relation of knowing is the most mysterious thing in the world. If we ask how one thing can know another we are led into the heart of Erkenntnisstheorie and metaphysics. … There are two kinds of knowledge broadly speaking and practically distinguishable: we may call them respectively knowledge by acquaintance and knowledge-about … I am acquainted with many people and things, which I know very little about …” [15, pp. 216–222].

For a thorough discussion of these complicated matters we recommend Peter Naur’s book “Knowing and the Mystique of Logic and Rules” [19]. Here the reader can also find two metaphors, which are at the core of the proposed usability evaluation technique. These two metaphors describe key aspects of (1) human understanding and insight, and (2) how we as humans are able only to a very limited degree to express our insights directly, warning us how understanding and sharing insights are very difficult and time consuming. The metaphors go like this:

The Metaphor of Person’s Insight.

“A person’s insight is like a site of buildings in incomplete state of construction. This metaphor is meant to indicate the mixture of order and inconsistency characterizing any person’s insights. These insights group themselves in many ways, the groups being mutually dependent by many degrees, some closely, some slightly. As an incomplete building may be employed as shelter, so the insights had by a person in any particular field may be useful even if restricted in scope. And as the unfinished buildings of a site may conform to no plan, so a person may go though life having incoherent insights.” [19, p. 215]

The Metaphor of a Person’s Utterances.

“A person’s utterances relate to the person’s insights as the splashes over the waves to the rolling sea below. This metaphor is meant to indicate the ephemeral character of our verbal utterances, their being formed, not as a copy of insight already in verbal form, but as a result of an activity of formulation taking place at the moment of the utterance.” [19, p. 215]

Further, and down to earth, we have been inspired by Lavra Enevoldsen’s classical series of textbooks titled “Read and Understand” (in Danish) [6]. These textbooks and her approach towards learning how to read have been in widespread use in Danish primary schools for decades. The textbooks are based on a set of small essays often supported by illustrations. After reading an essay and interpreting the illustration, the child is asked a number of questions. For some of the questions the answers are not explicitly to be found in the essay, but these questions can be answered clearly if the child has succeeded in reaching a coherent understanding from the reading and interpretation activity. The textbooks are typically used in a way where the child gives answers in written form. This approach of uncovering levels of understanding through a written dialog has been used in two comprehensive usability studies within HCI [13, 14]. The two studies are also concerned with complex application domains, namely information retrieval within programming [13], and reading activity and visualization [14]. These studies, however, did not directly—as this paper—emphasize the importance of focusing more generally within usability testing on the issue of content understanding.

3 The Context of the Case Study

In this section we briefly present the rehabilitation clinic and the application selected for the case study.

3.1 The Rehabilitation Clinic

The study took place in a Danish municipal rehabilitation clinic with 25 therapists employed. The starting point at the clinic was that the management of the municipal wanted to find out whether a new rehabilitation application intended for clients’ home-training could increase the productivity.

This created a situation where we as researchers could support and engage the management of the clinic and the employees to participate in a pilot project uncovering the effects of the rehabilitation application. The first challenge was to establish an adequate understanding of the tasks of therapists, and to find therapists that were interested to participate. Here we succeeded to engage four early adaptors [24] with authority among colleagues, and they were involved in the planning of the pilot project with a time span of 3.5 months. These therapists recruited four of their clients with training needs fitting the rehabilitation application.

3.2 The Rehabilitation Application (RA)

The purpose of the rehabilitation application (RA) selected in the pilot project is to motivate the clients to follow their home-training program as defined by their therapist and stay compliant [17, 22, 23].

RA was studied in the version available February 2016. RA consists of a stretch sensor transmitting data to an app. The sensor consists of two parts, mounted on both sides of a latex free elastic band, held together by magnets. The app supplies the client with real-time biomechanical feedback of the training, and supplies the therapist with data tracking their clients’ progress.

Figure 1 shows training exercise setup with RA placement on latex free elastic band. Figure 2 shows a screenshot of the therapist’s training administration interface with facilities to define and specify exercises and therapy exercise dosage for a client. Figure 3 shows a screenshot presenting feedback data to the client about training performance. The feedback data consist of these measures: (1) compliance with therapy exercise dosage, (2) the number of repetitions performed, (3) time under tension (TUT), and (4) the force used to stretch the elastic band (pulling force).

Fig. 1.
figure 1

Training exercise setup with RA placement on latex free elastic band.

Fig. 2.
figure 2

The therapist administration interface. The top menu shows adjustable training parameters. The left menu shows a selection of the predefined training exercises. The right menu shows training exercises selected for a client’s home-training program.

Fig. 3.
figure 3

The client feedback interface. The left menu shows the client’s training exercises. The right window shows a visualization of training performance, i.e. number of repetitions, time under tension, and pulling force. Also a video of the training exercise can be activated.

4 Planning and Implementation of the Pilot Project

The pilot project was planned in three major stages, see Fig. 4.

Fig. 4.
figure 4

The pilot project planned and implemented in three stages: Stage 1: Knowing the Clinic, Stage 2: Preparation of the Pilot Project, Stage 3: Research data/Analysis. Empirical data from the eight sessions of CUT with QU were collected, four therapist sessions (4T) and four client sessions (4C). The project period spanned 3, 5 months.

The aim of Stage 1 was “Knowing the Clinic”. In order to understand the therapists’ current work practices and their interaction with their clients we performed a number of Contextual Inquiries (CI) [4].

Stage 2, “Preparation of the Pilot Project” shows the action plan. Here we used the information from the contextual inquiries to make a detailed project description, and to prepare and communicate project activities and practical formalities to the participants. This included recruitment of participants who received an information letter describing their role along with consent forms. User manuals for RA and a plan for a workshop to introduce the therapist to RA were prepared. Based on this, a rehearsal was conducted in order to refine the experimental protocol.

During Stage 3, “Research data/Analysis”, empirical data from the evaluation sessions with the four therapists and the four clients were collected. The sessions were video-recorded. Both therapists and clients assessed the usability of RA following the procedure of the System Usability Scale (SUS) questionnaires [5]. The empirical data was consolidated through affinity diagramming [12], SWOT analyses [26], and summarized in a final evaluation report [1].

5 Results

Through the evaluation sessions with the therapists 36 usability issues were identified, with the clients 27 usability issues were identified. The usability issues were consolidated through affinity diagramming of the statements from the eight participants during their CUT with QU sessions. The usability issues were categorized according to the facilities of the RA, e.g. sensor design, client profile definition, training program specification, training adjustment, and training performance. Some of the statements uncover aspects of content understanding. Examples are extracted and discussed in the following sections.

5.1 Content Understanding

In Tables 2 and 3, we have extracted empirical results illustrating findings concerning content understanding.

Table 2. Content understanding examples of therapist statements selected from the affinity diagrams. The first column specifies the statement number, e.g. TS04 indicating the therapist statement no 04. The Statement column describes the actual statement, followed by a quotation example. In the Participants column it is specified with an “X”, which of the therapists that expressed the particular statement. TS04 was expressed by three therapists, T1, T2 and T4.
Table 3. Content understanding examples of client statements selected from the affinity diagrams. The first column specifies the statement number, e.g. CS09 indicating the client statement no 09. The Statement column describes the actual statement, followed by a quotation example. In the Participants column it is specified with an “X”, which of the clients that expressed the particular statement. CS09 was expressed by two clients, C1 and C3.

To illustrate the relation between statements in Table 2 and QU in Table 1 let us take for instance the therapist statement, TS30: ‘TUT works as a quality assurance of the training exercises in RA: “It is a cool thing, that you can regulate the pace at which the clients perform the training exercises” ’. The statement indicates that the therapists, T1 and T3, have an understanding of an important training concept, called Time Under Tension (TUT). The strong focus of TUT in training was new to therapists. They were able to build TUT coherently into their current understanding and work practices. Their answers match the QU 1 and 5 (Table 1): “How do you perceive this activity/facility?” and “Is this activity/facility relevant for you?”.

Similarly for the clients, we will explain CS26 in Table 3: ‘There does not exists a controlling mechanism for the actual performance of the training exercises in RA: “RA cannot tell you if you are performing the training exercises right or wrong” ’. The statement indicates that the clients, C1, C2, C3 and C4, have a comprehensive understanding, which extend far beyond the technical use of RA. The statement shows that the clients have approached the essence of the main challenges in RA and for rehabilitation in general. Their answers match the QU 1, 3, 5 and 6 (Table 1): “How do you perceive this activity/facility?” and “What are the consequences?” and “Is this activity/facility relevant for you?” and “Could you suggest another way to do this activity/facility with a similar or improved effect?”.

5.2 Co-monitoring Versus Self-monitoring

Through the focus of content understanding by CUT with QU it became visible that the concept of self-monitoring should be questioned. Self-monitoring was initially an important goal of introducing the RA at the clinic in order to increase the productivity without losing quality. But both therapists and clients emphasized the importance of collaboration, thus the idea of self-monitoring was partly misleading and should be balanced by a concept that we call “co-monitoring”, see Tables 4 and 5. The fact that the aspect of co-monitoring became clear might be an effect of the explorative and collaborative evaluation approach.

Table 4. Co-monitoring examples of therapist statements selected from the affinity diagrams. The first column specifies the statement number, e.g. TS03 indicating the therapist statement no 03. The Statement column describes the actual statement, followed by a quotation example. In the Participants column it is specified with an “X”, which of the therapists that expressed the particular statement. TS03 was expressed by four therapists, T1, T2, T3 and T3.
Table 5. Co-monitoring examples of client statements selected from the affinity diagrams. The first column specifies the statement number, e.g. CS07 indicating the client statement no 07. The Statement column describes the actual statement, followed by a quotation example. In the Participants column it is specified with an “X”, which of the clients that expressed the particular statement. CS07 was expressed by three clients, C1, C3 and C4.

In Table 4 and 5, we have extracted empirical results illustrating findings concerning co-monitoring. To illustrate the relation between statements in Table 4 and QU in Table 1, let us take for instance the therapist statement, TS34: ‘The client has tried to train after 12 a.m. in RA: “The training day stops at 12 a.m. in RA this means that you cannot cheat by performing yesterdays’ training today. RA has a compliance cutoff, default at 12 a.m. which is not always appropriatetherefore we have to help them remembering” ’. The statement indicates that the therapists, T1 and T3, understand the content information behind RA’s default settings concerning the principle of automatically compliance monitoring. It also illustrate that the therapists are beware of the importance of collaboration during a co-monitoring treatment processes in RA. Their answers match the QU 1, 3 and 5 (Table 1): “How do you perceive this activity/facility?” and “What are the consequences?” and “Is this activity/facility relevant for you?”.

Client statements in Table 5 is in addition to be coherently related to QU in Table 1 also coherently matched to therapist statement in Table 4 in the mind of co-monitoring and compliance aspects. To illustrate the relation between internal statements in Table 5 and QU in Table 1, let us take for instance the client statement, CS15: ‘The application can increase compliance in home-training: “I feel monitored and committed to do my daily trainingkind of a watchdog effect” ’. The statement indicates that the clients, C2 and C3, expect RA to increase compliance in home-training. The “watchdog effect” was expressed with a twinkle in the eyes, and the gossiping RA seemed to be a motivating factor illustrating the importance of co-monitoring. Their answers match the QU 1, 3 and 5 (Table 1): “How do you perceive this activity/facility?” and “What are the consequences?” and “Is this activity/facility relevant for you?” The message of therapist statement TS34 match the client statement CS15, and so do the others.

6 Discussion

The case study taking place at a rehabilitation clinic during a period of 3.5 months has demonstrated how a combination of cooperative usability testing (CUT), and key questions concerning users’ content understanding (QU) can complement conventional usability testing and provide insight into users’ challenges with understanding and making use of a new complex eHealth application. This was established by extracting the statements concerning content understanding from our dialog with therapists and users. These statements were consolidated through affinity diagramming. We did not validate our results by having other researchers to do a similar task of extracting the relevant statements from our transcripts of key statements and make their independent data consolidation.

Using the QU in direct dialog between the evaluators and the clients and therapists was far from easy and straightforward. We had to modify the QU and gain experience through multiple iterations. Our performance as evaluators improved along with an increased understanding of the application domain, our ability to grab the interesting situations illuminating content understanding, and an increased familiarity with the vocabulary used by therapists and users. This experience matches what was to be expected, see Sect. 2 where the CUT with QU technique is described.

As a general result, both clients and therapists emphasized the importance of collaboration between client and therapist. The collaboration seemed important for the clients’ motivation and compliance, which cohere with findings concerning the evaluated rehabilitation application (RA) by other researchers [23]. Thus, a widely used term “self-monitoring” for home-based training applications like RA is misleading. As a more appropriate term we propose the term “co-monitoring”.

The usability evaluation technique, CUT with QU, has until November 2016 only been empirically studied in this case study, and in a few small laboratory style exercises with health informatics students. Thus the applicability of the technique is still an open question. It has to be studied in other complex applications domains involving more users and evaluators with different levels of domain knowledge before the technique can be claimed to be widely useful and effective. It would also be interesting to experiment with users giving written feedback to QU as this will reduce the highly demanding task of the evaluator to interpret and respond in direct dialog to the user’s utterances concerning understanding issues.

7 Conclusion

We have proposed a new usability evaluation technique, called CUT with QU, which is targeting users’ content understanding. The evaluation technique is developed from an understanding of human understanding and insight as described by William James [15], Gilbert Ryle [25] and Peter Naur [19], and from creative work by Danish teacher and textbook author, Lavra Enevoldsen [6]. These four authors agree that grabbing and extracting understanding as held by people is very complex. The understanding cannot be expressed fully, but can be approached effectively through a situationally focused dialog involving an interviewer/examiner with solid domain insight. This idea of how to illuminate content understanding has been integrated as a set of questions (QU) into the usability technique, CUT, Cooperative Usability Testing [9].

In a case study within eHealth, specifically the setting of a rehabilitation clinic involving the participation of four physiotherapists and four clients, it has been demonstrated how CUT with QU can complement conventional usability testing and provide insight into users’ challenges with understanding and making use of a new complex eHealth application. Before claiming anything about the general applicability of CUT with QU, we need more experiments in other complex application domains and involving different kinds of users and evaluators.

Performing CUT with QU is very demanding by drawing heavily on the evaluators’ ability to respond effectively to openings and potential shortcomings in the users content understanding. Evaluators need to train this interview/examination process in order to be able to reach a proper insight of the test user’s content understanding.

If CUT with QU after more research shows to be inadequate, the motivating research question behind this experimental study remains important: How can “content understanding” effectively be brought into usability testing in complex application domains?