Keywords

1 Introduction

The number of methods for evaluating the User Experience (UX) and Usability in all phases of the software development process is growing [2]. We know that experiences are influenced not only by the system features (e.g. complexity, usability and functionality), but also by the user’s psychological state (e.g. motivation, expectations, needs, humor, others) and the context in which the interaction takes place [1]. Due to the increasing attention of the HCI field for usability engineering, features such as UX, user emotions, influences, motivations and values are getting as much attention as features such as ease of use, learning and subjective satisfaction [10].

While usability evaluation emphasizes effectiveness and efficiency, UX evaluation explores hedonic aspects. That is, UX evaluation explores how a person feels after using an application, as well as the experience, affection and significant and valuable aspects of such use. Therefore, through UX and Usability evaluations it is possible to measure both the understanding of how users feel and their satisfaction with the application (UX) and the time required to carry out certain activities in the application and the success rate of such activities (usability) [15].

This paper presents the result of applying some methods for evaluating UX and usability. These methods were employed during the development process of a distributed mobile application that helps a team of caregivers in a home care situation (involving both family members as well as professionals) to plan and organize the day-to-day tasks that are necessary in order to care for a senior citizen.

We discuss the types of UX and usability problems encountered by each method, which were analyzed and classified into hedonic and pragmatic. This classification was chosen since, according Hassenzahl [4], interactive systems are perceived through these two dimensions. We also verified which methods better contributed to improving the evaluated prototype, what difficulties we encountered when applying each method and when analyzing its results. Finally, we documented how some of the modification of UX and usability methods that we used contributed to enhancing the evaluation results. Through the results of the assessments, this paper provides information that may encourage software development teams to carry out cost-effective UX and usability evaluations. Also, we hope to support the choice of the most appropriate methods, according to the needs of each software development project, improving software quality in the process.

In the next section, we present the concepts of User Experience and Usability, and present some existing evaluation methods and motivate our selection. Section 3 presents the project context in which the methods were used and how the project was executed. In Sect. 4 we describe how the Usability and User Experience evaluations were performed. Next, in Sect. 5 we present the analysis of the results from the methods that were used during the evaluation. Finally, in Sect. 6 we conclude the paper and present possible future work.

2 Background

2.1 User Experience and Usability

The international standard ISO 9241-210 [6] defines User Experience as “a person’s perceptions and responses that result from the use or anticipated use of a product, system or service”. People perceive UX through two dimensions: pragmatic and hedonic quality [4]. Pragmatic quality refers to the perception of the product with regards to the fulfillment of its purpose [5]. It focuses on aspects of usefulness and usability regarding the tasks to be carried out. Hedonic quality is more focused on the human needs and expectations someone has for using a particular product [5].

By the ISO/IEC 25010 norm [7], usability is “the capability of the software product to be understood, learned, operated, attractive to the user, and compliant to standards/guidelines, when used under specific conditions”. Thus, usability subsumes the aspects of how easy the system is to use, such as learnability, operability, aesthetics, and also the extent to which usability affects the user’s choice to accept a product or not [1].

As Vermeeren et al. [16] point out, we see that the relationship between usability and UX is intertwined. While usability focuses on task performance (e.g. measuring task execution time, number of clicks or errors), UX focuses on experiences by analyzing peoples’ emotions while they interact with the software product. In that context, usability is related to UX. Furthermore, because UX is subjective, objective usability measures are not sufficient for measuring UX. For a complete evaluation, it is also necessary to analyze how the user feels about the software application while performing tasks on it.

2.2 UX and Usability Evaluation Methods

Usability and UX evaluation is a fundamental activity in any development process that seeks to produce an interactive system with high-quality use. It helps the evaluator make a value judgment about the quality of use of the proposed solution, and identify problems that affect the user experience while using the system [11].

The methods for UX evaluation that were employed during the project were 3E (Expressing Experiences and Emotions) [13], MAX (Method for the Assessment of eXperience) [3] and SAM (Self-Assessment Manikin) [9]. These methods can gather insightful information on the UX of a product in sessions lasting less than an hour per participating user [3]. Additionally, the designer does not need to be an expert in UX since the selected methods are easy to apply. We also employed modifications of these methods to collect information that the original methods did not allow to obtain: the modified methods are 3E* (modified 3E) and Empathy Map (EM) to evaluate UX. Finally, to evaluate the pragmatic aspects, we employed the Think Aloud [14] and Observation [8] methods. We briefly summarize each method below.

SAM.

SAM allows the evaluation of the affective quality of an interface. Through the SAM scale, it is possible to assess three dimensions: (a) pleasure (pleasure/displeasure); (b) dominance (control of the situation/dominated by the situation); and (c) arousal (calm/excited). The method was designed to collect information on subjective feelings. When applying SAM, the user marks on each of the scales the image that corresponds to his/her emotional response after using the application.

3E.

3E is a self-report method in which the user can express his/her emotions and experiences through drawings or by writing. The template of the method has a human body outline. The user can draw a face on the picture to express his/her emotional state. Beside the human figure there are two balloons. One of the balloons is used to represent inner thoughts and the other one for representing oral expression.

MAX.

The MAX method [3] uses a board and cards. During the use of the prototype of the developed software, the evoked emotions and experiences are collected through the answers to the questions that are written on the MAX board. In that context, the board has four questions: (a) “What did you feel when using it?”; (b) “Was it easy to use it?”; (c) “Was it useful?”; and (d) “Do you intend to use it?”. To answer the questions the MAX cards are used. The cards contain words that allowed the user to express his/her opinion regarding the application. While the users choose a card for each of the questions on the board, they report the reasons for choosing that card.

3E*.

We modified was in the 3E method to allow a more in-depth evaluation of different activities in the application. We name this modification 3E*. Using the original 3E method, a subject can only describe a general view of the application’s use. Our modification allows a subject to describe his/her experiences for each performed activity. In this modification, the method is applied for each of the tasks performed by the subjects. We also added bubbles where the user describes his/her thoughts and opinions regarding the application.

EM.

Another method that we modified was the Empathy Map (EM) [12]. This method is not originally employed for evaluating the UX. It is a method that helps to design business models according to the clients’ perspectives. The original method is divided into fields, in which the client describes what he/she thinks, feels, hears, does and speaks, together with his/her problems and needs. In the context of the UX evaluation, the template of the EM method was modified so that the subject can describe what tasks he/she managed to successfully complete, what he/she thought of the application and whether the application met his/her needs.

Think Aloud.

In the Think Aloud (TA) method, users are asked to literally think out loud; and report their interaction with the application, the tasks they are performing and what difficulties they are having. Thus, it is possible to obtain data about the users’ reasoning during the performed tasks.

Observation.

In this method, moderators observe the user interacting with the software and take notes about the observed difficulties. Unlike the other methods, this requires an observer with experience in UX evaluation.

3 The Home Care Development Project

The Home Care development project is intended to deal with a growing problem worldwide. It is motivated by the aging population in many countries. In the scenario we study, a senior citizen is cared for by a team composed of family members and professional caregivers. The problem we address is how to coordinate the many tasks that caring for a senior citizen entails. In this context, the scope of this work is to evaluate the UX and usability of the application being developed.

There were two distributed teams working on this project. One team was responsible for technical details regarding the underlying multi-agent based task allocation system. Our team was responsible for the design and evaluation of the user interface of the system. Our team was composed of six members, a team manager and five UX designers and evaluators. The evaluators had previous experience in applying UX and Usability evaluation methods, and they all applied the methods listed in Sect. 2.2. Also, our team had access to 8 mobile devices, in which users performed the UX and usability evaluations. In total, the project lasted for 6 months and adopted an iterative process lifecycle. For each month of the project there was an Iteration.

During that six-month period, we developed and evaluated more than a hundred low-fidelity prototypes, which were derived from 10 use cases implementing more than 20 functionalities. The overall process is shown in (Fig. 1).

In Iteration 1, we developed a total of eight personas in order to identify the different user profiles, their features and needs. After that, we specified the functionalities and use cases that would meet the needs of these users through activity diagrams. Also, we created 10 detailed scenarios describing each use case and its relationship with the personas and devices of the project. In Iteration 2, we developed the first set of prototypes of the application and interaction models to represent the interaction possibilities between the application and the users. In Iterations 3, 4 and 5, while developing the set of prototypes of the other prioritized use cases; we also evaluated and redesigned the set of prototypes that were developed in previous iterations. Finally, in Iteration 6, we carried out a complete UX evaluation of all of the redesigned prototypes to assess the impact of the modifications we made on the overall UX of the software system.

Fig. 1.
figure 1

Process for the design, evaluation and redesign of the prototypes

4 Project Execution: UX and Usability Evaluations

During the development of the project interface, four UX and Usability evaluations were performed. In each iteration from 2−5, the prototype was developed to fulfill the functionality of a specific use case for the application, according to the project’s initial prioritization in iteration 1. Thus, each evaluation in iteration 3−5 was performed over the developed prototypes from the previous iteration. In the fourth evaluation of iteration 6, the complete prototype was evaluated since this was the last evaluation and no further mockups were developed.

The four evaluations were conducted with students as users. Most of the study participants were living with elderly or thought that the application could be useful to look after their parents when they become elderly. In each assessment we used different methods (see Table 1). For the evaluations, the participants had to carry out some tasks using the prototype application on a mobile device. The evaluation was guided by three moderators, who explained to the subjects what tasks they needed to perform. Also, the moderators explained to the subjects how they had to employ the UX methods in order to express their opinion regarding their interaction with the prototype. Table 1 shows an overview of the participants and methods for each evaluation.

After each evaluation, the project team classified the identified defects. The classification process was composed of two activities: (1) removal of duplicates and (2) a meeting for the classification of problems. After the removal of the duplicated problems, the project team carried out a meeting where they reviewed the problems. The team discussed which classification was the best for each of the problems (cosmetic defect, relevant defect or not defect). Based on this classification, we verified which problems were to be modified in the application’s interface.

Table 1. Overview of the performed evaluations

5 Analysis of the Evaluation Methods

5.1 Type of Identified Problems (Hedonic or Pragmatic)

We verified the defects that were previously classified after the evaluations (as described in Sect. 4). In this analysis, we counted how many defects were identified by each method. All the defects were further classified as Hedonic (H) or Pragmatic (P). Such classification differs from the classification presented in Sect. 4. The goal of the first classification was to classify what needed to be modified in the user interface in order to improve the application. This new classification, on the other hand, aimed at analyzing what UX and usability methods can best be employed to find hedonic or pragmatic defects.

For the defects classification, the team based their decision on the proposal by Hassenzahl [4] regarding hedonic and pragmatic qualities (see Sect. 2). From this classification, we verified how many defects of each type the UX methods identified. We also conducted an analysis over the type of employed evaluation, i.e. if the method evaluates specific tasks or if the method provides an overview of the application. The use of some methods allowed the users to evaluate specific functionalities of the application (SF). Other methods allowed the users to express their general view of the application (GV). Table 2 presents the analysis of the UX and usability methods.

Table 2. Analysis of the evaluation methods

In the first evaluation, regarding the classification of hedonic and pragmatic problems, both methods identified 9 hedonic problems. However, 3E* found a higher number of pragmatic problems when compared to the 3E method. These results can be explained by the fact that the 3E* method allows the subject to express his/her opinion towards each individually performed task, whereas the 3E method allows the subjects only to provide a general view of the application.

In the second evaluation, the SAM method found more hedonic problems than the other methods. Think Aloud identified a higher number of pragmatic problems. A possible reason for these results could be that the SAM method is more focused on the hedonic attributes of UX, evaluating features such as emotions. On the other hand, the TA method focuses more on the functionalities of the application. In this evaluation context, the TA method was employed in order to support a usability test. Similar to SAM, MAX identified a high number of hedonic problems, but very few pragmatic ones. We suspect the same cause: MAX focuses more on evaluating aspects related to experience and emotions and less on evaluating usability.

In the third evaluation, we employed the Think Aloud and 3E methods. The TA method found more hedonic and pragmatic problems than the 3E method. TA found 6 hedonic and 9 pragmatic problems. The 3E method found 2 hedonic problems and 3 pragmatic problems. In the last evaluation regarding the pragmatic and hedonic defect classification, the EM method identified a higher number of hedonic (12 problems) and pragmatic (21 problems) problems than the observation method.

5.2 How the Methods Contributed to Improve the Evaluated Prototypes

Table 3 presents some examples of problems that were identified using the employed methods in each of the four evaluations from this project. In the following paragraphs, we explain how each method contributed to identifying improvement opportunities in the evaluated prototype.

Table 3. Examples of problems found in each evaluation

In the first evaluation, through the 3E method, it was possible to gather a general view of what the users thought about the application. By using the 3E* method, it was possible to gather a specific view of each of the performed tasks during the interaction with the application’s prototype. The modified method identified 24 defects in comparison to the only 14 defects identified by the original 3E method.

In the second evaluation, through the MAX method, it was possible to gain a general view of what the users thought about the developed prototypes. Through the SAM method, it was possible to gather the users’ opinions regarding the screens and messages of the prototypes. We also managed to collect further information about the usability and opinions of the users through the Think Aloud method. In that context, the MAX method identified 12 problems, the SAM method identified 22 problems, and the Think Aloud method allowed identified 28 problems to improve the application. The MAX method was employed similarly to the 3E in the previous evaluation, gathering the general view of the subject regarding the application. The SAM and Think Aloud methods collected information on the specific tasks of the application. In the SAM methods, several screens and messages from the application were evaluated, while in the Think Aloud method, all the interaction from the user with the application was evaluated.

In the third evaluation, we employed 3E and Think Aloud. During the execution of the tasks, the Think Aloud method was employed the same way it was employed in the second evaluation. Through the use of this method it was possible to identify usability problems and the user’s opinion during the use of the application. When using the Think Aloud method, the subject could orally express him/herself. Furthermore, we employed the 3E method again, because in the first performed evaluation we collected relevant information for the improvement of the application with this method. Through the 3E technique, it was possible to obtain a general view of what the users thought about the developed application. When employed together in the third evaluation, the 3E method found 5 problems while the Think Aloud method identified 15 problems to improve the application. Also, the 3E method allowed the subject to express a general view of the application, while the Think Aloud method allowed the subject to express his/her opinion on each performed task.

In the fourth evaluation we employed two methods. EM was employed for the UX evaluation. Also, observation (OBS) was employed as a way of supporting the usability evaluation during the execution of specific tasks. In this evaluation, the UX evaluation method identified a higher number of problems in the application than the usability evaluation method.

5.3 Difficulties in the Application of the Methods

Although the application of the methods was easy, the users that applied the methods encountered some difficulties. 3E, 3E* and Empathy Map were tiring for users as they required a lot of writing from users to allow them to express their opinions. As in these methods it is needed to draw a face to express the users’ emotion, some participants found them difficult to use or disliked the method as they did not like to draw.

The difficulties regarding the application of Think Aloud and Observation were related to the collection of the data. These methods require notes or records about the interaction of the participant with the application. Some information can be missed during this process. Furthermore, some participants do not like to speak while interacting with the application, which can be distracting but is mandatory while using the Think Aloud method. In the Observation method, some participants feel uncomfortable with the presence of the evaluator during their interaction with application.

The simplest methods were SAM and MAX. In the SAM method the participants had some difficulties in understanding the emotions they could choose in the scale. In the MAX method, the participants had some difficulties in understanding the meaning of some cards. Still, this method was the most fun because of its entertaining features and the dynamics between the cards and the board.

5.4 Difficulties When Analyzing the Results Generated by the Methods

Most of the used methods collect qualitative data. This makes the analysis more complex. The obtained information is relevant to the application, but a lot of time is spent on its analysis. It is necessary to organize the data, classify the problems and prepare the report with the results.

In the first evaluation we spent about 8 h analyzing the collected data with the 3E and the 3E* methods, because the methods obtained a large amount of feedback from the participants. The 3E* obtained more data than the 3E method. Therefore, more time was required for the analysis. In the second evaluation, the Think Aloud Method demands more time to analysis than SAM or MAX. We spent 6 h analyzing the data from the Think Aloud method. The data collected from the SAM and MAX methods was faster to analyze and we only spent about 5 h on this analysis.

In the third evaluation, we spent about 4 h analyzing the data and preparing the report, because in this evaluation the application was only used by six participants. In the last evaluation, we spent about 9 h analyzing the data since the Empathy Map method collects more information and this evaluation was performed with 18 participants. The observation method used in this evaluation did not return much data, and its analysis spent less time (about 2 h).

The main difficulties in carrying out such an analysis are in the categorization of the problems: the removal of duplicated problems, classifying problems as hedonic or pragmatic, and identifying their correction priority. Sometimes, a problem may refer to an interface component, and at other times it may refer to an interaction step, or specific features/looks that the user would like the application to provide. As users have difficulties in explaining what the cause of the problem is, or different users may have different ideas about when an issue affects their experience, it is necessary to carefully verify the cause of the problem and its effect in order to make the correct decisions when dealing with the identified issue. Thus, it is necessary to pay attention and consider the context of use of the application when analyzing the gathered information after an evaluation.

5.5 Benefits of the Modifications in the Application of the Methods

The 3E* was one of the modified methods, and it allowed the user to provide his/her opinion on each executed task, besides expressing him/herself in a written way. Thus, we were able to verify that the written form finds more problems. Also, it was possible to verify whether a method that evaluates each of the performed identifies a higher number of problems when compared to a method that evaluates the application as a whole. This modified method was used in the first evaluation and uncovered more issues than the original 3E method. This method found more problems because it evaluated specific functionalities of the application and the original method evaluated the application in general.

The Empathy Map was adapted to evaluate UX and was used in the last performed evaluation. The Empathy Map allowed the subject to express his/her opinion in a specific way (i.e., the subject expresses his/her opinion for each performed task) and also in a general way. The method found more problems than the Observation method. We collected problems in specific functionalities of the application and opinions of the participants about general characteristics of the application.

6 Conclusions

In this paper we applied six different methods for evaluating UX and usability. We employed three already applied UX evaluation methods: 3E, SAM and MAX, and we proposed changes in existing methods to gain further information on the users’ needs: EM and 3E*. Also, to complement the results from applying UX evaluation methods, we employed OBS and TA to find out about features that could be improved in order to enhance the quality of the application in terms of ease of use.

We identified that SAM, MAX and EM are better to identify hedonic problems, while Think Aloud, EM and 3E* are better to identify pragmatic problems. According to the evaluators who applied the methods, when applying UX methods that focus on evaluating the hedonic aspects of UX such as SAM or MAX, it is also useful to apply methods that can support the evaluation of usability features (e.g.: Observation and Think Aloud), which are more related to the pragmatic aspects of UX. Another point to consider is whether the method evaluates specific tasks of the application or if it allows describing a generic view of the application. Methods that evaluate specific tasks (i.e.: 3E*, SAM, TA, OBS) may identify a higher number of problems, however they can be more demanding for users, as they require that the user spends more time and effort evaluating each task. Through these lessons learned, we intend to encourage software companies to carry out cost-effective UX evaluations.

The categorization of the methods in terms of type of evaluated tasks can be a guide for software development teams willing to choose a method that suits their needs. However, we still need to verify how these methods perform in other conditions, such as being employed by users from different profiles, in the evaluation of other types of applications, and when applied in different environments. As future work, we intend to extend this research by testing these and other UX and usability evaluation methods under different circumstances to enhance the generalization of our results. That way, we can provide further information on scenarios in which each method is more suitable, advancing the research in UX and usability evaluation; and providing practitioners with a guide on when and why to apply existing and proposed UX and usability evaluation methods.