Abstract
It is generally agreed that usability is a basic attribute in software quality. User eXperience (UX) extends the usability concept beyond its traditional dimensions (effectiveness, efficiency and satisfaction). UX refers to all user’s perceptions resulting from the use (or even the anticipated use) of a product, system or service. For more than two decades heuristic evaluation proves to be one of the most popular usability inspection methods. When performing a heuristic evaluation, generic or specific heuristics may be used. Nielsen’s ten usability heuristics are well known, but many other sets of heuristics were proposed. Based on proper heuristics, the heuristic evaluation may also assess other UX aspects, beside usability. Usability heuristic sets are specific artifacts, so heuristics’ “usability” may also be evaluated. If we consider that evaluators are particular “users” of particular “products”, the set of usability/UX heuristics and the heuristic evaluation method, we may also analyze Evaluator eXperience as a particular case of UX. We systematically conduct studies on evaluators’ perception over generic and specific usability/UX heuristics. The paper presents a follow-up study on the perception of novice evaluators over Nielsen’s heuristics, using three online travel agencies as case studies (Atrapalo, TripAdvisor and Expedia). The experiments involved Chilean and Spanish students. We compare new results with our previous findings. Based on empirical results, we think the methodology used when teaching the heuristic evaluation method is highly important.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The ISO 9241 standard, updated in 2018, defines usability as the “extent to which a system, product or service can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use” [1]. As the usability concept is too general, the standard also indicates that “the specified users, goals and context of use refer to the particular combination of users, goals and context of use for which usability is being considered”. As the standard highlights, the term “usability” is “also used as a qualifier to refer to the design knowledge, competencies, activities and design attributes that contribute to usability, such as usability expertise, usability professional, usability engineering, usability method, usability evaluation, usability heuristic”. However, we think that a clear distinction should be made between “usability” as (software quality) attribute, usability evaluation and design methods, usability-related process (usability engineering) and usability professionals.
It is largely agreed that User eXperience (UX) extends the usability concept, beyond its traditional dimensions (effectiveness, efficiency and satisfaction). The same ISO 9241 standard defines UX as “user’s perceptions and responses that result from the use and/or anticipated use of a system, product or service” [1]. It also specifies that “users’ perceptions and responses include the users’ emotions, beliefs, preferences, perceptions, comfort, behaviors, and accomplishments that occur before, during and after use”.
Proposed in the early ’90s, heuristic evaluation is one of the most popular usability evaluation methods [2]. A heuristic evaluation is performed by a small group of experts (usually 3 to 5) based on a set of principles/rules/guidelines, called heuristics. Nielsen’s ten usability heuristics [3] are well known, but are often considered too general, unable to detect domain-related usability problems. That is why many other sets of heuristics were proposed [4, 5]. Heuristic evaluation may be used to asses several UX aspects, not only usability [6].
Teaching the heuristic evaluation method and forming evaluators is challenging. We think the practice is the best way to understand the heuristic evaluation protocol and the usability heuristics nature [7, 8]. We performed a comparative study on the perception of novice evaluators over Nielsen’s heuristics, involving Computer Science students from a Chilean and a Spanish university [9, 10]. This paper presents a follow-up study, including experimental results in two new case studies.
The paper is structured as follows. Section 2 introduce the “Evaluator eXperience” concept and describe the questionnaire that we developed and used for several years to assess the (novel) evaluators’ perception. Section 3 presents the experiments that we made from 2016 to 2018 on three major online travel agencies websites, Atrapalo.com [11], TripAdvisor.com [12] and Expedia.com [13]. Section 4 discusses experimental results. Section 5 highlights conclusions and future work.
2 Evaluator EXperience
Heuristic evaluators are particular kind of “users” of particular “products” (artifacts): (1) the set of usability/UX heuristics and (2) the heuristic evaluation method. Both artifacts may be evaluated in terms of their “usability”. We may think of Evaluator eXperience as a particular case of UX, which may also be assessed.
We conducted studies on the perception of evaluators over generic and specific usability heuristics for several years [14,15,16,17]. All participants are asked to perform a heuristic evaluation of the same case study. Then they are asked to participate in a post-experiment survey.
Heuristics quality is an important topic, as it highly influences the heuristic evaluation’s results. At least one heuristic quality scale was proposed [18]. We developed our own scale, a questionnaire that assesses evaluators’ perception over a set of usability heuristics, based on 4 dimensions and 3 questions:
-
D1 – Utility: How useful the heuristic is.
-
D2 – Clarity: How clear the heuristic is.
-
D3 – Ease of use: How easy was to associate identified problems to the heuristic.
-
D4 – Necessity of additional checklist: How necessary would be to complement the heuristic with a checklist.
-
Q1 – Easiness: How easy was to perform the heuristic evaluation, based on the given set of heuristics?
-
Q2 – Intention: Would you use the same set of heuristics when evaluating similar software product in the future?
-
Q3 – Completeness: Do you think the set of heuristics covers all usability aspects for this kind of software product?
Each heuristic is rated individually, on the 4 dimensions (D1 – Utility, D2 – Clarity, D3 – Ease of use, D4 – Necessity of additional checklist). But the set of heuristics is also rated globally, through the 3 questions (Q1 – Easiness, Q2 – Intention, Q3 – Completeness). In all cases, we are using a 5 points Likert scale (from 1 – worst, to 5 – best).
Additionally, two open questions are asked, to collect qualitative aspects of evaluators’ experience:
-
OQ1: What did you perceive as most difficult to perform during the heuristic evaluation?
-
OQ2: What domain-related aspects do you think the set of heuristics does not cover?
3 Experiments
We made several experiments on the perception of Nielsen’s heuristics when evaluating online travel agencies, from 2016 to 2018. The experiments involved novice evaluators, Computer Science students from Chile and Spain:
-
Graduate and undergraduate students in Informatics Engineering at Pontificia Universidad Católica de Valparaíso, Valparaíso, Chile, and
-
Undergraduate students of the Bachelor in Computer Engineering in Information Technologies at Universidad Miguel Hernandez de Elche, Elche, Spain.
All students were enrolled in Usability/UX-oriented Human-Computer Interaction introductory courses. In all cases they were asked to perform a heuristic evaluation based on Nielsen’s heuristics, following Nielsen’s protocol. With few exceptions, it was the first time they performed a heuristic evaluation; it was also their first contact with Nielsen’s heuristics and his evaluation protocol. After performing the heuristic evaluation, the students were asked to answer the questionnaire described in Sect. 2. All students participated voluntarily in the survey, there was no sample selection.
Experiments involved 112 Chilean and 31 Spanish students, as follows:
-
Atrapalo.com was evaluated by 31 Spanish undergraduate students, 17 Chilean undergraduate students, and 33 Chilean graduate students;
-
TripAdvisor.com was evaluated by 27 Chilean undergraduate students and 22 Chilean graduate students;
-
Expedia.com was evaluated by 13 Chilean undergraduate students.
Results obtained when evaluating Atrapalo.com were presented in detail in previous work [9, 10]. Section 4 synthetizes these results, describes the results obtain when evaluating TripAdvisor.com and Expedia.com, and compares them with the Atrapalo.com results.
Observations’ scale is ordinal, and no assumption of normality could be made. Therefore the survey results were analyzed using nonparametric statistics tests (Kruskal-Wallis, Mann-Whitney U and Spearman ρ). In all tests p-value ≤ 0.05 was used as decision rule.
As three groups of students (with different background) evaluated the same set of heuristics (Nielsen’s one), Kruskal-Wallis test was performed to check the hypothesis:
-
H0: there are no significant differences between the perceptions of the three groups of students,
-
H1: there are significant differences between the perceptions of the three groups of students.
Mann-Whitney U tests were performed to check the hypothesis:
-
H0: there are no significant differences between the perceptions of two groups of students,
-
H1: there are significant differences between the perceptions of two groups of students.
Spearman ρ tests were performed to check the hypothesis:
-
H0: ρ = 0, two dimensions/questions are independent,
-
H1: ρ ≠ 0, two dimensions/questions are dependent.
4 Results and Discussion
The Atrapalo.com experiments where presented in two previous papers [9, 10]. In summary, the experimental results show significant differences between the perception of Spanish and Chilean students, in several dimensions and questions (as presented in Table 1).
On the contrary as described in our previous papers [9, 10], there are no significant differences between the two groups of Spanish students (participants in the experiment in 2016 and 2017), in none of the dimensions and questions. The perception of Chilean undergraduate and graduate students is also similar; there are significant differences between the group of undergraduate and the group of graduate students only regarding question Q2 (intention of future use). It seems that the level of studies (graduate/undergraduate) does not influence students’ opinion, at least in our experiment. So, there are significant differences between the groups of Spanish and Chilean students, but not really among the members of the same group.
We also noticed that Chilean students have a better opinion that their Spanish counterpart, on all dimensions and questions (Table 2). It is especially notable that even if the Chilean students have a better perception on heuristics’ utility, clarity, and ease of use, they still fill the need for additional evaluation criteria (checklist).
We did not have evidences to suspect that the differences between Spanish and Chilean students are due to their background or cultural-related aspects. Based on some of the Spanish students’ comments, we identify as possible cause the methodology that was used when introducing Nielsen’s heuristics. In the case of Chilean students each heuristic is first explained by examples, and then students have to identify usability problems related to each heuristic in several case studies. The problems they identify are debated in the classroom.
As we couldn’t repeat the experiment in Spain using the same methodology as in Chile, we decided to repeat it in Chile in 2018, in three courses, using two others online travel agencies as case studies: TripAdvisor and Expedia. So we made new experiments with three groups of students:
-
A first group of 22 Chilean graduate students evaluated TripAdvisor.com;
-
A second group of 27 Chilean undergraduate students also evaluated TripAdvisor.com;
-
Finally, a third group of 13 Chilean undergraduate students evaluated Expedia.com.
All three groups were using Nielsen’s usability heuristics. The way we introduced Nielsen’s heuristics and we perform the experiments were identical as in the experiments made in Chile using Atrapalo.com as case study.
The Kruskal-Wallis test indicates no significant differences between the three groups of students, concerning dimensions D1, D2, D3 and D4, even when their background (undergraduate/graduate level), and/or the case study are different (Table 3). Significant differences occurs only on the overall perception of the heuristic evaluation method (Q1), intention of future use (Q2) and Nielsen’s set of heuristics completeness (Q3).
We then applied the Mann-Whitney U test for each pair of groups (Table 4). Results show very few significant differences:
-
One between undergraduate and graduate students that evaluated TripAdvisor, concerning the heuristic evaluation easiness (Q1);
-
Two between undergraduate students that evaluated Expedia versus the ones that evaluated TripAdvisor, concerning Nielsen’s heuristics ease of use (D3), and the intention of future use of Nielsen’s heuristics when evaluating online travel agencies (Q2);
-
Two between undergraduate students that evaluated Expedia versus graduate students that evaluated TripAdvisor, concerning Nielsen’s heuristics ease of use (D3) and Nielsen’s heuristics completeness (Q3).
Table 5 presents the averages scores for dimensions and questions for the three groups of Chilean students that participated in the 2018 experiment. It also includes the results of the 2017 group of students. As the opinions of all groups of Chilean students are similar, it also shows the averages scores for all Chilean students and, for comparison purpose, the averages scores for Spanish students.
The four groups of Chilean students have a better perception than their Spanish counterpart in all dimensions. They perceive Nielsen’s heuristics more useful (D1), clear (D2) and easy to use (D3). But they also feel a higher necessity for additional evaluation criteria (checklist, D4). They perceive the heuristic evaluation as easier to perform, comparing to the Spanish students, excepting the group of undergraduate students that evaluated TripAdvisor. Chilean students also express a higher intention of future use of Nielsen’s heuristics (with one exception, the undergraduate students that evaluated Expedia). Concerning Nielsen’s heuristics completeness when evaluating online travel agencies, Chilean students have divided opinions; two groups have a better perception than Spanish students, but other two groups have a less favorable perception. However, when comparing the opinion of all 112 Chilean students with the opinion of the 31 Spanish students, Chilean students have a better perception in all dimensions and questions. So, new results are consistent with previous findings [9, 10].
Table 6 shows the correlations between dimensions/questions when considering the three groups of Chilean students that participated in the 2018 experiment.
Few correlations occur when analyzing each group of Chilean students that participated in the 2018 experiments (Tables 7, 8, and 9).
As in our previous studies, few correlations occur in relatively small groups of students. When considering altogether the three groups of students, more correlations occur, and most of them are also consistent with our previous studies. The D1 – D2 correlation is particularly frequent: when heuristics’ specification is perceived as clear, heuristics are also perceived as useful.
Open questions OQ1 and OQ1 are evaluating some qualitative aspects of evaluators’ perception. What the three groups of students pointed out is similar to what students of previous generations expressed [9].
According to what the students say in their comments, the use of Nielsen’s heuristics seems to require positioning themselves in a new paradigm of thinking, to perceive and evaluate a website based on an evaluation perspective to which they are not accustomed to. In this way, the comprehension of each heuristic, its identification, adaptation and mode of application to different products, are aspects that the evaluators identify as difficult for their work.
Based on this, they highlight the importance of having elements that help familiarize themselves with both the artifacts they are using (Nielsen’s heuristics), as well as the services offered by the evaluated products (TripAdvisor and Expedia websites in this case). In this sense, the evaluators emphasize the need to count with technical reports that would provide them examples of heuristic evaluations which have been previously carried out (either by them or by others). On the other hand, the evaluators highlight that the websites they have been evaluating should provide strategies to facilitate their understanding by the people who use them (for example through tutorials), a good organization, distribution and precision of the information that helps to a better understanding. They consider that novice users are also facing the adjustment of their way of thinking and operating to what offer and allow the websites they use.
On the other hand, it is interesting that the evaluators point out that although their work consist in evaluating products, they experience difficulties to take a critical look, especially to detect problems that are not major, evident, or common. It seems that the evaluators are guided mostly by functionality and effectiveness criteria (based on the achievement of final results); they think the problems would be detected while the ongoing actions are carried out. However, following that direction, aspects of the subjective and personal experience of the real users may thus be underestimated and unattended. In this sense, it seems that the evaluators have difficulties to identify problems until they constitute complications for themselves, according to their own way of using the product and according to their own experiences. The evaluators highlight in this way that they notice difficulties in putting themselves successfully in the place of other users, especially in the place of novices. Complications seem to also arise because each evaluator has to understand others evaluators’ opinions. The evaluators emphasize that it is difficult for them to coordinate their opinions and perceptions regarding the carried out evaluations, in order to reach consent with the rest of the evaluation team.
5 Conclusions
Heuristic evaluation is probably the most popular usability inspection method, but forming evaluators is not an easy task. Heuristic evaluation results depend highly on both heuristics quality and evaluators experience. Evaluators are using specific artifacts, the set of usability/UX heuristics and the evaluation protocol. The protocol seems to be less challenging, but properly understanding and correctly applying heuristics in practice is much more demanding, especially for novel evaluators. Heuristics’ “usability” may be assessed, based on heuristics quality scale. Evaluators experience may also be assessed.
We systematically conduct studies on the perception of (novice) evaluators over generic and specific usability heuristics, based on a questionnaire that we developed. The questionnaire allows evaluating each heuristic individually (Utility, Clarity, Ease of use, Necessity of additional checklist), but also the set of heuristics as a whole (Easiness, Intention, Completeness). The questionnaire also allows expressing evaluators’ perception through comments.
In a comparative study that we have done before, we noticed significant differences between the perception of Chilean and Spanish Computer Science students when evaluating the same online travel agency (Atrapalo) based on Nielsen’s heuristics. The perception of Chilean students with different background was similar. The perception of two generations of Spanish students was also similar.
As we did not have evidences to suspect cultural or background-related issues as possible cause, we think the reason could be the methodology of introducing Nielsen’s heuristics, when teaching the heuristic evaluation method. We checked our assumption on two new case studies (TripAdvisor and Atrapalo), with three new groups of Chilean students. New results are consistent with our previous findings. Chilean students’ perception was systematically better than Spanish students’ perception.
As future work we would like to check (if possible) if the methodology that we are using with Chilean students would lead to similar results when applied to Spanish students.
References
ISO 9241-11:2018: Ergonomics of human-system interaction—Part 11: Usability: Definitions and concepts. International Organization for Standardization, Geneva (2018)
Nielsen, J., Mack, R.L.: Usability Inspection Methods. Wiley, New York (1994)
Nielsen, J.: 10 Usability Heuristics for User Interface Design, January 1995. http://www.nngroup.com/articles/ten-usability-heuristics. Accessed 24 Jan 2019
Hermawati, S., Lawson, G.: Establishing usability heuristics for heuristics evaluation in a specific domain: is there a consensus? Appl. Ergon. 56, 34–51 (2016)
Quiñones, D., Rusu, C.: How to develop usability heuristics: A systematic literature review. Comput. Stand. Interfaces 53, 89–122 (2017)
Quiñones, D., Rusu, C., Rusu, V.: A methodology to develop usability/user experience heuristics. Comput. Stand. Interfaces 59, 109–129 (2018)
Botella, F., Alarcon, E., Peñalver, A.: How to classify to experts in usability evaluation. In: Proceedings of the XV International Conference on Human Computer Interaction Interacción 2014. ACM (2014)
Rusu, C., Rusu, V., Roncagliolo, S.: Usability practice: the appealing way to HCI. In: The First International Conference on Advances in Computer-Human Interactions (ACHI 2008) Proceedings, pp. 265–270. IEEE Computer Society Press (2008)
Rusu, C., Botella, F., Rusu, V., Roncagliolo, S., Quiñones, D.: An online travel agency comparative study: heuristic evaluators perception. In: Meiselwitz, G. (ed.) SCSM 2018. LNCS, vol. 10913, pp. 112–120. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91521-0_9
Botella, F., Rusu, C., Rusu, V., Quiñones, D.: How novel evaluators perceive their first heuristic Evaluation. In: Proceedings of the XIX International Conference on Human Computer Interaction Interacción 2018. ACM (2018)
Atrapalo online travel agency website. http://www.atrapalo.com. Accessed 24 Jan 2019
TripAdvisor online travel agency website. http://www.tripadvisor.com. Accessed 24 Jan 2019
Expedia online travel agency website. http://www.expedia.com. Accessed 24 Jan 2019
Rusu, C., Rusu, V., Roncagliolo, S., Apablaza, J., Rusu, V.Z.: User experience evaluations: challenges for newcomers. In: Marcus, A. (ed.) DUXU 2015. LNCS, vol. 9186, pp. 237–246. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20886-2_23
Rusu, C., et al.: Usability heuristics: reinventing the wheel? In: Meiselwitz, G. (ed.) SCSM 2016. LNCS, vol. 9742, pp. 59–70. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39910-2_6
Rusu, V., Rusu, C., Quiñones, D., Roncagliolo, S., Collazos, César A.: What happens when evaluating social media’s usability? In: Meiselwitz, G. (ed.) SCSM 2017. LNCS, vol. 10282, pp. 117–126. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58559-8_11
Rusu, C., Rusu, V., Quiñones, D., Roncagliolo, S., Rusu, V.Z.: Evaluating online travel agencies’ usability: what heuristics should we use? In: Meiselwitz, G. (ed.) SCSM 2018. LNCS, vol. 10913, pp. 121–130. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91521-0_10
Anganes, A., Pfaff, M.S., Drury, J.L., O’Toole, C.M.: The heuristic quality scale. Interact. Comput. 28(5), 584–597 (2016)
Acknowledgments
We thank all the students involved in the experiment. They provided helpful opinions that allowed us to prepare this and (hopefully) further documents.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Rusu, V., Rusu, C., Quiñones, D., Botella, F., Roncagliolo, S., Rusu, V.Z. (2019). On-Line Travel Agencies’ Usability: Evaluator eXperience. In: Meiselwitz, G. (eds) Social Computing and Social Media. Communication and Social Communities. HCII 2019. Lecture Notes in Computer Science(), vol 11579. Springer, Cham. https://doi.org/10.1007/978-3-030-21905-5_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-21905-5_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21904-8
Online ISBN: 978-3-030-21905-5
eBook Packages: Computer ScienceComputer Science (R0)