1 Introduction

The continuous growth of Electronic Commerce has led to the development of increasingly dynamic websites [1]. Nowadays, Web applications are embedded of complex components, sophisticated designs, excessive functionality and real-time processing. Although software products have changed their nature, specialists still continue using traditional methods to evaluate quality aspects. However, there is enough evidence in the literature proving that these conventional techniques are no longer suitable for the new categories of software applications that are emerging. They fail to cover aspects from the application domain offering inaccurate results, especially, when they are used to assess the usability of new kinds of software products.

Usability is considered as a critical success factor for the development of Web applications, since this quality attribute contributes to the user goals through the implementation of a friendly and easy to use graphical interface. At this time, software developers are not only concerned about functionality, but also on the user experience. The result of an interaction is as important that can determine if the website will be used again in the future. For this reason, companies have been forced to change their business strategy by focusing on having usable Web applications. This process is only possible if usability evaluation methods [4] or usability techniques [15] are applied during the software development process.

Usability evaluation methods can be classified into two main categories [5]: inspection methods (which involve the participation of usability specialists), and test methods (which involve the participation of end users). Heuristic evaluation is one of the most recognized inspection methods in this field because of the significant advantages it provides. In contrast to usability testing, where a considerable amount of users is required, heuristic evaluations only demand the collaboration of three to five specialists. Additionally, it involves a simple process, that can be performed faster than other methods and during any software lifecycle stage.

Heuristic evaluation is an inspection method which involves the evaluation of a graphical user interface in order to determine whether each dialogue element follows established usability principles [9]. These guidelines are known in Human Computer-Interaction as usability heuristics. Although, the most widely used principles until the present are the ten usability heuristics proposed by Nielsen, there are gaps in the evaluation when they are used for interactive systems such as: video games, mobile applications, transactional Web applications, augmented reality applications and virtual worlds. This new generation of systems has new particular features that were not considered during the elaboration of the conventional principles. The emergence of new software environments leads to the appearance of new usability aspects that come from the application domain. Therefore, are Nielsen’s usability heuristics still an appropriate instrument to evaluate usability in these new categories of software applications?

In a previous work [11], we conducted an experimental evaluation in order to determine if the Nielsen’s usability heuristics are still valid for the new emerging kinds of software applications of the Web domain, especially, transactional Web sites. The heuristics were evaluated in terms of perceived ease of use (PEOU), perceived usefulness, (PU), intention to use (IU), and perceived completeness (PCO). Although Nielsen’s heuristics were perceived as usefulness, they were classified as difficult to use, because of the lack of clarity in the interpretation of each of them. Additionally, most of the participants perceived that Nielsen’s heuristics do not cover all usability aspects of a transactional Web application, since there are important issues that are not considered during an evaluation. Despite this, people who participated in our experimentation stated that they would use the traditional heuristics in case they become involved in future evaluations. The advantages of the evaluation method and the lack of a more specialized assessment tool encourage the use of the Nielsen’s heuristics. However, we have proposed a new set of heuristics for transactional Web applications [10], whose validation is established in this study. The intention of this new proposal is to provide specialists with a tool that is able to entirely examine the usability in this kind of software applications. This paper presents an evaluation of the new usability heuristics based on the same analysis that was considered for the Nielsen’s principles. The results between both are compared and discussed.

2 Background

2.1 Usability Heuristics for Transactional Web Sites

In a previous work [10, 12], we developed a new set of usability heuristics for transactional Web sites. These principles were established following a systematic and structured methodology proposed by Rusu et al. [14]. During a previous phase, the heuristics were used in a real scenario, proving to be an appropriate assessment instrument. In this study, we validate the proposal through an analysis of perceptions. The usability heuristics that were proposed are:

  1. 1.

    Visibility and Clarity of the System Elements (F1): The most important elements of the system must be clearly visible. Some components of the graphical user interface will be more relevant than others according to the purpose of the system. These elements should be established with a high level of visibility and clarity, in order to allow the achievement of the user goals.

  2. 2.

    Visibility of the System Status (F2): The user must be aware about the processes that the system performs. The software application must notify users when any kind of response or confirmation is required. The system must keep users informed about the current state of the software application within reasonable wait times.

  3. 3.

    Match between System and User’s Cultural Aspects (F3): The application design have to be consistent with the cultural aspects of the user. The graphical interface must be oriented to the cultural profile of the users who will use the application. Users should not feel forced to use the system under unfamiliar mechanisms.

  4. 4.

    Feedback of Transaction (F4): The system must keep users informed about the final status of transactions. The system should notify users about the success or failure of all transactions that are performed through the use of the application. Users must know the partial and final result of their operations until the achievement of their goals.

  5. 5.

    Alignment to Web Standards Design (F5): The system must follow established design conventions in the Web domain. The graphical user interface should be aligned to standardized guidelines, commonly used structures and widely known layout elements. System should be implemented by the use of design patterns that have become an standard due to their extended use over time.

  6. 6.

    Consistency of Design (F6): All sections of the system should maintain the same design style and a well organized structure. The graphical interface must be consistent and preserve the logical order of the elements.

  7. 7.

    Standard Iconography (F7): The interface design must be implemented by the use of standardized icons that are already part of the user’s conceptual model because of their frequent use in several software applications. The icons should be represented by standardized concepts, that besides being known by most users, succeed in communicating their intended purpose.

  8. 8.

    Aesthetic and Minimalism Design (F8): The user interface must not only be attractive, but also it must contain the units of information that are relevant to users. The information should be properly distributed, without overloading the interface with extra units of information that will be competing in importance with other units that are indeed essential.

  9. 9.

    Prevention, Recognition and Error Recovery (F9): The system must prevent the occurrence of errors. In addition, it must prevent users from taking actions that leads to errors in the application. However, once the error has occurred, the system must help users to recognize and quickly overcome these scenarios by displaying clear messages with appropriate instructions to solve the problem.

  10. 10.

    Appropriate Flexibility and Efficiency of Use (F10): The system must provide accelerators that allow expert users to effectively accomplish their tasks, without affecting the normal work flow of novice users. The interface design must allow both, inexperienced and experienced users, to use the software application with flexibility and effectiveness.

  11. 11.

    Help and Documentation (F11): The system must provide support options that help users perform specific tasks. These procedures should be clear and specify concrete actions for the achievement of the user goals. Support options must avoid ambiguity and confusion, by providing concise instructions focused on work flow and user’s tasks.

  12. 12.

    Reliability and Quickness of Transactions (F12): Transactions must be highly reliable. The system must guarantee that all transactions will be successfully completed within the expected time under specific operating conditions. However, in case of errors, the system must be able to correct the issue and undo all changes that are required. The software application must maintain data stability and avoid certain scenarios that negatively affect the user.

  13. 13.

    Correct and Expected Functionality (F13): System elements must be correctly implemented, and they should provide the functionality that users are expecting of them. The interface components should run processes related to the purpose that is established in the design.

  14. 14.

    Recognition Rather than Recall (F14): The user should not be forced to remember information from a previous state of the current transaction. Instructions for use must be easy to recall, and the Web form design must not be complex. The system should minimize the user’s memory load by developing highly intuitive graphical interfaces.

  15. 15.

    User Control and Freedom (F15): Users can choose certain system functions by mistake. Therefore, the system must provide mechanisms that allow users to exit from unwanted states and undo their actions. Users should not be affected because of a mistake.

2.2 An Evaluation Model

In order to evaluate the new set of usability heuristics for transactional Web sites, we have adopted part of the Method Evaluation Model that is presented in Fig. 1. This validated theoretical model was proposed originally for evaluating IS design methods. However, it incorporates general aspects of evaluation that can be applied to any kind of software development tool. Despite heuristics are recognized as a usability evaluation instrument, they can also be considered as a design method, in the sense that they are used during all phases of the software lifecycle in the context of a user-centered design for the development highly usable graphical interfaces [6].

The Method Evaluation Model, proposed by Moody [7], was designed considering two important concepts: the Methodological Pragmatism, a theory for validating methodological knowledge [13], and the Technology Acceptance Model (TAM), a theoretical model for explaining and predicting user acceptance of information technology [3]. From these two approaches, it was designed the core of MEM, known as Method Adoption Model (MAM) [2].

Fig. 1.
figure 1figure 1

Method evaluation model

In this study, we have focused on the perception/intention-based variables of the MEM that are defined in the Method Adoption Model. This model establishes the existence of three psychological aspects that are present in any successful method, and whose relationship is defined in Fig. 1. According to this evaluation model, the success of a method is reflected in its adoption in practice. If a design method is currently used in real contexts, is because of its efficiency and effectiveness. However, the acceptability of a method is the result of a set of perceptions and intentions. Only if a method is perceived as easy to use and useful, specialists will be motivated to use the method again in future scenarios. This intention to use a particular instrument becomes into actual usage of the method. The model establishes that a successful proposal is one that is widely used by the community. Nevertheless, the level of adoption can not be high if the perception about the method is not appropriate.

The variables that were considered for this study are:

  • Perceived Ease of Use (PEU): The degree to which an evaluator believes that the use of a particular usability heuristic would be free of effort.

  • Perceived Usefulness (PU): The degree to which an evaluator believes that a particular usability heuristic will achieve its intended objectives.

  • Intention to Use (IU): The degree to which an evaluator will use the usability heuristics in future evaluations.

  • Perceived Completeness (PC): The degree to which an evaluator believes that established heuristics cover all aspects of usability in a specific domain.

As a complement to this model, we have considered an additional aspect of evaluation. Our new proposal was developed in order to obtain an appropriate assessment tool of usability in this domain. For this reason, all features of transactional Web applications were studied. In contrast, when traditional heuristics are used to evaluate the usability in new categories of software, the results are inaccurate. Nielsen’s heuristics were designed for standard Web interfaces, and they do not cover aspects that may be relevant for some specific systems. This fact has led us to include a variable to determine whether our proposal is considering all aspects that are required in a heuristic evaluation.

3 Research Design

3.1 Participants

The participants of this case study were fifteen undergraduate students from the Information System Engineering program of the National University Pedro Ruiz Gallo. They were randomly selected from a section of Software Quality, a technical mandatory course. All students voluntarily agreed to participate in our study without expecting any kind of compensation for their participation. There were no significant differences in their backgrounds, since they had attended to the same courses of their curriculum.

As part of the requirements of the sixth semester, students had to assess the quality attributes of a software product. This fact encouraged the teaching of usability evaluations during class. However, in order to conduct this study, it was necessary to train students in heuristic evaluations. Despite their lack of experience, participants identified a relevant set of usability problems using the new heuristics we have proposed.

3.2 Method

Our empirical study was focused on the analysis of the students’ perceptions about the new set of usability heuristics for transactional Web sites. The study was conducted in classroom settings during the Springer semester of 2014. The broad research questions addressed are:

  • RQ1: Do the students consider that the new set of usability heuristics for transactional Web sites is easy to use and useful?

  • RQ2: Would the students use the new set of heuristics in future evaluations?

  • RQ3: According to the students’ opinion, are the heuristics covering all aspects of usability in the transactional Web domain?

  • RQ4: What is the degree of perceived ease to use and perceived usefulness of each heuristic?

  • RQ5: Is the perception of usefulness of the new heuristics being influenced by their perception of easy of use?

  • RQ6: Is the degree of intention to use, perceived usefulness, perceived ease of use and perceived completeness of new set of usability heuristics higher than in the traditional proposal of Nielsen?

The experiment was conducted through a systematic procedure. First, all participants were trained in the main concepts of usability and user experience. When these definitions were fully conceptualized by the students, they were trained in heuristic evaluations. The students had to follow the new set of heuristics and analyze a Web site in order to find usability problems. For this purpose, a transactional Web application for booking accommodation online was selected: Booking.com. The students examined the graphical user interface of the application for about two hours. As a result of the evaluation process, each participant reported a list of usability problems in the interface with references to those usability principles that were violated by the design in each case.

Finally, a post-task survey was used to measure the following constructs about the new set of usability heuristics: perceived ease of use (PEU), perceived usefulness (PU), intention to use (IU) and perceived completeness (PC). The items of the survey instrument were formulated using a 5-point Likert scale, where 1 was referred to an extremely negative perception of the construct, and 5 to an excellent positive rating of it. Although PEU and PU were measured by heuristic, the set of principles was considered as a single instrument for the evaluation of IU and PC. In the survey, we inquired into the ease of use and usefulness of each heuristic. However, the questions were not formulated by heuristic when we asked for IU and PC. The interest in the use of a particular heuristic without considering the others is not possible. The heuristics have to be used as a single evaluation instrument because of each of them addresses to a specific aspect of usability. For this reason, the survey was focused on determining if the students would use the entire proposal in future evaluations, and if the heuristics, as a single tool, succeed in covering all aspects of usability.

4 Data Analysis and Results

We collected fifteen valid documents which were submitted to analysis. This stage was performed using a standard version of SPSS for Windows, Release 22.0. In this section, we present the obtained results for each research question.

4.1 Research Question 1

The purpose of this research question was to determine the perceived ease of use (PEU) and perceived usefulness (PU) of the new heuristics for Transactional Web Sites. In a previous work [11], we identified that the Nielsen’s heuristics, despite being regarded as the most commonly used principles, they are difficult to use by novice evaluators. When specialists use the traditional heuristics for the first time, they fail to correctly interpret each principle. This situation seems to be reflected due to the complexity of the heuristics. However, we consider that the principles should be easy to use by both, experts and novice evaluators, without the necessity of an in-depth study in these guidelines. In addition, the perceived ease of use represents a key success factor in a method according to the MAM. For this reason, this aspect was considered during the development of the new heuristics and as a part of this evaluation.

Other important aspect that was addressed in this question is the perceived usefulness of the new usability principles. The purpose of this construct was to determine if the new proposal was appropriate and could be used as a useful tool to identify usability problems. During the previous work, we noticed that the usefulness of the Nielsen’s heuristics was questioned by the specialists when usability issues out of the scope of the evaluation instrument were identified. Although certain aspects of the interface were considered as usability problems, there was no heuristic that could support these statements. The Nielsen’s heuristics, despite being efficient design recommendations, were not longer useful for the transactional Web domain. In this study, we examine if the new set of heuristics meets these requirements.

The scores of all students were averaged in order to obtain a final result for each construct. The descriptive statistics were:

  • Perceived Ease of Use (PEU) (mean = 3.37, standard deviation = 0.53)

  • Perceived Usefulness (PU) (mean = 3.91, standard deviation =  0.46)

The results show that the mean is greater than 3 (the neutral score in a 5-point Likert scale) in both constructs. In this way, we can conclude that the new set of heuristics is perceived as easy to use and usefulness. However, the mean value is not high enough for PEU. From this result, we can establish that it is still necessary to conduct studies in order to determine the cause of the difficulty of use. Some assumptions we propose are the complexity of the evaluation method and the lack of clarity in the definition of the heuristics. Nevertheless, many efforts were made for the development of understandable heuristics. For this reason, we concluded that the difficulty of the process of heuristic evaluation affects in a way the perception of ease of use of the usability heuristics.

4.2 Research Question 2 and 3

In these research questions, the purpose was to measure the degree of adoption in practice of the new usability heuristics as well as their validity. The students were asked if they would use this proposal in case they had to perform a heuristic evaluation again. The construct of intention to use can be considered as a critical success factor because it determines if the heuristics will be used in the future. Similarly, we included into the evaluation model a construct to validate the completeness of the heuristics from the opinion of the participants. In this variable, we evaluated if our new proposal was covering all aspects of usability of a transactional Web domain.

The scores of all students were averaged in order to obtain a final result for each construct. The descriptive statistics were:

  • Intention to Use (IU) (mean = 3.93, standard deviation = 0.93)

  • Perceived Completeness (PC) (mean = 4.07, standard deviation = 0.70)

The results show that the mean is greater than 3 (the neutral score in a 5-point Likert scale) in both constructs. Therefore, we conclude there are intentions to use our proposal in future evaluations, and that also the heuristics are covering most of the aspects of usability in this domain. Although the results are appropriate, more studies are required to perform continuous improvements through the feedback of the specialists.

4.3 Research Question 4

In this research question, the purpose was to examine individually each heuristic of the new proposal. For this reason, the survey was designed in order to obtain a score about the ease of use and the usefulness of each heuristic. The scores of all participants were averaged by heuristic and construct. The results are presented in Table 1. From these results, we can conclude that: (1) F8. aesthetic and minimalism design is perceived as the easiest to use, (2) F4. feedback of transaction is perceived as the most difficult to use, (3) F3. match between system and user’s cultural aspects is perceived as the most useful, and (4) F14. recognition rather than recall is perceived as the less useful.

Table 1. Perceptions of the students about the new set of usability heuristics

From the results, it is possible to notice that most of the heuristics which covers aspects of usability that are not considered by the proposal of Nielsen, such as: F1, F3, F5, F6, F7 and F13, obtained high scores greater than 3 (the neutral score in a 5-point Likert scale). However, F4 and F12 were rated with low scores in PEOU. We believe that this fact is due to the inability to complete a entire work flow during the evaluation of the software product. Both heuristics demand the execution of transactions. Given that the transactional Web application was a E-Commerce Web site in operation, it was required to execute a financial transaction to complete a work flow, that obviously was not performed because of the nature of this test.

4.4 Research Question 5

The purpose of this research question was to verify one of the relations established by the MAM in our new proposal of usability heuristics. Due to the design of the survey, we only focused on analyzing if the perceived ease of use had some kind of influence on their perception of usefulness of the heuristics. Therefore, we formulated the following hypothesis:

We calculated the Pearson product-moment correlation coefficient to observe the impact of the perceived ease of use on the perceived usefulness of the new usability heuristics. The assumption of normality was satisfied.

The Pearson correlation coefficient was significant (r = 0.495, p = 0.006). This result shows a strong relationship according to Mujis [8] and indicates that: 49.5 % of the variance in the perceived usefulness of the heuristics is explained by a linear relationship with their perceived ease of use. Considering a significance level of 5 %, these values allow to conclude that the perceived usefulness of the new heuristics is being influenced by their perceived ease of use.

4.5 Research Question 6

The purpose of this research question was to compare the perception analysis of our new proposal with the results of another study that was performed to the traditional heuristics. In a previous work [11], we conducted a similar case study of the Nielsen’s heuristics in the transactional Web domain. The results showed that these principles were not appropriate since they did not cover all usability aspects of this specific kind of software application. This fact encouraged the authors to elaborate a new set of usability heuristics improving all the aspects of the MAM. The results of both studies are presented in Table 2.

Table 2. Comparison between the new set of usability heuristics and the traditional proposal of Nielsen

Although there is an improvement in all aspects regarding the traditional heuristics of Nielsen, the differences were not highly remarkable. One of the most relevant results is the perceived completeness. According to this comparison, the new usability heuristics for transactional web sites would be achieving their purpose by covering most of the aspects of usability in this specific domain. However, it is still necessary to complete more studies with the aim of further improving the other variables of the model. The results of this comparison must only be considered as a reference since the studies were conducted in different contexts. Despite the methodological design was the same and the participants were undergraduate students in both studies, there are aspects that could affect the validity of this comparison, such as: the institutional context, the teaching, the curriculum of the program, the cultural differences between groups, the experiment settings and others.

5 Conclusions and Future Works

In recent years, several categories of software applications have emerged. Systems nowadays are embedded of complex and sophisticated components. However, we keep using the same assessment tools that were developed to measure the level of usability of generic software. In a previous study, we determined that Nielsen’s heuristics, a list of principles to assess the usability of software products in heuristic evaluations, were not appropriate in the domain of transactional Web applications. They failed to cover all aspects of usability in this kind of software product. Therefore, a new proposal was developed.

A new assessment instrument of fifteen new usability heuristics for transactional Web sites was proposed. In this work, we validated this new proposal through an perception analysis about the heuristics. The purpose was to determine how the new set of heuristics is perceived by specialists who use it for the first time. This experimental case study was conducted following the Method Adoption Model (MAM) which establishes the study of three dimensions: perceived ease of use (PEU), perceived usefulness (PU) and intention to use (IU). However, an additional construct was considered to verify if the heuristics as a set are covering all aspects that are required: perceived completeness (PC). Fifteen undergraduate students were asked to performed a heuristic evaluation using the new proposal. After the evaluation, a survey was taken to measure the dimensions.

The results showed that the new heuristics were perceived as easy to use and usefulness. Furthermore, participants expressed their intentions to use this new proposal in future evaluations and adopt it in practice. According to the student’s opinion, the new heuristics cover many aspects that were not considered by the traditional proposal. Although these promising results, it is still necessary to refine some heuristics that individually scored low results. It would be also convenient to propose a checklist and to conduct more studies in different context. In a final comparison, we determined that our proposal is better perceived than Nielsen’s heuristics, however, more experiments should be performed in order to generalize these results.