Keywords

1 Introduction

Satisfaction is one of the three main components of usability [8], along with effectiveness and efficiency. Practitioners are used to testing this component through standardized questionnaires after that people have gain some experience in the use of a website. In particular, experts are used to applying short scales of satisfaction analysis to reduce the time and the costs of the assessment of a website. Among the quick satisfaction scales, the most popular tool of assessment is SUS [9]. SUS is a free and highly reliable instrument [1014], composed of only 10 items on a five-point scale (1: Strongly disagree; 5: Strongly agree). To compute the overall SUS score, (1) each item is converted to a 0-4 scale for which higher numbers indicate a greater amount of perceived usability, (2) the converted scores are summed, and (3) the sum is multiplied by 2.5. This process produces scores that can range from 0 to 100. Despite the fact SUS was designed to be unidimensional, since 2009, several researchers have showed that this tool has two-factor structures: Learnability (scores of items 4 and 10) and Usability (scores of items 1-3 and 5-9) [2, 3, 13, 1517]. Moreover, the growing availability of SUS data from a large number of studies [13, 18] has led to the production of norms for the interpretation of mean SUS scores, e.g., the Curved Grading Scale (CGS) [16]. Using data from 446 studies and over 5,000 individual SUS responses, Sauro and Lewis [16] found the overall mean score of the SUS to be 68 with a standard deviation of 12.5.

The Sauro and Lewis CGS assigned grades as a function of SUS scores ranging from ‘F’ (absolutely unsatisfactory) to ‘A+’ (absolutely satisfactory), as follows: Grade F (0–51.7); Grade D (51.8–62.6); Grade C- (62.7–64.9); Grade C (65.0–71.0); Grade C+ (71.1–72.5); Grade B- (72.6–74.0); Grade B (74.1–77.1); Grade B+ (77.2–78.8); Grade A- (78.9–80.7); Grade A (80.8-84.0); Grade A+ (84.1–100).

Recently, two new scales were proposed as shorter proxies of SUS [17]: the UMUX, a four-item tool [1, 19], and the UMUX-LITE composed of only the two positive-tone questions from the UMUX [3]. The UMUX items have seven points (1: Strongly disagree; 7: Strongly agree) and both the UMUX and its reduced version, the UMUX-LITE, are usually interpreted as unidimensional measures. The overall scales of the UMUX and UMUX-LITE range from 0 to 100. Their scoring procedures are:

UMUX: The odd items are scored as [score − 1] and even items as [7 − score]. The sum of the item scores is then divided by 24 and multiplied by 100 [1].

UMUX-LITE: The two items are scored as [score − 1], and the sum of these is divided by 12 and multiplied by 100 [3]. As researchers showed [1, 3, 19], SUS, UMUX, and UMUX-LITE are reliable (Cronbach’s α between .80 and .95) and correlate significantly (p < .001). However, for the UMUX-LITE, it is necessary to use the following formula (1) to adjust its scores to achieve correspondence with the SUS [3].

$$ {\text{UMUX}} - {\text{LITE }} = \, . 6 5\left( {\left[ {\text{Item 1 score}} \right] + \left[ {\text{Item 2 score}} \right]} \right) \, + { 22}. 9. $$
(1)

Despite the fact short scale of satisfaction analysis is quite well known and used in HCI studies, rarely have the psychometric properties of these scales been analyzed by researchers when applied to test the usability of an interface with disabled users. This is because elderly and disabled people are often excluded from the usability evaluation cohorts because they are considered “people with special needs” [20], instead of possible end-users of a product with divergent and alternative modalities of interaction with websites. Nevertheless, as suggested by Borsci and colleagues [21], the experience of disabled users has a great value for HCI evaluators and for their clients. Indeed, to enrich an evaluation cohort with sub-samples of disabled users could help evaluators to run a sort of stress test of an interface [21].

The main complaint of designers, as regards the involvement of disabled people in the usability evaluation, is the cost of the test for disabled users. In fact, disabled users testing usually requires more time compared with the assessment performed by people without disability. The extra-time could be due to the following reasons. First, some disabled users need to interact with a website through a set of assistive technologies and this could require conducting the test in the wild instead of a lab. Second, evaluators need to set-up an adapted protocol of assessment for people with cognitive impairment, such as dementia [7]. Nevertheless, these issues could be overcome by adopting specific strategies. For instance, experts could ask for a small sample of disabled users, who are already customers of a website, to perform at their house a set of short interactions with a website driven by scenarios. Another approach could be to ask disabled users who are novices in the use of a website, to perform at home for a week a set of tasks by controlling remotely the interaction of these users [4]. Independently from the strategies, instead of fully monitoring the usability errors performed by disabled users, experts could just request from these end-users to complete a short scale after their experience with a system to gather their overall satisfaction. The satisfaction outcomes of disabled users’ cohort could be then aggregated and compared with the results of the other cohort of people without disability. Therefore, by using short scales of satisfaction evaluation, practitioners could save on costs and, with a minimal effort, report to designers the number of errors identified, the level of satisfaction experienced by users without disability, and a comparative analysis of the satisfaction with a mixed cohort of users. Thus, short scales could be powerful tools to include, at minimal cost, the opinions of disabled users in the usability assessment, in order to enhance the reliability of the assessment report for the designers.

Today, the possibility to include a larger sample of users with different kind of behaviors in the usability testing is particularly relevant to obtain a reliable assessment. In fact, in the context of ubiquitous computing people could access and interact through different mobile devices with websites, and a large set of information on public services (such as taxes, education, transport, etc.) is available online. Therefore, for the success of public services websites it is important to have an interface which is accessible to a wide range of possible users and usable in a satisfactory way.

Despite the growing involvement of disabled users in the usability analysis, there are no studies analyzing the psychometric properties of short scales of satisfaction and the use of these tools to assess the usability of website interfaces perceived by a sample of disabled users.

The aim of this paper is to propose a preliminary analysis of the use of SUS, UMUX, and UMUX-LITE with a small sample of users with and without disability. To reach this aim, we involved in a usability assessment two different cohorts (blind and sighted users), in order to observe the differences between the two samples in terms of number of errors experienced by the end-users during the navigation, and the overall scores of the questionnaires. Moreover, we compared the psychometric properties of SUS, UMUX, and UMUX-LITE when administered to blind and sighted participants in terms of reliability and scales correlation.

2 Methodology

Two evaluation cohorts composed of 10 blind-from-birth users (Age: 23.51; SD: 3.12) and 10 sighted users (Age: 27.88; SD: 5.63) were enrolled through advertisements among associations of disabled users, and among the students of the University of Perugia, in Italy. Each participant was asked to perform on the website of the Italian public train company (http://www.trenitalia.it) the following three tasks, presented as scenarios:

  • Find and buy online a train ticket from “Milan – Central station” to “Rome – Termini station.”

  • Find online and print the location of info-points and ticket offices at the train station of Perugia.

  • Use the online claim form to report a problem about a train service.

Participants were asked to verbalize aloud their problems during the navigation. In particular, sighted users were tested through a concurrent thinking aloud protocol, while blind users were tested by a partial concurrent thinking aloud [7].

After the navigation each participant filled the Italian validated version [14] of three scales, presented in a random order.

2.1 Data Analysis

For each group of participants there were descriptive statistics (mean [M], standard deviation [SD]). An independent t-test analysis was performed to test the differences between the two evaluation cohorts in terms of overall scores of the three questionnaires. Moreover, a Cronbach’s α and Pearson correlation analyses were performed to analyze the psychometric properties of the scales when administered to different end-users. All analyses were performed using IBM® SPSS 22.

3 Results

3.1 Usability Problems and User Satisfaction

The two evaluation cohorts identified, separately, a total number of 29 problems: Blind users experienced 19 usability issues, while sighted users experienced only 10 issues. Of the 29 issues reported by the two cohorts, eight issues were identified by both blind and sighted users; two problems only by sighted users; and 11 only by blind users. Therefore, a sample of 21 unique usability issues was identified testing 20 end-users. As reported in Table 1, an independent t-test analysis showed that for each of the questionnaires there was a significant difference between the overall satisfaction in use experienced by blind and sighted users.

Table 1. Differences among SUS, UMUX, and UMUX-LITE administered to blind and sighted users.

As can be seen in Table 2, while blind users assessed the website as not usable (Grade F), sighted users judged the interface as having an adequate level of usability (Grades for C- to C). By aggregating the two evolution cohorts, the website could be judged as a product with a low level of usability (Grade F).

Table 2. Average score, standard deviation (SD) and average aggregated scores of the SUS, UMUX, and UMUX-LITE of blind and sighted users. For each scale the Curved Grading Scale (CGS), provided by Sauro and Lewis [16], was also used to define the grade of website usability.

3.2 Psychometric Properties of Questionnaires

The Cronbach’s α analysis showed that all the questionnaires are reliable when administered to both sighted and blind users (Table 3). Nevertheless, in the specific case of blind users, UMUX reliability is lower than expected (.568).

Table 3. Reliability of SUS, UMUX, and UMUX-LITE for both blind, and sighted users.

As Table 4 shows, all the questionnaires, independently from the evolution cohort, are strongly correlated (p < .001).

Table 4. Correlations among SUS, UMUX, and UMUX-LITE for both blind and sighted users.

4 Discussion

Table 2 clearly shows that while sighted users judged the website as quite a usable interface (Grades from C- to C), disabled users assessed the product as not usable (Grade F). This distance between the two evaluation cohorts is perhaps due to the fact that blind users experienced 11 more problems than the cohort of sighted participants. These results indicate that a practitioner adding to an evaluation cohort a sample of disabled users may drastically change the results of the overall usability assessment, i.e., the average overall score of the scales (Table 1).

The three scales were very reliable for both the cohorts (Cronbach’s α > 0.8; Table 3), however, the UMUX showed a low reliability when administered to blind users (Cronbach’s α > 0.5). This low level of reliability of UMUX was unexpected, considering also that UMUX-LITE composed of only the positive items of UMUX – i.e., items 1 and 3 – was very reliable (Table 3). Perhaps the negative items of UMUX – i.e., items 2 and 4 – were perceived by disabled users as complex or unnecessary questions, or this effect is an artifact of the randomized presentation of the questionnaires to the participants. Finally, for both the cohorts, the three scales were strongly correlated – i.e., p<.001 (see Table 4).

5 Conclusion

Quick and short questionnaires could be reliably used to assess the usability of a website with blind users. All the three tools reliably capture the experience of participants with and without disability, by offering to practitioners a good set of standardized results about the usability of a website.

Although further studies are needed to clarify the reliability of UMUX when administered to disabled users, our results suggest that UMUX-LITE and SUS might be applied by practitioners as good scales of satisfaction analysis. The use of these short scales may help practitioners to involve blind participants in their evaluation cohorts and to compare the website experience of people with and without disability. In fact, practitioners with a minimal cost may administer SUS and UMUX or UMUX-LITE to a mixed sample of users, thus obtaining an extra value for their report: the divergent perspectives of the disabled users. This extra value is particularly important for websites of public administration and of those services, such as public transport, that have to be accessed by a wide range of people with different levels of functioning.