A revised open source usability defect classification taxonomy

https://doi.org/10.1016/j.infsof.2020.106396Get rights and content

Abstract

Context

: Reporting usability defects is a critical part of improving software. Accurately classifying these reported usability defects is critical for reporting, understanding, triaging, prioritizing and ultimately fixing such defects. However, existing usability defect classification taxonomies have several limitations when used for open source software (OSS) development. This includes incomplete coverage of usability defect problems, unclear criticality of defects, lack of formal usability training of most OSS defect reporters and developers, and inconsistent terminology and descriptions.

Objective

: To address this gap, as part of our wider usability defect reporting research, we have developed a new usability defect taxonomy specifically designed for use on OSS projects.

Method

: We used Usability Problem Taxonomy (UPT) to classify 377 usability defect reports from Mozilla Thunderbird, Firefox for Android, and the Eclipse Platform. At the same time, we also used the card-sorting technique to group defects that could not be classified using UPT. We looked for commonalities and similarities to further group the defects within each category as well as across categories.

Results

: We constructed a new taxonomy for classifying OSS usability defects, called Open Source Usability Defect Classification (OSUDC). OSUDC was developed by incorporating software engineering and usability engineering needs to make it feasible to be used in open source software development. The use of the taxonomy has been validated on five real cases of usability defects. However, evaluation results using the OSUDC were only moderately successful.

Conclusion

: The OSUDC serves as a common vocabulary to describe and classify usability defects with respect to graphical user interface issues. It may help software developers to better understand usability defects and prioritize them accordingly. For researchers, the OSUDC will be helpful when investigating both trends of usability defect types and understanding the root cause of usability defect problems.

Introduction

Usability is one of the prominent software quality characteristics that measures the understandability, learnability, operability and attractiveness of the software products [1]. In the context of community open source software in which no specific software development processes were carried out, usability activities are often ignored. Volunteers are more focused on functionality and features rather than appearance, design aesthetic, and how people will use the products [2]. As a result, open source projects often have poor interfaces and complex user interaction [2], [3].

Since usability is a key acceptance criterion of a software product, usability-related issues need to be reported. In this work, a usability issue is defined as any unintended behaviour by the product that is noticed by the user and has an effect on user experience. For example, consider a search job that uses a lot of computer resources. If the effect of high memory usage is only noticeable by software developers, then we consider this problem to be a performance defect. But, if a user experiences the slowness of retrieving the search results and is frustrated by a delay, in addition to performance it also affects usability.

However, reporting usability defects can be a challenging task, especially in convincing developers that the usability issue is indeed a real defect. The subjective nature of a usability issue in addressing a user’s feelings, emotions, and struggling requires a mutual definition so that developers do not misinterpret the key information. In open source project development where most volunteers are “non usability – savvy” and work for limited time, a list of usability keywords and options to classify usability defects is a helpful solution to directly point to the causes and solutions [4]. Currently, existing defect repositories, such as Bugzilla, have used keyword functionality to label usability-related defects. For instance, a defect can be labelled as uiwanted, useless-UI, ux-affordance, uc-consistency and ux-efficiency. However, such a high-level classification does not assist developers to identify the underlying flaws or problems. In fact, the lack of descriptions, examples and limited usability terms make it difficult for non-expert Human Computer Interaction evaluators to assign such labels for certain usability defects [5]. Moreover, a recent literature review exploring usability defect reporting practices has discovered that usability defect reporting processes suffer from a number of limitations, including mixed data, inconsistency of terms and values of usability defect data, and insufficient attributes to classify usability defects [6]. These limitations encouraged us to revise existing usability defect classification models to produce a model suitable for the OSS domain. There are several reasons for categorizing usability defects:

  • 1)

    to more clearly disclose the probable causes of the defect;

  • 2)

    to highlight the impact of usability defects on the task outcome;

  • 3)

    to treat usability defect priority the same as the other defects; and

  • 4)

    to quantitatively track usability defects over time.

Based on our analysis of open source usability defect reports, we integrated and revised some existing usability defect classification models [8], [9], [10] to better incorporate Software Engineering and usability engineering needs. We also obtained and evaluated feedback on our new proposed open source usability defect classification model by requesting software development practitioners and novice users to classify a sample of usability defects. From an analysis of these classifications, we identified several strengths and weaknesses in our approach. In this paper, we report on the design and empirical evaluation of our new OSS usability defect taxonomy. Key contributions of this work include:

  • a revised OSS usability defect taxonomy to classify usability defects in OSS environment that have limited usability engineering training and practices; and

  • an evaluation of the taxonomy by practitioners to understand its strengths and weaknesses.

The rest of this paper is organized as follows. In Section 2, we describe an overview of usability defect classification schemes from the usability and software engineering disciplines. Section 3 follows with the rationale for revising existing usability defect classification schemes in open source software development. Section 4 explains the research process and methodology to construct the taxonomy. In Section 5, we elaborate our new usability defect classification model. We present our approach to evaluate the model in Section 6, and the evaluation results are presented in Section 7 and Section 8, respectively. We outline threats to validity in Sections 9 and we discuss some important issues in Section 10. The paper concludes with a summary, implications, and future work in Section 11.

Research on usability defect classification is often studied in the field of human computer interaction (HCI). Earlier efforts to classify usability defects were done by Nielsen [7]. Nielsen refined the nine heuristics he identified earlier using factor analysis of 249 usability problems to derive a set of heuristics with maximum explanatory power, resulting in a revised set of 10 heuristics. Since Nielsen’s heuristics only offer a high-level classification that only considers high level views of difficulties users encountered with the user interface, there are several limitations with using it to classify usability problems, as reported in [8]: 1) insufficient distinguishability, 2) lack of mutual exclusiveness, 3) incompleteness, and 4) lack of specificity.

To overcome these limitations, Keenan et al. [8] developed the Usability Problem Taxonomy (UPT) that classifies usability defects into artefact and task components. The artefact component consists of visualness, language, and manipulation categories, while the task component consists of task mapping and task facilitation categories. Each category is composed of multi-level subcategories. For example, language consists of two-level sub-categories; the first level consists of naming/labelling, and other wording. In the second level, other wording is further categorized into feedback messages, error messages, other system messages, on screen text, and user-requested information/ results. The depth of classification along the components and categories may result in one of three outcomes: full classification, partial classification, and null classification.

However, Keenan’s approach to classification relies on a high quality defect description which, as our earlier work demonstrates [9], are rarely present in open source usability defect reports. Our observations are that many open source usability defect reports have defect descriptions that contain a lack of contextual information, particularly on the user-task. As a result, when using UPT to classify usability defects, we have to make many assumptions and a self-judgement about the task performed by the users that lead to the problems. We believe UPT is useful for usability evaluators to assess the usability defects during usability evaluation with the presence of users, but not to classify defects by just reviewing the usability defect description.

Andre et al. [10] have expanded the UPT to include other usability engineering support methods and tools. By adapting and extending Norman’s [11] theory of action model, they developed Usability Action Framework (UAF) that used different interaction styles. For example, the high-level planning and translation phase contains all cognitive actions for users to understand the user work goals, task and intentions, and how to perform them with physical actions. The physical action phase is about executing tasks by manipulating user interface objects, while the assessment phase includes user feedback and the user’s ability to assess the effectiveness of physical actions outcome. Even if the UAF was viewed as a reliable classification scheme that supports dissimilarity of defect descriptions for the same underlying design flaw, the complexity in determining which phase of the interaction the problem occurred is a real challenge for novice evaluators.

Meanwhile, in ISO/IEC 25,010 standard the quality of software product can be measured using eight characteristics (further divided into sub characteristics) – functional suitability, performance efficiency, compatibility, usability, reliability, security, maintainability, and portability. In the context of product usability, ISO/IEC 25,010 defines usability as appropriateness recognisability, learnability, operability, user error protection, user interface aesthetics, and accessibility. However, the results from [12] indicated that the classification of defects using main characteristics and sub characteristics were not reliable due to the limited information present in the defect reports. With little information, the functionality and usability issues were difficult to distinguish.

However, since 2000, many researchers have started to actively use software engineering classification models to classify usability defects. One of the most prominent approaches is the adoption of a cause-effect model. For example, Vilbergsdottir et al. [13] have developed a Classification of Usability Problem (CUP) framework that consists of two-way feedback; Pre-CUP that describes how usability defects are found, and Post-CUP that describes how usability defects are fixed. In Pre-CUP, usability evaluators use nine attributes (Defect identifier, frequency, trigger, context, description, defect removal activity, impact, failure qualifier and expected phase) to describe usability defects in detail. Once the usability defects have been fixed, the developers record four attributes (actual phase, types of fault removed, cause and error prevention technique) in Post-CUP. Although the Post-CUP is useful for defects triaging, in which similar issues can be mapped into specific fixes, we postulate that some of the attributes in Pre-CUP are not relevant for novice OSS reporters to report informative usability defect descriptions. For example, technical information about defect removal activity, failure qualifier, expected phase, and frequency are difficult to obtain, especially for those who have limited usability-technical knowledge.

Khajouei et al. [14] argued that the lack of information on the effects of usability defects in UAF will cause a long discussion to convince developers of the validity of the usability defects. They augmented the UAF to include Nielsen’s severity classification and the potential impact of usability defects on the task outcome, in order to provide necessary information for software developers to understand and prioritize problems.

Although Geng et al. [15] agreed that CUP can capture important usability defect information and provide feedback for usability software, CUP could not be used to analyse the effect on users and task performance. Considering the importance of the cause – effect relationship, they have customized the ODC and UPT, as shown in Fig. 1. They developed cause-effect usability problem classification model that consists of three causal attributes (artefact, task, and trigger) and four effects attributes (learning, performing, perception, and severity). However, in the absence of formal usability evaluation in OSS projects, the trigger attribute as suggested in the model cannot be sufficiently justified. Additionally, the use of pre-defined values for some of the attributes may introduce selection bias and users are likely to select incorrect values.

Other usability problem classifications use a combination of models to support practical use of classification in different software development context [16]. This model-based framework consists of three perspectives, in which each perspective is facilitated by the use of models: artefact-users-tasks-organization-situation model for Context of Use, abstraction hierarchy model for Design Knowledge, and function-behaviour-structure model for Design Activities - in which the usability problem needs to be analysed through the collective consideration of the three models. The Context of Use perspective is to understand the cause of the problem, either related to design factors (violated user interface design guidelines) or non-design factors (user preferences). If a usability problem is judged as “design factors”, it should be further analysed from the Design Knowledge and Design Activities perspectives. Such a reference framework allows usability evaluators to develop a specific classification scheme for a context. However, poor involvement of usability evaluators in OSS projects makes it rather impossible to adopt such a comprehensive framework. In fact, contributors who participated voluntarily in open source projects prefer to work more on the main functionality of a certain application rather than focusing on user-centric design [17].

Several other related work support usability-related issues by focusing on GUI defects and functionality. Examples include the GUI fault model [18], which categorize GUI defects into interface and interaction defects, and Harkke et al. [19] classified usability defects into 11 categories – missing, misinterpreted, positive, inadequate, unexplored, misplaced, unnecessary, technical deficiencies, problematic change of work practice, preferenced, and misaligned.

From a software engineering perspective, cause-effect classification models provide a deeper understanding of a software problem. To the best of our knowledge, only one usability cause-effect classification currently exists. Geng’s classification [15], in our view, is not appropriate to classify open source usability defects that often contain limited information [20], [21], [22]. The trigger component in the causal attributes can be limited. This is because in the absence of formal usability evaluations in OSS development, it is impossible to identify the usability evaluation methods that trigger usability defects.

Even if formal usability evaluations were to be conducted, OSS projects would still lack an effective mechanism to conduct the evaluations, mainly for two reasons. First, many of the volunteers who contribute to OSS development are developers, who generally have limited knowledge and skills required for usability evaluation. Second, in order to formally conduct usability evaluations, extra commitment from contributors is necessary, but volunteers may not be able to spend the time on this.

Considering all of these limitations, we revised Geng’s classification [15] to better suit an OSS environment and adapted some elements of the ODC framework to address cause and effect attributes. In the following paragraphs we summarize the rationale for our revisions.

Defect category - In software development, quantitative measurements such as the amount of memory used, the time to load an application or response time is very crucial and often gets immediate attention from software developers, as opposed to subjective usability issues that cannot be scientifically quantified and measured. To address this issue, common open source defect repositories such as Bugzilla have implemented keyword functionality to address usability heuristics terms, such as consistency, jargon, and feedback so that the concept of user interface and the underlying implementation can be described effectively. Each usability issue is tagged with the specific usability heuristic being violated. In this way, software developers with limited usability and interface design knowledge can learn about the heuristics and understand how the same types of defects could be resolved. However, current usability principles being used by Bugzilla’s keyword functionality are too broad [23]. Some keywords are hard to distinguish and may lead to incorrect interpretation. Consider the following Bugzilla keywords and definitions :1

Ux-affordance – controls should visually express how the user should interact with them.

Ux-discovery – users should be able to discover functionality and information visually exploring the interface, they should not be forced to recall information from memory. (This is often the nemesis of ux-minimalism since additional visible items diminish the relative visibility of other items being displayed)

Based on these definitions, the two keywords refer to the ability of users to recognize and understand possible actions based on visual cues of user interface. The unclear separation between the keywords can lead to misclassification of defects that will eventually affect the identification of root cause and similar resolution strategies. In fact, the single perspective classification as used by Bugzilla is not relevant for classifying usability issues that often consist of graphical user interface and action issues. In this regard, a taxonomic classification such as UPT is a recommended approach to classify usability defects from an artefact and task aspect, respectively.

Effect – Previous studies have reported that usability defects are treated at a lower priority compared to functional defects [24]. In the existing ODC classification, severity is used to measure the degree of the defect impact. However, due to unclear usability category definitions, many usability defects end up with low severity ratings [24]. From our analysis of open source defect reports [9], we think unclear and missing descriptions about user difficulty caused by the usability defect is one of the reasons why software developers do not prioritize the importance of fixing many reported usability defects. The fact that only a small fraction of usability defect reports contain impact information reveals the lack of contextual information to convey information to software developers about the user difficulty and how it impacts user emotion from the perspective of usability engineering. However, the use of only textual descriptions to capture user difficulty could be a disadvantage as users are likely to provide lengthy explanations that may be unhelpful to many software developers. One way to reduce this limitation is to create a set of predefined impact attributes so that the impact can be objectively measured. For example, we can use rating scale to measure emotion, while task difficulty could be selected from a predefined set of attributes.

Causal – Since no formal usability evaluation is usually conducted in OSS projects, usability problem triggers cannot be identified. In OSS projects, usability defects are most often reported from online user feedback facilities and results of developer black-box testing. Considering this limitation, instead of looking at trigger attributes, we study the failure qualifier of the problem. This information could help software developers to understand the reason why a user considers the problem as a valid usability defect.

Section snippets

Research process and methodology

The OSUDC taxonomy was created following a three-phase process and influenced by the design science methodology [31]: 1) problem identification, 2) artefact design, and 3) validation.

In the problem identification phase, we reviewed several usability defect classification models in the literature, in particular UPT [8], ODC [25], GUI fault model [18] and usability-ODC framework [15]. While we wanted our usability defect classification to be in line with software engineering principles, we also

Evaluator demographic information

A total of 41 evaluators from 26 to 55 years of age participated in the evaluation of the OSUDC taxonomy. As shown in Table 5, most of the evaluators are computing students and academic researchers, accounting for 48.8% and 29.3%, respectively. Almost 80% of the evaluators had received training or certification related to usability evaluation/ HCI/ UX. However, as indicated in Table 6, the majority of evaluators had limited familiarity in handling usability defects.

Fleiss’ kappa analysis

To measure the reliability of

Experiences and feedback

This section presents and discusses the results from the post-evaluation questionnaire filled out by the evaluators at the end of our survey. The post-evaluation questionnaire had one closed question and one open-ended question. The closed question was measured on a 5-point Likert scale using the satisfaction-based statements as follows:

  • Easy to learn – the degree to which an evaluator is satisfied that the OSUDC is easy to be learned with no training or demonstration

  • Easy to use – the degree to

Discussion

For the purpose of practical usability defect reporting in conjunction with our proposed OSUDC, we recommend four characteristics for capturing usability defects:

  • 1

    State the types of usability problem encountered.

  • 2

    Justify the impact of the usability defects on user and task, possibly by relating to human emotion and software quality attributes. Perhaps, the human emotions could use scale rating so that it could be objectively quantified.

  • 3

    State how the problem is identified.

  • 4

    Use predefined attributes

Summary

This study presented the OSUDC taxonomy to classify and analyse usability defects. In the absence of formal usability evaluation in most OSS projects and limited information available in most usability defect descriptions, we revised the existing defect classification schemes to accommodate these limitations. We integrated the Geng’s classification model and ODC framework to reflect the important element of classifying usability defects from the perspective of usability and software

Declaration of Competing Interest

None.

Acknowledgements

Support for the first author from the Fundamental Research Grant Scheme (FRGS) under Contracts FRGS/1/2018/ICT 01/UITM/02/1, Universiti Teknologi MARA (UiTM), ARC Discovery Projects scheme project DP140102185 and Laureate Fellowship FL190100035, and from the Deakin Software and Technology Innovation Lab and Data61 for all authors, is gratefully acknowledged.

References (33)

  • S.L. Keenan et al.

    The Usability Problem Taxonomy : a Framework for Classification and Analysis

    Empir. Softw. Eng.

    (1999)
  • N.S.M. Yusop et al.

    Analysis of the Textual Content of Mined Open Source Usability Defect Reports

  • D.A. Norman

    Cognitive engineering

    User centered Syst. Des.

    (1986)
  • A. Vetro et al.

    Using the ISO/IEC 9126 product quality model to classify defects: a Controlled Experiment

  • S.G. Vilbergsdóttir et al.

    Classification of Usability Problems (CUP) Scheme : augmentation and Exploitation

  • R. Geng et al.

    In-process Usability Problem Classification, Analysis and Improvement

  • Cited by (0)

    View full text