Keywords

1 Introduction

Heuristic evaluations are used both within the health care domain [1,2,3] and in broader design research contexts [4,5,6] to perform “usability inspections” of technologies and systems under development. A heuristic “refers to a global usability issue which must be evaluated or taken into account when designing” a program or application [7]. As the computing world moves to a wider diversity of hardware that includes mobile devices and applications designed for these devices, mobile heuristic evaluation checklists have been developed to address this diversification [7,8,9]. A natural progression of mobile computing is the development of mobile clinical decision support applications [10, 11]. Project design teams should engage clinical end users early in the design process to ensure mobile clinical decision support apps appropriately fit the workflow of clinicians [12,13,14]. However, a major challenge of this approach is the limited availability of clinicians and other health care workers for design engagement due to professional time constraints [15] and the time-intensive process required to perform traditional heuristic evaluations using lengthy evaluation checklists. Further, we are unaware of heuristic evaluation checklists specific to mobile clinical decision support apps. Therefore, there is a need for a short form heuristic evaluation checklist that facilitates engagement of clinical end users early in the process to design mobile clinical decision support applications. Thus, the aims of this study are twofold: (1) Describe the process and results of a heuristic evaluation of a mobile clinical decision support application to support diagnosis of urinary tract infection in long-term care settings using a long form heuristic evaluation checklist, and (2) Develop a short form heuristic evaluation checklist for early stage development of future mobile clinical decision support applications.

1.1 Urinary Tract Infection (UTI)

There are more than 14,000 nursing homes in the US caring for over 1 million residents. Nursing homes are a unique healthcare environment characterized by a complex and debilitated population, a residential healthcare setting, and staffing and workflow patterns reflecting a focus on nursing care [16, 17]. In particular, providers, nurse practitioners, and physician assistants are not physically present in nursing homes much of the time resulting in asynchronous communication and an enhanced responsibility for nurses to detect changes in condition and the identify acute disease such as infections [18]. Urinary tract infections (UTI) are the most common infection diagnosed in nursing homes each year [19]. There is no gold standard test for UTI, as the findings on urine testing for UTI can be identical to those for asymptomatic colonization of the urinary tract – a benign condition which does not require treatment [20, 21]. Thus, to avoid unnecessary and even dangerous overuse of antibiotics, accurate assessment of symptoms relies on a well-trained nursing workforce. Unfortunately, nursing homes are plagued by high rates of nursing turnover and a staffing model in which the use of highly trained registered nurses is minimized. Mobile clinical decision support, therefore, holds great promise in the nursing home environment to address the challenges of nursing home staffing, nurse training as it relates to the identification of UTI, and the differentiation of UTI from bladder colonization [22]. However, any app that is to be utilized regularly and successfully in the nursing home environment must take into account technology limitations in nursing homes, and must be evaluated from the perspective of the end users in order to fit within their workflows.

1.2 UTIDecide

UTIDecide is a mobile clinical decision support tool built as a native mobile application for iOS and Android platforms. The current version of the app is designed for use in nursing homes by nurses and has been tested with nurses in that setting. UTIDecide incorporates an established and validated algorithm for the evaluation and management of UTI in a series of guided application screens [23]. The user is taken through the algorithm step-by-step and app output is tailored to the inputs provided. Features include a password-secured login, user-solicited data capture about usability, and analytics related to user interactions. The application provides evidence-based conclusions about the likelihood of UTI and recommendations for further assessment and treatment. The nurse user is given the option to view a customized SBAR (situation, background, assessment, recommendation) communication tool, which is intended to standardize and facilitate communication of clinical findings to physicians, nurse practitioners and physician assistants [24], who are often geographically distant from the nursing home site. The text from this screen can be printed, sent via e-mail, or potentially shared via secure transmission to clinical information systems. At any point in the process, the user may return to the previous step or start over from the top of the algorithm. Additional features of the app include generating a list of other conditions the user should assess, based on entered clinical signs and symptoms, as well as educational features, including pop-ups describing common points of confusion in the UTI diagnostic and management workflow. Contextual help is provided on each screen, including answers to common questions and icons that provide additional information.

2 Methods

Our evaluation approach follows the recommendations of Billi et al. [8] for mobile application evaluations such that each evaluator has: (a) good knowledge of the proposed set of mobile heuristics and standard heuristic evaluation methodology; and (b) experience in using mobile applications and/or with mobile computing [8].

2.1 Evaluation Team and Setting

Evaluation team members were selected for background, training, and expertise to provide broad coverage in clinical and technical areas. Heuristic evaluation took place at Anschutz Medical Campus in Aurora, Colorado. Seven evaluators used iPhones and two used Android devices. These were: iPhone 6 (n = 4), iPhone SE (n = 2), iPhone 5 (n = 1). All iPhone models were running iOS 10.2.1. Of the two evaluators using Android phones, one used a Samsung S5 and the other used an HTC6525LVW. Both Android phones were running the Android 6.0.1 operating system. All evaluators are co-authors of this manuscript and not considered test subjects; thus, institutional review board was neither required nor sought.

Each evaluator received information about the nature of heuristic evaluations and the long form heuristic evaluation checklist prior to evaluation of the mobile app. Evaluators were composed of two groups: the familiar group (n = 5) and the unfamiliar group (n = 4). Evaluator group sample sizes were informed by research guidelines that recommend 3 to 5 evaluators for heuristic evaluations [25]. The familiar group was composed of current members of the project design team. These included two PhD-trained informatics researchers, two physician researchers, and a health services research PhD student. The unfamiliar group was composed of evaluators with no prior exposure to the UTIDecide mobile clinical decision support app (n = 2) and former project design team members who had been separated from the project for over a year and had no exposure to the version being evaluated (n = 2). Unfamiliar group members were two 4th year medical students, one of whom holds a master’s degree in computer science, a masters-prepared registered nurse (RN), and an undergraduate honors nursing student with a BS degree in biology. The two medical students were part of the project design team for a previous web-based version of the app [10] prior to development of the native mobile app under evaluation.

2.2 Usability Inspection and Long Form Heuristic Evaluation Checklist Items

Long form heuristic evaluation checklist items were drawn from the published heuristic evaluation literature, and then synthesized and adapted for this project in similar fashion to that used by Alexander et al. [1] Selected sources were “traditional” heuristic evaluation checklists [4, 5] and newer checklists for evaluating mobile applications. [7,8,9] Table 1 shows a ranked list of categories for the UTIDecide heuristic evaluation synthesized from selected sources. Individual evaluation items under each category were compiled from the selected sources and harmonized under the categories in Table 1 by the first author. This long form checklist was then circulated to members of the project design team for review and revised based on their feedback. The final checklist totaled 11 categories and 200 heuristics.

Table 1. Rank ordered categories in the long form heuristic evaluation checklist*

Aspects of the UTIDecide app were evaluated with the long form heuristic checklist using Nielsen’s Severity Rating Scale (SRS) [26]. SRS ratings are shown below:

  • 0 = I don’t agree that this is a usability problem at all

  • 1 = Cosmetic problem only: need not be fixed unless extra time is available on project

  • 2 = Minor usability problem: fixing this should be given low priority

  • 3 = Major usability problem: important to fix, so should be given high priority

  • 4 = Usability catastrophe: imperative to fix this before product can be released

SRS ratings were applied to questions under each evaluation category (see Table 1). The UTIDecide app was evaluated for two key path scenarios: (1) a patient with a catheter and concordant UTI symptoms [10], and (2) a patient with a catheter and discordant UTI symptoms as described in our previous development work [10]. The two key path scenarios are described in the publication for a previous version of the UTIDecide mobile clinical decision support app [10].

Each evaluator installed the UTIDecide app on his or her smart phone, recording the phone model and OS. During the evaluation, each evaluator rated the UTIDecide app for each heuristic item in a spreadsheet, making comments where appropriate. Each evaluator also recorded the start and stop time of his or her evaluation session. If a heuristic didn’t apply to the current version of the app (e.g. missing feature), evaluators were still asked to provide a rating, especially if the missing feature affected perceived use of the app. Ratings of “0” were given if the feature didn’t affect usability. Comments about missing features were solicited to generate features for future versions of the app. After all usability inspection tests were completed, the first and second author tabulated usability scores in a spreadsheet and median usability scores for familiar and unfamiliar evaluator groups were calculated.

2.3 Analysis and Summary of Evaluation Comments for UTIDecide Design Changes

The last author analyzed and summarized all evaluator comments to inform design changes to the UTIDecide app. These design changes are describe in Sect. 3.2 and were implemented by the development team.

2.4 Analysis of Evaluation Results to Establish the Short Form Heuristic Checklist

Short form heuristic evaluation item candidates were selected from long form usability inspection results that had both: (a) a median evaluation score of 1.00 or higher from any evaluator group, indicating a usability problem of cosmetic or higher, and (b) a severity rating of 3 or higher from any individual evaluator, indicating a perceived major usability issue by at least one evaluator. This process yielded 27 unique results from the familiar evaluation group, and an additional 5 unique results from the unfamiliar group, for a total of 32 unique items. The first and second author reviewed these 32 items, removing two items that were not relevant to early stage app development and were unintentionally included as an oversight during synthesis of the usability inspection literature. These unintentional items related to supporting a range of user expertise (e.g. novice to expert users) and foolproof synchronization with other information systems, both of which are features to be implemented in later state development.

3 Results

3.1 Long Form Heuristic Checklist Yielded Redundant Results and Evaluator Fatigue

Both the familiar group and unfamiliar group conducted a usability inspection of the UTIDecide mobile clinical decision support app using the long form heuristic evaluation checklist. Evaluators were sent the same materials simultaneously and performed the usability inspection during the period over a three-week period. Each evaluator scored each item on a scale of 0–4 using Nielsen’s Severity Rating Scale (SRS) [26], as described above. Time for completion ranged from 1 h to 2 h and 1 min. Among familiar evaluators, the long form heuristic evaluation took on average 1 h 45 min to complete. For unfamiliar evaluators, it took 1 h 34 min on average.

Median scores across all evaluators were calculated by the second author. No item had a median response of 2 or higher, though individual evaluators frequently rate UTIDecide with scores above a 2.

Two things became apparent after analysis of the long form heuristic evaluation checklist results:

  1. 1.

    Degree of familiarity increased the severity of evaluation scores; evaluators of the familiar evaluator group rated individual heuristics with a higher degree of severity than did those of the unfamiliar group. In a two-sample z-test for comparing two proportions, familiar evaluators gave significantly more severity scores of 2 and above.

  2. 2.

    Survey fatigue with the long form evaluation checklist was evident for both the familiar and unfamiliar evaluators (e.g. evaluators repeated comments using copy/paste)

Scores revealed that evaluators more closely involved in current app development efforts delivered higher severity scores in assessing the UTIDecide app. Thus, degree of familiarity was associated with differences in the severity of evaluation scores. Members of the familiar evaluator group rated individual heuristic items with a higher degree of severity (> 1 rating difference in median score from unfamiliar group) in the following categories: navigation, clarity of aesthetic design, and usability in workflow. Table 2 shows differences in median scores between evaluator groups for specific questions in affected categories. Navigation concerns included a user’s ability to know where s/he is in the app, where to exit, and how to return to a top level at any stage of use. Aesthetic design concerns were that white space and other section markers were used inconsistently or could be improved. Questions regarding usability and ease of use in user workflow prompted evaluators to speculate regarding about how the app may be used in a real world setting, and again, familiar evaluators rated UTIDecide with greater severity.

Table 2. Differences in median scores for familiar and unfamiliar evaluator groups

Evaluators made comments less frequently on later questions in the heuristic evaluation, demonstrating evidence of review fatigue with the long form. We note that one evaluator in the familiar group reported application of the heuristic evaluation items in a non-linear fashion, thus review fatigue may not have manifested in the same way for this evaluator. On average, there were no more than 2 meaningful comments per heuristic after the third section, with later sections having as few as 0.7 average responses per heuristic. Comparatively, the first section had nearly 3 meaningful comments per heuristic. Comments were cut if they were only “yes,” “no,” or “not applicable.”

Further, “as above” comments increased in frequency as the evaluators worked through the long form heuristic evaluation, suggesting a degree of repetition. In group meetings, evaluators reported their review fatigue and reported that they found some heuristics repetitive or so similar as to yield identical results (e.g. “For question and answer interfaces, are visual cues and white space used to distinguish questions, prompts, instructions, and user input?” and “Have prompts been formatted using white space, justification, and visual cues for easy scanning?”).

3.2 Summary of Long Form Heuristic Results for UTIDecide Prototype Changes

Usability inspection findings resulted in design revisions to the UTIDecide app interface and features. A summary of implemented design changes is listed below.

  • Interface redesigns throughout included:

    • Improved layout so control elements are visible without scrolling

    • Removal of excess white space

    • Removal of unnecessary bullet points to improve text display

  • Addition of informational pop-ups with definitions for clinical terms

  • Addition of missing “dismiss” buttons for informational pop-ups

  • Spelling and grammatical corrections throughout the application

  • Home screen redesign to separate it into multiple screens

  • “Don’t show again” disclaimer on home screen

  • Key path scenario for scenario 1 (patient with catheter in and UTI concordant symptoms [10])

    • Added screen for diagnoses to consider

    • Add functionality for input of additional discordant symptoms

  • Key path scenario for scenario 2 (patient with catheter in and UTI discordant symptoms [10])

    • Improvements in language used in clinical term definitions

  • Recommendations screen

    • Added summarized view of symptoms selected by user

    • Added links to evidence-based guidelines

    • Standardized font size with rest of the app

    • Improvement in language for recommendations

    • Changed improper placement of yes/no controls that appeared before question about discordant symptoms

  • Communication script screen

    • Corrected wrong script pop up for discordant symptoms key path scenario

    • Added informational pop-up with definition of script

  • Survey screen

    • Improved layout to show that scrolling required to view questions off screen

    • Added descriptive text to explain survey

    • Added numbering for questions

    • Added opt out capability for survey

    • Added “clear survey capability”

    • Revised “suggestions for improvement” question to allow for comments from erroneous yes/no question format

    • Added “return to home screen” capability after survey completion

3.3 Short Form Heuristic Evaluation Checklist

Substantial time spent reviewing, evidence of evaluator fatigue, and evaluators’ reports of the repetitive nature of the items on the long form heuristic evaluation guidelines indicated the need for a short form heuristic evaluation checklist. Below we describe the short form heuristic evaluation checklist. Table 3 shows the short form heuristic evaluation checklist items.

Table 3. Short form heuristic evaluation checklist for mobile clinical decision support

Below are the 30 heuristics that comprise our proposed short form heuristic evaluation checklist for mobile clinical decision support apps. These heuristics fall into 8 of the 11 original categories from the long form heuristic evaluation checklist. Categories not represented from the long form checklist are: 5. Error management, 9. Help and documentation, and 11. Privacy. These categories were cut because they were deemed less appropriate for early stage development. Evaluator responses within these sections were purely speculative at this stage of app development, and the heuristic items are highly dependent on the use of the specific app and knowledge of future use, rather than the app itself. For example, among the unfamiliar group of evaluators, no heuristic item in the 5. Error management category yielded a severity rating other than 0. The most severely rated heuristic by familiar evaluators was “Does the system provide foolproof synchronization with a PC and/or secure wireless data transfer to other information systems?” and responses were based solely on insider knowledge of plans to integrate UTIDecide into an electronic health record, not on the capabilities of the prototype app at time of evaluation. For early stage development, use of these categories risks app evaluation based on discussions and plans for dissemination rather than implemented features and form factor of the app itself. Further, privacy questions are not applicable to early stage mobile clinical decision support apps, especially given mandatory HIPAA-compliance for live deployments of mobile clinical decision support apps in a clinical setting.

4 Discussion

The aim of the broader study was to develop an improved UTIDecide application. Our heuristic evaluation revealed several limitations of the long-form checklist method and the need for a short form heuristic evaluation checklist. However, despite redundancies of similar long form heuristic evaluation items and fatigue resulting from repetitive application of these items, our heuristic evaluation identified important design flaws that were corrected in the UTIDecide mobile clinical decision app prototype. Thus, heuristic evaluation efforts successfully improved the app as intended. Beyond this aim, our results revealed that a short form heuristic evaluation, or a “clinical heuristic evaluation,” could potentially improve efficiency in gaining input from clinical experts and other potential end users with limited time and familiarity with user-centered design processes. The potential utility of the clinical heuristic evaluation is improved ease of use, reduction in evaluation time, standardization of usability inspections for mobile clinical decision support apps, and simplification of evaluation results synthesis.

We found no significant differences between clinical and non-clinical evaluators in our interdisciplinary project design team. Differences in results were instead dependent on familiarity of the evaluator with the application. This suggests that project managers should carefully consider the make-up of usability inspection teams to anticipate the effect of groups of familiar evaluators and unfamiliar evaluators. In addition, evaluations by a mixed group of familiar and unfamiliar evaluators may require a breakdown of results based on their backgrounds. Beyond the development of a mobile clinical decision support application such as the UTIDecide app, a short form heuristic evaluation checklist may have utility in the creation of evidence-based practice guidance for practitioners in other fields, such as public health practice and disease surveillance.

4.1 Limitations

This study has several limitations. First, the short form heuristic evaluation checklist was developed based on a single mobile clinical decision support app project, which may limit generalizability. Second, the short form heuristic evaluation checklist may not transfer to mobile CDS apps used in different care contexts due to patient needs, practice norms, and/or technology familiarity. Further testing of the short form heuristic evaluation checklist as applied to other mobile clinical decision support apps implemented in different care contexts is needed to assess and validate its generalizability. Despite these limitations, we believe that the proposed short form heuristic evaluation checklist can be valuable to researchers who do not have time to synthesize and adapt the usability inspection literature yet want to quickly engage clinical end users during early stage design efforts to develop mobile clinical decision support apps.

5 Conclusion

In order to successfully design mobile clinical decision support apps that can augment cognition in clinical workflows, project design teams must engage clinical end users as part of the early design process. Evaluators using long form heuristic evaluations are certainly able to rate the visibility/aesthetic, controls and consistency of a mobile app; app memorability and learnability; and app fit within workflows in real-world settings. However, post-evaluation analysis and evaluator reports of the effect of the long form heuristic evaluation checklist used in this project revealed evidence of evaluator fatigue that could potentially decrease the likelihood of future evaluator involvement. For early stage development projects, project design teams need evaluation tools with better ease of use and lower time commitment to engage time-limited clinicians and solicit feedback. A short form heuristic evaluation checklist for mobile clinical decision support apps shows promise as a tool to more quickly and efficiently conduct usability inspections with interdisciplinary groups of evaluators who have differential experience with the app under development. Future work will include applying the short form heuristic evaluation tool to new versions of the UTIDecide app, as well as other mobile clinical decision support apps and health-related apps. These efforts should include formal assessment of reviewer fatigue and perceived ease of use and usefulness of the evaluation instrument. In the long term, future versions of UTIDecide and other mobile clinical decision support apps may use onboard device sensors to monitor user state and practice patterns to deliver tailored decision support guidance. Likewise, heuristic evaluation efforts may be enhanced by using onboard sensors during the app design process to capture evaluator state and identify potential improvements that result in more efficient heuristic evaluation checklists.

5.1 Author Contributions

The first author conceptualized the study and participated in all phases of the project. The first and second authors developed the short form heuristic evaluation checklist based on quantitative and qualitative analysis of evaluation results. All authors contributed to usability inspection of the mobile clinical decision support app and writing of the manuscript.