We performed a comparative analysis with two other programming interfaces that support creation of multiverse analyses. We implemented the same multiverse analysis of the hurricane dataset using each of these tools to highlight the different design choices in each of these tools. We provide the specifications in our supplement, and below we describe relevant differences in the syntax and semantics of each package. Specifying the same multiverse analysis in each tool helps to ground our evaluation of all three packages, including multiverse.
4.2 Summary of Evaluation Using Cognitive Dimensions of Notation
We draw on existing evaluative frameworks in HCI to compare the design of
multiverse, Boba, and mverse programming interfaces.
6 The
cognitive dimensions of notations [
6,
24,
32] break down representational systems (e.g., programming libraries) into orthogonal considerations about how well they support user reasoning; and the
gulfs of execution and evaluation [
28,
44] address mismatches between a user’s mental model and the system’s affordances for expressing a user’s intended analysis. Since our focus is on designing notations to support reasoning about and constructing multiverses, we concentrate our analysis on the API and environments for these tools. Three of the authors each independently evaluated the usability of the tools they were most familiar with. This qualitative coding involved systematically going through the cognitive dimensions of notations and noting anything relevant about a particular tool. The authors then met to discuss, compare, and synthesise notes. We report on our main findings below.
Progressive evaluation (how the user checks work in progress) and Provisionality (level of premature commitment to actions):
Checking work in progress in a multiverse analysis involves executing analysis code, and the way that users do this is restricted by their computing environment. In
multiverse and mverse, users can build up an analysis by adding decision points in a computational notebook and execute provisional parts of a data analysis which updates variables and data structures in the RStudio session. This supports iterative workflows [
31,
32] (
D1). In contrast, Boba requires users to author their analysis in a template file, then compile and execute from the command line. Evaluating work in progress requires the user to open a new session and run scripts representing individual universes, making it cumbersome to check provisional parts of code during the authoring process.
Consistency (similarity of syntactic representations for semantically similar operations) and Closeness of mapping (how well notations represent the application domain).
Both
multiverse and Boba conceptualise multiverse construction as consisting of two steps—a multiplexing step to declare alternatives, and a pruning step to declare incompatible combinations of analyses.
multiverse provides a single core operator for each (
branch and
%when%), and its syntax stays close to existing syntax and conventions in base R. In Boba, the multiplexing step can be represented using two syntactic forms, JSON and code blocks (see §
4.1.2); termination of a code block requires the user to declare an additional code block; and there is no support for the reuse of parameters for conceptually similar decisions. These raise potential issues of syntax consistency. Boba’s syntax aims to represent decision spaces more broadly, regardless of programming language or execution environment and thus preserves closeness of mapping to the decision tree itself, which is expressed using a JSON structure. mverse conceptualises multiverse construction through analogs of familiar tidyverse and R functions, therefore preserving consistency with analogous functions.
Error proneness (invitations for users to make mistakes or lack of protection against mistakes) and Hard mental operations (level of cognitive load):
Error proneness in each tool reflects unintended consequences of design choices that are otherwise well-motivated. The
%when% syntax in
multiverse impacts every instance of a given analysis option in a multiverse specification, not just code in the apparent scope of the condition; this may not be intuitive to all users. mverse users interact with wrapper functions (§
4.1.1); while this makes the syntax less flexible and expressive, it should reduce error proneness because the operation of each function are specialised. Boba requires the user to write their template code in one editor but evaluates universe scripts in separate R environments, which requires the user to switch between different editing and execution environments—text editor, command line and R or RStudio IDE. This can involve greater cognitive load and create opportunities for errors.
Gulf of execution (how difficult it is to express intended operations with a tool):
In multiverse, the branch operator allows users to replace any sub-expression in R to declare alternative analyses, but users have to determine how to multiplex over every type of operation they wish to employ. Similar challenges may be encountered in Boba, which uses text-substitution. mverse is limited by the existence of analogous functions for the task the user wishes to perform.
Gulf of evaluation (difficulty interpreting whether a tool is behaving as a user intends):
Gulfs of evaluation arise when debugging or validating that a multiverse worked as intended. We expect a larger gulf of evaluation when this process is significantly different from the standard workflow of debugging individual paths. multiverse generates a tree structure of nested R environments which share their scope insofar as different universes share analysis code, a design choice meant to reduce runtime by eliminating redundant computations. This is different from usual program execution in R and thus different from the mental model of running code that an R user might have. Because of this execution process, debugging can be difficult. In contrast, Boba creates different execution environments for each individual universe script, executes them, and collates console logs and outputs the data from each universe. This involves processes that might be more familiar to a typical R user. As a consequence, errors are easier to reproduce by running universe scripts in an R session, making it easier to assess whether the implementation matches one’s intention.