Keywords

1 Introduction

In the recent decade, intuitive interaction has been developing from a buzzword used by designers and marketers when talking about interactive technology to a specific field of HCI [1,2,3]. Due to the increasing digitalization of life and work, people with a wide variety of backgrounds and prior knowledge of technology need to interact with a wide range of electronic products on a daily basis. New electronic products sport advanced technology, and many inbuilt functions and services [4]. Designing for intuitive interaction of products, services and systems is one way of dealing with an aging society and could also be a step towards a more inclusive society. Intuitive interaction occurs when technology is familiar [5], when users do not have to think about their interactions, when they can use a system based on their gut feelings and when they do not have to explicitly learn how to use the system [6]. Formal definitions state that intuitive interaction allows users to effectively use products, services or systems based on the effortless and subconscious use of prior knowledge [7,8,9,10]. As intuitive use covers effectiveness, mental efficiency and satisfaction, it clearly describes pragmatic aspects of user experience, but it also includes hedonic aspects, for example, when the interaction is described as “magical” by the users (cf. [11]).

Even though intuitive use itself is difficult to measure because of its subconscious nature, its preconditions and consequences can be measured. Several methods for the evaluation of intuitive interaction have been proposed that examine in what way the interaction with a system is based on the subconscious application of prior knowledge. These methods often involve user observations [12] or the use of questionnaires [12, 13]. Other methods that assess the subconscious prior knowledge include the analyses of product familiarity [4] or image-schematic metaphors [14]. While these approaches are often very detailed, they are cumbersome to apply in practice and do not scale well to using more complex software and larger numbers of participants. To address these problems, his paper introduces CHAI, a simplified coding scheme for assessing intuitive use through user observation. The method is based on the observation method of [12]. To assess the usefulness and validity of CHAI, this paper also reports an empirical study that compares it with other measures of intuitive interaction.

2 Intuitive Interaction

Researchers across the world agree that prior experience is the leading contributor to intuitive use and that it is linked with technology familiarity [6, 8, 15,16,17,18,19]. Originating from an industrial design background, Blackler and colleagues from the Queensland University of Technology (QUT) in Australia stated the following definition based on an extensive review of the literature on creative thinking, intuitive decision making, memory and expertise, consciousness studies and education:

Intuitive use of products involves utilizing knowledge gained through other experience(s). Therefore, products that people use intuitively are those with features they have encountered before. Intuitive interaction is fast and generally non-conscious, so people may be unable to explain how they made decisions during intuitive interaction [1, 8, 12].

The IUUI (Intuitive Use of User Interfaces) research group in Germany based their definition of intuitive use on a literature review of usability design criteria [20] and a series of interviews and workshops with users, and usability experts as well as an analysis of how software producers describe their products:

Intuitive use is the extent to which a product can be used by subconsciously applying prior knowledge, resulting in an effective and satisfying interaction using a minimum of cognitive resources [3].

According to Blackler and Hurtienne [21] both definitions have in common that the subconscious use of prior knowledge (although it is phrased differently in each strand of the literature) serves as the prerequisite for intuitive use to happen. The definition of IUUI has additional requirements that relate intuitive use to the international standard ISO 9241-11, which provides criteria for the evaluation of usability: effectiveness, efficiency and satisfaction. Thereby, intuitive use fulfils the requirements of effectiveness, satisfaction and especially focuses on the cognitive (not motoric or temporal) efficiency of interaction. This is in line with recent psychological research in dual-process theories suggesting that the minimal demand on mental effort may be the primary feature to distinguish between system-1 (subconscious, automatic) and system-2 (conscious, controlled) thinking that underlies intuitive and non-intuitive use [22, 23]. The fastness of intuitive interaction as in the definition of [12] is considered as a typical correlate of the minimal demand on cognitive resources and the related enhancement of information processing speed [21, 22]. As a consequence, intuitive use can be assessed by examining the possibility of the subconscious application of prior knowledge when using technology. As this approach focusses on the precondition for intuitive use, another approach would be to focus on the consequences of intuitive use, especially measuring cognitive efficiency as the decisive factor (see Fig. 1).

Fig. 1.
figure 1

Preconditions and consequences of intuitive use.

One instrument that measures the subjective consequences of intuitive is the QUESI (QUEStionnaire for the subjective consequences of Intuitive use). It consists of five subscales, which are derived from the above definition of intuitive use by the IUUI group [13]:

  • Perceived mental effort (QUESI-M): Items in this subscale were derived from the precondition of the subconscious application of prior knowledge and its consequence of cognitively efficient interaction.

  • Perceived achievement of goals (QUESI-G): Items in this subscale were derived from the consequence of effective interaction.

  • Perceived effort of learning (QUESI-L): Items in this subscale were derived from the precondition of prior knowledge. When a user interface is based on the prior knowledge of its users, their effort of learning should be low when using the system for the first time.

  • Perceived familiarity (QUESI-F): Items in this subscale were derived from the precondition of prior knowledge. When user interfaces are based on prior knowledge, they should lead to a higher familiarity in using the system.

  • Perceived error rate (QUESI-E): Items in this subscale were derived from the consequence of effective interaction.

As cognitive efficiency can be seen as the distinctive consequence of intuitive use, mental effort questionnaires like SMEQ “subjective mental effort questionnaire” [24] or its German equivalent, SEA [25], can be used to assess the consequences of intuitive interaction. The SEA questionnaire, for example, consists of a vertical scale, on which users mark their experienced amount of mental workload between 0 and 220. In contrast to QUESI, which is focused on the system level, SEA allows to estimate intuitive use on the system and task levels. Although QUESI and SEA have been successfully applied in different contexts like GUI, gestural interaction and gaming [26,27,28], these are still subjective measures with all the associated implications like social desirability response bias. In contrast, the observational measure provided by the QUT researchers offers a more objective measure of intuitive use that can be applied on the system and task level alike and focuses on the precondition side of intuitive interaction. Besides video recording of the interaction with a product, the framework requires users to think aloud and a experts to meticulously analyze video recordings to estimate the probability of the subconscious application of prior knowledge [12]. The video coding uses a set of heuristics that we introduce in the following.

3 Development of Intuitive Use Heuristics

This section provides an introduction to the original heuristics of Blackler [12] and demonstrates how these heuristics informed the development of CHAI.

3.1 Original Coding Heuristics for Intuitive Use [12]

Through multiple experiments, the QUT research group developed a coding scheme based on heuristics inspired from the literature. The scheme is designed to assess whether prior knowledge is applied subconsciously and thus whether the precondition for intuitive use is met. For this purpose, users performing a set of tasks with a product and their concurrent thinking aloud is video-recorded [1, 12, 29, 30].

According to Blackler [12], the coding of the observed interactions can be done either feature- or event-based. A feature is a part of a product, which is distinct from others, has its own function, location and appearance and can be designed separately from other features (e.g., a print icon on software, a shutter button on a camera). A task is comprised of a certain number of different events and each event requires one or more interactions to complete. For instance, “entering age” is an event in a greater registration task and it includes multiple button click interactions. It is associated with a certain text box, which represents a feature. Then, all participants’ interactions have to be coded either feature- or event-based. The coded variables are the correctness and type of use (see Fig. 2).

Fig. 2.
figure 2

The original coding heuristics used by Blackler [12].

Correctness of Use

A “correct use” is considered to be one that is correct for the feature (e.g., shutter button) or event (e.g., button click) and also correct for the associated task or subtask (e.g., to disable an alarm clock via a click on the snooze button). A “correct for feature but inappropriate for task use” is taken as one when it is correct for the feature or event but not for the task or subtask. Or, put another way, a user knew what he wanted to do and used the right feature in doing so, but it was the wrong thing to do at the moment (e.g., the task required the user to set an alarm to wake up early, but he pressed the power-button instead and thus disabled the alarm clock). “Incorrect uses” are wrong for feature or event, task and subtask. “Attempted uses” are taken when the product did not detect the use due to product failure (e.g., a user’s touch that is not registered). When users get help from inbuilt help functions, product labels or the researcher, the use is counted as non-correct as well. However, this large number of graduations seem unnecessary with regard to the definition of intuitive use that only requires to assess whether the interaction was correct or not (see [3]).

Type of Use

As mentioned before, the subconscious application of prior knowledge serves as the precondition for intuitive interaction. In order to determine which uses are associated with the subconscious application of prior knowledge and thereby are likely to be intuitive, a series of main indicators is employed to assess the type of use. These indicators (e.g., fastness) are strongly related to the typical correlates of subconscious processing according to dual-process theories (see [22]).

Fastness

If users are able to locate and use a user interface element moderately fast, the associated feature or event could be coded as an intuitive use. Intuition is correlated with fastness of action initiation and actual interaction [12, 22, 31] and time to make a move can be used as a measure of thinking time [32]. When a user spends too much time in exploring other features that amounts to more than five seconds, that specific step is likely to be non-intuitive – according to Blackler [12].

Relation to Past Experience

During the concurrent thinking aloud, users would sometimes mention that a feature reminds them of something they used or have seen before which shows the evidence of existing prior knowledge. When there is a link to prior experience, it is likely that their action is intuitive.

Expectation

Due to the fact that prior knowledge is the prerequisite for intuitive interaction, it is also used to form expectations [12, 33]. When a user has explicitly worded a clear expectation that a feature would perform a particular function during the concurrent thinking aloud, his or her action is therefore likely to be intuitive.

Certainty of Correctness

Research suggests that intuitive use is accompanied by a certainty of correctness or confidence in a decision [12, 31]. Degree of confidence has been used in experiments as an index of intuition [12, 34, 35]. When participants seem certain about the function of a feature or event (even though they were not always correct) and were not just trying it out, it is likely that their action is intuitive [12].

Conscious Reasoning

Since the application of prior knowledge has to happen without conscious reasoning, the less reasoning was evident for each use, the more likely it can be counted as intuitive. Therefore, participants processing intuitively would not verbalize their reasoning during their actions while thinking aloud [12].

Blackler [12] suggests to code a use only as intuitive when the use was marked as a “correct use” and at least two of the five type-of-use indicators were available. This coding scheme was applied (sometimes with minor adaptions) by experts to code interactions with products such as remote controls, alarm clocks, microwaves, cameras, tangible interfaces and cars [12, 29, 36]. However, even in these systems with a relatively low level of complexity and a small number of possible interactions per task, the effort for coding is quite high. Another concern is that the need to provide a verbal protocol introduces a secondary task, which induced additional cognitive load and thus affect intuitive interaction (cf. [37]). In addition, the original coding scheme is rather vague and might lead to potentially false positives for “intuitive use” when only two indicators out of five are needed to code an event or a feature as “intuitive”. For instance, imagine a user who does not speak very much (indicator of conscious reasoning is fulfilled) and is generally fast (<5 s) in clicking during the use of a feature or event (indicator of fastness is fulfilled). Then this feature or event would be mistakenly counted as intuitive, even when other heuristics like certainty of correctness are not fulfilled. For this reason, we suggest an adaptation of the coding scheme to suit more complex systems (e.g., websites, 3D modelling software) and remove uncertainties about the epistemic status of an intuitive use.

Naumann et al. [38] assume that the intuitive use of more complex technical systems is best measureable at the level of single interactions. In accordance with Blackler [12] interactions form the basic parts of an event. Depending on the user interface type, a single interaction can be regarded as a mouse click, touch event or manipulation of a tangible object. Accordingly, technical systems should be perceived as more intuitive, the higher the proportion of intuitive interactions. As the thinking aloud required for the original coding scheme may have an influence on the level of consciousness during the product use and as a secondary task could add to cognitive workload [37] the adapted CHAI heuristics were designed to work without the requirement of thinking aloud.

Inspired by the correctness and type-of-use indicators of Blackler [12], an interaction should be coded as intuitive when it is correct, fast and certain (see Fig. 3). Since the original heuristics of “relation to past experience”, “conscious reasoning” and “expectation” are only accessible through verbal protocol, they were not considered in the new set of heuristics. Since a thinking-aloud protocol has not to be analyzed anymore, the overall coding effort is reduced using CHAI. Due to the fact that “fastness” and “certainty of correctness” always need to be present to code an interaction as intuitive, the new coding scheme is more conservative than the original one and might lead to less false positives. The adaptions in detail are explained in the following chapter.

Fig. 3.
figure 3

CHAI: The refined coding heuristics to enable assessment of intuitive interaction.

3.2 CHAI: Coding Heuristics for Assessing Intuitive Interaction

Correctness of Use

We slightly simplified the conditions for the correctness of use by reducing the number of categories from five to four. A “correct interaction” is considered to be one that is correct for the user interface element (e.g., menu bar) and also correct for the associated task or subtask. “incorrect interactions” are wrong for the user interface element, task or subtask. “attempted” interactions are taken as ones, when the system does not register the use due to product failure (e.g., a touch event that is not registered). “Getting Help” interactions are, when the users are looking for help in the system or are provided with clues from the researcher. A “correct interaction” is only considered as an intuitive interaction, when the type of use indicators “fastness” and “certainty of correctness” given.

Type of Use

Fastness

Here, the time interval of five seconds in the original heuristic was shortened to avoid disambiguates in interpreting an interaction as intuitive. We used Keystroke Level Modeling (KLM) to determine a suitable time threshold. KLM is a simple cognitive modeling technique that is used to estimate the times expert users need to accomplish a routine task using an interactive system. The estimations are done on the basis of generalized estimates of the associated cognitive and low-level motor operations. A cognitive operation like the user’s decision where to click, for example, is estimated at an average of 1.2 s. The actual pointing of the mouse and clicking is estimated with 1.3 s. Homing the hands between keyboard and mouse is estimated with 0.4 s. Taking all these actions together, a routine click under normal circumstances should not exceed 3 s. Thus, if a user needs more than three seconds for an action, he is likely not acting intuitively based on his prior knowledge [39,40,41,42]. Due to the fact that elder adults use technology less intuitively and show in general slowed down perceptive, motoric and cognitive processes the “fastness” heuristic need to be adapted in this case [4]. When the heuristic is used with an another input device (e.g., tangible) the motoric part of the heuristic has to be adapted as well (see [41]).

Certainty of Movement

At this point, the determination of the user’s interaction was assessed by an expert via looking at his hand or mouse movements. When participants seemed to move their mouse cursor or hand in a targeted manner towards the user interaction element without interruptions, seeking or circular movements, it is likely that this interaction is intuitive [39].

The above heuristics have been initially applied by Horn [39] on two versions of a ticket machine. Interactions on a touch screen were recorded using a hand camera. The complexity and associated set of required interactions with the ticket machine represent already an increase in contrast to the products that have been evaluated before (e.g., remote control, alarm clock). The results showed significant positive correlations between the proportion of intuitive interactions (intuitive taps) and the total score of QUESI values, r(26) = .39, p < .05. The correlation shows that the interaction was perceived as more intuitive when there were more intuitive interactions identified by this method, however, the size of the effect was rather small.

The purpose of the following study described here builds on the results of Horn [39] and aims to investigate, whether the refined heuristics can be used to assess intuitive use in a rather complex system (i.e., 3D-modelling software). In this way, the following study examined the construct validity of the refined heuristics as a means to assess intuitive interaction for graphical user interfaces. To test the convergent validity, SEA and QUESI questionnaires were administered. For discriminant validity the total number of mouse clicks were calculated as a measure for motoric (physical) efficiency, which is not associated with the construct of intuitive use [3]. Correlation analyses were then carried out to support the validity of the proportion of intuitive clicks as a measure for intuitive use. Specifically, we expect higher correlations with SEA and QUESI than with number of clicks.

4 Methodology

4.1 Participants

A sample of 20 undergraduates was recruited from a German university campus. Four participants had to be excluded from data analysis due to not completed printed out questionnaires (two participants) and broken video footage (two participants). The remaining sample includes 16 participants (12 women, 4 men). The age of the participants ranged from 18 to 28 years (M = 21.22, SD = 2.39). All participants were students of human-computer interaction or media communication. None of the participants reported any level of prior experience with 3D modelling tools in general.

4.2 Apparatus and Measures

Participants used SketchUp Make 2017 in this experiment, a 3D modelling software (see Fig. 4). This software was chosen, because we wanted to test the heuristics in contrast to previous studies in a more complex environment covering interaction styles (e.g., clicking) that are widely used in industrial software. According to Akers et al. [43] creation-based software like 3D modelling software is very difficult to test and thus could provide a good benchmark for testing the refined heuristics. The experiment was conducted in a laboratory setting and recorded with the screen capture software Morae Recorder including mouse trajectories and click events. Mouse trajectories needed to be recorded to assess whether the heuristic “certainty of correctness” was given or not. In order to evaluate “fastness”, experts judged whether each click (i.e., action is initiated when the cursor starts to move) was within a three second timeframe starting from the last click or pause if no click happened. The very first timeframe was analyzed from the beginning of the task. For the video analysis with the refined heuristics we used Morae Manager. A demographic questionnaire captured gender, age and items on prior experience inspired by the technology familiarity questionnaire of Blackler [12], namely the frequency of using 3D modelling software and the diversity of software features used. The QUESI was completed after each session and participants rated their mental effort using the SEA scale after each task.

Fig. 4.
figure 4

This screenshot of SketchUp shows the 3D model used in the rotation task, in which participants were instructed to rotate the chair by 180° along its vertical axis.

4.3 Procedure

Upon arrival at the laboratory, participants were given a brief scripted verbal overview of the session. Then the participants were instructed to fill in the demographic questionnaire. Participants were then instructed not to talk and informed that there will be a retrospective interview utilizing captured video footage.

Then, a 3D model, representing a furnished bedroom, was preloaded in the application. The participants were given a scenario with three tasks. One task included the measurement of the height of a door (measuring task), another dealt with repositioning a bedside lamp (positioning task) and a third task required the rotation of a chair by 180° (rotation task). Thereby, each task required selecting and using basic features for 3D navigation and object manipulation that could be accessed via the toolbar in a logical, independent and goal-oriented manner. Each task consisted of three major parts: the adjustment of the current viewport in order to spot the corresponding object and target, the selection of the correct tool from the menu and, finally, the correct application of the tool.

All tasks were presented one at a time in printed form. The order of tasks was randomized across participants. Each task was designed to be solved within one minute by an expert user. The participants were not informed exactly how long it takes to solve the tasks and were not given any practice. This was necessary to meet our experimental aim to evaluate intuitive interaction using the refined heuristics and thereby allowing us to assess how participants apply their prior knowledge rather than how quickly they could be trained to use the application [12]. Once the participant read a task description and confirmed their understanding, they were asked to start the allocated task. If participants could not solve a task within five minutes, the experimenter asked participants to stop. Participants were then asked to rate their cognitive effort during the task on a SEA scale [25]. Then, participants read the description of the next task and the above process was repeated for the second and third task. After finishing the third task, participants completed the QUESI questionnaire [13]. Finally, participants were debriefed and thanked for their participation. The experiment took about twenty minutes in total.

4.4 Data Analysis

The video footage was coded by two independent raters using the CHAI coding heuristics for assessing intuitive interaction as outlined above. Because our goal was to examine whether the video-recorded interactions were intuitive or not, we also developed a process to apply the heuristics to the videos as efficiently as possible (i.e. the decision tree in Fig. 5). Since interactions with the investigated 3D modelling software were best characterized by mouse events, we focused on mouse clicks (left or right) and mouse movements for our analysis. Figure 6 gives an example of a possible analysis.

Fig. 5.
figure 5

The decision tree used in the experiment for distinguishing between intuitive and non-intuitive clicks based on the CHAI heuristics.

Fig. 6.
figure 6

Both pictures show click events for the measuring task. (A) Shows a rather non-intuitive interaction: (1) the user paused for 6 s - one non-intuitive click; (2) the user clicked on an incorrect item within 3 s - one non-intuitive click; (3 & 4) user paused for 2 s and then clicked on the correct icon within 3 s - one intuitive click; (5) user clicked correctly and fast but not targeted - one non-intuitive click; (6) user clicked correctly, fast and targeted - one intuitive click. In total three non-intuitive and two intuitive clicks resulting in 40% intuitive interactions of the overall interaction. (B) Shows an intuitive interaction with the following coding (1) user clicked certainly and correctly within 1 s on the correct button - one intuitive click; (2) user clicked correctly and in a targeted manner on the measurement start point within 2 s - one intuitive click; (3) user performs the measuring task by dragging the cursor top down and then release the mouse button - one intuitive click. In total three intuitive clicks were performed resulting in 100% intuitive interactions of the overall interaction.

As an intuitive interaction requires a quick action interaction and therefore a delay in the performance indicates non-intuitive interaction, we looked at the first timeframe of three seconds to determine whether the participant was inactive (no action initiation; mouse cursor was not moved) or a click happened (see Fig. 5, First decision). The three seconds timeframe was chosen due to KLM estimates for a click interaction as described above in the introduction of the refined heuristics. As a delay in action initiation can be seen as a wasted opportunity to act intuitively and thus the criterion of fastness is violated, it has to be counted as a non-intuitive click every time it happens.

When a click took place within this timeframe, the next step would be to check if the click was correct or not (Fig. 5, second decision). Clicks were registered as correct when their outcome moved the participant closer to a valid system status associated with their task goal while not verbally calling on external assistance. For this purpose, the experimenter identified the “happy path” (cf. [44]), the basic course of action, as well as alternative paths. These paths were used to decide whether a click was targeted and adequate towards achieving the task goal. If the click was not correct in respect of these paths, the click was counted as non-intuitive (Fig. 6A, steps 3 and 4). Finally, if the click was fast and correct, the certainty of the mouse movement was examined (Fig. 5, third decision). When the mouse cursor’s path was evaluated as unsteady and serpentine by the experts, the click was not counted as an intuitive click (Fig. 6A, Step 5). If, instead, the cursor was moved in a straight path without major deviations, the certainty heuristic was fulfilled and the click was registered as intuitive (Fig. 6A, Step 6). Once a click was classified as intuitive or non-intuitive, the next three-second timeframe was analyzed and the above process repeated until the end of the recording was reached and all clicks of the participants were classified as intuitive or not.

In this way, all video recordings were coded by the two raters and then each click was discussed whether it was intuitive or non-intuitive. Before the discussion, the inter-rater reliability according to Cohen’s kappa was κ = .61, suggesting a substantial agreement beyond chance [45]. The discussion resolved any disagreements between the raters (e.g., judging the certainty of movement was very difficult sometimes). Finally, the ratio of intuitive clicks to all counted clicks was computed as a proportion on task level. Then these proportions were averaged across the three tasks to obtain a measure for each participant on the system level.

5 Results

Intuitive interactions in the form of intuitive mouse clicks were coded as a proportions of the total number of mouse clicks within each user observation and computed on the task and system level. In order to evaluate the convergent validity of the refined coding scheme, SEA values were calculated on the system and task level. QUESI values were available at the system level only. To evaluate the discriminant validity, the total number of clicks were calculated on the system and task level.

5.1 Assessment of Intuitive Interaction on the System Level

An average of QUESI scores, proportions of intuitive clicks and SEA scores were calculated and Pearson correlation coefficients were computed between the system scores obtained from the three instruments. Descriptive data is shown in Table 1 and correlation coefficients can be found in Table 2. All correlation coefficients were statistically significant (p < .05). The correlations showed that a high proportion of intuitive clicks resulted in a small cognitive workload and high subjective ratings of intuitive interaction.

Table 1. Descriptive statistics (system level).
Table 2. Pearson correlation coefficients between the proportion of intuitive clicks, number of clicks overall, QUESI and SEA scores on system level (*p < .05; **p < .01).

5.2 Assessment of Intuitive Interaction on the Task Level

An average of SEA, proportion of intuitive clicks and number of clicks were calculated on the task level (see Table 3). Then, Pearson correlation coefficients were performed between the task scores obtained from SEA, proportion of intuitive clicks and total number of clicks (see Table 4).

Table 3. Descriptive statistics (task level).
Table 4. Pearson correlation coefficients between the proportion of intuitive clicks, number of clicks and SEA scores on task level (*p < .05; **p < .01).

6 Discussion and Future Work

The main goal of this study was to introduce a refined set of heuristics that can be used to assess the probability of the subconscious application of prior knowledge in relatively complex interactive systems. The findings of this study show that measurements related to intuitive use show similar and acceptable correlations, indicating that the proportion of intuitive interactions as derived from applying the CHAI method shows a good convergent validity. Especially, the correlations between the proportion of intuitive interactions and the precondition-related QUESI scales Familiarity (QUESI-F) and Perceived effort of learning (QUESI-L) were significant supporting the assumption that CHAI can be used for the assessing the probability of the subconscious application of prior knowledge. Furthermore, correlations showed significant relations between the proportion of intuitive interactions and consequences of intuitive use like mental effort, which was measured using the SEA scale and the scale for subjective mental effort of the QUESI (QUESI-M). Regarding effectiveness, the Perceived Error Rate (QUESI-E) and the Perceived Achievement of Goals (QUESI-G) also show a relation to the proportion of intuitive interactions. The absence of correlations between the proportion of intuitive clicks and the total number of clicks suggests that CHAI specifically assesses cognitive efficiency rather than motoric (physical) efficiency. This demonstrates the discriminant validity of CHAI for assessing intuitive use.

Regarding the convergent validity of CHAI on the task level, the correlations with the SEA scores for the positioning and rotation task were positive and acceptable, while the correlations with the SEA scores for the measurement task did not reach significance. However, a general trend can be noted when looking at the effectiveness and the proportion of intuitive clicks (see Table 3). Further studies need to examine whether this trend remains and to find out what was the reason for the non-significant correlation by the use of a multidimensional workload measure like the NASA-TLX [46] instead of the SEA scale. To investigate the discriminant validity at the task level, the correlation analyses between the proportion of intuitive clicks and the number of total clicks showed no significant correlations indicating the discriminant validity of CHAI on the task level as well.

However, as with all experimental studies, there are limitations of this study, which can prompt future research in this area. First, the experiment was carried out using one particular software from one application domain with one input device (mouse). Results from this study should be validated across multiple software in different domains using other input devices (e.g., touch screens, gesture input). It can be expected that the KLM-derived threshold of the fastness criterion would need to be adapted to the specific type of technology used. Second, the manual coding using CHAI is still time-consuming and relies greatly on expert judgment. The inter-rater reliability of κ = .61 is satisfactory, but both raters coded only if a click was intuitive or not without stating which heuristics led to their decision. It would be interesting to see the inter-rater reliabilities of each of these heuristics. Third, the experiment was conducted in a German context with student participants having no previous experience with the software or similar 3D applications. Thus, the results should not be generalized to other populations before further validation. When designing for older users, we expect that the fastness criterion would need to be adapted as well. Finally, the tasks were simulated in a laboratory setting. Thus, any sense of urgency or other contextual responses that a user may experience in a real-setting may not arise here. A replication of the study in a real-setting should bring further validation.

In summary, with CHAI we introduced a set of coding heuristics for intuitive interaction that unlike previous heuristics [12] do not rely on concurrent thinking aloud and thus do not cause cognitive interference with the actual task. We also reduced the total number of heuristics and developed a more conservative coding scheme which should prevent false positives for intuitive interaction by checking whether all (and not only some of the) heuristics are fulfilled. Besides the effectivity, CHAI promises an enhanced efficiency in contrast to the coding scheme of Blackler [12], because the refined heuristics do not require to analyze the verbal protocol with respect to prior knowledge, conscious reasoning and expectations. Regarding validity, the present results demonstrate that CHAI can be applied to rather complex systems like 3D modelling software and can be used to assess intuitive use both on the system and task level. However, our sample size was too small to draw final conclusions and more studies are needed to confirm the results. Additionally, further work should provide direct comparisons between the original heuristics of Blackler and CHAI, regarding their reliability and validity as well as providing measures of their practicality in research and applied settings.

The future direction of this research should further examine the validity of the approach at the problem level and thereby check the potential of the method for qualitative data analysis as well. Since this work views intuitive interaction mainly from the pragmatic perspective (effectiveness and cognitive efficiency), additional research should check whether CHAI results are also related to hedonic properties of intuitive interaction (gut feeling, magical experience) as suggested by Ulrich and Diefenbach [11]. To conclude, in order to create products that are intuitive to use, designers should be able to not only assess whether a product or a task is intuitive or not, but also get qualitative information at the problem level and an easy way to prioritize these. The proportion of intuitive interactions could offer such an opportunity by enabling designers to spot likely non-intuitive user interface elements on the basis of their non-intuitive interactions across participants. In this way, we are prepared to meet the challenges of digitalization by enabling designers to evaluate the intuitive use with products, which is one step further towards designing interactive systems for users with different backgrounds and prior technology knowledge.