Reconstructing experiences with iScale

https://doi.org/10.1016/j.ijhcs.2012.06.004Get rights and content

Abstract

We present iScale, a survey tool for the retrospective elicitation of longitudinal user experience data. iScale aims to minimize retrospection bias and employs graphing to impose a process during the reconstruction of one's experiences. Two versions, the constructive and the value-account iScale, were motivated by two distinct theories on how people reconstruct emotional experiences from memory. These two versions were tested in two separate studies. Study 1 aimed at providing qualitative insight into the use of iScale and compared its performance to that of free-hand graphing. Study 2 compared the two versions of iScale to free recall, a control condition that does not impose structure on the reconstruction process. Overall, iScale resulted in an increase in the amount, the richness, and the test–retest consistency of recalled information as compared to free recall. These results provide support for the viability of retrospective techniques as a cost-effective alternative to longitudinal studies.

Highlights

► We present a survey tool for the retrospective elicitation of longitudinal UX data. ► iScale is rigorously grounded on competing theories of experience reconstruction. ► iScale employs graphing in imposing a structure in experience reconstruction. ► We report on two studies, a qualitative inquiry and an experimental study. ► iScale increased the amount, the richness and reliability of recalled information.

Introduction

Understanding the use and acceptance of interactive products beyond initial use has always been an interest of the human–computer interaction (HCI) community (Erickson, 1996, Prümper et al., 1992). However, two recent trends make the call for a more longitudinal view more urgent (Karapanos et al., 2009). First, legislation and competition within the consumer electronics industry have led to prolonged product warranties, resulting in an alarmingly increasing number of products being returned on the basis of failing to satisfy their users' “true” needs (Den Ouden et al., 2006). Second, products have become more embedded into services. Often, products are being sold for low prices or even given away for free and revenues stem mainly from the supported service and their prolonged use (Karapanos et al., 2009). Thus, the overall focus on product quality shifts from a focus on the classic phases of pre-purchase and purchase to a more longitudinal perspective, trying to better understand use and liking over time. This is a shift increasingly taken up by the HCI community (Gerken et al., 2007, Barendregt et al., 2006, Fenko et al., 2009, Karapanos et al., 2008, von Wilamowitz-Moellendorff et al., 2006, Courage et al., 2009, Vaughan et al., 2008, Kjeldskov et al., in press).

From a methodological perspective, one may distinguish three approaches to understanding the development of usage and experience over time (von Wilamowitz-Moellendorff et al., 2006): cross-sectional, pre-post/longitudinal, and retrospective reconstruction. Cross-sectional approaches are the most popular in the HCI domain (Prümper et al., 1992, Bednarik et al., 2005). Cross-sectional studies distinguish, for example, user groups with different levels of expertise, for instance, novice and expert users. Observed variation in experience or behavior is then attributed to expertise in the sense of a quasi-experimental variable. This approach is, however, limited as it is prone to confounding variables, such as failing to control for external variation and, more importantly, falsely attributing variation across the user groups to expertise. Prümper et al. (1992) already highlighted this problem, by showing that different definitions of novice and expert users lead to different results.

Beyond the cross-sectional, one may further distinguish pre-post and true longitudinal approaches. Pre-post designs study the same participants at two points in time. For instance, Kjeldskov et al. (in press) studied the same seven nurses, using a healthcare system, right after the system was introduced and 15 months later. Karapanos et al. (2008) explored how 10 individuals formed overall evaluative judgments of a novel pointing device, during the first week of use as well as after 4 weeks of using the product. While these approaches study the same participants over an extended period of time, they cannot tell much about the exact form of change, due to the limited number of only two observations. True longitudinal designs take more measurements and employ a number of statistical techniques to track change in general and to estimate the impact of particular events on change. Because of their laborious nature, however, they are only rarely used in practice and research.

Different granularities in longitudinal studies can be distinguished (see von Wilamowitz-Moellendorff et al., 2006): a micro-perspective (e.g. an hour), a meso-perspective (e.g. 5 weeks) and a macro-perspective, with a scope of years of use. Studies with a micro-perspective assess how users' experience changes through increased exposure over the course of a single session of use. For instance, Minge (2008) elicited judgments of perceived usability, innovativeness and the overall attractiveness of computer-based simulations of a digital audio player at three distinct points: (a) after participants had seen but not interacted with the product, (b) after 2 min of interaction and (c) after 15 min of interaction. An example of a study with a meso-perspective is Karapanos et al. (2009). They followed six individuals after the purchase of a product over the course of 5 weeks. One week before the purchase of the product, participants started reporting their expectations. After product purchase, participants were asked to narrate the three most influential experiences of each day. Studies with a macro-perspective are ‘nearly non-existent’ (von Wilamowitz-Moellendorff et al., 2006).

A third approach is the retrospective reconstruction of personally meaningful experiences from memory. Different variants of the Critical Incident Technique, popular in marketing and service management research (Edvardsson and Roos, 2001, Flanagan, 1954), ask participants to report critical incidents over periods of weeks, months or the complete time-span of the use of a product or service. In a survey study, Fenko et al. (2009) asked participants to recall their single most pleasant and unpleasant experience with different types of products and to assess the most important sensory modality (i.e. vision, audition, touch, smell and taste) at different points in time (i.e. when choosing the product in the shop, during the first week, after the first month, and after the first year of usage). von Wilamowitz-Moellendorff et al., 2006, von Wilamowitz-Moellendorff et al., 2007 proposed a structured interview technique called CORPUS (Change Oriented analysis of the Relation between Product and User) for the retrospective assessment of the dynamics in users' perceptions of different facets of perceived product quality. CORPUS starts by asking participants to assess the currently perceived quality of “their” product on a number of defined facets (usability, utility, beauty, stimulation, identification, and global evaluation). Subsequently, they are asked to “go back in time” and to compare their current perception and evaluation of the product to the moment right after purchasing the product. If change has occurred, participants are further prompted to indicate the direction and shape of change (e.g., accelerated improvement, steady deterioration). Finally, participants are asked to elaborate the reasons that induced the changes in the form of short narratives, so-called “change incidents”. The obtained data can be used quantitatively by constructing graphs of change (see Fig. 1 for an example) and qualitatively by exploring the reasons people give for changes.

A common critique of methods relying on memory is the degree to which recalled experiences are biased or incomplete. In the context of perceived product quality, we argue that this is of minor importance. While a given reconstruction from memory should be truthful (i.e., reflect what the participant really thinks), it seems less important, whether the reconstructed timeline as well as the reasons given are true (i.e. reflect what actually happened) as long as the participant is convinced that what she is reporting actually happened. This is because, we are foremost interested in subjective reconstructions because those (and not “objective” data) will be communicated to others as well as guide the individual's future activities. In other words, it may not matter how good a product is objectively, it is the “subjective”, the “experienced”, which matters (Hassenzahl et al., 2006). See also Norman (2009). To give a further example: Redelmeier and Kahneman (1996) found retrospective assessment of the pain experienced during colonoscopy to be biased. People put an extra weight on the most painful moment and the end of the examination. This has interesting consequences. One can, for example, deliberately prolong the examination (something not approved by the patients), but make sure that these last, additional 2 min are not painful. The consequence is an overall assessment of the examination as less painful compared to people without the additional 2 min. While this is clearly a bias, people simply have no memory for all the moments they experience, but will remember their overall impression of the examination. The retrospective judgment is more real to them than what actually happened. While the validity of remembered experiences may not be crucial, their consistency across multiple recalls is. It seems at least desirable that participants would report their experiences consistently over multiple trials. If recall would be purely “random”, the value of respective reports for design would be questionable. In other words, what we remember might be different from what we experienced; however, as long as these memories are consistent over multiple recalls, they provide valuable information.

In the area of critical incident research, interviewing techniques have been developed with the aim to assist participants in remembering more details of and contextual information around experienced critical incidents (Edvardsson and Roos, 2001). However, interviews in general, however, need substantial skills and resources. It, thus, seems desirable to create a self-reporting approach. Consequently, this paper presents iScale, a survey tool that was designed to increase participants' effectiveness in reconstructing their experiences with a product over time. iScale uses a graphical representation of change over time as a major support (i.e., time-line graphing). Other than previous approaches (von Wilamowitz-Moellendorff et al., 2006, Kujala et al., 2011), the employed procedure is more firmly grounded on theory, actually deriving variants of the procedure based on competing theoretical models of the retrospective reconstruction of episodes and experiences from memory. Graphing is assumed to support the reconstruction process through what Goldschmidt (1991) calls interactive imagery (i.e., “the simultaneous or almost simultaneous production of a display and the generation of an image that it triggers”). The idea of using graphing as an approach to introspecting on past emotional experiences can be traced back to Sonnemans and Frijda (1994).

We begin with laying out two different ways of obtaining retrospective reconstructions of experiences and their theoretical foundation. We then present the results of two studies. Study 1 acquired a qualitative understanding of the use of iScale in comparison to its analog equivalent (i.e. free-hand graphing). Study 2 assessed how iScale compares to an experience reporting tool without graphic support, which, can be seen as a control condition to assess the impact of iScale on participants' effectiveness and test–retest consistency in reconstructing experiences.

Section snippets

Reconstructing experiences from memory

Memory was for long understood as a faithful account of past events, which can be reproduced, when trying to remember details of the past. This idea was first challenged by Bartlett (1932). He described remembering as an act of reconstruction, which never produces the exact past event, but instead alters representation of the event with every attempt to recall. Bartlett (1932) asked participants to recall an unfamiliar story told 20 h earlier. The recalled stories differed from the original in

Study 1: Understanding graphing as a tool for the reconstruction of experiences

The first study aims at a qualitative understanding of graphing as a support for the reconstruction of experiences. It compares the two iScale tools to free-hand graphing.

Study 2: benefits and drawbacks of the constructive and the value-account version of iScale

While iScale appeared to be a viable alternative to free hand graphing, the comparative benefits and drawbacks of both iScale variants merited a second study. We compared the constructive and the value-account version of iScale to a control condition that entailed reporting one's experiences with a product without any support through graphing. We focused on the number, the richness, and the test–retest consistency of the elicited experience reports.

Conclusion and future work

This paper has presented iScale, a graphing tool to elicit change in product perception and evaluation over time. It took the general approach of retrospective reconstruction of users' experiences as an alternative to longitudinal studies. More specifically, the tool was designed with the aim of increasing participants' effectiveness in recalling their experiences with a product.

We created two different, theoretically grounded versions of iScale. The constructive iScale tool imposes a

References (55)

  • D. Blei et al.

    Latent Dirichlet allocation

    The Journal of Machine Learning Research

    (2003)
  • J. Cohen

    A power primer

    Psychological Bulletin

    (1992)
  • M. Conway et al.

    The construction of autobiographical memories in the self-memory system

    Psychological Review

    (2000)
  • Courage, C., Jain, J., Rosenbaum, S., 2009. Best practices in longitudinal research. In: Proceedings of the 27th...
  • E. Den Ouden et al.

    Quality and reliability problems from a consumer's perspective: an increasing problem overlooked by businesses?

    Quality and Reliability Engineering International

    (2006)
  • B. Edvardsson et al.

    Critical incident techniques

    International Journal of Service Industry Management

    (2001)
  • Erickson, T., 1996. The design and long-term use of a personal electronic notebook: a reflective analysis. In:...
  • A. Fenko et al.

    Shifts in sensory dominance between various stages of user-product interactions

    Applied Ergonomics

    (2009)
  • J. Flanagan

    The critical incident technique

    Psychological Bulletin

    (1954)
  • J.L. Fleiss et al.

    Statistical methods for rates and proportions

    (2003)
  • Forlizzi, J., Battarbee, K., 2004. Understanding experience in interactive systems. In: Proceedings of the 2004...
  • F. Fransella et al.

    A Manual for Repertory Grid Technique

    (2003)
  • Gerken, J., Bak, P., Reiterer, H., 2007. Longitudinal evaluation methods in human–computer studies and visual...
  • G. Goldschmidt

    The dialectics of sketching

    Creativity Research Journal

    (1991)
  • R. Groves et al.

    Survey Methodology

    (2009)
  • J. Gutman

    A means-end chain model based on consumer categorization processes

    The Journal of Marketing

    (1982)
  • M. Hassenzahl

    The interplay of beauty, goodness, and usability in interactive products

    Human–Computer Interaction

    (2004)
  • Cited by (49)

    • Value creation framework for tourist destinations based on designable evaluation network

      2022, Social Networks
      Citation Excerpt :

      Memory-based methods evaluate the experience of an object on the basis of long-term memory. UX curve (Kujala et al., 2011) and iScale (Karapanos et al., 2012) are typical examples of these methods. An advantage of these methods is that UX evaluation can be measured over a long period of time, and a disadvantage is that the burden on the information provider is large because long-term surveys are done.

    • Assessing long-term user experience on a mobile health application through an in-app embedded conversation-based questionnaire

      2020, Computers in Human Behavior
      Citation Excerpt :

      The eUX concept can be used to achieve a deeper understanding of the experience in the research and practice of human-computer interaction studies. The literature also contains other long-term UX assessment and reconstruction methods, such as UX Curve (Kujala, Roto, Väänänen-Vainio-Mattila, Karapanos, et al., 2011a, b; Vissers, De Bot, & Zaman, 2013), AttrakDiff (Hassenzahl, Burmester, & Koller, 2003; Walsh et al., 2014), the Experience Sampling Method (Hektner, Schmidt, & Csikszentmihalyi, 2007) and iScale (Karapanos, Martens, & Hassenzahl, 2012; Walsh et al., 2014). Kujala, Roto, Väänänen-Vainio-Mattila, and Sinnelä (2011a, b) explain that long-term UX assessments can occur through repeated episodes over a period of time in longitudinal studies or through retrospective assessments, in which the user is asked to reminisce about their long-term experience.

    • Exploring automated vehicle driving styles as a source of trust information

      2019, Transportation Research Part F: Traffic Psychology and Behaviour
      Citation Excerpt :

      At the same time as the ‘peak-and-end’ rule to some extent contradicts the idea of retrospective information retrieval, Karapanos, Martens, and Hassenzahl (2010) as well as Kujala et al. (2011) argue that such recalled memories are the most meaningful to the users and therefore important. The sketching of the trust curve itself may also be of importance in recalling memories, since earlier studies have shown that when sketching chronological curves participants recalled more experiences than participants using no form of sketching (Karapanos, Martens, & Hassenzahl, 2012). The number of analytic responses could be explained by the fact that the method allowed the participants to also reflect on the test run as a whole as well as how the individual situations shaped the overall experience.

    View all citing articles on Scopus
    View full text