Keywords

1 Augmented Reality as an Interpretive Technology

1.1 Immediate Apprehendability Supporting Art Engagement

In an ideal art engagement experience, the artwork provokes the visitors’ interest, raises their emotions, and leads to learning and further interest in the artwork. Tour guides, whether virtual or human, are intended to support visitors’ interpretation of art. Art interpretation refers to visitors’ engagement with and understanding of art [29]. Tour guides can mitigate the cognitive overload that results when visitors are exposed to overwhelming stimuli and distracted from the art [1, 15]. Enabling immediate apprehendability during tours can help reduce visitors’ cognitive overload [2, 22]. Immediate apprehendability refers to providing a focused stimulus for users that allows them to quickly understand an object’s properties without excessive effort, even when they experience it for the first time [1]. Immediate apprehendability provides users with quick access points into the artwork, enabling them to engage with the art by focusing on only some of its properties. Immediate apprehendability is mediated through affordances like labels or digital features on tour guides, which support users’ learning and their interaction with art [1, 14, 38].

Ideally, an art tour guide provides users with guidance, supporting a flow-style experience in which they feel a deep sense of engagement with the art. That flow is possible when the activity matches users’ previous skills and knowledge and they feel comfortable and well-oriented [1]. For such a flow to be achieved, the user must be in control of the experience, with the artwork as the primary focal point and the tour guide as the secondary focal point, allowing the user to experience both emotional and intellectual engagement with the art. Emotional engagement refers to feelings that the artwork provokes, while intellectual engagement refers to interest and learning that the artwork provokes [1, 2].

1.2 Augmented Reality as an Interpretive Technology in Museums

Interpretive technologies, including AR, are increasingly central in guiding visitors’ experiences in art museums. AR experiences are mediated by stationary multimedia kiosks [30, 31], wearable displays such as Google Glass [19, 28], reflective mirrors [27], projections on objects [5, 18, 37, 38], and handheld devices [10, 21, 26].

AR tour guides have been shown to enhance users’ learning in museum contexts. A user study of a vision-based mobile AR tour guide at the Taipei Fine Arts Museum showed that, compared to visitors using audio guides or no guides, visitors using an AR guide learned more and a had better flow experience. The AR guide also increased the amount of time visitors spent focusing on the paintings as opposed to other stimuli [8]. Studies also report increased engagement times during AR device use in science learning [17]. Previous work on AR as an interpretive technology indicates that designing a simple and effective user interface for AR tour guide applications is a persistent challenge. An interface must provide enough support for the user to navigate the system and receive the needed information while not distracting the user from the art [8].

AR has a number of definitions, ranging from showing related content based on mobile location [11, 20] to video see-through AR [3, 25]. This study focuses on video see-through AR, in which images are captured by a camera and displayed on a screen with augmented data overlaid on the objects. Video see-through AR applications recognize the object and track its position to correctly position the augmentations. In vision-based mobile AR, users point the camera of their mobile device at an exhibited object, and automatic image recognition provides more information about the object. One advantage of vision-based AR is that it is situated in the same visual field as the artwork; when using an AR tour guide, users do not need to shift their field of vision from the artwork to the tour guide or to a wall label next to the artwork [8, 10]. By pointing a mobile device camera at the artwork, the user can see overlaid information on the device’s screen while examining the artwork.

Despite increasing AR use in museums, there is a lack of knowledge about the impact of mobile video see-through AR tour guides on users’ engagement and interaction with art. Most of the work on video see-through has been focused on reporting technical advances in AR [26]. Controlled, between-subjects studies examining the impact of AR on the user experience are scarce, and though one such experiment shows that AR can enhance learning about art [8], other aspects of engagement with art mediated by AR remain largely unstudied. Art engagement includes not only one’s learning about art but also one’s emotional connection to art, enjoyment of art, and interest in art [1, 14]. There is also a lack of knowledge about the users’ interaction with mobile see-through AR applications for art. More knowledge is needed on how to design mobile AR tour guide systems that support immediate apprehendability in art engagement and offer a high level of usability. The usability of the tour guide can affect art engagement by either supporting or obstructing the user’s interaction with art.

To address these questions, we developed a novel AR tour guide system, Art++, with the Cantor Arts Center at Stanford University. Using the AR guide, we conducted an on-site user testing study in a between-subjects experiment to address the following questions: (1) How does a mobile-based video see-through AR tour guide application affect a user’s engagement with art, namely their learning of, interest in, emotional connection to and liking of art? (2) How does an AR tour guide affect a user’s interaction and experience in the art gallery in relation to the artwork, the tour guide, and the broader gallery context? (3) How usable is the mobile video see-through AR tour guide application in an art museum? (4) How should the user interface be designed to maximize usability and support immediate apprehendability in an art museum? By addressing these questions, this paper focuses on three aspects to examine the value and role of AR in art engagement: the user’s intellectual and emotional engagement with art, the interaction patterns between the user and the AR guide, and usability of the AR guide.

2 System Description, Methods and Data

2.1 Art++, an AR Application for Art Tours

The AR tour guide developed for this study, Art++, is a video see-through AR guide based on image recognition for tablet devices. When a user points a tablet device at a painting, the application provides information about that painting by highlighting elements, augmenting perspective lines, and supplying textual content on the painting’s features. Art++ has three modes: resting, scanning, and reading. The resting mode displays the camera view, as shown in Fig. 1a. Paintings are recognized at runtime using an image retrieval algorithm. After a query frame is captured, visual features (SURF) [4] are extracted and aggregated into a compact low-dimensional signature (REVV) [9] that retains most of the visual information of the image. The geometric consistency of the matches is ensured via a robust RANSAC procedure [13], which is a very common step in image matching [7].

Fig. 1.
figure 1

Screenshots of the six views in Art++: (a) camera view, (b) parent layer in mode A, (c) parent layer in mode B, (d) point of interest child layer, (e) overlay child layer, (f) reading mode. (Color figure online)

When the painting is recognized by the system, the scanning mode is activated and the application provides information with video see-through overlays. The two AR tour guide designs tested in this study differ in the parent layer. The design in condition A, shown in Fig. 1b, uses button icons to switch to a child layer. These buttons are static on the screen and give the title of the layer. Condition B, shown in Fig. 1c, uses red dots that appear directly on the view of the painting, requiring the user to touch the dot to enter a child layer. These markers give the location of the element about which information is provided, but they do not give the title of the layer.

There are two types of child layers in the application. The point-of-interest layer, shown in Fig. 1d, draws the user’s attention to a specific part of the painting by desaturating and darkening the background, while the overlay layer, shown in Fig. 1e, superimposes an image onto the painting itself. The visual information is updated in real time to align with the real-world content and is complemented by a text paragraph shown in a layer on the side. The user can switch from scanning mode to reading mode by positioning the tablet as if reading a book (Fig. 1f), and the screen will show a reference image of the recognized artwork instead of the live images from the camera. The other interface elements remain the same; the user can switch between layers as in scanning mode, using buttons (A) or touch markers (B) to go from the parent layer to any child layer and using the “back” or “check” buttons to return from a child layer to a parent layer. The switch between reading and scanning modes is based on the orientation of the device, which is accessed by the program via data from the Android’s accelerometer sensors.

The application was developed for and with Cantor Arts Center at Stanford University with the goal of finding a meaningful way to provide visitors with information about the museum’s art. The development and testing of an AR application was part of the museum’s strategy to identify ways of meaningfully integrating technologies into its visitor experience. Audio features were not developed at this stage, as the goal of this research was to develop a simple-to-use AR tour guide and examine the impact of AR on art engagement.

2.2 Experiment Design

The between-subjects experiment was conducted at the Cantor Arts Center in 2015. Forty-six participants were randomly assigned to use (A) one version of the AR application (Art++) in the art tour (Fig. 1b), (B) another version of Art++ (Fig. 1c), or (C) a book tour guide with the same information as the AR application. Condition A had 16 participants, and B and C each had 15. The participants were recruited on-site at the art museum, through museum member email lists, and through social media, and they each received a $20 gift card for their participation. The experiment was carried out on second-generation Android Nexus 7 tablets with 7-inch displays.

To examine the impact of the AR guide on user behavior, a book guide was used as a control to represent a traditional, analog tour guide, the booklet, which is widely used in museums. The book guide had the same information as the AR guide. Two versions of the AR application were tested to compare two designs for providing information to the user. The goal was to find the best design to support immediate apprehendability in art engagement to inform future designs of AR mediated art tour guides. The study was approved by the Institutional Review Board (IRB) at the museum’s parent university.

The experiment had three parts: pre-evaluation, the activity phase, and post-evaluation. The experiment began on-site at the museum as the subjects filled out a consent form and a pre-evaluation survey that inquired about their relationship to art and technology. The survey showed pictures of paintings that subjects would later see on the tour of the Dutch collection in the museum. Offering a 7-point Likert scale for responses, the following questions were posed in pre- and post-surveys to examine the users’ interest in, knowledge of, enjoyment of, and emotional connection to the paintings: “How interested are you in this painting?” “How much do you know about this painting?” “How much do you like this painting?” “How emotionally connected you are with this painting?”

In the activity phase, the users received an AR tour guide for the Dutch collection at the museum’s visitor service desk. The researcher gave them a prompt to find the collection and interact with the paintings in the collection as naturally as possible. The tour guide offered a full set of information on six of the 13 paintings in the gallery, while it listed only the name of the painting and the artist for the rest. The tour guide provided instructions on navigating to find the collection in the gallery, and the application included a tutorial that every user watched before starting the application. The activity phase was recorded on video, and users’ interactions with the application were screen-captured and recorded in activity logs built into the application. Each interaction (e.g., button pressed, painting scanned) was logged with a timestamp, thus recording analytics such as the duration of the tour and the paintings and layers viewed. On average, a tour lasted 15 min and 20 s in condition A, 14 min and 2 s in B, and 14 min and 10 s minutes in C.

Users were asked to narrate their thoughts, feelings, and motivations for behavioral decisions during the tour, following the think-aloud method used in design evaluation [35]. Two researchers observed the users’ behavior during the tour; one took notes as the other recorded the tours on video. In two occasions, there were three researchers on-site for researchers’ training purposes. The users were instructed to interact with the paintings as they normally would. The users were interviewed after their tours, and they filled out post-evaluation surveys at the research site that inquired about their interest in, knowledge of, enjoyment of, and emotional connection to the paintings they had seen in the gallery, and their experience with the tour guide. The post-survey also inquired about the usability of the tour guides. The usability measures were adapted from Brooke’s set of measures [6], following the example of Wein’s [33] study about the usability of visual recognition-based mobile museum applications.

2.3 Sample Profile

Of the 46 participants, 36 (78.3%) were women and 10 (21.7%) men. The largest group of users, one third (30.4%), were 55–64 years old. The second-largest group, almost one third (28.3%), were 18–25 years old. About one fifth (19.6%) were 35–54 years old, and another one fifth (17.4%) were 65 or older. Most of the users were working full-time (44%), a quarter (26%) were students, and one fifth (17%) were retired. The users’ professions varied, including information technology-, science-, and education-related positions. All users had a good proficiency in English, and 90% spoke English as their first language.

The majority of the participants were occasional museumgoers: over half (54%) reported visiting art museums and galleries only once or twice a year. A large portion of the users were very active museumgoers: 37% visited once a month or more often. Most users (85%) had previously visited at least once the museum where the experiment was conducted. The majority of users could be described as art lovers, as 89% stated that they “loved art” and 80% “enjoyed seeing art very much.” However, the users did not consider themselves very knowledgeable about art, even if they visited art museums often: almost half (48%) said their knowledge about art was moderate at best. They considered themselves to have even weaker knowledge about European art than art in general, as 69% said their knowledge in that field was moderate at best. The users were relatively technologically savvy. Slightly more than a majority (55%) used tablet devices frequently, whereas about a quarter (24%) used them rarely or never. Almost all (93%) used mobile touchscreen phones frequently. Most users liked using new technologies (75%) and found technologies easy to learn (65%).

3 Findings

3.1 Interest, Liking, Learning and Emotional Connection to Art

To examine changes in interest in, enjoyment of, knowledge of, and emotional connection to art, we analyzed the survey results by one-way ANOVA with Tukey’s HSD test as the post hoc method. When referring to “change,” we mean the change in the dependent variables between pre- and post-surveys. The results show a statistically significant difference between the three conditions in the change in liking of the paintings, as Table 1 shows; it reports the ANOVA results of the impact of the three conditions on the key dependent variables, considering all the paintings together. The differences in liking of the paintings were significant at the 0.01 level (p = 0.008). This indicates that the AR users’ liking of the paintings increased more during the tour than did the book users’. We did not find a statistically significant association between the dependent variables and independent variables such as age, relationship to technology, or relationship to art.

Table 1. The change between the pre- and post-surveys in the users’ liking of, interest in, and emotional connection to the paintings and the perceived and objective knowledge gain.

3.2 Objective and Subjective Knowledge Gain

We examined users’ learning with two sets of measures: one set for objective knowledge gain (actual learning) and another for subjective knowledge gain (the users’ self-perceived learning). Objective knowledge gain was measured by analyzing answers to the following question about each painting in both pre- and post-surveys: What do you know about this painting? List everything you know.” This question measured the baseline of the users’ actual knowledge about the paintings before the tour, and the same question was asked after the tour. We counted each new piece of information the users learned during the tour. For example, when a user said in the post-survey, “The crab is a symbol of vanity in life,” but did not note that in the pre-survey, the user would receive a point. The users learned the most information with the book, with an average of 2.5 points per painting, whereas with the AR application version A users earned 2.1 points and version B 1.9 points. The differences are significant at the 0.05 level (p = 0.017), as reported in Table 1.

To examine the reasons for the difference between learning with the AR guide and with the book, we also analyzed objective learning with each painting in each condition, as shown in Fig. 2. The users learned the least about the two paintings for which the guides did not supply further information. The difference in objective learning between the AR and book users was significant with those paintings, which is explained by the fact that the book users relied more on wall labels than the AR users did; the book users read the wall labels when they were curious to learn more about a painting, whereas the AR users largely ignored the wall labels, which we will elaborate on later in the Findings section.

Fig. 2.
figure 2

Objective knowledge gain per painting in each condition.

Self-perceived learning was measured on a 7-point Likert scale with the question: “How much do you know about this painting?” There were no statistically significant differences in self-perceived learning between the conditions. When we examined self-perceived learning in greater depth, we found a trend: the less the users knew, the more they thought they learned in all conditions. The users’ previous knowledge affected the amount of self-perceived learning in all conditions, as shown in Fig. 3. For each user and each painting, we recorded the integer-valued knowledge in the pre-evaluation survey and the change in knowledge, which are represented by differences between the pre-survey and post-survey. The plots in Fig. 3 show the values as blue tokens; the shade of blue is darker for the values that contain more occurrences. The red line is the result of a linear regression between the two quantities.

Fig. 3.
figure 3

Impact of users’ previous self-perceived knowledge on learning for (a) conditions A and B (Art++) and (b) condition C (book). The red line is the result of a linear regression between the two quantities. (Color figure online)

However, when analyzing the impact of the users’ previous level of objective knowledge as measured in the pre-survey on their objective knowledge gain, their previous level of objective knowledge did not affect their actual learning, as Fig. 4 shows. The AR conditions did not show any association, while the book condition showed only a very weak association. One potential explanation for the difference between the impact of previous level of knowledge on self-perceived and objective learning lies in user perception. For those who did not have much knowledge at the outset of the experiment, learning even one thing about the paintings may have felt like learning a lot, and they thus these users viewed their experience as a significant gain over their previous knowledge, increasing their self-perceived learning scores. However, when those who already knew a bit about art learned one thing, they perceived it as a lesser knowledge gain. In reality, those with lesser knowledge and more knowledge learned fairly equal amounts, as the objective knowledge values show.

Fig. 4.
figure 4

The impact of users’ previous objective knowledge on learning for (a) conditions A and B (Art++) and (b) condition C. The red line is the result of a linear regression between the two quantities. (Color figure online)

3.3 User Experience with the AR Tour Guide

In this section we compare user interactions with the AR application and the book guide using data from the on-site observations, videos, interviews, activity logs, and surveys. The findings are organized around three elements: interaction with art, the tour guide, and wall labels. The users are referred to by alphanumeric combinations, such as 28C, with the number referring to the anonymized number of the user and the letter to the condition to which they were assigned.

Interaction with Art.

Discovering hidden aspects. The AR users felt they interacted directly with the art and that the tour guide application heightened their interest in the paintings: “It got me to look closer at the actual work, not just at the screen. It made me want to learn more” (1A). The augmentations, such as the highlighting of elements in paintings and perspective lines, helped the AR users understand the artwork, and directed their attention to aspects that might otherwise have remained hidden, leading to excitement and even joy: “It helped me to focus on subtleties in the art that I may have missed. Some surprising and fun details! It added to my understanding of each painting” (6A). The users perceived that the visual overlays helped them focus on and learn about certain aspects of the paintings: “I liked how I could learn how the art works in a visual way, rather than being limited to an auditory lecture” (26B). The augmentations helped the users better comprehend the paintings, sparking discovery and focus in particular. Some users also liked the feeling of achievement and gameplay in their art exploration: “[I enjoyed] the immediacy of learning facts and the game-like quality of hide-and-seek within the images” (30A). These feelings of excitement, joy, and achievement were not found among the book users.

Too Much Screen Time.

There was a clear difference in the AR users’ and book users’ interaction with the art: The AR users typically started with only a quick glance at the painting before focusing on the tablet. The book users, by contrast, typically first went closer to the painting, looked at it for some time, and then flipped open the relevant page in the book. A number of book users also began to interpret the painting vocally before even looking at the tour guide; this never occurred with the AR guide. Thus, for the book users the painting was the primary object of engagement, and the book guide was secondary. Whereas for the AR users, the tour guide application was the primary object of engagement, and the artwork was secondary.

The users’ interaction patterns can also be found in the time distributions between the conditions. The AR users spent more time than the book users in engaging with each painting, when we combine time spent looking at the painting and the application, only at the painting, and only at the wall label. The AR users spent less time looking at the art with the naked eye (without mediation of the tablet screen) than those who used the book as a guide. The AR users spent almost 70% of their time looking at the art in the application and at the painting through the tablet’s screen. Looking at the painting with the naked eye occupied a small part of the time for AR users (about 25%), whereas for book users it occupied more than half of the time (54%). Figures 5 and 6 show with 90% confidence intervals that this quantity was consistently higher for condition C (disjoint confidence intervals), while conditions A and B are indistinguishable using this metric (overlapping confidence intervals). We found that for the naked eye mode, the difference between the AR conditions (condition A and condition B in Fig. 6) and the book was significant at the 0.01 level (p = 9.506e−05 comparing condition A and condition C; p = 0.0008 comparing condition B and condition C). For other modes, there were not statistically significant differences between the AR conditions and the book.

Fig. 5.
figure 5

Average ratio of the time spent looking directly at the painting over the total time spent interacting with it.

Fig. 6.
figure 6

The time distributions between modes of engagement and conditions.

The AR users did not like having so much screen time during their art experience. They felt that the tour guide application interfered with their engagement with the art: “I felt a bit disconnected from the artwork during the time I was looking at the screen. I didn’t like spending too much time looking at the screen” (41A). The users felt the experience became “too technological” (40A), and were worried they did not spend enough time simply looking at the art. Even though the book guide also required the users’ attention, the book users were not as concerned about the time they spent reading the book. This difference suggests that users experienced the properties of the technologically mediated tour guide as more distracting than a book. The AR application thus distracted the users from their art experience; it became the object of their attention even more than the art. At the same time, it supplied the users with information that they found useful. This tension between the utility provided by the application and the concern about excess screen time resulted in a continual balancing act: the users tried to negotiate between focusing on the application and the customary behavior of paying more attention to the paintings.

Somewhat surprisingly, considering the AR users’ complaints about being distracted from the artwork, when combining the time looking at the painting through the screen and with the naked eye, the AR users looked at the paintings longer than book users (Fig. 6). However, looking at the paintings through the screen and with the naked eye are two different experiences; some of the paintings’ features become much more apparent when seen with the naked eye than through the screen. The screen limits the user’s visual field, the mediating tablet screen always moves in the user’s hands, and the user’s eyes get tired of watching the painting through a moving screen. But to see the live augmentations, the user had to point the tablet at the screen and look at the painting through the screen.

Text played a crucial role both in the AR application and the book guide by directing the users’ attention from the guide to the painting. When the text encouraged the user to have a closer look at the painting, for instance by saying, “Look closely at the reflection painted into this silver jug, and you can dimly make out an image of the artist at his canvas,” most users with all tour guides went closer to the painting to find the reflection.

User Experience with the AR Tour Guide.

The users were generally comfortable using all tour guides. The level of comfort using the guides was rated on a scale of 1 to 3, with 3 being very comfortable, 2 comfortable, and 1 not comfortable. The ratings were tabulated by two researchers based on the videos filmed on-site during the experiment. Most users, 43 out of 46, were rated as comfortable or very comfortable using the guides. All book users were at least comfortable. A minority, three out of 31 AR users, had trouble interacting with the application or the tablet.

The main usability issue with both versions of the AR guide was learning to switch from scanning mode to reading mode. About one quarter of users (23%) complained during the tour that their arms grew tired from holding the tablet up. The physical fatigue began early in the tour, after only one or two paintings. When the users noticed it was possible to lower the tablet and continue reading the content, most users (24, which is 77%) started doing so once the application had recognized the painting. The users then scanned the painting and quickly switched to reading mode, finding that to be a more comfortable way to explore the information.

Interaction with Wall Labels.

Beyond the tour guides, the only other available information about the paintings could be found in the gallery wall labels. The book users paid more attention to the wall labels than the AR users did. The former looked at about five wall labels on average and spent about 80 s per a label—about 9% of their total time—reading them. The AR users looked at three wall labels on average and spent 52 s per a label—about 6% of their time—reading them. The role of wall labels in art engagement thus varied: for the AR users, the labels were clearly a secondary source of information. For the book users, the labels had more significance as information providers. But several sources of information created confusion: “I am sometimes a little unsure of where to direct my attention between the piece, the screen, and the info card [wall label]” (21B). The users had trouble deciding which field of vision to use and which source of information to focus on. This confusion about the information sources mitigates the potential advantage of AR in helping the user achieve immediate apprehendability in art engagement by providing ease of focus.

Usability and Suitability of the Tour Guides.

Based on usability measures in the survey, the participants liked the AR guide and perceived it as suitable for use as a tour guide in art museums. The differences between AR and the book guide were modest. The AR users enjoyed their guide slightly more than the book users enjoyed theirs, as Fig. 7 shows. The difference in enjoyment of use between the AR version A users (menu version) and the book users (condition C) was significant at a 0.05 level (p = 0.04994). Regarding comfort of use, users felt nearly equally comfortable using the AR and book guides. The users, however, found only the AR condition B, the dot version, easier to use than the book. The users felt they had the most control over the AR dot version (condition B), the least control over the AR menu version (condition A), and the most control over the book. The lower usability scores for the menu version are explained by its navigational difficulties, which were not present in the dot version (condition B). The difficulty in the menu version was caused by a lack of visual cues when navigating from parent to child layers, resulting in users feeling a loss of control. The dot version of AR provided clearer support to the users with the red blinking dots. As Fig. 7 shows, visitors were more likely to use both versions of AR in their future visits to art museums than the book guide. The difference between the AR menu version (condition A) and the book (condition C) was significant at the 0.01 level (p = 0.007036). We did not find a statistically significant association between the usability measures and independent variables such as age, relationship to technology, or relationship to art.

Fig. 7.
figure 7

Enjoyment, usability, and suitability of the tour guides in each condition. (Color figure online)

4 Discussion and Limitations

4.1 Augmented Reality: Compromised Impact on Art Engagement

The findings show that the mobile video see-through AR tour guide application supports users’ engagement with art. The augmented information helped users understand and interpret the paintings and engage with them. The AR users began to like the art more than the book users did during their tour. The less knowledgeable the users perceived themselves to be, the more they felt they learned. This suggests that AR provides immediate apprehendability to users and may help decrease visitors’ cognitive overload in museums. However, when compared to the book guide, there was very little or no statistical significance in the differences between the conditions. Only one of the examined aspects in art engagement, liking the paintings, performed better in the AR guide than the book with a statistically significant difference.

According to measures of changes in objective knowledge gain, the book users learned more about the paintings than the AR users. One potential explanation for the difference is the book users’ greater reliance on wall labels. The AR users, however, tended to continue clicking on the application and thus were trying to find more information in it and were largely ignoring the wall labels. Another explanation is that the book users focused more on the actual content of the tour guide, and thus learned more, whereas the AR users were more occupied by clicking around in the application than reading the content. However, based on the mean values of perceived learning, the AR users thought they learned more than the book users, even though they didn’t. The AR guide thus created more of an impression of learning than actual learning.

The differences between actual learning and perceived learning between users of AR and the book may be explained by the interactivity of the application: the AR user can click around in the application and skim through the information. This action itself can create a sense of accomplishment, as some of the users noted: “It made me feel like I was accomplishing something by going to each painting in the app” (15B). The experience of using the application also created joy and excitement. Some users also liked the feeling of achievement and game play in their art exploration, as a user explained in response to the question, “What did you like about the AR guide in the survey?”: “(I liked) The immediacy of learning facts and the game-like quality of hide and seek within the images” (30A). The active role of the user can thus create a false or at least somewhat exaggerated sense of learning and accomplishment simply through the act of clicking through the information in the app. The book guide users focused more on reading the actual information without any distraction from digital affordances and, as the objective knowledge gain measures show, actually learned more than the AR users. The AR guide thus provided an experience of active discovery and entertainment, in which the users are an active information-seeker. The book guide, instead, provided a more passive experience to the users, who still enjoyed the experience and learned more.

4.2 The Value of Augmented Reality

The users’ behavioral patterns show they didn’t take advantage of viewing art in the same visual field as the augmentations, which is considered a benefit of AR tour guides. Instead, they preferred to review the live augmentations quickly by pointing the tablet at the paintings to activate the parent information layer and then switching to reading mode. They users became quickly tired of holding the tablet up. Furthermore, the users were uncomfortable with switching between several fields of vision: the screen, the painting, and the wall labels. One reason for the lack of interest in live augmentations could be that their content was not compelling enough to merit the effort of holding the device up. Future research should experiment with content that leverages the possibilities of AR more richly and should test a tour guide with different types of augmentations that may yield different results. However, the physical fatigue of holding the tablet up remains a serious concern, regardless of the type of augmentation: If experiencing an augmentation requires holding a tablet up for long periods of time, the users will undoubtedly become fatigued. They might not hold the tablet up for more than a few paintings, as the users’ behavior in this study demonstrated.

Apart from liking the paintings, in which the AR guide performed better than the book, and objective learning, in which the book performed better than the AR guide, both the AR application and the book served as a tour guide. Therefore, it is worth investigating the actual value of AR in art engagement. One benefit of AR shown in this study was users learning about the art through application, and AR was part of that experience. For instance, seeing the augmented perspective lines on a large painting was a powerful learning experience for several users, as user 26B made clear: “I’ve never understood what a perspective really is before seeing this.” The scaffolding of information with visual overlays, mixing the real (the painting) and the virtual (augmented perspective lines) helped the user understand a key concept in visual art that had not previously been accessible to him. The augmentations also helped users notice details that otherwise would likely have been missed. Learning can even lead to a sense of excitement and joy in exploring hidden aspects in the paintings, as happened with some users in the study. The excitement has also been noted in other AR studies in museum contexts [16, 19]. It can lead to longer engagement times with the artworks, as has been detected by other scholars [17, 32, 36, 38], but that was not the case in this study. However, excitement and joy may translate into liking the paintings, and this could be one explanation for the difference between the stronger enjoyment of the paintings with the AR condition over the book condition.

4.3 Design Implications: Increasing Naked Eye Time

The AR guide’s distracting nature was its downside, drawing users’ attention away from the artwork toward the screen. The users felt disconnected from the art and were concerned about the excess screen time. This concern is borne out by the data: The AR users looked at the art for less time with the naked eye than did the book users. The distracting property of the AR guide is similar to other technologically mediated tour guides [16, 19, 34]. Contrary to the findings of our study, previous user studies on AR in art engagement haven’t shown the distractive nature of the AR systems [8]. The findings of this present study show that when using AR, the users slipped further away from the art. The user was pulled into the influence sphere of the tour guide and became subsumed by the technological experience. The artwork became the secondary focus of attention. When the content of the application instructed the user to look more closely at the painting, most users did. The application’s content thus played an important role in negotiating the balance between engaging with the tablet screen and the art object.

The AR users felt conflicted; they wanted to read the information on the AR app, but at the same time felt they should look at the paintings with the naked eye instead. Such anxiety was not found among the book users. To resolve this conflict, measures to mitigate this concern must be developed. Decreasing the screen time is the most obviously promising avenue. Another avenue is adding an audio feature to the application that allows users to obtain the same information by listening, likely increasing the naked eye time with art. However, not all users like using audio guides, as the users’ testimonies show, so not all users would benefit from the audio feature. A third possible avenue is content. The content already plays a crucial role in pulling the user back to the actual painting, as when the text prompts the user to find details in the painting. Adding more of these elements would remind the user to focus on the art itself.

Navigation difficulties in the AR menu version (condition A) led to a feeling of lack of control of the application, and disturbed the users’ interactions with the art. The blinking red dots in condition B clearly provided more support for focus and discovery, thus supporting immediate apprehendability of art and creating a sense of control. This finding offers a design implication for preferring instant, clear access points in a mobile see-through AR app for art engagement.

4.4 Limitations

The main limitation of this study was its observational nature. The users’ interactions with the paintings were observed by two researchers, which could have affected the users’ behavior. The users were encouraged to interact with the paintings as naturally as possible, but they most likely paid more attention to the paintings than they would have otherwise, in accordance with the Hawthorne effect [23], which occurs when users try to please researchers [12]. However, on-site, in-person observation is a validated field-research method commonly used in design ethnography in human–computer interaction. This method was crucial for our study, because the observers on the ground provided the users’ perspective, which is important in user studies [24]. On-site observations could not be replaced by other methods without compromising the data quality. We did not notice the users having issues with our presence, nor did we notice the users trying to please us; they listed both positive and negative aspects of their experiences, and didn’t hesitate to express their concerns. The users completed anonymous pre- and post-surveys on a laptop-based digital survey interface at the research site without the researchers’ interference, so it is unlikely they would have altered their survey scores and answers to please the researchers. Finally, the conditions were the same for each user in each condition, so the Hawthorne effect would have affected them equally and should not have skewed the results.

5 Conclusion

This study examined the impact of a mobile video see-through AR tour guide on users’ engagement with art. The findings show that AR enhances users’ liking of the paintings more than an analog guide, which functioned as a control condition. When measuring actual learning (objective knowledge gain), the book users learned more than the AR guide users did, contradicting previous findings of AR as a medium for increasing learning. The AR guide enhanced the users’ art engagement by strengthening both their emotional and intellectual connections to the art, and it supported immediate apprehendability by providing clear focus points and sequenced information about the artwork, revealing hidden aspects about the paintings. The users enjoyed using the AR application, found it easy to operate, and the system brought joy to their experience in the art museum. The AR guide, however, was not superior to the book guide; the AR application provided equally good support for art engagement as a book guide, and it was equally easy to use, even for the less technologically savvy users.

AR also presented serious downsides, which mitigate the value of AR in art engagement. The users felt distracted by the AR guide and were worried about excessive screen time. They felt physical fatigue holding the tablet up when reviewing the live augmentations. The users had difficulty switching between the three fields of vision: the art, the screen, and the wall labels. The difficulty is paradoxical; mobile see-through video AR is argued to ease art engagement because it is located in the same field of vision as the art object.

These complications partly compromised the AR users’ art experience and could be a reason for their weaker learning performance. Ideally, there should be no compromises when using new technologies, only benefits. That might not, however, be possible to achieve in this case. Video see-through, AR-mediated experiences on mobile devices require a certain amount of screen time; to benefit from using these systems, users may have to compromise their instinct to avoid screen time. Moreover, it may be the case that users employ AR applications in museums only occasionally, so screen time might not be a major issue in the long run. AR technologies situated in the same field of vision as the user, such as HoloLense, could be a more fitting platform to art engagement, yet they provide new challenges in user interactions. The findings demonstrate that to add value by deploying AR in art engagement, we must explore further ways of implementing AR applications in art museums with designs that keep the art objects as the main focus of art engagement.