1 Introduction

Recently, museums have been very interested in the introduction of digital technologies such as virtual reality (VR) and mixed reality (MR) into their exhibitions. Using these technologies, they aim to effectively show background information about their exhibits. However, most of these digital technologies are not suitable for designing an exhibit that a museum requires, and it is difficult to use these technologies as an effective approach for current exhibits’ designs. To solve this problem, Kajinami et al. [7] focused on display cases, which have been used for exhibits at some museums. They constructed a prototype of the Digital Display Case, which demonstrated an exhibit using VR technologies. This system enabled users to interact with the virtual exhibition and provided them with the information about exhibits.

On the other hand, content creation is one of the problems for creating such interactive exhibition. To create a traditional interactive VR content, it was popular to use a physical simulation. However, to authentically reproduce the exhibit’s behavior, it was required to precisely create a model in a computer and hence risk the deterioration of exhibits while measuring their features. Therefore, in this paper, we propose a high-definition digital display case that can convey the dynamic behavior of exhibits such as structures (Fig. 1). To create the interactive contents of dynamic virtual exhibits, we propose an image-based interaction method that uses image-based rendering. This method allows us to construct high quality realistic contents without the risk of an exhibit’s deterioration such as disintegration of its structures.

Fig. 1.
figure 1

High-Definition Digital Display Case

2 Related Works

2.1 Digital Technologies for Museums

Although some digital devices like information kiosk or videos about exhibits are already introduced into museums, most of them are placed outside of the exhibition rooms. This is because curators in museums, who design exhibitions, do not know how to use it effectively, while they know much about conventional exhibition devices. We have to consider this know-how that museums used for exhibits, to introduce digital technologies into museums.

Digital technologies which are used for exhibitions in museums can be categorized in two types. One is the consequential type which display the information for exhibitions incrementally, and the other is the straightforward type which convey the exhibitions’ information as the devise of exhibition.

As the former, a theater system is already introduced. Several studies have been conducted on the gallery talk in the theater [20]. These systems can present the highly realistic images or models about the theme of the exhibition. However it is difficult to introduce the system into the exhibition rooms, and this type of system have to hold visitors for a long time to exhibit. There are also more researches that use digital technology at a gallery talk in an exhibition room. A gallery talk is a conventional way for museums to convey exhibit’s background information to their visitor. However, it is difficult to have frequently or individually because of the problem of lack of sufficient help. Gallery talk robot [10] is one solution for this problem, which realize gallery talk from a remote person. Mobile devices are also used to convey the information about exhibits [6].

As the latter, the technologies for straightforward exhibits, some researches constructed the system that superimposes the information on actual exhibitions using MR display system. To superimpose an exhibit’s information, the Head Mounted Display (HMD) was usually used [9], however a wearable system like HMD have a difficulty of management when we introduce them into permanent exhibition. On the other hand, there are some works which use half mirrors to superimpose the information on actual exhibitions [3], and a method which use a projector which dis- play the information depending on the measured user’s point of view [8, 14].

2.2 Interactive Exhibit Using Digital Technology

On the other hand, some exhibit system which enable users to touch the virtual exhibitions have been studied. Nara University Museum held an exhibit where visitors could experience to read Japanese traditional scrolls by rolling a bar device using the digital data of the scroll. Wakita et al. constructed a system which enable users to touch the digital archived fabric, by displaying haptic feedback with a SPIDAR device based on data of the fabric taken with the laser range scanner [21].

These systems realize experiences in which users touch virtual exhibition, and convey the exhibition’s information about weight, texture and so on. However, these systems display physically static exhibitions, so they do not enable users to operate dynamic virtual exhibitions. Therefore in this paper, we aim to construct the exhibit system which can convey the exhibition’s information about the dynamic characteristics like mechanism. To convey these information, realizing the experience in which users manipulate exhibitions is considered most effective. Therefore, we constructed the Digital Display Cases where users can touch and manipulate the virtual exhibition.

2.3 Technology for Creating VR Contents

There are some methods that can be used for constructing the dynamic contents in our proposed high-definition Digital Display Case. One of these methods is the way of 3D models and bones. However, if the contents have the complex dynamic structure, it is difficult to reproduce the model with precise mechanism. Therefore in this paper, we focused on the image-based rendering(IBR) method which composes the photo taken by the camera with the virtual position, using photos taken from various viewpoints. IBR uses Image-based modeling(IBM) method which is the geometry estimation using a number of photos, and View Morphing method which interpolates the difference between photos.

IBM methods rely on a set of two-dimensional images of a scene to generate a three-dimensional model and then render some novel views of this scene. IBM focused on detecting, grouping, and extracting features like edges, faces, and so on, present in a given photo and then trying to interpret them as three-dimensional clues. The main IBM methods include the reconstitution of rough three-dimensional geometry [4], the method with the stereo matching with cameras whose positions and parameters are known [11], and the method with the Structure from Motion(SfM) which refers to the process of estimating three-dimensional structures from two-dimensional image sequences which may be coupled with local motion signals [18].

On the one hand, Seitz et al. [16] proposed View Morphing method which composes the image taken from the virtual point of view using the Image Morphing method [22], not using the estimation of the three-dimensional geometry. Image Morphing method is a technique to compose the natural interpolated image depending on the movement and distant of corresponding feature points or segments [2, 17]. This method is often used to interpolate between images taken from different angles, but some researches try to interpolate a movement in scenes using the morphing technique. The method Manning et al. proposed is able to compose an interpolated image like a movie from images that show an object with the linear uniform motion [13].

Applying these methods, it can be considered that we are able to construct the realistic dynamic contents by deforming the exhibition’s image with the estimation of the structure from the matching of feature points between images, not by creating the precise mechanical model.

3 Image-Based Interaction

From these findings, we constructed the Digital Display Case which enables users to interact with the dynamic virtual exhibits with the deformation mechanics. In the previous paper [7], we constructed a prototype of Digital Display Case which enables users to handle the virtual exhibits and to look round them. On the one hand, in this paper, we realize the display system which enables users to feel the structure of the exhibitions by manipulating the virtual exhibits with their hands, and aim to help users understand for the dynamic characteristics of the exhibitions.

To realize interactive contents for VR/MR system, it is a usual method to construct the model with the shape and the behavior, and to construct the contents in real time on the basis of the model’s simulation outcome to input. However, to reproduce the exhibit’s behavior truly by the simulation, it is important to analyze the exhibit’s structure precisely, and to elaborate model from the result of the analysis. To analyze the exhibit’s mechanism, it may be needed to deconstruct it, however in that case, it is necessary to ensure restoring to original state. Therefore, before we deconstruct the exhibit, the precise affirmation and inspection of the inner structure using pertinent materials and X-ray. Besides, if we detect the exhibit’s structure, it requires huge effort to calculate the physical characteristics like spring and damper modulus precisely for reproducing the structure.

To solve these problems, we propose image-based interaction based on image-based rendering, as the method to construct high-definition interactive contents easily. This method constructs a realistic content with freedom degrees for the deformation state of the exhibition and the position of the point of view from given images, depending on the manipulated input and the viewpoint. This method is able to reduce the risk of the exhibit’s degradation attached to the analysis of the structure, and has an advantage to avoid the necessity of elaborating CG models (Fig. 2). In particular, the system composes the exhibit’s image corresponding to the interaction, by interpolating images which we captured frame by frame with degree of freedom for the deformation state and the point of view (Fig. 1).

Fig. 2.
figure 2

Interpolation of viewpoint and transformation by Image-based Interaction

To increase the construction speed, the system interpolates images by segmenting them into small meshes and deforming them. The system deforms the image using the Rigid MLS method [15] and uses the corresponding points detected by the following process as the control points. This method determines the deformation amount of a mesh (\(\varDelta m\)) based on the position of the control points \(q = (q_0, q_1, ..., q_{n-1})\) using the MLS method. Using this method, the deformation amount of a mesh is presented as \(\varDelta m = D(q)\). First, we interpolate between three images using the two-dimensional parameters of the deformation state or the viewpoint position (Fig. 4). The deformation that is constructed using the interpolated images is the combination of interpolations for both degrees of freedom as presented in the following equation: \( {\varDelta }m(t_0,t_1)=D_{0\rightarrow 1}(t_0)+D_{0\rightarrow 2}(t_1)\). For example, if point group \(q_0\) in Image 0 corresponds to point group \(q_1\) in Image 1, the deformation amount is calculated using \(D_{0\rightarrow 1}(t_0)=D((1-t_0)q_0+t_0q_1)\). An image with more similar deformation states is used as the base image. Therefore, when \(t<0.5\) Image 0 is used; otherwise, Image 1 is used. For the interpolation, the system uses control points that are related to the content’s motion. To detect motion, the SIFT feature values [12] are calculated for both images, and corresponding points are detected using these feature values in the images (Fig. 3). For these corresponding points, the system uses RANSAC algorithm [5]. The collection of corresponding points that resemble motion are identified as one part. Then, outliers that do not belong to this part are removed. An interpolated image (see Fig. 4) is composed when this process is expanded to two dimensions and is applied to three images with different deformation states.

Fig. 3.
figure 3

Process of Parts Estimation

Then, we adapt this method to change both the viewpoint position and deformation state (Fig. 2). The interpolated image is controlled using four parameters: the viewpoint position coordinates (\(v=(v_0,v_1)\)) and the deformation states (\(p=(p_0,p_1)\)). Finally, the deformation amount of the image is the combination of the interpolation effect caused by the viewpoint (\(D_v\)) and the interpolation effect caused by the deformation state (\(D_p\)) as follows: \({\varDelta }m(v,p)=D_{v}(v)+D_{p}(p)\). These two effects can be estimated as the following equation regarding the mutual effect. In these equations, \(D_{v(p_0, p_1)}\) represents the two-dimensional interpolation for the deformation state v at \(p=(p_0,p_1)\), and \(D_{p(v_0, v_1)}\) represents the two-dimensional interpolation for the position of viewpoint p at \(v=(v_0,v_1)\).

$$\begin{aligned} D_{v}(v)= & {} (1-p_0-p_1)D_{v(0,0)}(v)+p_0D_{v(1,0)}(v)+p_1D_{v(0,1)}(v)\\ D_{p}(p)= & {} (1-v_0-v_1)D_{p(0,0)}(p)+v_0D_{p(1,0)}(p)+v_1D_{p(0,1)}(p) \end{aligned}$$

Using these methods, the system can compose the interpolated image, which can adapt to the changes in the viewpoint position and the deformation state of an exhibit. Figure 4 shows the deformation state of the exhibit that is constructed using the interpolation of images with three different deformation states. The comparison with the actual image shows that the interpolation works as designed. Because this method composes the interpolated image based on an image with the most similar state to the desired one, it is difficult to adapt to the change in occlusions. Therefore, when a considerable shift appears in self-occlusion due to changes in the viewpoint position or the deformation state, there is a risk for an image gap occurrence at the scene where the base image is switched. To solve these problems by an efficient data acquisition, we should quantify the conditions in which the interpolations for the viewpoint or the deformation state are broken and set the index for the camera’s position and the exhibit’s movement when images are taken.

Fig. 4.
figure 4

Interpolation among three images

Because this method composes an interpolated image based on an image with the most similar state to the desired one, it is difficult to adapt to changes in occlusions. Therefore if the huge shift of self-occlusion comes up when the position of viewpoint or the deformation state change, it invokes the risk of occurring the gap of images at the scene where the based image is switched. Besides, if the parts of structure are so small and the difference of the motion between each part is so huge, the system is not able to detect a sufficient amount of corresponding feature points on parts in the parts estimation, and fails to construct an applicable interpolated image.

To solve these problems by efficient data acquisition, we should quantify the conditions where the interpolations for the viewpoint or the deformation are broken, and set the index for either the cameras’ positions and the exhibit’s movement when taking images.

4 Composition of Virtual Jizai-Ryu

We integrated the image-based interaction method into the system that detects user’s manipulation and constructed a high-definition digital display case that enables a user to manipulate a virtual exhibit (Fig. 1). We used a “Jizai-Ryu” which is a free-motion ornament of dragon as the dynamic exhibit. It is a cultural asset developed in Japan.

It is a kind of figure model which can change its posture with the accurate movable structure. Since it is a national treasure, almost no one is permitted to touch it. Therefore, it is very difficult to know its structure. The curators in the museum want their visitor to manipulate this ornament to feel how it works, but that’s not possible because it is a national treasure. Also they can not to deconstruct it for the analysis of its structure because there is a risk to deteriorate this exhibit.

Therefore we realize the exhibition which is able to convey the potential of this ornament’s movement to visitors effectively, although normally this ornament is displayed in the fixed state.

To generate a content of the exhibition, we used the image composed by the image-based interaction based on images photographed in frame photographing by a camera array while manipulating the free-motion ornament. In this paper, we focused on the movement of the head and tail of the dragon, and took images in frame photographing while manipulating these two parts in the range in which the dragon’s body did not move. Finally, the system composed the interpolated image from these fifteen images for each part’s movement, and joined the head side and the tail side to compose the dynamic contents, with which users can manipulate the head and tail of the dragon at once.

Furthermore, it was anticipated that hundreds of visitors experienced this system in day, so the high turnover rate of experience and the ease of maintenance were required. Therefore we cut out the three-dimensional display function which needed visitors to wear 3D glasses, and constructed the device which realized the virtual manipulation experience. Regarding the maintenance, we constructed the interface for the virtual manipulation experience using the simple method in which users handled the bar to get a weight, not using complex haptic devices.

The user grips the green bar as shown in Fig. 5 and put his/her hands into the system’s designated space. The system detects the operation input and calculates an output using the image-based interaction method. Using this output, the system shows the user’s hands as if s/he manipulates the virtual exhibit. In this process, we can manipulate the dragon’s head and tail by handling the bars inside the designated space.

Figure 5 shows the process of how an exhibit is manipulated using bars inside the system’s designated space. Web cameras are placed in these spaces to detect the condition of bars and user’s hand using the HSV color space. First, the system detects the movable area of the operating parts such as the head and tail. Then, it calculates the area that contains all operating parts. The system also detects the area in which the center of a manipulating bar moves beforehand and represents a hand image on the movement trajectory of the operating parts based on the relative correspondence of these two areas (Fig. 6). For the deformation state, which is determined by the positions of the head and tail, the system constructs an interpolated image of the head and tail sides of the dragon using the image-based interaction method. Then, the system blends both images in the center of the trunk. Finally, the system superimposes the extracted hand image to give users a sense manipulating the Jizai-Ryu with their actual hands. In the previous work, we confirmed that if we show the captured an image of the hand which the movement is distorted, users feel that they move their hands as the movement of the hand’s image on the monitor. It is confirmed that this illusion modifies the perception of shapes that users touch [1], and it can be considered that the similar effect occurs in this time, and the sensation that users manipulate the exhibition with their own hand can be enhanced.

Fig. 5.
figure 5

Process for creating interaction

5 Validation of Virtual Jizai-Ryu

We made the high-definition Digital Display Case available to the public, by exhibiting it at the Tokyo National Museum’s 140 Year Anniversary Special Exhibition: “Flying Dragon” from Jan. 2nd, 2012 to Jan. 29th, 2012. Over the duration of the exhibition, we exhibited two type of displays, one was the high-definition Digital Display Case which enabled users to experience manipulating the exhibit with their own hands with the method of detecting the manipulation input (referred to as Experience type) (Fig. 1), and the other was the exhibition which enabled users to manipulate the exhibit with touch panel as the compared condition (referred to as Touch panel type) (Fig. 7). We exhibited Experience type for 12 days, and exhibited Touch panel type for 13 days.

With Touch panel type display, users could manipulate the movements of the head and tail, by switching 30 images taken in frame photographing while manipulating the exhibition little by little.

Fig. 6.
figure 6

Position for interaction.

Fig. 7.
figure 7

Operation with the touch panel

5.1 Questionnaire for Visitors

We had questionnaires to the part of visitors who experienced our system during this exhibition, and got answers from 301 visitors who experienced Experience type and 267 visitors who experienced Touch panel type.

Figure 8 shows the evaluation of the composed image (5 = totally good, 1 = totally bad). The answers for the deformation of the exhibit’s image indicate that the deformation of Experience type got the equal evaluation of the deformation of Touch panel type. In this case, the image of Touch panel type contents are constructed based on more than twice the number of images than the number of images that compose the image of Experience type. Therefore this result indicates that our interpolation method is able to construct the dynamic contents with fewer images without a reduction in quality. On the one hand, the evaluation of the composed image’s visibility of Experience type was lower than that of Touch panel type. It is considered that the disturbance in a captured hand’s image and the obscuring the exhibition with the superimposed hand reduced the evaluation of the composed image’s visibility. These comment were shown in the open question. Therefore we should enhance the quality of superimposing hands, and examine the way of hand’s superposition which counteracts viewing of the virtual contents. Besides, it can be seen in the open question that there is a great demand of dynamic virtual contents with highly free-dimension and users want more various dynamic distortion.

Figure 9 shows that the answer of the evaluation of the manipulating sense with users’ own hands indicates the manipulation system which is used in Experience type was evaluated equal to that in Touch panel type (Fig. 9). Visitors who experienced our system contained a lot of elderly adults who looked unaccustomed to the digital device. However, these visitors’ evaluation of Experience type was equal to that of Touch panel type which is typically considered as the intuitive operation. Therefore, it can be said that the operation system which detects gripping manipulation used for Experience type was intuitive enough to display the interactive contents for various visitors.

Fig. 8.
figure 8

Feedback for the content of exhibition (Ave. and SE)

Fig. 9.
figure 9

The answer to questions about manipulation and mechanism(Ave. and SE)

5.2 Review from Curators

20 Curators who worked in Tokyo National Museum experienced this system, and we got reviews from them. First, most of them said that it was very interesting experience that the ornament of the dragon deformed as a function of the user’s manipulation, and the quality of the photo-based contents were appropriate to the exhibit in the museum. Some curators said that the superimposed hands’ image enhanced the feeling that they manipulated the exhibition with their own hands, and felt moving their hands as displayed on the monitor. Besides, how the user manipulated the dynamic virtual exhibition can be seen around the system, so visitors around the system were easy interested in the exhibition. These comments indicated that high-definition Digital Display Case using the image-based interaction method was useful for the exhibit in the museum.

On the other hand, some were confused about how to move their hands in the holes. It is considered that the reason of this confusion was the difference of freedom degree between the contents deformation and the movement of users’ hands. To solve this problem, we should expand the freedom degree of the image deformation, or show the visual guide as the cue of the hand’s movement.

Other curators mentioned that they want to convey their visitors exhibitions’ haptic sensation. This system made users handle the bar to convey the weight, regarding of the experience turnover and the maintenance, however using the similar weight or quality of material for the gripping parts or displaying them with the cross-modal effect like the visuo-haptic interaction [19] is possible to help realizing the realistic haptic experience.

6 Conclusion and Future Works

In this paper, we proposed the high-definition Digital Display Case which can convey visitors the dynamic characteristics of the exhibition like the structure. As the method to construct the interactive contents of dynamic virtual exhibits, we proposed the image-based interaction method based on image-based rendering. With this method, we can avoid the risk of the exhibit’s deterioration with the measuring of mechanism. This method interpolates pictures of an exhibit with a number of deformational conditions and viewpoints

Using this method, the high-definition digital display case which enable users to interact with the virtual dynamic exhibit. This system detects the position where the user handles the object from the web camera to get the operation input. We constructed the exhibit which displayed the “Free-Motion Ornament of Dragon” in the Tokyo National Museum. In this exhibit, we got feedbacks from visitors and curators of this Digital Display Case.

To derive more detailed specifications for the system, we should expand the freedom degree of the contents deformation. Currently, this system can display two-dimensional deformation, so we aim to expand this method to three-dimensional deformation using the interpolating method with the three-sided pyramid, not with the triangle like in this paper. Furthermore we will try to display the exhibits’ shape, weight, quality of material in ease using the cross-modal effect like the visuo-haptic interaction [19]. With these improving technique and a great deal of consideration, we aim to realize the goal high-definition Digital Display Case.