Keywords

1 Introduction

In recent years, museums have not only collected and stored cultural property, but also focused on exhibitions and education [1]. In this movement, in addition to simply displaying the cultural property stored in the museum, it is important to pass on the background knowledge such as the history of the cultural property and usage applications. Therefore, VR exhibition using digital archive technology has been advanced, because it is difficult to convey the background knowledge on cultural property in the conventional static exhibition [2]. By using VR technology for exhibitions, it is possible to exhibit a highly immersive exhibition even for cultural properties that cannot be actually touched or entered in traditional exhibitions because of its high cultural value, and understanding of cultural properties deepens [3, 4].

In addition, the digital archive technology has developed rapidly in recent years, and the \(360^\circ \) camera appeared, which can photograph surroundings at once. \(360^\circ \) camera is suitable for archiving cultural properties with large space, and it is also introduced in VR exhibition in museum [5]. However, when offering such a VR exhibition, it is necessary to archive cultural properties, and, the archiving using the \(360^\circ \) camera involves difficulties. This is because many cultural properties cannot be archived as they were at that time as a result of aging and physical restrictions. In the VR exhibit using the archived material under such circumstances, there is a possibility that a misunderstanding may arise in the transmission of the background information. In such a situation, it is necessary to reproduce the original appearance in order to promote correct understanding of the cultural properties.

In this research, we treated the Imperial Train, which is for the Japanese Emperor and Imperial family, as the subject to exhibit using VR technology. The train is of importance because its historical value is very high. Also, the interior of the train is made of silk fabric, resulting in its very high value as a craft.

However, due to the age of the car, it cannot be driven outside. In addition, the interior of the train is very dark because the internal electric circuit is dead, and an indoor light cannot be applied. Moreover, there are various restrictions in the archive for VR exhibition of Imperial trains. First, because there is no light, a light source is required. Furthermore, from the viewpoint of protecting the exhibits, it is impossible to hit a strong light source. Second, the indoor environment is very narrow. The width of the aisle is about shoulder width, and in the observation room there are sofas on the left and right, which cannot be damaged. We need to be able to archive in such constraints In addition, images taken in this way may not be sufficient for VR exhibition. First, it is necessary to ensure uniform brightness. Second, the captured image is the image taken inside the museum, not the archive when actually running. In the images under such conditions, background information may not be transmitted correctly.

Therefore, in this research, we propose VR content creation method to promote understanding of cultural properties based on spherical images photographed under adverse conditions separated from the original context.

2 Related Works

2.1 Creation of Virtual Reality Space

There are several ways to create a Virtual reality space for exhibitions. First of all, it can be thought of a generation method using point clouds like Structure from Motion [6]. Although this method got only sparse point clouds, in recent years, a method of obtaining a dense point clouds called Multi View Stereo was also proposed [7]. However, there are cases in which estimation does not work well in these method, and there is a possibility that a hole may be emptied in the VR space or a form different from the original form may be presumed. This method is inappropriate because the possibility of transmitting wrong background information. In addition, a method using a 3D model such as Image based modeling can be considered [8]. Although these methods are useful for reproducing a single structure, it is difficult to reproduce vast urban spaces and natural landscapes. The VR space using the \(360^\circ \) camera has a limitation on the viewpoint, but because of its photorealistic image, it is possible to preserve the figure as it is. According to Tanaka et al. it is shown that the exhibition using the whole celestial tone has deepened the understanding of the target [5]. In this way, spherical images are considered suitable for the museum exhibition. Therefore, in this research, we will consider the archive using this spherical images.

2.2 Brightness Adjustment and Image Synthesis

In order to correct the captured image, we describe the previous research on the method of synthesizing the image and correcting the brightness.

Lee proposed a method to perform a process for enormous contrast and a filtering process for removing noise [9]. Singh and Kapoor proposed a method to emphasize the contrast of the grayscale image based on the histogram [10]. There is also a method of estimating the light source and changing the brightness of the image as a method of brightening the dark image [11]. In this research, we will consider correcting the darkened portion by correcting the contrast and luminance of the image like these researches.

There are also many studies on image synthesis, and it is known that geometric consistency and optical consistency are important at the time of synthesis. Glassner described the human visual characteristics, signal processing methods such as Fourier transform and wavelet transform, and proposed an image synthesis method using them [12]. Yasuda et al. Proposed a method to cut out regions using temperature information and to do synthesis [13]. These are compositing methods for obtaining geometric consistency, and are often used in today’s image synthesis. On the other hand, as a technique for achieving optical consistency, a method of estimating the light source environment from shadows of real objects in an input image was proposed [14]. In this research, we will examine the method for synthesizing, while considering geometric consistency and optical consistency during synthesis.

3 Design of Proposed System

3.1 Overview

We propose VR content creation method to promote understanding of cultural properties based on spherical images photographed under adverse conditions separated from the original context. In Sect. 3.2, we describe the method of correcting the brightness of the spherical image. In Sect. 3.3, we describe the synthesis method of the spherical images. We correct the appearance of the spherical image due to the fact that the location was not originally the place of the cultural property at that time of photographing the spherical image. For that purpose, we propose a method to combine a spherical image with other spherical images taken at the original place.

3.2 Brightness Adjustment

First, the brightness of the spherical image is adjusted. This is because it was thought that in the experience of the spherical image, when the brightness is too low or too high, it adversely affects the viewing of cultural properties. The brightness is adjusted by \(\gamma \) correction. Calculate the histogram after performing \(\gamma \) correction on the original spherical image. The one with the cumulative frequency distribution of the histogram closest to the straight line is adopted as the picture with the largest contrast. The determination as to whether it is close to a straight line is made by the least squares method.

3.3 Spherical Image Synthesis

In this section, we describe a method of reproducing the appearance by synthesizing spherical images. Consider synthesizing spherical images taken at different places. However, if this synthesis is simply performed, there will be deviations in the perspective due to mismatch of the viewpoints, resulting in different appearance. Furthermore, in the synthesis of spherical images, it is difficult to correct all of the deviations over the entire periphery. However, in the viewing of the spherical image, since the part the user is looking at is only limited part of spherical image, we considered to correct only that part. Therefore, in this research, we select dynamically the material to be synthesized from the consecutive spherical images and deform the image before synthesizing to correct that part. The synthesized image is dynamically selected according to the user’s eye direction, and the image to be synthesized is deformed using Mobius transformation [15].

The workflow of this method will be explained. First, a spherical image to be a base for combining is selected according to the user’s eye direction. Next, we apply the Mobius transformation to the selected image according to the positional relationship between spherical images, and perform the composition (Fig. 1).

Fig. 1.
figure 1

1. Select spherical image to synthesize according to the eye direction. 2. Deform the spherical image by mobius transform according to the geometry

First, we will describe a method of selecting a spherical image according to the eye direction. Two spherical image groups to be synthetic materials are prepared. Subsequently, camera parameters are obtained using Structure from Motion, and the positional relationship between all the image of each camera is determined. The positional relationship between the two groups is determined based on parameters in the real world. Then, when improving the appearance from one spherical image, an image closest to the direction of the line of sight is selected and set as a composite material.

Subsequently, the image is transformed so as to synthesize the selected image with geometric consistency. Therefore, in this research, we synthesize after deforming the spherical image using Mobius transformation. The coefficients of the Mobius transformation are determined from the position of all spherical images and then from the geometrical positional relationship of the landscape shown in the deformed spherical images. Then, according to the value, deformation of the spherical image is performed, and synthesis is performed.

4 Experiments

In this chapter, we examine the effect of each of the correction methods proposed in Sect. 3. First of all, as to the method of Sect. 3.2 the difference between histogram difference and feature point matching result is checked between the image captured by the single lens reflex camera and the image before correction and the image after correction. The validity of the image composition method of Sect. 3.3 is verified by applying it to an actual image. In order to verify the usefulness of the method, we photograph the train running in Tokyo and the home of Ueno station in Tokyo, synthesized by the proposed method and evaluated.

4.1 Brightness Adjustment

Detailed Procedures. Using the method proposed in Sect. 3.2, confirm whether the brightness is consistent. First of all, we prepared spherical image groups archive inside the Imperial train. Also, as a comparative image, an image of the interior of the train taken with a single lens reflex camera was prepared. First, after executing gamma correction on the spherical images, calculate the histogram to obtain the image with the greatest contrast. To the spherical image obtained in this way, an image which becomes visible from the user when viewing the entire circumference image is cut out so as to be equal to the composition of the single lens reflex. We compare histograms of image taken with single lens reflex camera and video cut out. Also, we measure the similarity of feature points between those images. Comparison of histograms was made to have a size of \(300\times 200\) px so as to preserve the aspect ratio, each histogram was calculated, and the degree of coincidence was examined. The degree of similarity between the feature points was calculated by grayscaling the image and after setting the size to \(300\times 200\) px in the same way, extracting the feature points using the A-KAZE feature value and calculating the distance between the feature points. The images used for comparison are the following two groups of images (Figs. 2 and 3).

Fig. 2.
figure 2

Group A: left image was photographed with a single-lens reflex camera, and the middle image is an original spherical image, and right image is corrected spherical image.

Fig. 3.
figure 3

Group B: left image was photographed with a single-lens reflex camera, and the middle image is an original spherical image, and right image is corrected spherical image.

Result and Discussion. The degree of coincidence between the matching degree of the histogram and the distance between the feature points in group A and group B is as follows (Tables 1 and 2). In the table, “Proposed” means Proposed method which adjust the brightness, and “Comparative” means Comparative method which does not correct the brightness.

Table 1. Histogram
Table 2. Feature matching

First, the value of the histogram in group A has decreased in the degree of coincidence in the proposed method, whereas in group B, the degree of coincidence of the histogram has increased. It is thought that the result at group A was originally due to the fact that the brightness of the bright part was further increased. On the other hand, in group B, the range the user was looking at is a dark part of the spherical image, and the part is brightened, so it is considered that the brightness has improved. If there is a difference in the brightness of the image in the entire circumference image, processing corresponding to that darkness is considered to be necessary. On the other hand, in both scenes A and B, it was found that the result of feature point matching improves in the proposed method. This is thought to be because the feature points became easier to understand by emphasizing the contrast even in the case of a bright image originally. From the above, it was found that there is a possibility of improving the appearance of the entire celestial image by emphasizing the contrast. On the other hand, it was suggested that adjustment of brightness requires processing to be divided between bright and dark parts of contents.

4.2 Spherical Image Synthesis

Detailed Procedures. In order to evaluate the validity of the method, we synthesize the spherical images taken at Ueno station and the spherical images taken in the train stopping at the station. Regarding the appearance at the time of viewing actually, evaluate the difference between the image created by the synthesis and the spherical image which is taken in real (ground truth image). As a comparison method, we compare it with a method of merely combining with spherical image selected by eye direction, and not transformed. For each composite image and ground truth image, feature points are extracted using A-KAZE feature detection in a range visible from the train window. Subsequently, the extracted feature points are visually associated with each other. This is because it is difficult to automatically match by the fact that the time of photographing was different or due to the influence of the color of the window. After that, in order to check the degree of coincidence between the composite image and the ground truth image, it is judged from relationships of the feature points matching. The amount of parallel movement and the amount of enlargement/reduction such that the group of feature points of one image coincide with the group of image feature points of the other are calculated. Also calculate the average pixel distance between feature points of each image. When the line of sight direction is perpendicular to the window, the angle is \(0^\circ \), and the above indices are calculated when the gaze directions rotate \(0^\circ \), \(10^\circ \), \(20^\circ \), \(30^\circ \), respectively. The appearance of the entire sky image that was actually synthesized is as shown in the Fig. 4.

Fig. 4.
figure 4

Left image is photographed from the inside of the train, the middle is simply synthesized image, and the right image is synthesized image after conversion.

Result and Discussion. The amount of parallel movement, the amount of enlargement and reduction, and the average pixel distance between feature points between images are as shown in the following graph (Fig. 5). From this figure, it can be seen that composition shift is improved by using the image transformation method based on the Mobius transformation for each of the translation amount, the enlargement/reduction amount, and the average pixel distance between the feature points between the images. As the center of sight axis rotated, every index was larger when synthesis is performed merely. In that method, this is because it takes feature points only in the part visible from the train window, so it seems to be because the range that takes the feature point is biased toward one side as the line of sight rotates. Also, we were seen to be out of the feature points on the pillar of the station’s home because we specified the value of the Mobius transformation so that the wall of the station can be seen at the correct position. However, even in such a point, we see that the proposed method is superior to the image selection method alone.

Fig. 5.
figure 5

The left figure is the scale, the middle image is the average moving distance of the image center, and the right figure is the average distance between the feature points.

5 Conclusion

In this paper, we propose a VR contents creation method to promote understanding of cultural properties based on spherical image materials taken under adverse conditions separated from the original context. We proposed a method of correcting the brightness of the spherical image taken and a method of synthesizing the spherical image for correction of the part different from the original appearance.

First, the brightness was adjusted. The luminance was adjusted by \(\gamma \) correction so that the overall contrast was smoothest. The degree of smoothness of the overall contrast is assumed to be the case where the cumulative frequency distribution of the histogram is the most linear. The degree of proximity to a straight line was determined by the least squares method. Subsequently, for a part different from the original appearance of the spherical image, it was synthesized with the spherical image taken at original place. First, the positional relationship between all the spherical image groups is obtained. hen, an image to be a composite material is dynamically selected according to the eye direction of the user. Based on the geometry of the real world, the selected image is transformed by Mobius transformation and synthesized.

We conducted experiments on the usefulness of each of the correction methods. First of all, regarding the experiment of the brightness correction method, We conducted experiments on the usefulness of each of the correction methods. In order to verify the brightness correction method, the appearance from the user of the spherical image is compared with the image photographed with the single lens reflex camera. We compared between the spherical image corrected by the brightness correction method and the spherical image not corrected. The similarity between the histogram and the similarity between the feature points is examined between the appearance of each spherical image and the image of the single lens reflex. As a result, in the histogram, it was found that when we were looking at the dark part in the spherical image, we improved it while when we were looking at the bright part, the similarity declined. On the other hand, in any case, it was found that the degree of coincidence of the feature points is increased, and it was found that the appearance is improved by smoothing the contrast. Regarding the synthesis method of spherical images, synthesis was performed on the actual spherical image and verified. The verification was based on comparing the error between the actually captured image (ground truth image) and the image simply synthesized without deformation and the error between the ground truth image and the image synthesized by the proposed method. Feature points of each image were determined, and the amount of translation and enlargement/reduction such that the feature points of one image coincided with the image feature points of the other image were calculated, respectively. We also calculated the distance between feature points. As a result, improvements were found in the proposed method for any indices.

In the limitation of the proposed method, the existing scenery can be synthesized in the reality image, but it can not be applied to things where the present figure can not be seen due to collection, damage or the like. In addition, it is difficult to deform if the distance between all the spherical images is too large, or when there are many substances that become disturbance. Also, when increasing the brightness, there are cases where the original image is too dark, it does not go well.

As futurework, I would like to apply this method to other cultural properties and carry out various archives. Also, I would like to confirm that the cultural properties actually archived in this way are exhibited and that the quality of the exhibition could be secured. Regarding the correction of brightness, because the correction parameters may be different depending on each cultural property, we want to verify the parameters.