Keywords

1 Introduction

In a Virtual Reality (VR) environment, the ability to interact with virtual objects improves user experience. Thus, a good design of the interaction in a VR system is important. A VR experience that includes significant interaction enhances user interest and understanding more effectively than a passive VR experience. Therefore, some VR systems incorporate physical interaction by users. Arakawa et al. proposed a reliving video experience system that demonstrates the way in which a camera operator captured the scene by enabling users to move in the same way as the camera operator and by simultaneously showing the corresponding scene on the screen of their handheld devices [1]. Arakawa et al. showed that their proposed interaction techniques are effective in reliving a video experience. If the interaction design in a VR system is not appropriate, users will fail to understand the experience of that system. Therefore, users may lose interest in VR objects and stop their experience before viewing the entire content that the designers prepared and intended to show.

The popularization of omnidirectional cameras, which can capture spherical images instantaneously, has facilitated the archival of a real space. Additionally, the usage of experiential devices such as tablet devices and head-mounted displays has become widespread. Consequently, a larger number of VR systems with spherical images and videos have been developed. In particular, spherical images are suitable for easy construction of immersive and realistic virtual spaces [2, 3]. A spherical image contains the entire information of the landscape of all angles from a location; therefore, a natural appreciation of spherical images requires manipulation such as a mouse dragging operation, as performed in Google Street View [4]. Several studies have investigated spherical images; the experience of spherical images by using a hand-held device such as a tablet device is known to be immersive and effective in the understanding of geometric space [2, 57].

However, the appreciation of spherical videos has two problems. First, conventional video interfaces with a playback button and a seek bar, such as YouTube [8], require few interactions. Thus, the experience of a spherical video with a conventional interface becomes passive. Second, users cannot obtain sufficient information because a part of the region cropped from the spherical image is displayed on a mobile device, and therefore, at a given moment, users can only appreciate a part of the direction of spherical videos that contain plenty of information. Users cannot view all the directions at a given moment; therefore, they may overlook the movement of the main object of the spherical video if the video plays back automatically. These problems make it difficult to continuously retain the interest of the user throughout the playback. Hence, in conventional interfaces without physical interaction, some users gradually lose interest and stop appreciating spherical videos in the middle of the playback.

Therefore, we propose a system with an interface that enables users to play back spherical videos by touching and swiping objects in the spherical movie. In our proposed interface, users are able to directly manipulate the objects in the movie. It is expected that users will frequently interact with virtual objects and will be able to obtain information about the objects because the proposed interface is intuitive; further, the video can be easily played back or stopped according to user preferences. Hence, the proposed interface retains user interest and enables users to completely appreciate the spherical video. In addition, we conducted an experiment in a real exhibition and evaluated the effectiveness of our proposed interface for a spherical VR experience.

2 Related Work

In this section, first, we describe a VR system for the appreciation of spherical images on mobile devices. Second, we describe the use of mouse dragging as an interface for video.

2.1 Mobile Application for Viewing Spherical Images

Virtual spaces constructed by spherical images are more immersive and realistic than the virtual spaces constructed by computer graphics [2, 3]. Okada et al. developed an application named “Manseibashi Reminiscent Window” for mobile devices; with this application, users can appreciate spherical images of the past scene on-site by superimposing a past-scene image of an actual site onto the present scene. Their proposed system is socially accepted and maintains user interest even in an unattended exhibition. In this system, the virtual camera, which crops a part of the region of the spherical images, is linked with the orientation of the mobile device obtained from the built-in gyro and acceleration sensor. When the user directs a mobile device in a specific direction, the virtual camera turns to the same direction in conjunction with the user movement. Motion-based interaction is known to improve the level of immersion and presence of a VR experience [7]. Therefore, in our proposed system for appreciation of spherical videos, we configure the virtual camera to be connected to the orientation of the mobile device.

2.2 Direct Manipulation Interface for Video

Thorsten et al. proposed an intuitive interface for frame-accurate navigation in video scenes [9]. A dragging operation along the trajectory of the moving object in the scene results in the selection of the appropriate video frame. In the experiment by Thorsten et al., their interface reduced the time required for in-scene navigation tasks by an average of 19–42 % when compared with the time required by a standard seek bar. Additionally, in a survey, users responded that this interface was more natural than conventional interfaces. This interface enables users to navigate an object in video to a specific desired position precisely and quickly; further, the interface is useful in video tasks such as editing video, reviewing sports footage, and verification of video in a trial.

Although the study by Thorsten et al. focused on accurate navigation of objects in video, we apply this technique to play back videos. We aim to develop a mobile system; therefore, in order to play back videos, users must swipe a touchscreen instead of dragging a mouse. We expect that this intuitive swiping interface will be effective in attracting the interest and understanding of the user with regard to the video content. Then, we perform an experiment to evaluate this effectiveness.

3 Design of Proposed System

3.1 Overview

In conventional video interfaces such as playback buttons and seek bars, it is difficult to retain user interest owing to the lack of interaction. Therefore, some users stop their VR experience before the video ends. Further, users cannot view all the angles of spherical images at a given instant of time. Hence, we propose a new interface for playing back spherical videos; in the proposed interface, users can directly manipulate objects by swiping the screen. It is expected that users will intuitively play back spherical video and interact with the virtual object frequently, thus leading to an increase in interest.

3.2 Spherical Video Used in Our System

In our proposed system, we captured spherical video by using an omnidirectional camera, “LadyBug5”. The camera is installed at the top of an electric wheelchair and records a video at 10 fps. The camera is approximately 170 cm in height. We captured the departure of a sleeper train, “Hokutosei”, from Ueno station, Japan. This train is the main and movable object in our system. The railway has a linear trajectory, and therefore, it is an ideal choice for the proposed interface. We created a movie that captures the train departure from the station for a duration of 90 s. This video consists of 900 frames.

3.3 Playing Back Video by Manipulating Virtual Objects

We propose an interface in which users play back the spherical movie by swiping virtual objects. In this system, we use the spherical video of a departing sleeper train captured from a fixed point. The train moves linearly away from the camera. We can sense the finger touch interaction of the user obtained from the touch screen. When the user swipes the train, the frame of the video is switched depending on the extent of finger movement, considering that the train follows their finger. We implemented this mechanism by creating a virtual train model in the background and calculating its coordinates (Fig. 1). When the trajectory of the train is swiped on the display, this virtual train model moves linearly in the same direction as the finger. Even after the user touch action is complete, the manipulated model maintains its speed owing to inertia. A swipe action in the opposite direction enables the user to navigate the model to the opposite side. In order to allow users to experience the manipulation of a train according to their will, we implemented the interface by linking the coordinates of the virtual model with a suitable video frame. First, we configured three representative combinations of the position of the model with video frames; then, we linearly complemented the other combinations. The touch of a user is valid only when it is on the trajectory of the train, and touching any other region does not influence the playback of the movie. The train can be swiped in the opposite direction to enable reverse play. Based on a user study at a past exhibition at a real museum in Saitama, Japan, we found that many users swipe in the same direction repeatedly in order to accelerate the object. Hence, we reflect this user behavior by introducing a positive rate to increase the speed when the object is continuously swiped in the same direction. This interface is applicable only to the videos containing an object that moves definitely.

Fig. 1.
figure 1

Proposed system

Thus, we developed a system for interactive spherical videos based on touching and swiping objects in the video. Users can intuitively play and reverse play a spherical movie at any arbitrary speed, and they can stop at a desired frame easily by manipulating objects in the video. Further, users inevitably focus on a moving object and are expected to be attracted to it. We can reduce the oversight of an important object by designing it as a movable object.

4 Experiment at a Real Exhibition

We evaluated our proposed interface by conducting a large-scale demonstration experiment at a real exhibition at Gunma, Japan. The subjects were guests who attended this exhibition. The staff in the exhibition room handed out iPad Air 2 devices to the subjects and explained the method to experience our system. The subjects were free to terminate the viewing at their will. We used the spherical video of the departure of a sleeper train, as described in Sect. 3. The exhibition was held for 92 days. We displayed a system with our proposed interface for the initial 43 days and a system with a conventional interface for the remaining 49 days.

4.1 Application

Proposed System.

The subjects were able to play and reverse play the spherical movie at any speed voluntarily and stop at the desired frame easily by manipulating the objects in the video. This interface was not familiar to the subjects; therefore, a hand icon was shown on the display, and the icon moved to indicate touch interaction and the direction of swiping (Fig. 2). Further, the train was emphasized by turning on and off a red light before the first touch interaction.

Fig. 2.
figure 2

Screenshot of proposed system

Conventional System.

We created a system with a conventional interface for comparison with our proposed interface. The conventional interface consisted of a playback button and a seek bar (Fig. 3). Buttons for playing, reverse playing, and changing the playback speed were provided. The subjects were able to multiply the speed by 2, 4, 8, or 16 during play and reverse play. The seek bar enabled transition between video frames.

Fig. 3.
figure 3

Conventional video interface

4.2 Detailed Process

The subjects directed their iPad toward a scale railroad model of Hokutosei displayed in the exhibition room (Fig. 4). When the device camera captured the model, the subjects began to appreciate the spherical video. The subjects freely appreciated the spherical video and could quit at any point of time. The values measured by the gyro sensor and the video frames watched were recorded in the background. The total number of subjects who experienced the proposed interface was 1,169, and the number of subjects who experienced the conventional interface was 1,295.

Fig. 4.
figure 4

Scene at the exhibition

4.3 Results and Discussion

Figure 5 shows the number of times that the subjects interacted; the interaction was in the form of swiping and manipulating the virtual object for the proposed interface and touching buttons and the seek bar for the conventional approach. Although these numbers cannot be directly compared owing to the difference between the interfaces, it can be observed that the proposed interface induced frequent interaction and attracted user interest in the video, as expected. In the case of the conventional interface, 41 % of the subjects did not use the buttons and seek bar, and 77 % of the subjects used them less than twice; therefore, it is likely that the subjects watched the spherical video passively.

Fig. 5.
figure 5

Total amount of interaction

The subjects appreciated more video frames in the proposed interface than in the conventional interface. Figure 6 represents the total number of video frames watched in each experience. It is observed that the subjects watched more video frames in the proposed interface system than in the conventional interface system. Figure 7 indicates the distribution of the watched video frames for each interface. These graphs indicate the frames that are appreciated greatly. The subjects watched the entire video almost uniformly in the proposed interface system. Based on the slight increase toward the end, it can be inferred that some subjects played back the video in reverse and watched it again even after having watched the entire video. However, in the conventional interface system, the watched frames converge in the first half of the video. It is conjectured that passive video appreciation without appropriate interactions attracts little interest and causes interruption in the experience. Therefore, it is likely that our proposed system has an effect of retaining appreciation after the video ends.

Fig. 6.
figure 6

Total number of watched video frames

Fig. 7.
figure 7

Distribution of watched video frames

Figure 8 shows the percentage of subjects watching each frame of video. In the system with the conventional interface, the percentage steeply declines as the video progresses to the end; the percentage of subjects who played the video to the end was 37 %. In contrast, 71 % of the subjects watched the entire video in the proposed system. The reason for this result could be that the proposed interface retained user interest.

Fig. 8.
figure 8

Percentage of subjects watching each frame (Color figure online)

Figure 9 shows the distribution of the yaw angle of the iPad. The iPad and the virtual camera are linked, and these attitudes are determined by the yaw, pitch, and roll angles. The yaw angle represents the direction that the subjects are looking at in a spherical image because almost all the user rotation in appreciation is along the yaw angle. The difference between the interfaces is small; however, the rate of watching the main objects is slightly higher in the proposed interface. The reason for this result may be that the train was the only large, moving object in the video, and therefore, it stood out. In order to observe the actual effect, we must measure the appreciation activity for a video that has multiple objects or small objects. This experiment can be considered in future work.

Fig. 9.
figure 9

Distribution of yaw angle of iPad (Color figure online)

5 Conclusion

The design of suitable interaction is essential to achieve an effective VR experience. In this study, we proposed an interface for appreciation of spherical videos. With our interface, users are able to intuitively play back the video by touching and swiping a moving object in the video.

In our experiment at a real museum, it was observed that users interact with virtual objects more frequently. Further, users watched more video frames with the proposed interface, which is intuitive and enables easy manipulation of a spherical video, than with the conventional interface. In addition, the distribution of watched video frames was almost even in the case of the proposed interface. Consequently, in the case of the conventional interface, many users stopped their experience in the middle of a video playback, before the video ended; 37 % of users played the video to the end. However, in the case of the proposed interface, 71 % of the users watched the entire video. The reason for these results is that the experience through a conventional interface is passive and includes few interactions, whereas the experience through the proposed interface provides many interactions and attracts user interest effectively.

A limitation of this study is that the proposed interface cannot be applied to videos that do not have a linearly moving object. Such videos would require the acceptance of user interaction in the form of tapping or pinching in or out.

In future work, we plan to apply this interface to videos that contain multiple moving objects. A VR experience with active selection of interactive objects is subjective and unforgettable [1, 10]. We will determine the appropriate response of virtual objects to user interaction. Further, with the emergence of algorithms for calculating object trajectories by using optical flow fields between neighboring frames in the scene [11], we can apply our proposed interface to spherical VR systems such as virtual tour and live broadcasting to increase the attractiveness of these VR experiences.