Keywords

1 Introduction

Human beings typically choose to direct their attention to a region of interest (ROI) on the basis of the information obtained from their peripheral vision [1, 2]. In a system that supports the activity of humans, a visual attention retargeting method that naturally guides a human’s gaze to an ROI is required for realizing the natural interaction between the system and the human. A user can easily find and access the necessary and important information by guiding his/her gaze so that improvement in the usability of the system can be expected. Therefore, it is commonly believed that guiding a human’s gaze to a particular region allows many types of human activities to be effectively facilitated and directed [3].

Humans choose important information from an enormous volume of visual information; this is called “visual attention.” The traditional attention retargeting approach that is used in television and movies is to present visual stimuli such as arrows or a bounding box in the peripheral visual field [1]. This traditional approach is more coercive than effective from a viewer’s standpoint. A better approach would be to smoothly and effectively direct a human’s attention toward an ROI without impeding his/her current visual attention.

Several attention retargeting methods that use visual saliency maps to guide a human’s attention to an ROI have recently been proposed, which are divided into two groups: color-based methods and orientation-based methods [4,5,6,7,8,9]. The color-based methods modify each color component so that the visual saliency inside the ROI increases, whereas that outside the ROI decreases. The advantage of this method is that it generally guides the attention to the ROI while maintaining a high resolution for the non-ROI. Veas et al. [4] proposed a saliency modulation technique that prompts attention shifts and influences the recall of the ROI without a perceptible change to the visual input. Mendez et al. [5] proposed a method for dynamically directing a viewer’s gaze by analyzing and modulating the bottom-up salient features. Recently, Takimoto et al. [6] used a novel saliency analysis and color modulation to create modified images in which the ROI is the most salient region in the entire image. The proposed saliency map model that is used during the saliency analysis reduces the computational costs and improves the naturalness of the image by using the L*a*b* color space and a simplified normalization method. On the other hand, the orientation-based methods guide the viewer’s attention to a non-blurred region by blurring the colors outside the specified region. Hitomi et al. [7] proposed a saliency map based on a wavelet transform and an image modification method to direct a viewer’s gaze to a given region in an image. These methods adaptively modified the visual features based on bottom-up visual attention by reverse engineering a typical visual saliency map model. However, these attention retargeting methods focus only on modulating an image or a movie that is presented on a display device. Therefore, it is difficult to guide a viewer’s gaze toward any region in real space.

In this paper, we propose a novel attention retargeting method used with a projector-camera system for realizing attention retargeting in real space. We focus on the realization of an appearance control method for attention retargeting to the plane of a real space as a first step. First, we capture the region where we can control the appearance by using a projector-camera system. Second, a target image that matches the ideal saliency map of the ROI that has the highest saliency is created from the captured image by reverse engineering a visual saliency model for bottom-up attention. Third, we calculate an optimum projection pattern for attention retargeting by using a projector-camera dynamic feedback system.

Fig. 1.
figure 1

Flowchart of the proposed method

2 Proposed Method

The aim of this study was to create an effective attention retargeting method in real space that is strictly based on a bottom-up computational model of visual attention by using a projector-camera system. Our method consists of three phases: calibration of the projector-camera system, image modulation for saliency enhancement, and pattern projection. A flowchart of the proposed method is shown in Fig. 1.

2.1 Calibration of the Projector-Camera System

It is necessary to estimate the relationship between the projector and the camera in a projector-camera system. We project and capture two simple patterns to determine the correspondence between the projected pattern and each pixel in the captured image.

First, a whole black image and a whole white image are continuously projected and are then captured as images \(I_B\) and \(I_W\), respectively. Second, the subtracted image \(I_S\) is calculated from both the \(I_B\) and \(I_W\) images. From the subtracted image \(I_S\), the four corners of the projected region can be easily detected by using a snake model combined with a Hough transform.

Finally, the projected region in the captured image \(I_B\) is extracted from the coordinates of the detected four corners. In addition, a projective transformation is applied to the extracted region so that the region has the same size as that of the projected pattern. Henceforth, this transformed image \(T_B\) is defined as the actual and original appearance. The term “original” means any pattern that is not projected. The transformed image \(T_W\) is created from the captured image \(I_W\) in the same way as for \(T_B\).

2.2 Image Modulation

Visual saliency may be defined as an estimation of how likely a given region can attract human visual attention. Itti et al. [10] proposed a computational model of visual saliency based on Koch and Ullman’s early vision model [11]. They demonstrated in their study, wherein they measured actual human gazes, that a saliency map matches well with the distribution of actual human attention. Therefore, an ROI with high saliency can attract attention if we adjust the features of the whole image on the basis of a saliency map.

We can indirectly adjust the saliency of the original image by changing each RGB component for guiding visual attention, which is done by reverse engineering a visual saliency model for bottom-up attention. To achieve this, our proposed method repeats two phases: saliency analysis and color modulation. In the first phase, we create a visual saliency map from the input image, and in the second phase, we modulate the color components by using the obtained saliency map.

A basic concept of our color modulation method is that the saliency inside the ROI increases, whereas that outside the ROI decreases by iteratively modulating the RGB color components. The procedures of our color modulation method based on a saliency map are as follows. In the preprocessing step, a user selects an ROI where he/she wants to guide the viewer’s attention to. In addition, a target image T, which is the initial image used for image modulation, is calculated by averaging \(T_B\) and \(T_W\). Let \(T^t\) be the modulated image updated t times from T. Let \(k_{ij}^t\) be the color component \(k_{ij} ~ (k \in \{R, G, B\})\) of the input image \(T^t\) at pixel (ij).

 

Step 1: :

The saliency map \(SM^t\) of image \(T^t\) is calculated.

Step 2: :

The intensity coefficient \(w_{ij}^t\) and the modification value \(Q_{(k,ij)}^t\) are calculated.

Step 3: :

Each pixel value \(k_{ij}^t\) is temporarily modulated by the following equation:

$$\begin{aligned} k^{t+1}_{ij} = \left\{ \begin{array}{ll} k^{0,B}_{ij}&{} ~~k^{M}_{ij} \le k^{0,B}_{ij}\\ k^{M}_{ij}&{} ~~k^{0,B}_{ij}< k^{M}_{ij}<k^{0,W}_{ij}\\ k^{0,W}_{ij}&{}~~ \mathrm{otherwise} \end{array} \right. \end{aligned}$$
(1)
$$\begin{aligned} k^{M}_{ij}= & {} k^{t}_{ij} + \alpha w_{ij}^t Q_{(k,ij)}^t \end{aligned}$$
(2)

where \(\alpha \) is the weight coefficient used for color modulation and \(k^{0,B}_{ij}\) and \(k^{0,W}_{ij}\) are the color component of \(T_B\) and \(T_W\) at pixel (ij), respectively. Even though the processing time decreases with the increase in the parameter \(\alpha \), the image quality gradually decreases because there is a trade-off between these two. Therefore, this parameter is optimized by a subjective experiment.

Step 4: :

If the saliency \(SM^{t+1}\) inside the ROI is the highest in the modulated image \(T^{t+1}\), image modulation is finished after the following equation is applied; otherwise, \(k_{ij}^t\) is set to \(k_{ij}^{t+1}\) and the procedure goes back to Step 1.

 

The intensity coefficient \(w_{ij}^t\), which is the weight of the modulation values of each pixel in the target image \(T^t\), is defined by

$$\begin{aligned} w_{ij}= & {} \left\{ \begin{array}{ll} \overline{SM}_\mathrm{ROI} &{} ~~~ (i,j)\in \mathrm{ROI} \\ -S_{ij} &{} ~~~\mathrm{otherwise} \\ \end{array} \right. \end{aligned}$$
(3)

Here,

$$\begin{aligned} \overline{SM}_\mathrm{ROI} = \frac{1}{m}\sum _{(i,j)\in \mathrm{ROI}}SM_{ij} \end{aligned}$$
(4)

where m is the number of pixels in the ROI.

On the other hand, the modification value \(Q_{(k,ij)}^t\) is defined by reverse engineering the saliency map calculation. \(Q_{(k,ij)}^t\) reflects how much a feature influences the saliency and is obtained by back-calculating the saliency map. This indicates the influence rate of the saliency for each color component.

By using the proposed image modulation method, we can obtain the modified image \(T^{Prop}\), which is the ideal appearance for attention retargeting.

2.3 Pattern Projection

We calculate a pattern that is projected onto a plane. Here, the projector and camera devices may have nonlinear characteristics such as gamma characteristics. In addition, the pattern light projected from a projector may be attenuated before it arrives at the plane. For these reasons, it is difficult to change the actual appearance to the optimum appearance by projecting a subtracted pattern between the actual appearance \(T_B\) and the ideal appearance \(T^{Prop}\) onto the actual plane only once.

Therefore, the actual appearance is imitated to look like the ideal appearance \(T^{Prop}\) by iteratively calculating the optimum projection pattern on the basis of the projector-camera feedback system. The procedures of the pattern calculation are as follows.

 

Step 1: :

The projection pattern P between the actual appearance \(T_B\) and the ideal appearance \(T^{Prop}\) is calculated.

$$\begin{aligned} P= & {} T_{B} \ominus T^{Prop} \end{aligned}$$
(5)

where \(\ominus \) indicates the corresponding pixel-wise subtraction.

Step 2: :

The current appearance is captured after the subtracted pattern P is projected onto the plane. The captured appearance, which is the actual appearance, is transformed by projective transformation as \(T_{cap}^{act}\).

Step 3: :

The subtracted pattern D between the captured image \(T_{cap}^{act}\) and the ideal appearance \(T^{Prop}\) is calculated.

$$\begin{aligned} D= & {} T^{Prop} \ominus T_{cap} \end{aligned}$$
(6)
Step 4: :

If the following conditions are satisfied for D, the iterative projection is finished:

$$\begin{aligned} \frac{\sum ^i \sum ^j D_{ij} }{n} < Th \end{aligned}$$
(7)

where n is the number of pixels in D and Th is the threshold. On the other hand, if the condition is not satisfied, the projection pattern P is updated by applying the following equation and the procedure returns to Step 2.

$$\begin{aligned} P= & {} P \oplus D \end{aligned}$$
(8)

where \(\oplus \) indicates the corresponding pixel-wise summation.

 

Fig. 2.
figure 2

Examples of the target image, result of the proposed method, result of the conventional approach, and their saliency maps

Table 1. Example of the saliency analysis inside/outside the ROI

3 Experiments

3.1 Experimental Setup

To show the effectiveness of the proposed attention retargeting method, we compared our method to a conventional technique. In the conventional technique, we projected a white pattern only on the ROI. In other words, this approach is like a spotlight.

We employed an EPSON EB-935W projector and a Logicool B910 HD webcam. In this experiment, nine A4-sized pictures were arranged on a gray board. The target image \(T_{cap}\) and its saliency map \(SM(T_{cap})\) are shown in Fig. 2. In the saliency map, a whitish pixel indicated that the saliency was high. The watermelon in the upper left part of the image was chosen as the ROI.

3.2 Experimental Results and Discussion

Bottom-up attention induced by visual features obtained from a visual stimulus dominantly influences visual attention in the early stages, i.e., immediately after the visual stimulus is presented. Itti et al. [10] proposed a visual saliency computation model based on the early vision model proposed by Koch and Ullman [11]. Using human gaze measurements, they demonstrated that their saliency map matches well with the distribution of actual human attention. Therefore, we evaluated the effectiveness of each method for attention retargeting by using a saliency map.

The result of the proposed method \(T^{Prop}_{cap}\), the result of the conventional method \(T^{Spot}_{cap}\), and their saliency maps are shown in Fig. 2(a) and (d). The detailed results are listed in Table 1. In this table, the average, maximum, and minimum values of the saliency map inside or outside the ROI are shown.

The average value of the proposed method \(SM(T^{Prop}_{cap})\) inside the ROI was higher than that of the spotlight \(SM(T^{Spot}_{cap})\). In addition, the average of the proposed method \(SM(T^{Prop}_{cap})\) outside the ROI was lower than that of the spotlight. Here, the larger the difference was between the saliency inside the ROI and that outside the ROI, the easier it was to direct a viewer’s attention to the ROI. The effectiveness of the proposed method for attention retargeting was sufficient compared with that of the conventional approach.

4 Conclusions

In this paper, we proposed a novel attention retargeting method used with a projector-camera system to realize attention retargeting in real space. We focused on an actual appearance control method for attention retargeting to the plane of a real space as a first step. On the basis of the evaluations results, we have confirmed that the proposed method achieved efficient and effective attention retargeting compared with the conventional approach. It is necessary to evaluate the effectiveness of the attention retargeting method by using an eye tracking system as a future work.

This research was partially supported by a Grant-in-Aid for Scientific Research (C) from the Japan Society for the Promotion of Science (grant no. 15K00282).