Keywords

1 Introduction

In the last years, with the improvement of technology, many display devices are built with different resolution. This increases the request of image resizing techniques aimed to guarantee the quality of salient visual information. The aim of content-aware image resizing is the reduction of the overall number of pixel of a given image, while preserving the content and aspect ratio of the depicted objects. The problem of image retargeting is defined as follows. Given an image I of size \(H \times W\), the purpose is to map it in a new image \(I'\) of size \(H \times W'\) (\(H' \times W\) in horizontal case), with \(0< W'<W\) (\(0< H'<H\)), where \(W'\) is defined as \(W'=W-N\) (\(H'=H-N\)) and N is the number of paths to be removed. The two simplest techniques to resize an image are cropping and uniform scaling but they introduce deformation or distortion of the subjects. Moreover, these methods do not take into account the content of the image (i.e., the semantic).

In 2007, Avidan et al. [1] proposed the seam carving technique, which consists in finding proper pixel paths (called seams) which are related to background or other parts not related to the semantic of the picture. In the last years, several methods have been proposed. To establish the paths to be considered during the resizing, in [2], a method based on the Gradient Vector Flow (GVF) of the image is presented. The authors also proposed an approach which takes into account the visual saliency properties of the images, to find an optimal path in the resizing space. GVF, introduced in [14], is computed as a diffusion of the gradient vectors of a gray-level or binary edge map computed from the image. Xu et al. [15] proposed a method that generalises the GVF formulation, called Generalised Gradient Vector Flow (GGVF), to improve active contour (snake) convergence to long, thin boundary indentations, while maintaining other desirable properties of GVF. In particular, they add two weighting coefficients which can be dynamically changed in the image region. In [18] and [19] GGVF is improved in term of noise robustness, weak edge preserving and convergence, for the task of medical image segmentation. To solve the high computational cost of GVF, virtual electric field (VEF) [7] and its extension [16] have been proposed. The hypothesis of these methods is that each pixel of an image is an electron and all pixels generate a virtual electric filed.

Many approaches try to combine different techniques to resize images and define new metrics to measure the quality of proposed methods. In [5], an algorithm which iteratively applies seam carving, cropping, warping, and scaling is proposed. Structural Similarity Metric (i.e., SSIM) is adopted to measure the similarity between original and retargeted images. The work in [13] combines several resizing operators and defines a new image similarity measure which is used with a dynamic programming algorithm whereas in [12], the authors present a comprehensive perceptual study and analysis of image retargeting. The authors of [12] propose a metric that can predict human retargeting perception. A measure that simulates the human vision system is also proposed in [10]. In particular, global topological property is the core of the method and image scale space is considered to extract the global geometric structures from retargeted images. In [11], a real-time approach based on axis-aligned deformation space is introduced. It minimizes convex energy under feasible constraints with the aim to guarantee the convergence of the method and the quality of the results. In [6], a metric that measures the geometric distortion of a retargeted image based on the local variance of SIFT flow [9] vector fields of the image is presented. To measure the quality of retargeted image, the work in [8] proposes an objective quality assessment method which takes account the following factor: preservation of saliency regions, symmetry and global structure, influence od introduced artifacts and aesthetics.

In the last years, deep neural network models have been considered for image resizing. The work in [3] proposes a weakly- and self-supervised deep Convolutional Neural Network (CNN) that takes a source image and a target aspect ratio as input. In [17], it is presented a perceptually aware model that reduces the dimension of the original photo/video by deeply encoding human gaze shifting sequences. Even if CNN based methods show encouraging results, the end-to-end approach implemented by such encoder-decoder models creates a new image with a pre-defined aspect ratio, without any knowledge about the process that determined the pixels that have been removed.

In this paper, we present a new method for image retargeting which is based on GGVF. We assess and investigate the importance of one of the main involved parameter (K) of GGVF, which balances the smoothing term and data term. The proposed approach has been compared with respect to a method based on GVF [2] and a seam carving approach [1] for different values of percentage of resizing. Experimental results demonstrate the relation between K and the scale factor of retargeting. They also show that the proposed method is able to overcome some difficulties of method based on GVF.

The paper is organised as follows. In Sect. 2, the comparison between GVF and GGVF is introduced and our algorithm is detailed. Section 3 presents and discuss the results. Finally, conclusions and hints for future works are given in Sect. 4.

Fig. 1.
figure 1

GGVF of test image for different values of K. \(1^{th}\) column: K = 0.001, \(2^{th}\) column: K = 0.05, \(3^{th}\) column: K = 0.75, \(4^{th}\) column: K = 1, \(5^{th}\) column: K = 1.25.

2 Proposed Method

Gradient Vector Flow [14] is a force field \(\mathbf F \) of vector \(\mathbf v (x,y)=[\textit{u}(x,y),\textit{v}(x,y)]\) that minimizes the following energy function:

$$\begin{aligned} \begin{aligned} E&=\iint \mu (\textit{u}_x^2+\textit{u}_y^2+\textit{v}_x^2+\textit{v}_y^2)+|\nabla f|^2 |\mathbf v -\nabla f| dx dy\\&= \iint \mu \nabla ^2 \mathbf v +|\nabla f|^2 |\mathbf v -\nabla f| dx dy \end{aligned} \end{aligned}$$
(1)

where \(\mu \) is a regularisation parameter that controls the trade-off between the first term, called smoothing term, and the second term, named data term, in the integrand. The terms \(\textit{u}_x, \textit{v}_x, \textit{u}_y, \textit{v}_y\) indicate the partial derivatives along x and y axes, f is an edge map of the input image, \(|\nabla f|\) is the gradient of f and \(\nabla ^2\) is the Laplacian operator. If \(|\nabla f|\) is close to zero, the energy E in Eq. 1 is dominated by \(\mu \nabla ^2 \mathbf v \), hence GVF is a slowly varying field. On the other hand, when this quantity is large the values of GVF field are close to \(|\nabla f|\) and presents slow variations in homogeneous regions.

To solve the difficulty of GVF in driving a path into long and thin indentations that could be due to the smoothing of the field near the boundaries, \(\mu \) and \(|\nabla f|^2\) are replaced by generic weighting coefficients. Therefore, GGVF field [15] is the equilibrium solution of the following partial differential equation:

$$\begin{aligned} \mathbf v _t= g(|\nabla f|) \nabla ^2 \mathbf v - h(|\nabla f|) (\mathbf v -\nabla f). \end{aligned}$$
(2)

To preserve the proprieties of GVF, the weighting function \(g(\cdot )\) and \(h(\cdot )\) should be monotonically non-increasing and non-decreasing functions of \(|\nabla f|\), respectively. These coefficients are spatially varying, since they depend on the gradient of the edge map which is spatially dependent. In our experiments, the following function [15] are used:

$$\begin{aligned} g(|\nabla f|) = \exp {-(|\nabla f|/K)}, \end{aligned}$$
(3)
$$\begin{aligned} h(|\nabla f|) = 1- g(|\nabla f|), \end{aligned}$$
(4)

where the parameter K balances the smoothing term and data term. Hence, the deformation curve can converge rapidly in the flat field and protect weak borders. Figure 1 shows the output of GGVF applied on a test image for different values of K. As we can observe, the value of K affects the both the gradient distribution and intensity.

In this paper, the magnitude of GGVF is used to detect the seams to be removed. So, fixed K, the proposed algorithm computes GGVF and its normalisation from the input image I that was previously converted from RGB to grey scale. The seams are built starting from the top of the image and following the direction of the normalisation of GGVF, in order to preserve edges and propagates their contributions in the neighbouring pixels, by creating a repulsive field. A cost \(c_t\) is associated to each seam \(s_t\) by the following equation:

$$\begin{aligned} c_t=\sum _{(i,j)\in s_t} |GGVF(i,j)|. \end{aligned}$$
(5)

The seam with the lower cost is hence removed from the image at each iteration. The GGVF map is then updated and a new iteration of the seam removal algorithm is performed for each seam to be removed. Such heuristic is partially inspired by the work in [2].

To drive the selection of seam to be removed and to maintain the strong edges of the images and propagates their contributions also in their neighbouring, the proposed method exploits the properties of the GGVF field without considering all the possible paths, as GVF approach present in [2]. GGVF comprises two weighting functions that are dependent on the gradient of the edge map, this guarantees the dynamic change of the field in each image region.

Fig. 2.
figure 2

Example of image reduction with resizing percentage from 10% to 50% with seam carving (\(1^{th}\) row), GVF (\(2^{th}\) row) and GGVF with \(K = 0.75\) (\(3^{th}\) row).

Fig. 3.
figure 3

Examples of image resizing at 70% of the original width. Original image (\(1^{th}\) column), binary mask (\(2^{th}\) column), seams generated by our approach with \(K=1\) (\(3^{th}\) column), our result (\(4^{th}\) column), seams generated by GVF (\(5^{th}\) column), GVF result (\(6^{th}\) column), seams generated by seam carving (\(7^{th}\) column) and its result (\(8^{th}\) column).

Fig. 4.
figure 4

Examples of image resizing at 50% of the original width. Original image (\(1^{th}\) column), binary mask (\(2^{th}\) column), seams generated by our approach with \(K=0.05\) (\(3^{th}\) column), our result (\(4^{th}\) column), seams generated by GVF (\(5^{th}\) column) and GVF result (\(6^{th}\) column), seams generated by seam carving (\(7^{th}\) column) and its result (\(8^{th}\) column).

Fig. 5.
figure 5

Examples of image resizing at 40% of the original width. The original images are shown in the first column. The second column reports the resizing results obtained by applying GGVF and the related cost (i.e., Eq. 7). The third column shows the results obtained by GVF, whereas the fourth column reports the seam carving results. The last three columns show some details of the outputs obtained by GGVF, GVF and seam carving.

Fig. 6.
figure 6

Challenging cases by reducing image by 30%. The first column shows the original images, the second, third and fourth columns show the results obtained by applying the GGVF, GVF and seam carving approaches respectively. Each output image reports the results in terms of cost (i.e., Eq. 7). The best results are highlighted in green. (Color figure online)

Fig. 7.
figure 7

Experimental results in terms of \(Score_{1}\) (i.e., Eq. 6).

Fig. 8.
figure 8

Average GGVF performances in terms of \(Score_{1}\) computed over 1000 images at varying of percentage of resizing and K.

Fig. 9.
figure 9

\(Score_{2}\) obtained with Eq. 8.

Fig. 10.
figure 10

Average \(Score_{2}\) values achieved by the GGVF approach at varying of the resizing faction and the value of K.

3 Results

In the experimental evaluation, we compared the proposed method with respect to the GVF scheme paired with seam carving approach [2] and only seam carving technique [1] on a dataset used in [2] and [4] which is composed by 1000 images, including several scenes and objects which appear in multiple instances and in different locations of the image. For each image I, the dataset provides the ground-truth map which denotes the pixels of the areas containing the main salient objects (i.e., the parts of the image that we want to preserve after the resizing). In our experiments, we evaluated the GGVF algorithm with several values of K, namely 0.001, 0.05, 0.75, 1, 1.25 whereas the parameter \(\mu \) of GVF is set to 0.1 as in [2]. The three retargeting approaches have been tested at varying the percentage of resizing from 10% to 50%. Figure 2 shows the progressive resizing of a sample image.

Figures 3 and 4 report some image examples obtained by resizing images with a scale factor of 30% and 50%, respectively, with respect the original resolution of the processed image. The three algorithms have different behaviours. In particular, comparing the seems generated by the proposed algorithm (\(3^{th}\) column) and the ones generated by the GVF scheme (\(5^{th}\) column) or by the seam carving approach (\(7^{th}\) column), is possible to observe that the methods of the state of the art remove information from the object introducing deformations and distortions on the image, whereas the GGVF approach preserves the visual content of the scene by maintaining both size of the objects and the details related the visual stimuli of textures and edges.

To evaluate the performance of our algorithm for different values of K, the corresponding binary mask is used. Indeed, the same seams of the input image are removed from each mask and then the remaining pixels are counted. This number is compared with GVF results. More specifically, let N be the total number of images in the dataset (i.e., \(N=1000\)). Let \(T=\{x:n^{GGVF}\ge n^{GVF} \}\) be the set of images such that the number of pixels of the binary mask removed with our approach \(n^{GGVF}\) is greater or equal to the number of pixels removed with approach based on GVF \(n^{GVF}\). Based on these variables, the following evaluation score is computed:

$$\begin{aligned} Score_{1}=\frac{|T|}{N} \end{aligned}$$
(6)

where |T| is the cardinality of set T, and N is the total number of images in the dataset.

Figures 7 and 8 show the obtained scores for each evaluation setting and the trend of this evaluation score by varying the value of K. The achieved results suggest that the best values of K are 0.75 and 1 if the percentages of resizing are in the range [10%–30%], whereas for larger scale factor (40% or 50%), the best values of parameter K are 0.05 and 0.001, respectively. Therefore, it seems that there is an inversely proportional relationship between K and the percentage of resizing.

Furthermore, for each i-th image, we considered the number of pixels in its binary mask \(p_i^{bm}\) and the number of successfully preserved pixels after the application of the Seam Carving (SC), the GVF and the GGVF methods, denoted as \(n_i^{SC}\), \(n_i^{GVF}\) and \(n_i^{GGVF}\) respectively. The quality of a resized image is evaluated by considering the ratio between \(n_i^{m}\) and \(p_i^{bm}\):

$$\begin{aligned} q_i^{m}=\frac{n_i^{m}}{p_i^{bm}} \end{aligned}$$
(7)

where \(m\in \{SC, GVF, GGVF\}\) is the resizing method applied to the input image. Based on these definitions, the following evaluation score is computed:

$$\begin{aligned} Score_{2}=\frac{1}{N}\sum \limits _{i=1}^{|T|}q_i^{m} \end{aligned}$$
(8)

Figure 9 shows the achieved experimental results in terms of average \(Score_{2}\), by varying the resizing factor and the value of K. Figure 10 shows how the value of K affects the performances, depending on the resizing factor. The achieved results suggest that there is a relationship between K and the percentage of resizing. However, when the resizing factor is set to extreme values, the performances start to decrease after a certain value of K (see Fig. 10).

Figure 5 shows three examples with a scale factor of 40%. The \(2^{th}\) and \(4^{th}\) columns show the results obtained by GGVF (with the best choice for K), by GVF and by Seam Carving respectively. The values reported under each image are the cost obtained with Eq. 7. The \(5^{th}\) column highlights how our approach better preserves the main object of the input image with respect to other algorithms (\(6^{th}\) and \(7^{th}\) column). Although the proposed method achieves interesting performances compared to the state of the art approaches, some challenging cases have been found, as shown in the Fig. 6. As we can observe, GGVF, GVF and seam carving methods do not preserve the main object introducing distortions with respect to the original image. However, the performances in terms of cost (i.e., Eq. 7) show that the proposed approach still achieves better performances compared to GVF.

4 Conclusions

This paper addresses the problem of content-aware image resizing. The proposed work evaluates the generalised version of the Gradient Vector Flow approach (i.e., GGVF) which allows the adaptation of the algorithm parameters. Indeed, the experiments shown that with a proper parametrization, the GVF and seam carving approaches are outperformed by its generalised version. According to our hypothesis, the GGVF can be controlled by varying the parameter K. Moreover, this parameter can be properly tuned based on the percentage of resizing. Our experiments demonstrated that a good choice of K can be a critical factor, and that there is a relationship between the percentage of resizing and the optimal K value. Moreover, our experiments considered extreme percentage values of resizing, with the aim to observe the behaviour of such relationship for extreme values. The results revealed that, for reasonable resizing factors (i.e., from 10% to 30%), the performances increase by augmenting the value of K. At a certain point, augmenting the value of K does not provide substantial improvements. However, when the resizing factor is set to extreme values (i.e., 40% to 50%), the algorithm is forced to remove a large amount of seams. As result, the algorithm removes some pixels related to the objects that we want to preserve.

In this paper, the best K has been obtained empirically for each considered percentage of resizing. In the future works, methods to automatically determine the best K will be investigated. Future experiments will include horizontal paths in the resizing process, in order to further improve the method performances. Furthermore, the exploitation of saliency maps in the algorithm will be also evaluated.