Keywords

1 Introduction

Content-aware image warping and retargeting has drawn significant attention so that many studies have been proposed in recent years.

Broadly speaking, the main target application in existing literature has almost exclusively been the change of scale or aspect ratio of images, producing “retargeted” outputs that kept in a tight rectangular image the unmodified perceptually salient features of the source image, while eliminating or distorting the less salient portions. Common to these applications is a constrained rectangular boundary resulting in induced deformations that are mostly axis-aligned. Emblematic of this class of approaches is seam carving [3], which allows to shrink or expand images by removing or adding whole columns or rows of pixel found through a minimum saliency path from the top to bottom or left to right boundaries of the image.

Existing image retargeting approaches can be classified into two categories: discrete and continuous [10]. Discrete approaches alter the image size by eliminating pixels through cropping or seam carving. Recently, Rubinstein et al. [9] presented a multi-operator algorithm that combines cropping, linear scaling, and seam carving while Pritch et al. [8] remove repeated patterns in homogeneous regions.

Fig. 1.
figure 1

Map image courtesy of State Archive of Venice.

Example of an historical map georeferencing process. Left: original map (1682, Sebastiano Alberti, ASV). Right: georeferenced map with respect to 15 manually-defined control points overlapped to a satellite view of the area (Google Earth basemap).

Conversely, continuous approaches optimize mapping or warping using smoothness and salient-region preserving constraints to retain perceptual content. Wolf et al. [12] retargeted an image by merging less important pixels to reduce distortion. In this way, the distortion is propagated only along the resizing direction. Wang et al. [11] warp local regions to match optimal scaling factors, thus distributing the distortion in all directions. However, large salient objects may undergo inconsistent deformation throughout their extent. To ease this problem, Zhang et al. [13] and Guo et al. [6] force highly salient objects to undergo similarity transformations when resizing images, resulting in good preservation of local shape. However, inconsistent deformations can occur along elongated structures. Lin et al. [7] propose an approach to preserve both the visually salient objects and structure lines by constraining patches of high saliency to undergo similarity transformations. It is noteworthy to say that all the approaches are designed to give their best on photographic content rather than human-made drawings.

In this paper we address a different application for Content-aware image warping: the georeferencing of historical maps (Fig. 1). Historical maps provide vital evidence to scholars in disciplines as diverse as history, archaeology, and environmental sciences, which routinely use them to identify and map information on past landscapes that is no more available on the ground and to compare their content with current geomorphological and spatial data within GIS platforms, where historical maps can be georeferenced. A number of tools available in these platforms enable warping images to fit control points to known coordinates; generally, however, these are limited to global context-independent transformations such as polynomial or spline based image warping. These global transformations provide sufficiently accurate results with historical maps produced starting from beginning/middle of 18th century, which where originated using more or less accurate topographic criteria.

Maps created in previous centuries, instead, where generally conceived more like a pictorial depiction of the landscape rather than a rigorous topographical representation of it. In this case, global context-independent transformations tend to destroy the information-rich pictorial content, failing to warp the features over the actual landscape due to the limited degrees of freedom of the transformations available.

A retargeting process enabling the georeferencing of historical maps will differ from traditional retargeting approaches in several key elements. First, the transformation is driven by control points resulting in different boundary conditions: Dirichelet for the control points and Neumann for the actual boundary. Second, salient visual content must be preserved after the transformation. Third, the map has an orientation, so the warp should be smooth and non-decreasing in both horizontal and vertical directions. Fourth, each iconographic element should maintain the original shape and orientation. (eg. the compass rose present in many historical maps should preserve its original direction and be minimally distorted). Finally, maps are rich in linear elements, which can be stretched along their direction, but not orthogonally. Hence, context-based constraint should be directional.

In accordance to the these requirements, our approach provides three novel contributions. First, we defined a continuous image retargeting approach based on a set of control points. Second, we consider a tensor-based characterization of the saliency map to favor or penalize strong warping across salient edges. In other words, we generalize the saliency map from a scalar to a tensor field to control both the amount of stretching and its directionality among the image domain. Finally, we avoid the possible fold overs by embedding isotonic regression during the variational map optimization. This enables extreme stretches on possibly small parts of the images, avoiding folds and guaranteeing orientation and monotonicity of the map.

2 Problem Formulation

We suppose to have a source image \(I_s\) defined over a spatial scale-independent domain \(\varOmega _s \subset \mathbb {R}^2\). Our goal is to transform the image on a new scale-independent domain \(\varOmega _t\) (which can be affinely mapped to world coordinates) according to some point-to-point correspondences \(C_p=\{(p_1,q_1), (p_2,q_2), \ldots , (p_n \in \varOmega _s, q_n \in \varOmega _t)\}\) that are manually defined with respect of some visual features that a human operator has identified in \(I_s\). In Fig. 1 (Left) we have a typical example of an historical map held at the State Archive of Venice and drawn by geographer Sebastiano Alberti in 1682. To georeference the picture, an algorithm should ideally map each pixel of \(I_s\) to a new world coordinate system so that the geographical features (like the rivers, islands, city locations etc.) are mapped to their corresponding places. Since the historical map was not intended for cartographic purposes, local scale of the pictorial content is subject to a certain degree of freedom introduced by the artist. In other words, we do not expect to have a single explicit function to map the whole image into the world reference frame. In this specific example, distances between the cities and the actual shape of the shoreline is not consistent across the image.

Our objective is to find a function \(f: \varOmega _t \rightarrow \mathbb {R}^2\) defining the displacement of each point from target to source domain. This allows the creation of a target image \(I_t\) by interpolating all the points of the source image with the mapping function \(M(q)=q+f(q)\) such that

$$\begin{aligned} I_t(q) = I_s\big ( M(q) \big ) \end{aligned}$$
(1)

and \(M( q_i ) = p_i \;\; \forall (p_i,q_i) \in Cp\) (i.e. all the control points are mapped exactly between source and destination). In computer-vision terms, f is the optical-flow function densely mapping pixels from \(I_t\) to \(I_s\).

2.1 Saliency Tensor Map

The fact that the important pictorial information should be preserved when transforming the source image implies a concept of perceptually salient content that must be identified a-priori. In the field of image retargeting, the problem is usually addressed by defining a saliency map \(S(p): \varOmega _s \rightarrow [0,1]\) to quantify the importance of each point of the source image. Such saliency map can be computed automatically from the source image [2] or obtained by directly drawing on it to customize the importance of each pixel [3]. In both the cases, the saliency map is a scalar field over the image domain used to weight the amount of stretching allowed in an area.

Since we require directional context-based constraints, we explicitly model a possible non isotropic response of the saliency map. In other words, the amount of stretching is made not only spatially but also directionally dependent. We start by considering a scalar saliency field S provided by the user to create a rank-2 tensor field T on \(\varOmega _t\). Such tensor assign to each point \(q \in \varOmega _t\) a \(2 \times 2\) matrix defined as:

$$\begin{aligned} T(q) = k_1 S\big (M(q)\big ) \mathbf {I} + k_2 H(q),\;\; H= W \star \big (\nabla I_t \nabla I_t^T\big ) \end{aligned}$$
(2)

where \(k_1\) and \(k_2\) are two parameters to weight the scalar and directional contribution of the saliency, \(\mathbf {I}\) is the identity matrix and H is a structure tensor computed over a window W.

Fig. 2.
figure 2

Effect of the saliency map in our approach. (a) Synthetically generated source image showing a regular grid. (b) A toy-sample saliency map with high values for the pixels corresponding with the coloured square on the source image. (c) Retargeted image with constant saliency (i.e. all the pixels have the same visual importance). Retargeted image using the saliency map in (b) (Color figure online)

3 Variational Functional

We pose our retargeting problem as the minimization of the functional:

$$\begin{aligned} E(f)=\int _{\varOmega _t} {{\,\mathrm{trace}\,}}\bigg ( \mathbf {J}_f^T T(q) \mathbf {J}_f \bigg ) dq \end{aligned}$$
(3)

where \(\mathbf {J}_f = \begin{pmatrix} \partial f_x/\partial x &{} \partial f_x/\partial y \\ \partial f_y/\partial x &{} \partial f_y/\partial y \end{pmatrix}\) is the Jacobian of f. Equation (3) leads to a variational optimization problem in which we set Neumann boundary conditions for the image borders and Dirichlet boundary conditions \(f(q_i)=p_i-q_i\) on the control points. The idea is to find a displacement function f between source and target domains with pre-defined values on the control points and for which the first-order partial derivatives (i.e. the local stretching caused by the mapping) is small in a quadratic sense. Additionally, the amount of stretching allowed in each direction is modulated by the saliency tensor field computed as in (2). According to the calculus of variations, E(f) has a stationary value if the corresponding Euler-Lagrange equations:

(4)

are satisfied. This lead to a set of PDEs which are solved numerically as described in the following sections.

3.1 Special Cases

In its formulation, our method generalizes other variational warp-based image retargeting techniques [5, 10] but with two main differences. First, the Dirichlet boundary conditions are set on the control points and not the image boundaries which are free to overshoot or undershoot the source image borders. Second, the saliency map is directional depending on the linear features of the source image itself.

If we set \(k_2=0\), we eliminate the latter so that the functional (3) can be rewritten as:

$$\begin{aligned} \int k_1 S\big (M(q)\big )\bigg [ \bigg ( \frac{\partial f_x}{\partial x} \bigg )^2 + \bigg ( \frac{\partial f_x}{\partial y} \bigg )^2 + \bigg ( \frac{\partial f_y}{\partial x} \bigg )^2 + \bigg ( \frac{\partial f_y}{\partial y} \bigg )^2 \bigg ] dq. \end{aligned}$$

To simplify, we assume to already have a saliency map \(\hat{S}(q)\), in target space, giving a reasonable approximation of \(S\big (M(q)\big )\). According to the Euler-Lagrange equations, the resulting system of PDEs is:

(5)

where \(\varDelta \) is the Laplace operator. Since the off-diagonal elements of T(q) are equal to 0, each equation depends only on the horizontal and vertical component of f respectively and thus can be computed in parallel as set of elliptic PDEs. The only case in which \(\hat{S}(q)=S\big (M(q)\big )\) is when the saliency is constant everywhere. In this case, the deformation is not saliency aware but only depends on the local deformation of the control points (Fig. 2(c)). For all the other cases, we start from an initial approximation of f and we alternate the computation of f and \(\hat{S}(q)\) until convergence (See Sect. 4 for details).

4 Numerical Solution

To solve the general problem numerically, we consider a discrete approximation of \(\varOmega _s\) and \(\varOmega _t\) into a regular grid with arbitrary resolution, depending on the required level of detail. Consequently, we transform f into a \(M \times N \times 2\) tensor and transform the functional (3) to:

$$\begin{aligned} \begin{aligned} E(f)=\sum _{u=0}^{N-1} \sum _{v=0}^{M-1}&{{\,\mathrm{trace}\,}}\bigg ( \mathbf {J}_f^T \hat{T}(u,v) \mathbf {J}_f \bigg ) \\&+ \sum _{(p,q) \in C_p} \alpha \big (p-M(q) \big )^2. \end{aligned} \end{aligned}$$
(6)

To let the computation more stable, we substituted the Dirichlet boundary conditions on the control points with a soft L2 constraint on the flow values on the control points. Note that, while with this formulation the exact mapping of control points is not guaranteed, for high values of \(\alpha \) the effect is negligible. The corresponding Euler-Lagrange equations are thus:

(7)

where \(\delta \) is a mixture of Kronecker Deltas centered at \(q_1 \ldots q_n\) and \(\hat{f}\) is the known value of f at the control points. We consider an initial approximation of f obtained by interpolating the control points via multiquadric radial basis functions and iteratively modify f over time so that \(\partial f_x/\partial t = -L_x\) and \(\partial f_y/\partial t = -L_y\). We use a discrete approximation of the Laplace operator computed by convolution and forward finite differences for the time differentiation to solve the PDEs via forward Euler scheme. At each step t, we consider the function computed at previous step \(f^{(t-1)}\) to obtain \(\hat{T}\) according to Eq. (2).

Fig. 3.
figure 3

Effect of the isotonic regression on the resulting distortion map. Left: with no monotonicity constraint singular solutions may cause bubbles around a control point. Right: The isotonic regression step force the resulting mapping to a non-decreasing solution thus removing such artifacts.

4.1 Avoiding the Fold Overs

Variational functional of Eq. (6) pose only constraints on the smoothness of the obtained warp but not on its monotonicity. Before starting the optimization, we compute an initial global affine transformation of the control points so that global rotations or flips of the source domain with respect to target domain are eliminated. After that, we expect that \(M(u,v) \leqslant M(u+1,v)\) and \(M(u,v) \leqslant M(u,v+1)\) for each point of the domain. If such constraint is violated, the resulting topology of the retargeted \(I_t\) will be no longer consistent with the original image lattice. In particular, we obtain flip overs on the warped image that manifest themselves as bubbles around the control points (See Fig. 3).

Fig. 4.
figure 4

The saliency map of Faventini 1734 (Left) and Unknown 1624 (Right) used for our experiments

To avoid this, during the optimization we alternate the computation of f with a projection to a new space to obtain the closest approximation \(f^i\) of f (in L2 terms) but with the additional constraint that each row of \(f_y^i\) and each column of \(f_x^i\) are non-decreasing. This is performed by using the isotonic regression technique [4] applied in parallel to the rows and columns of f. Overall, the numerical optimization is performed by alternating the following steps:

  1. 1.

    Let \(f^0\) be an initial interpolation of the control points using RBF

  2. 2.

    Compute \(\hat{T}\) as in Eq. (2) using \(f^{(t-1)}\) to approximate M and \(\nabla I_t\)

  3. 3.

    Use forward Euler schema to compute \(f^{(t)}\)

  4. 4.

    Substitute \(f^{(t)}\) with its non-decreasing approximation \(f^i\) obtained via isotonic regression

  5. 5.

    Return to step 2 until convergence.

5 Experimental Evaluation

We qualitatively tested our approach on three historical maps held at the State Archive of Venice (Italy). The first, (Fig. 5, left), drawn by Giovanni Battista Faventini in 1734, represents the costal line of the Marano and Grado’s lagoon (NE Italy) and the mainland comprised between Tagliamento and Isonzo riversFootnote 1; the second (Fig. 5, right), drawn by unknown artist in 1624 circa, characterizes the town of Grado and surrounding territories up to Terzo and Fiumicello villagesFootnote 2. The third map, (Fig. 7, top-left) also sketched by unknown artist in the 17th century, depicts the lower part of the Friuli region up to the marshes surrounding Grado. These maps were chosen as good example of map content rich of iconographic elements and loosely accurate with respect to the real cartographic features.

Fig. 5.
figure 5

Images courtesy of State Archive of Venice. (Color figure online)

Comparison between our algorithm and point-based warping methods for two different historical maps (left column: Faventini, 1734; right column: Unknown, 1624). First row: original map with the control points and saliency map superimposed. Second row: Second degree polynomial warping. Third row: Thin-plate spline warping. Fourth row: our proposed method. Note how our retargeting approach enables a better georeferentiation of the input picture while minimizing local distortions of the symbolism, geomorphology and the place names.

Fig. 6.
figure 6

Detail of the local stretching affecting the compass rose on the bottom of the Faventini 1734 map shown in Fig. 5 after processing with TPS (Left) and our method (Right).

Fig. 7.
figure 7

Images courtesy of State Archive of Venice (Raccolta Terkuz, n. 34)

Retargeting result on a complex pictorial map for which the control point placement causes a highly non-uniform distortion. Top-left: Original image. Top-right: Saliency map in which we manually gave high importance to the pictorial content of the map (see the white blob corresponding to the compass rose). Bottom-left: warping via cubic thin-plate spline. Bottom-right: retargeting computed with our proposed method. Note how our result is less “extreme” while still preserving the correct positioning of the elements. Moreover, the purely pictorial content expressed by the saliency map is retargeted with almost no distortion.

In all the cases, the saliency map was generated in a semi-supervised way. Specifically, we first computed the Difference of Gaussian of the input images and then re-normalized the result in range \([0 \ldots 1]\). Then, we manually painted on top of the saliency map the areas in which interesting iconographic features (like the compass rose or the place names) were present. For reference, saliency maps for the images used in our experiments are shown in Figs. 4 and 7 (top-right) respectively. Note that the intended usage of the saliency is not to define areas of the historical map in which the cartographic information is more accurate but simply to mark the elements (mostly iconographic) for which the local deformation should be minimum. We ran our optimization using \(k_1=1.0\), \(k_2=10\), \(\alpha =100\) and a Gaussian \(5\times 5\) window W for the computation of structure tensor in all the cases.

In Fig. 5 we compare the result obtained with our technique (bottom row) against two common image warping techniques implemented in QGIS, namely: second degree polynomial and thin plate splines (TPS). We superimposed the actual national base map of the area (red lines) to highlight the overall map fitting accuracy to the geographical features. Since both polynomial and TPS are designed to work with cartographic maps and satellite images, they give the same importance to each pixel of the map. This result in often too exaggerated distortions in the attempt to satisfy each control point constraint. In both the cases, our approach gives a more natural result minimizing the local stretch for areas both far from the control points or subject to low visual saliency.

To better highlight this behavior, in Fig. 6 we show a magnification of the bottom area of the first map comprising both the shoreline and purely iconographic elements (i.e. the compass rose and the scale bar). While the shoreline correctly overlaps the actual cartography in both the cases, the shape and orientation of the two iconographic elements are better preserved with our method. In particular, the direction of the compass rose is correctly aligned with the cardinal directions and the scale bar is maintained horizontal and free to strong deformations. This is a peculiar feature of our method, that can exploit a saliency tensor field to selectively limit the distortion on specific parts of the map. In fact, we observe an actual re-positioning of all the high-saliency elements of the map more than a typical warping we obtain with interpolation approaches commonly implemented in GIS. This is further shown in Fig. 7 where we compare our method against cubic thin-plate splines on what we consider the most challenging historical map in our set.

Fig. 8.
figure 8

Detail of the local stretching of the upper part of the 1624 (Unknown) map shown in Fig. 5 (right column) throughout the optimization. From left to right: retargeted image after 0, 5000, 20000 and 40000 iterations. The saliency tensor allows the rivers to warp more freely along their extent instead of between the banks. This way, the thickness of the linear features is preserved. Note also how the appearance of the two cities evolve toward a low distortion configuration without changing their relative position in the georeferenced image.

To visualize the evolution of the flow during the optimization, in Fig. 8 we show a magnification of the top area of the map shown in Fig. 5 (Left) at different iterations of our algorithm. The process starts with an RBF interpolation (left-most image) of the displacement computed for each control point. Since the interpolation is not aware of the saliency map, we can observe a strong deformation of the two stylized cities and their related captions. During the iterations, the stretching affecting the two cities is smoothly reduced in accordance to the high value on the saliency map. The effect is particularly visible on the most southern city (Grado), far from any control point and hence implicitly allowed to move. On the other hand, the control point just on top on the north-west city cause a strong deformation of the land eastbound it since no salient content is present. Other interesting observations can be made by analyzing the shape of the rivers among the cities. Consistent with placement of the control points, the intersection of the two main rivers moves easterly during the optimization. However, thanks to the tensor form of the saliency component, the algorithm smoothly modifies the relative angle of the two rivers (and thus their length) without changing their thickness. This is notably evident on the river oriented north-south for which the relative movement is orthogonal to its linear extent. This behavior is remarkably important as it contributes to preserve the overall visual appearance of the original map.

Fig. 9.
figure 9

Left: Evolution of the energy functional (6) throughout the iterations. Right: Optimization speed in iterations per second. Both plots refer to the map (Raccolta Terkuz, n. 34) presented in Fig. 7

5.1 Implementation Details and Running Times

We designed our method so that it can be implemented in terms of convolutions and simple arithmetical operations among tensors of rank 2 and 3. In practice, we observed that is computationally more efficient to use a very simple variational solver (Forward-Euler) with a small time-step instead of a sophisticated integration scheme that do not scale well when retargeting high resolution maps.

We implemented our method using the Python version of the popular TensorFlow framework [1]. We performed our experiments on a consumer desktop PC, with an Intel i7 CPU and a Nvidia GeForce 1060 GPU. In all the cases, flow function was scaled during the optimization to a rank-3 tensor with shape (256, 256, 2) and the time delta \(\varDelta _t\) for the Forward-Euler scheme was set to \(5E-6\). In Fig. 9 we plotted the value of the energy functional (6) during the optimization (left plot) and the number of iterations per second (right plot) while optimizing the map presented in Fig. 7. With the parameters used for our tests, the whole optimization took  25000 to converge to a minimum at a speed of 520 iterations per second on average. This implies a total time ranging between 40 s to a minute to perform a full map retargeting.

6 Conclusion

We proposed a novel technique to perform dense continuous image retargeting driven by a set of control points. It was designed at georeferencing historical maps for which the local distortion of the pictorial content is dynamically adjusted according to a saliency tensor depending both on a saliency map defined by the user and the structure tensor computed on the image itself. This allows part of the map to be warped smoothly to accomodate the control points while preserving the local stretching of purely pictorial content (like the common compass roses) that are seamlessly re-positioned in the resulting image.

Our algorithm involves an optimization of an energy functional based on the calculus of variation that can be efficiently implemented in GPU by means of convolutions and tensor operations, commonly implemented in many frameworks designed for Convolutional Neural Networks and Deep Learning. During the optimization, we included an isotonic regression step to avoid visual artifact in the resulting displacement map. The isotonic regression guarantees that no flip-overs will be present in the retargeted output, even in presence of extreme local stretching that may happen when warping maps not created for cartographic purposes. Qualitative tests performed on original historical maps demonstrate the effectiveness of our approach especially when handling multiple pictorial elements. At the present state, our method still requires human intervention in the definition of the control points and the saliency map. For the near future, we plan to automate the saliency map creation by exploiting some of the existing approaches in the literature and implement our method as a QGIS plugin.