A combined multi-scale/irregular algorithm for the vectorization of noisy digital contours

https://doi.org/10.1016/j.cviu.2012.07.006Get rights and content

Abstract

This paper proposes and evaluates a new method for reconstructing a polygonal representation from arbitrary digital contours that are possibly damaged or coming from the segmentation of noisy data. The method consists in two stages. In the first stage, a multi-scale analysis of the contour is conducted so as to identify noisy or damaged parts of the contour as well as the intensity of the perturbation. All the identified scales are then merged so that the input data is covered by a set of pixels whose size is increased according to the local intensity of noise. The second stage consists in transforming this set of resized pixels into an irregular isothetic object composed of an ordered set of rectangular and axis-aligned cells. Its topology is stored as a Reeb graph, which allows an easy pruning of its unnecessary spurious edges. Every remaining connected part has the topology of a circle and a polygonal representation is independently computed for each of them. Four different geometrical algorithms, including a new one, are reviewed for the latter task. These vectorization algorithms are experimentally evaluated and the whole method is also compared to previous works on both synthetic and true digital images. For fair comparisons, when possible, several error measures between the reconstruction and the ground truth are given for the different techniques.

Highlights

► We propose a novel unsupervised algorithm to vectorize digital noisy contours. ► Our system combines a multi-scale noise detector and a polygonalization algorithm. ► We can compute polygonal reconstructions that have completing theoretical properties. ► The comparison with related work shows that our proposals are very competitive. ► We also present the interest of our work for several image analysis applications.

Introduction

The vectorization (i.e. reconstruction into line segments) of digital objects obtained from segmentation, digitization or scanning processes is a very common task in many image analysis systems such as optical character recognition (OCR), license plate recognition (LPR), and sketch recognition [1], [9], [14], [36], [31], [37], [38]. The development of raster-to-vector (R2V) algorithms is in constant progress, responding to both technical and theoretical challenges [30]. Indeed, in real-life applications, digital objects are not perfect digitizations of ideal shapes but present noise, disconnections, irregularities, etc.

To process this kind of image data, additional information is provided such as a priori knowledge on studied shapes (for instance, shapes are letters in OCR) or user supervision. For low level image processing, classic approaches of contour (or edge) detection generally need an external parameter that has to be tediously tuned, and the output has to be filtered and post-processed [5], [10] (see Fig. 1 for an example with the Canny edge detector and the Sobel operator, and with also a recent algorithm of edge noise removal [13]).

The noisy digital contour (or a thick digital curve around it) can be partitioned into thick (or blurred) segments [11], [12]. Such approaches require a global thickness parameter and thus cannot handle contours along which the amount of perturbation or noise is not uniform (e.g. see Fig. 1a, top and bottom). The document vectorization method of [14] also assumes rather uniform noise so that filtering and skeletonization are enough to take care of it. Other methods like [19], [27], which are based on different principles, also require a global scale parameter to compute polygonal reconstructions. Other related works aim to compute an isothetic hull from a noisy contour in order to build a polygonal contour [3]. In this case, the user has to define the spacing of the grid used to compute the orthogonal structure. In this various works, the main parameter is related to the amount of noise in the image.

We proposed in a previous work [35] a novel unsupervised technique, divided into two main stages. We first used the pixel resizing algorithm based on the multi-scale noise detector introduced in [15], [17]. This set of resized pixels is transformed into an irregular isothetic object composed of rectangular and axis-aligned cells. The topology is stored into a Reeb graph [23]. The object is then analyzed and vectorized using two geometrical algorithms, both based on the preimage of straight parts (i.e. sequence of cells that can be passed through by a straight line). These two polygonalization algorithms are an improvement of the visibility cone approach of [32].

Our system is comparable with the work of [24], where is introduced a polygonalization technique based on a pixel resizing step, combined with a generalized preimage algorithm. However, this approach mixes up noise, arithmetic artefacts and high curvature features when trying to detect noisy parts of contours. It also needs a very complex topological control process [25], represented as a skeleton, to handle objects not homotopic to a cycle.

In this paper, we extend the approach introduced in [35], along three directions. First, the Reeb graph, which contains the topology of the irregular object, is better exploited in order to get a polygonal representation of the input digital contour that is homeomorphic to a circle (one connected component and one hole) and such that exactly two edges are incident to each vertex. This filtering step also informs us if the processed irregular object can be interpreted as a single cycle, and may loop back to the multi-scale noise detector to have an analysis at a finer scale. Then, we propose another geometrical algorithm that minimizes, for each k-arc (i.e. parts of connected cells), the length of the polygonal representation. The output of this algorithm turns out to be a good trade-off between minimizing the number of vertices and minimizing the reconstruction error. Finally, we conduct a larger amount of quantitative comparisons with other vectorization techniques in order to validate our approach. We illustrate the global processing chain of our system in Fig. 2.

After recalling basic definitions about irregular isothetic objects and their construction from a noisy digital contour (Section 2), we show in Section 3 how to filter the obtained irregular object using its Reeb graph in order to get a faithful representation of the input digital contour. In Section 4, the vectorization techniques of [32], [35] is recalled and we introduce a novel approach based on the minimal-length polygon inscribed in a polygonal object. As an experimental validation, we compare the different reconstruction algorithms and compare the whole method to other vectorization techniques in Section 5. We also propose a hybrid polygonalization method that combines two formerly presented polygonalization techniques: it exploits the flat part or curved part tags that are a byproduct of the multi-scale analysis.

Section snippets

Definitions

In this section, we first recall the concept of irregular isothetic grids (I-grids) in 2-D, with the following definitions [8], [34].

Definition 1

2-D I-grid

Let R be a closed rectangular subset of R2. A 2-D I-grid G is a tiling of R with closed rectangular cells whose edges are parallel to the X and Y axes, and whose interiors have a pairwise empty intersection. The position of each cell R is given by its center point (xR,yR)R2 and its length along X and Y axes by lRx,lRyR+2.

Definition 2

ve-adjacency and e-adjacency

Let R1 and R2 be two cells. R1 and R2

Topological reconstruction of a noisy contour

We now propose to analyze noisy digital contour by using Kerautret and Lachaud’s local noise detector [15], [17]. This is a method for estimating locally if the digital contour is damaged, what is the amount of degradation and what is the finest resolution at which this part of the contour could be considered as noise-free (called the meaningful scale). The main idea of the approach is to exploit the asymptotic properties of the length of the maximal straight segments by using a subsampling

Unsupervised polygonalization of noisy digital contours

Guided by the pruned Reeb graph, the computation of the polygonal representation of E is performed by reconstructing independently each remaining k-arc. In order to easily glue together each polygonal line into one global structure, each polygonal line is set to begin at the center of the first cell and to end at the center of the last cell of the vectorized k-arc. Between these two points, any polygonal line is valid. But among them, we are looking for the one that represents the most

Comparative study

To experiment the quality of the proposed algorithms, we first consider a polygonal shape that was perturbed by a Gaussian noise, with different standard deviations (σ0 = 0, σ1 = 75, σ2 = 125, σ3 = 175). These images were generated with two different grid sizes h = 1 and 0.5 (Fig. 11a and g). The resized pixels (illustrated on images of Fig. 11b and h) were obtained from the digital contours extracted by using a simple threshold (set to 128) (images (b, h)) and boundary tracking algorithm. In order to

Discussion and future works

In this paper, we address the problem of vectorization of noisy digital contours. We transform the resized pixels obtained by Kerautret and Lachaud’s algorithm [15] into an irregular isothetic object recoded in a set of k-arcs whose topology is stored into a Reeb graph. We first show how to use the Reeb graph in order to prune the set of k-arcs so that it is homotopic to the initial digital contour. Then we review different geometrical algorithms (VC, S2, C2), and propose a new one (MLP), in

Acknowledgment

This work has been supported by the French National Agency for Research with the reference ANR-10-CORD-005 (REVES project).

References (38)

  • E. Bretin et al.

    Regularization of discrete contour by willmore energy

    J. Math. Imag. Vis.

    (2011)
  • J. Canny

    A computational approach to edge detection

    IEEE Trans. PAMI

    (1986)
  • G. Cerutti et al.

    A parametric active polygon for leaf segmentation and shape estimation

  • G. Cerutti, L. Tougne, J. Mille, A. Vacavant, D. Coquin, Guiding active contours for tree leaf segmentation and...
  • L.P. Cordella et al.

    Symbol recognition in documents: a collection of techniques?

    Int. J. Document Anal. Recogn.

    (2000)
  • L.S. Davis

    Edge detection techniques

    Comput. Graphics Image Process.

    (1995)
  • A. Faure, F. Feschet, Linear decomposition of planar shapes, in: Proc. of IEEE ICPR, 2010, pp....
  • T.V. Hoang, E.H. Barney Smith, S. Tabbone, Edge noise removal in bilevel graphical document images using sparse...
  • X. Hilaire et al.

    Robust and accurate vectorization of line drawings

    IEEE Trans. PAMI

    (2005)
  • Cited by (13)

    View all citing articles on Scopus
    View full text