Elsevier

Pattern Recognition

Volume 44, Issue 6, June 2011, Pages 1282-1295
Pattern Recognition

Document seal detection using GHT and character proximity graphs

https://doi.org/10.1016/j.patcog.2010.12.004Get rights and content

Abstract

This paper deals with automatic detection of seal (stamp) from documents with cluttered background. Seal detection involves a difficult challenge due to its multi-oriented nature, arbitrary shape, overlapping of its part with signature, noise, etc. Here, a seal object is characterized by scale and rotation invariant spatial feature descriptors computed from recognition result of individual connected components (characters). Scale and rotation invariant features are used in a Support Vector Machine (SVM) classifier to recognize multi-scale and multi-oriented text characters. The concept of generalized Hough transform (GHT) is used to detect the seal and a voting scheme is designed for finding possible location of the seal in a document based on the spatial feature descriptor of neighboring component pairs. The peak of votes in GHT accumulator validates the hypothesis to locate the seal in a document. Experiment is performed in an archive of historical documents of handwritten/printed English text. Experimental results show that the method is robust in locating seal instances of arbitrary shape and orientation in documents, and also efficient in indexing a collection of documents for retrieval purposes.

Introduction

In the last few years, intensive research has been performed on content-based image retrieval (CBIR) and consequently a wide range of techniques have been proposed [8]. CBIR consists of retrieving visually similar images from an image database based on a given query image. Digitized document is a particular case of image. Libraries and archives are generally interested in mass-digitization and transcription of their book collections. The objective is not only preservation in a digital format of documents of high value, but also to provide access and retrieval services to wider users, and assist scholarly research activities. On the other hand, companies are interested in implementing digital mailrooms to improve the efficiency of paper-intensive workflows and to reduce the burden of information processing of incoming mails, faxes, forms, invoices, reports, employee records, etc. By this digital mailroom, companies try an automatic distribution of incoming mails to their respective departments based on the content of electronic documents. Content-based document image retrieval (CBDIR) is a subdivision of CBIR where large scale document retrieval is performed according to users requests.

Documents are containers of information made by humans to be read by humans. CBDIR involves a search process where the user's request is a concept to be found. Traditionally, document retrieval is associated to textual queries. The records in typewritten and structured documents are indexed with the traditional high performing commercial Optical Character Recognition (OCR) products. Once the image document is OCRed, the resulting ascii information is compared with the query with a string search algorithm. The OCR techniques may fail in documents with high degree of degradation such as faxes due to compression or bilevel conversion, historical documents databases which are often degraded due to aging or poor typing, or handwritten documents. For such type of documents, word-spotting [22], [28] provides an alternative approach for index generation. Word spotting is a content-based retrieval process that produces a ranked list of word images that are similar to a query word image. The matching for word spotting is done at image level through word shape coding. To obtain shape code of a word, different zonal information of the word can be used.

Besides huge amount of text strings, some documents contain a large amount of information that is usually neglected and can provide richer knowledge. There are documents comprising mostly graphical information, like maps, engineering drawings, diagrams and musical scores. Old documents contain decorative initials and separators. Administrative documents contain graphical symbols such as logos, stamps or seals. Searching in terms of symbol queries such as logos or seals allow the quick filtering or routing of the document to the right business process, archive or individual. Thus the detection of these graphical symbols in documents increases the performance of document retrieval [35], [37]. Given a single instance of a symbol queried by the user, the system has to return a ranked list of segmented locations where the queried symbol is probable to be found. Traditional word-spotting methods which are devised to extract row/column-wise features from segmented image may not be useful for such purpose. Following the idea of word spotting, the problem of locating a graphical symbol in a document image is called symbol spotting. If we extend the idea of symbol spotting to a database of document images, i.e. a digital library, the problem is called symbol focused retrieval [32]. The arrangement of featured key-points are used to retrieve document images [25]. These methods are promising in detecting objects in general in terms of accuracy, time and scalability. Graphical objects such as symbols, seals, logos, etc. are synthetic entities consisting of uniform regions and these are highly structured [32], [33]. These facts make geometric relationships between primitives and discriminative cue to spot symbols.

Indexing of documents can be performed based on graphical entities. In this paper we propose a scheme on document indexing based on seal information. Seals are complex entities consisting of mixed textual and graphical components. Information obtained from seals could be used for efficient storage and retrieval of the documents. Seals generally have closed, connective contour surrounding text characters, and logos which indicate owner and usage of the seal [19]. They bear some constant character strings to convey the information about the owner-organization and its locality. Besides, in many instances seal contains some variable fields such as date [26]. Date may provide the sending or receiving information of the document. Hence, automatic seal detection and recognition is an important stage to follow this task. It also allows to identify the document sources reliably.

Detection and recognition of seals are challenging tasks because seals are generally unstable and sometimes contain unpredictable patterns due to imperfect ink condition, uneven surface contact, noise, etc. [38]. Overlapping of seals with text/signatures, missing part of a seal, etc. are typical problems and add difficulty in seal detection. A seal instance may be affixed on any position within the document, which requires detection to be carried out on the entire document. Also a seal may be placed in arbitrary orientation. See Fig. 1(a), where a historical archive document containing a seal is shown. The seal is overlapped here with part of signature and text regions. As a result some information of the seal is missing/illegible. We also show a postal document in Fig. 1(b) which contains a seal overlapped with stamp images. It is to be noticed that, due to overlapping and noise, many text characters of the seal are missing.

Detection of seals in documents can be treated as localizing objects in different pose. In many instances seals and text are in different colors, and to handle such document some researchers have studied the detection process based on color analysis [10], [35]. In mass digitization process due to compression or bi-level conversion, color analysis can only solve a set of problems. A prior knowledge of the outer boundary shape of seal (e.g. circular, rectangular, elliptical, etc.) is helpful to locate seal in documents [38]. A further seal recognition technique is needed here because, it is difficult to recognize the seal if text information within seal is different although the boundary shape is similar. Thus many times seal has been treated as a symbol and methodology like segmentation by Delaunay tessellation [7] are applied for seal recognition. Hu et al. [15] proposed a heuristic method to find the best congruent transformation between the model and sample seal imprint. Correlation-based algorithm for seal imprint verification is also presented in [12], [13]. Chen [5] used correlation-based block matching in polar coordinate system based on rotation-invariance features. Since, under the constraint of fixed scanning resolution, seal information can be trained in priori.

Multi-scale is not necessary for seal detection in documents. But multi-rotation features are needed. He also proposed a method using contour analysis to find the principal orientation of seal [6]. Matsuura et al. [23] verified seal image by using the discrete K–L expansion of the discrete cosine transform (DCT). Rotation invariant features by the coefficients of 2D Fourier series expansion of the log-polar image is presented in [24]. Lee and Kim [18] used the attributed stroke graph, obtained from skeleton, for registration and classification of seal. Gao et al. [11] used a verification method based on stroke edge matching. These techniques [11], [18] use pixel-based shape descriptor considering the seal as a whole object. Seal involves rich textual structure with a bigger boundary frame and smaller text components. In Fig. 2(a), we show two different seal images containing similar circular boundary frames but different text information. They contain a set of text characters along with a symbol. Text strings are designed in different font, orientations and sizes. In Fig. 2(b) we show a class of oval-shaped seal having noise (missing seal information or presence of other text characters from document). Because of different orientation, font, sizes, etc. boundary information [38] or pixel-based methods [11], [18] are not sufficient to discriminate the seals properly. Moreover, to take care of multi-oriented seal, pixel-based method needs higher computational time. Also, hard-matching approach [5] performed by template matching may not work when a seal contains variable field such as date. Text information is important part of seal and we needed a higher level of feature using text information rather than just pixels to handle recognition of seals.

Generalized Hough transform (GHT) [2] is the extension of the idea of the “standard” Hough transform (HT), which is a feature extraction technique originally devised to detect analytical curves (e.g. line, circle, etc.). The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. The modification of GHT enables this technique to be used to detect arbitrary objects (non-analytical curves) described with its model.

A look-up-table called R-table is used in GHT to store information concerning the template of the object to be detected. Given an object, the centroid is chosen usually as reference point (xc, yc) [3]. From this reference point, data from each point on the object's edge is recorded in R-table. Let, the line joining from a boundary point (x, y) to the reference point makes an angle (α) with the x-axis. Also, ‘r’ be the distance from that boundary point to the reference point as shown in Fig. 3. R-table is constructed to store (α,r) of each boundary point by indexing with its gradient angle (ϕ) as shown in Table 1. From the pair (α,r), the reference point can be reconstructed.

Given a test image, an accumulator (A) is defined in the parameter space. The accumulator records the votes of the edge points to determine the most probable center of the prototype object. More specifically, the gradient angle ϕi which is obtained from each edge point (ui,vi) of the test image is utilized to retrieve corresponding entries of the R-table. The possible location of reference point in the parameter space is calculated by the following equation:(xci,yci)=ui+r(ϕi)cos[α(ϕi)],vi+r(ϕi)sin[α(ϕi)]

In this equation, (xci,yci) denotes the location of the possible reference point. And, r(ϕi) and α(ϕi) signify the magnitude and the angle of the position vector obtained from the R-table for index ϕi, respectively.

A classical pixel-based GHT may not be efficient for seal detection because two different seals having similar frame boundary (such as circular, rectangular, etc.) cannot be distinguished properly. Seals generally comprised of many text characters. In our approach, text characters in a seal are considered as high level descriptors and these text characters are used to cast vote for detecting seals. We use character pair information, inter-character distance and angle information to describe the spatial information. Since, the spatial information among text characters will remain fixed, the relative position of them is used to vote for seal detection.

In this paper we propose a scheme on document indexing based on seal detection and text information has been used for this purpose. When imprinted, seal produces a fixed set of text characters with a logo (sometimes) to describe its identity. Although the boundary frames are similar in some seals, they may contain different text information. We classify them as different types of seals. So, it is important to take care of text information knowledge in seal recognition purpose. Text characters in seal are of multi-rotations and multi-scale. The font styles vary according to the design by its owner/organization. Generally seals are affixed in documents that also contain a huge amount of text characters. Thus, text information within seal may be degraded due to the overlapping with text present in that document. In seal, the relative positions among neighboring text characters are fixed. This is a key observation in our formulation. In literature, we find, distance between connected components was exploited as a basic feature in Docstrum [27]. The Docstrum is the plot of distance and angle of all nearest-neighbor pairs of a document image. Distance and angle are computed from k-nearest neighbor pairs of connected components. The plotting information is used to analyze higher level layout structure like line extraction, skew estimation, etc. of the document. In our approach, features computed from spatial information of text characters along with their corresponding text labels obtained after recognition are used to detect the seal efficiently.

Our proposal of seal detection approach is based on the concept of GHT. Instead of feature points, the text characters in the seal are used as basic features for seal detection. We label the local connected components of a seal as alpha-numeric text characters. To obtain this, we employ multi-scale and multi-oriented text character recognition using a Support Vector Machine (SVM) classifier. The recognized text characters are used to provide high level descriptor in our approach. For each component (text character) we find its n-nearest neighbors. Next for each pair of components, we encode their relative spatial information using their distance and angular position. Given a query seal we compute the relative spatial organization for pair-wise text components within seal. These information are stored into a spatial feature bank. In an image, for each neighboring component pair, we use this spatial feature bank to retrieve query seal hypothesis. A vote is casted for possible location of the seal based on the matching of neighboring component pairs to ensure detection of partial seal objects. The locus of a high number of votes (the peak) are the result of the seal spotting. Votes in GHT accumulator validates the hypothesis to locate the seal.

The main contribution of this paper is to use of recognized local text components as high level descriptors and to generate hypothesis of the seal location based on spatial arrangement of these descriptors. Our approach is based on local text characters instead of only pixel-level descriptors to overcome distortion and affine-transform problems which are main drawbacks in existing approaches. Also, we have formulated our approach to be scalable and adaptive to process large collection of documents. It does not need a previous segmentation, thus provides direct focus on seal detection. The approach is robust to detect seal in noisy, cluttered document. We can therefore conclude that combining individual character recognition with a SVM and relational indexing using GHT method, our work can be classified as a combination of statistical and structural approach.

The rest of the paper is organized as follows. In Section 2, we explain the local component extraction and recognition procedure in brief. In Section 3, we describe the representation of the seal and its detection process. The experimental results are presented in Section 4. Finally conclusion is given in Section 5.

Section snippets

Text character extraction and identification

Labeling of individual characters in multi-oriented and multi-sized environment drives the detection of seal objects in documents in our approach. In a seal, apart from text, there may exist non-text components due to presence of form/table in the document or strokes from signatures which may touch/overlap the text characters in the seal (see Fig. 1). For extraction of text components, we assume the size of text components is smaller compared to that of non-text components (graphical

Seal detection using GHT

Seals in documents are affected mainly due to noise, rotation, occlusion and overlapping. To take care of these problems, our approach is based on partial matching which is inspired by GHT [2]. Here, we describe the architecture of our method with four key-parts namely: spatial information extraction, construction of R-table, seal recognition and detection approach. The former two allow to represent model shapes in a compact way. The latter two are the steps when a query is formulated into the

Experimental results

As application scenario, the approach described here has been applied to the historical archive of border records from the Civil Government of Girona [20]. It consists of printed and handwritten documents between 1940 and 1976. The documents are related to people going through the Spanish-French border. These documents are organized in personal bundles. For each one, there is a cover page with the names of people whose information is contained in this record. In each bundle there is very

Conclusion

In this paper, we have presented a seal detection approach based on spatial arrangement of seal content. A query seal is translated into a set of spatial feature vector using its text character information. These features are used later to locate a seal of similar content in documents. We have used a multi-oriented and multi-scale text character recognition method to generate high level local feature to take care of complex multi-oriented seal information. The recognition result of these

Partha Pratim Roy has obtained Ph.D. in 2010 from Universitat Autònoma de Barcelona, Spain. He received his Bachelor of Technology degree in Computer Science in 2002 from Kalyani University in India. From 2003 to 2005, he was working as an Assistant System Engineer in Tata Consultancy Services. He obtained his MS in Computer Vision and Image Processing in 2007 from Universitat Autònoma de Barcelona. His research work focuses on the analysis of text/symbol present in graphical documents. It

References (38)

  • R. Datta et al.

    Image retrieval: ideas, influences, and trends of the new age

    ACM Computing Surveys

    (2008)
  • M.A. Fischler et al.

    Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

    Communications of the ACM

    (1981)
  • A.S. Frisch

    The fuzzy integral for color seal segmentation on document images

  • W. Gao et al.

    A system for automatic chinese seal imprint verification

  • R. Haruki et al.

    Automatic seal verification using three-dimensional reference seals

  • T. Horiuchi

    Automatic seal verification by evaluating positive cost

  • M.-K. Hu

    Visual pattern recognition by moment invariants

    IRE Transactions on Information Theory IT

    (1962)
  • J. Iivarinen, A. Visa, Shape recognition of irregular objects, in: Intelligent Robots and Computer Vision XV:...
  • A. Khotanzad et al.

    Invariant image recognition by Zernike moments

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1990)
  • Cited by (0)

    Partha Pratim Roy has obtained Ph.D. in 2010 from Universitat Autònoma de Barcelona, Spain. He received his Bachelor of Technology degree in Computer Science in 2002 from Kalyani University in India. From 2003 to 2005, he was working as an Assistant System Engineer in Tata Consultancy Services. He obtained his MS in Computer Vision and Image Processing in 2007 from Universitat Autònoma de Barcelona. His research work focuses on the analysis of text/symbol present in graphical documents. It includes understanding of text graphics separation from graphical documents and recognition of text/graphics in multi-scale and multi-orientation environment.

    Umapada Pal received his Ph.D. from Indian Statistical Institute and his Ph.D. work was on the development of Printed Bangla OCR system. He did his Post Doctoral research on the segmentation of touching English numerals at INRIA (Institut National de Recherche en Informatique et en Automatique), France. During July 1997–January 1998 he visited GSF-Forschungszentrum fur Umwelt und Gesundheit GmbH, Germany to work as a guest scientist in a project on image analysis. From January 1997, he is a Faculty member of the Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata. His primary research is Digital Document Processing. He has published 163 research papers in various international journals, conference proceedings and edited volumes. In 1995, he received student best paper award from Chennai Chapter of Computer Society of India. He received a merit certificate from Indian Science Congress Association in 1996. Because of his significant impact in the Document Analysis research domain of Indian language, TC-10 and TC-11 committees of IAPR (International Association for Pattern Recognition) presented ‘ICDAR Outstanding Young Researcher Award’ to Dr. Pal in 2003. In 2005–2006 Dr. Pal has received JSPS fellowship from Japan government. Dr. Pal has been serving as a program committee member of many conferences including International Conference on Document Analysis and Recognition (ICDAR), International Workshop on Document Image Analysis for Libraries (DIAL), International Workshop on Frontiers of Handwritten Recognition (IWFHR), International Conference on Pattern recognition (ICPR) etc. Also, he is the Asian PC-Chair for 10th ICDAR to be held at Barcelona, Spain in 2009. He has served as the guest editor of special issue of VIVEK journal on Document image analysis of Indian scripts. Also currently he is co-editing a Special issue of the journal of Electronic Letters on Computer Vision and Image Analysis. He is a life member of IUPRAI (Indian unit of IAPR) and senior life member of Computer Society of India.

    Josep Lladós received the degree in Computer Sciences in 1991 from the Universitat Politècnica de Catalunya and the Ph.D. degree in Computer Sciences in 1997 from the Universitat Autònoma de Barcelona (Spain) and the Universitè Paris 8 (France). Currently he is an Associate Professor at the Computer Sciences Department of the Universitat Autònoma de Barcelona and a staff researcher of the Computer Vision Center, where he is also the director since January 2009. He is the head of the Pattern Recognition and Document Analysis Group (2009SGR-00418). He is chair holder of Knowledge Transfer of the UAB Research Park and Santander Bank. His current research fields are document analysis, graphics recognition and structural and syntactic pattern recognition. He has been the head of a number of Computer Vision R+D projects and published more than 100 papers in national and international conferences and journals. J. Lladós is an active member of the Image Analysis and Pattern Recognition Spanish Association (AERFAI), a member society of the IAPR. He is currently the chairman of the IAPR-ILC (Industrial Liaison Committee). Formerly he served as chairman of the IAPR TC-10, the Technical Committee on Graphics Recognition, and also he is a member of the IAPR TC-11 (reading Systems) and IAPR TC-15 (Graph-based representations). He serves on the Editorial Board of the ELCVIA (Electronic Letters on Computer Vision and Image Analysis) and the IJDAR (International Journal in Document Analysis and Recognition), and also a PC member of a number of international conferences. He was the recipient of the IAPR-ICDAR Young Investigator Award in 2007. He was the general chair of the International Conference on Document Analysis and Recognition (ICDAR’2009) held in Barcelona in July 2009, and co-chair of the IAPR TC-10 Graphics Recognition Workshop of 2003 (Barcelona), 2005 (Hong Kong), 2007 (Curitiba) and 2009 (La Rochelle). Josep Lladós has also experience in technological transfer and in 2002 he created the company ICAR Vision Systems, a spin-off of the Computer Vision Center working on Document Image Analysis, after winning the entrepreneurs award from the Catalonia Government on business projects on Information Society Technologies in 2000.

    View full text