Automated entry system for printed documents

https://doi.org/10.1016/0031-3203(90)90112-XGet rights and content

Abstract

This paper proposes a system for automatically reading either Japanese or English documents that have complex layout structures that include graphics. First, document image segmentation and character segmentation are carried out using three basic features and the knowledge of document layout rules. Next, multi-font character recognition is performed based on feature vector matching. Recognition experiments with a prototype system for a variety of complex printed documents shows that the proposed system is capable of reading different types of printed documents at an accuracy rate of 94.8–97.2%.

References (14)

  • G. Ciardiello et al.

    An experimental system for office document handling and text recognition

  • K.Y. Wong et al.

    Document analysis system

    IBM J. Res. Develop.

    (1982)
  • I. Masuda et al.

    Approach to smart document reader system

  • N. Hagita et al.

    Handprinted Kanji characters recognition based on pattern matching method

  • ISO 8613-1: 1989(E): Information Processing—International Standard—Text and Office Systems

    (1989)
  • T. Akiyama et al.

    A method of character extraction from format-unknown document images

  • K. Kubota et al.

    Document understanding system

There are more references available in the full text version of this article.

Cited by (186)

  • Complex layout analysis based on contour classification and morphological operations

    2017, Engineering Applications of Artificial Intelligence
    Citation Excerpt :

    The performance of layout analysis methods depends heavily on the page segmentation algorithm in use (Shafait et al., 2008). The page segmentation methods that have been reported in the literature can be categorized into foreground analysis (Wong et al., 1982; Tsujimoto and Asada, 1992; Fan et al., 1994; Sun, 2005; Fletcher and Kasturi, 1988; Akiyama and Hagita, 1990; O’Gorman, 1993; Hönes and Lichter, 1994; Zlatopolsky, 1994; Olivier and Dominique, 1995; Wang and Yagasaki, 1995; Simon et al., 1997; Bukhari et al., 2010; Koo and Kim, 2013; Le et al., 2015; Gatos et al., 2001; Antonacopoulos et al., 2007; Chen, 1996; Bloomberg, 1991; Bukhari et al., 2011; Tran et al., 2016), background analysis (Nagy et al., 1992; Ha et al., 1995; Baird et al., 1990; Breuel, 2002, 2003; Normand and Viard-Gaudin, 1995; Kise et al., 1996, 1998), hybrid (Pavlidis and Zhou, 1992; Antonacopoulos and Ritchings, 1994; Smith, 2009; Chen et al., 2013; Antonacopoulos et al., 2013, 2015) and local analysis (Jain and Bhattacharjee, 1992; Tang et al., 1995; Sauvola and Pietikäinen, 1995; Williams and Alder, 1996; Etemad et al., 1997; Strouthopoulos and Papamarkos, 1998; Acharyya and Kundu, 2001; Kumar et al., 2007; Garz et al., 2011; Asi et al., 2014; Mehri et al., 2015; Maji and Roy, 2015; Chen et al., 2015) ones. Although many techniques have been proposed, the available tools are far from being fully automated.

  • An overview of existing literature on document skew detection

    2023, Malaysian Journal of Computer Science
  • An Investigation for Cursive Context-Specific Printed Script Recognition Techniques

    2023, 2023 20th International Multi-Conference on Systems, Signals and Devices, SSD 2023
  • Distortion, rotation and scale invariant recognition of hollow Hindi characters

    2022, Sadhana - Academy Proceedings in Engineering Sciences
View all citing articles on Scopus
View full text