Regular paper
Reading order of Chinese newspaper articles using a block-growing method

https://doi.org/10.1016/S0262-8856(97)00077-2Get rights and content

Abstract

The reading order of articles plays an important role in the areas of document analysis and document understanding. Since the reading sequence conveys significant semantic information embedded in a document, it will influence the robustness and correctness of post-applications in document analysis. Hence, the reading order of documents, which most researchers and systems frequently ignore, is indispensable for post-processing in order to understand document systems. In this paper, a block-growing approach is proposed to generate the reading order of Chinese newspapers, by matching predefined style graphs. In this graph-matching approach, the geometric relation graph (GRG) is first constructed according to the geometric relationships among the segmented blocks of input documents. Those blocks that belong to the same article are gradually merged to form an article by matching the predefined style graphs. Since the local reading information is retained during the block-growing process, the global reading information can be easily generated. The proposed bottomup merging approach is powerful and flexible in finding the reading order of Chinese newspaper articles. A wide variety of Chinese newspapers with horizontal and vertical styles were tested to verify the validity of our proposed method. Experimental results reveal the feasibility and effectiveness of our proposed approach.

References (18)

  • K.C. Fan et al.

    Signal Processing

    (1995)
  • K.C. Fan et al.

    Pattern Recognition Letters

    (1995)
  • K.C. Fan et al.

    Pattern Recognition Letters

    (1995)
  • A. Dengel

    Initial learning of document structure

  • A. Yamashita et al.

    A model based layout understanding method for the document recognition system

  • D.D. Peden

    Frame-based system for macro-typographical structure analysis in scientific papers

  • G. Semeraro et al.

    Learning contextual rules for document understanding

  • H. Fujisawa et al.

    A top-down approach for the analysis of documents

  • J. Kreich et al.

    An experimental environment for model based document analysis

There are more references available in the full text version of this article.

Cited by (1)

View full text