Skip to main content

An Improved Algorithm of Logical Structure Reconstruction for Re-flowable Document Understanding

  • Conference paper
  • First Online:
  • 2282 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9362))

Abstract

The basic idea of re-flowable document understanding and automatic typesetting is to generate logical documents by judging the hierarchical relationship of physical units and logical tags based on the identification of logical paragraph tags in re-flowable document. In order to overcome the shortages of conventional logical structure reconstruction methods, a novel logical structure reconstruction method of re-flowable document based on directed graph is proposed in this paper. This method extracts the logical structure from the template document and then utilizes directed graph’s single-source shortest path algorithm to filter out redundant logical tags, thus solving the problem of logical structure reconstruction of a document. Experimental results show that the algorithm can effectively improve the accuracy of logical structure recognition.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Electronic Imaging 2003, International Society for Optics and Photonics, pp. 197–207 (2003)

    Google Scholar 

  2. Namboodiri, A.M., Jain, A.K.: Document structure and layout analysis. In: Digital Document Processing, pp. 29–48. Springer, London (2007)

    Google Scholar 

  3. Wu, Z., Mitra, P., Giles, C.L.: Table of contents recognition and extraction for heterogeneous book documents. In: Document Analysis and Recognition 12th International Conference, 2, pp. 1205–1209 (2013)

    Google Scholar 

  4. Sonka, M., Hlavac, V., Boyle, R.: Image processing, analysis, and machine vision. Cengage Learning (2014)

    Google Scholar 

  5. Hu, T.: New Methods for Robust and Efficient Recognition of the Logical Structures in Documents. IIUFUniversité de Fribourg, Switzerland (1994)

    Google Scholar 

  6. Satkhozhina, A., et al.: Non-manhattan layout extraction algorithm. In: Proceedings of SPIE-IS&T Electronic Imaging, 86640A (2013)

    Google Scholar 

  7. Belaïd, A., D’Andecy, V.P., Hamza, H., Belaïd, Y.: Administrative document analysis and structure. In: Biba, M., Xhafa, F. (eds.) Learning Structure and Schemas from Documents. SCI, vol. 375, pp. 51–71. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Song, H., Li, L., Zhang, W.: Application of VSM model to document structure identification. Journal of Beijing Information Science and Technology University (Natural Science Edition) 6, 66–69 (2011)

    Google Scholar 

  9. Jin, C.: Determine Algorithm of logical order in document layout based on directed graph. Microcomputer Information 12, 292–293 (2008)

    Google Scholar 

  10. Peng X., Li, N.: Improved VSM algorithm for judging paragraph logic label. Journal of Beijing Information Science and Technology University (Natural Science Edition), 19–24 (2014)

    Google Scholar 

  11. Nepomniaschaya, A.S.: An associative version of the bellman-ford algorithm for finding the shortest paths in directed graphs. In: Malyshkin, V.E. (ed.) PaCT 2001. LNCS, vol. 2127, pp. 285–292. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ning Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhao, L., Li, N., Peng, X., Liang, Q. (2015). An Improved Algorithm of Logical Structure Reconstruction for Re-flowable Document Understanding. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2015. Lecture Notes in Computer Science(), vol 9362. Springer, Cham. https://doi.org/10.1007/978-3-319-25207-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25207-0_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25206-3

  • Online ISBN: 978-3-319-25207-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics