Skip to main content

Grammatical formalism for document understanding system: From document towards HTML text

  • Oral Presentations
  • Conference paper
  • First Online:
Book cover Advances in Document Image Analysis (BSDIA 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1339))

Included in the following conference series:

Abstract

This paper deals with the use of grammatical formalisms to recognize the physical and the logical structures of a composite document. We propose a new system for document recognition and analysis. The goal of this system is to identify particularly the summaries, and as an application, to convert them into machine readable form. We translate a summary paper into a HTML (HyperText Markup Language) text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. O. T. Akindele and A. Belaid. Page Segmentation by Segment Tracing. Second International Conference of Document Analysis and Recognition( ICDAR 93). 1993. pp. 341–344.

    Google Scholar 

  2. A. S. Saidi. Extensions Grammaticales de la Programmation (en) Logique: Application á la Validation des Grammaires Affixes. Ph. D Thesis. Ecole Centrale de Lyon. 1992.

    Google Scholar 

  3. A. Belaid, J. J. Brault and Y. Chenevoy. Knowledge-Based System for Structured Document Recognition. In MVA'90 IAPR Workshop on Machine Vision Applications, November 1990.

    Google Scholar 

  4. Y. Chenevoy. Reconnaissance structurelle de documents imprimés: Etudes et Réalisations. Ph.D. Thesis. INRIA-Lorraine. December 1992.

    Google Scholar 

  5. Y. Hirayama. A Block Segmentation Method for Document Images with Complicated Column Structures. In Proceedings of ICDAR'93: 2nd International Conference on Document Analysis and Recognition. Tsukuba, Japan. 1993. Pp. 91–94.

    Google Scholar 

  6. W. Horak. Office Document Architecture and Office Document Interchange Formats. Current status of international standardization. IEEE Computer. Vol. 18. N°10. October 1985. pp.50–57.

    Google Scholar 

  7. R. Ingold. Une nouvelle Approche de la Lecture Optique Integrant la Reconnaissance des Structures de Documents. Ph.D. Thesis. Ecole Polytechnique Federale de lausanne. 1989.

    Google Scholar 

  8. R. Ingold. A Document Description Language to Drive Document Analysis. First International Conference of Document Analysis and Recognition( ICDAR 91). Vol 1. pp. 294–301, 1991.

    Google Scholar 

  9. F. Lebourgeois. Localisation de Textes dans une Image á Niveaux de Gris. CNED'96. France. 1996. pp. 207–214.

    Google Scholar 

  10. J. LIANG, J. HA, R. ROGERS, I.T. PHILLIPS, R.M. HARALICK, B. CHANDA. The Prototype of a Complete Document Image Understanding System, DAS'96. Malvern, October 1996. pp. 131–154

    Google Scholar 

  11. P. Maurice. L'Architecture d'un Document électronique: concepts et applications. L'écho des Recherches. N°130. 4st term 1987. pp. 15–24.

    Google Scholar 

  12. G. Nagy, S. C. Seth and S. D. Stoddard. Document Analysis with an Expert System. Pattern Recognition in Practice II (E. S. Gelsema and C. N. Kanal, Eds.). 1986. Pp. 147–159.

    Google Scholar 

  13. G. Nagy. A Prototype Document Image Analysis System for Technical Journals. IEEE Computer Magazine. July 1992.

    Google Scholar 

  14. D. Peden-Derrien. Analyse des structures de documents: une approche objet. Ph. D. Thesis Université de Rennes 1. 1990.

    Google Scholar 

  15. J. Petrak. An Object-Oriented Case-Based Learning System. Ph. D. Thesis. 1995

    Google Scholar 

  16. A. Sanfeliu. Syntactic and Structural Methods in Document Image Analysis. In Structured Document Image Analysis. H.S Baird, H. Bunke& K. Yamamoto (Eds.). 1992. pp-479–499.

    Google Scholar 

  17. T. Saitoh, T. Yamaai and M. Tachikawa. Document Image Segmentation and Layout Analysis. IEICE Transactions in Information and Systems. Vol. E77-D. N° 7. July 1994. pp. 778–784.

    Google Scholar 

  18. S. TAYEB-BEY, S. SAIDI, H. EMPTOZ Grammatical Approach for the Physical and the Logical Structure of Documents Analysis: Application to Summary Documents. MVA'96. IAPR Workshop on Machine Vision Applications, November 1996, Tokyo. pp. 341–343.

    Google Scholar 

  19. A. Van Wijngaarden. Orthogonal Design and Description of Formal Languages. Mathematish Centrum Amsterdam, MR 76, 1965.

    Google Scholar 

  20. K. Y. Wong, R. G. Casey and F. M. Wahl. Document Analysis System. IBM Journal of Research and Development 26. 1982. pp. 647–655.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Nabeel A. Murshed Flávio Bortolozzi

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tayeb-bey, S., Saidi, A.S. (1997). Grammatical formalism for document understanding system: From document towards HTML text. In: Murshed, N.A., Bortolozzi, F. (eds) Advances in Document Image Analysis. BSDIA 1997. Lecture Notes in Computer Science, vol 1339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63791-5_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-63791-5_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63791-2

  • Online ISBN: 978-3-540-69646-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics