Skip to main content

A DTD extension for document structure recognition

  • Part III: EP'98
  • Conference paper
  • First Online:
Electronic Publishing, Artistic Imaging, and Digital Typography (RIDT 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1375))

Included in the following conference series:

Abstract

This paper deals with the representation of document models used in the field of document recognition. A novel formalism called generalized n-gram is presented, which is shown to be accurate for the recognition task and well adapted to automatic learning by examples. The paper addresses also the thorny problem of integrating models for document analysis with existing standards used for document manipulation and production.

This project is funded by the Swiss National Fund for Scientific Research, code 21-42'355.94.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. O. Akindele. Vers un système de construction automatique de modèles génériques de documents. PhD thesis, CRIN-Nancy, 1995.

    Google Scholar 

  2. D. Bollinger. Inferenz und Spezialisierung kontextfreier Regeln mit statistischen Zusatzin-formationen. master's thesis report in computer science, Uni Fribourg, 1996.

    Google Scholar 

  3. R. Brugger, A. Zramdini, and R. Ingold. Modeling documents for structure recognition using generalized n-grams. In ICDAR, 1997.

    Google Scholar 

  4. H. Bunke and P. S. P. Wang. Handbook of Optical Character Recognition and Document Analysis. World Scientific Publishing Company, 1997.

    Google Scholar 

  5. E. Charniak. Statistical language learning. MIT Press, 1993.

    Google Scholar 

  6. J. Clark. Jade — james' dsssl engine. http://www.jclark.com/jade/, 1997.

    Google Scholar 

  7. P. Fankhauser and Y. Xu. Markitup! an incremental approach to document structure recognition. Electronic Publishing, 6, December 1993.

    Google Scholar 

  8. D. J. Hand. Artificial Intelligence, Frontiers in Statistics. Chapman & Hall, 1993.

    Google Scholar 

  9. G. J. Klir and T. A. Folger. Fuzzy Sets, Uncertainty, and Information. Prentice-Hall International, 1992.

    Google Scholar 

  10. P. Lefèvre and F. Reynaud. ODIL: an SGML description language of the layout structure of documents. In ICDAR, 1995.

    Google Scholar 

  11. International Standards Organization. Information processing — text and office systems — standard generalized markup language (SGML) (ISO 8879). Geneva: ISO, 1986.

    Google Scholar 

  12. International Standards Organization. Document style semantics and specification language (DSSSL) (ISO 10179). Geneva: ISO, 1996.

    Google Scholar 

  13. A. L. Spitz. Style directed document recognition. In ICDAR, pages 611–619, 1991.

    Google Scholar 

  14. P. H. Winston. Artificial Intelligence. Addison-Wesley, second edition, 1984.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Roger D. Hersch Jacques André Heather Brown

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brugger, R., Bapst, F., Ingold, R. (1998). A DTD extension for document structure recognition. In: Hersch, R.D., André, J., Brown, H. (eds) Electronic Publishing, Artistic Imaging, and Digital Typography. RIDT 1998. Lecture Notes in Computer Science, vol 1375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0053282

Download citation

  • DOI: https://doi.org/10.1007/BFb0053282

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64298-5

  • Online ISBN: 978-3-540-69718-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics