Skip to main content
Log in

Labelling logical structures of document images using a dynamic perceptive neural network

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

This paper proposes a new method for labelling the logical structures of document images. The system starts with digitised images of paper documents, performs a physical layout analysis, runs an OCR and finally exploits the OCR’s outputs to find the meaning of each block of text (i.e. assigns labels like “Title”, “Author”, etc.). The method is an extension of our previous work where a classifier, the perceptive neural network, has been developed to be an analogy of the human perception. We introduce in this connectionist model a temporal dimension by the use of a time-delay neural network with local representation. During the recognition stage, the system performs several recognition cycles and corrections, while keeping track and reusing the previous outputs. This dynamic classifier allows then a better handling of noise and segmentation errors. The experiments have been carried out on two datasets: the public MARG containing more than 1,500 front pages of scientific papers with four zones of interest and another one composed of documents from the Siggraph 2003 conference, where 21 logical structures have been identified. The error rate on MARG is less than 2.5% and 7.3% on the Siggraph dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. ABBYY FineReader Engine: http://www.abbyy.com/ocr_sdk/ (2003)

  2. Alam H., Hartono R., Kumar A., Rahman A.F.R., Tarnikova Y., Wilcox C.: Assuming accurate layout information for web documents is available, what now?. Int. Workshop Document Layout Interpret. Appl. 1(3), 27–30 (2003)

    Google Scholar 

  3. Analyzed Layout and Text Object: http://www.loc.gov/standards/alto/ (2010)

  4. Antonacopoulos A., Pletschacher S., Bridson D., Papadopoulos C.: ICDAR2009 page segmentation competition. Int. Conf. Document Anal. Recognit. 1(10), 1370–1374 (2009)

    Article  Google Scholar 

  5. Belaïd A., Rangoni Y.: Structure extraction in printed documents using neural approaches. Mach. Learn. Document Anal. Recognit. Ser. Stud. Computat. Intell. 90, 21–43 (2008)

    Article  Google Scholar 

  6. van Beusekom J., Keysers D., Shafait F., Breuel T.M.: Example-based logical labeling of document title page images. Int. Conf. Document Anal. Recognit. 1(9), 919–923 (2007)

    Google Scholar 

  7. Blum A., Langley P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  8. Brugger R., Bapst F., Ingold R.: A DTD extension for document structure recognition. Int. Conf. Electron. Publ. 1375(7), 343–354 (1998)

    Google Scholar 

  9. Candela L., Castelli D., Pagano P.: A reference architecture for digital library systems: principles and applications. LNCS Digit. Libr. Res. Dev., Springer, Berlin 4877(1), 22–35 (2007)

    Google Scholar 

  10. Conway A.: Page grammars and page parsing. A syntactic approach to document layout recognition. Int. Conf. Document Anal. Recognit. 1(2), 761–764 (1993)

    MathSciNet  Google Scholar 

  11. Côté M., Lecolinet E., Cheriet M., Suen C.: Automatic reading of cursive scripts using a reading model and perceptual concepts. Int. J. Document Anal. Recognit. 1(1), 3–17 (1998)

    Article  Google Scholar 

  12. Coüasnon B.: DMOS, a generic document recognition method: Application to table structure analysis in a general and in a specific way. Int. J. Document Anal. Recognit. 8(2), 111–122 (2006)

    Article  Google Scholar 

  13. Coyle K.: Mass digitization of books. J. Acad. Librariansh. 32(6), 641–645 (2006)

    Article  Google Scholar 

  14. Dengel A.R., Klein B.: Smartfix: a requirements-driven system for document analysis and understanding. Int. Conf. Document Anal. Recognit. 2423(5), 77–88 (2002)

    Google Scholar 

  15. Doucet A., Kazai G.: ICDAR 2009 book structure extraction competition. Int. Conf. Document Anal. Recognit. 1(10), 1408–1412 (2009)

    Article  Google Scholar 

  16. Ford G., Thoma G.: Ground truth data for document image analysis. Symp. Document Image Underst. Technol. 1(5), 199–205 (2003)

    Google Scholar 

  17. Hruschka H.: Interpretation Aids for Multilayer Perceptron Neural Nets. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin (2005)

    Google Scholar 

  18. Hurst M.: Layout and language: an efficient algorithm for detecting text blocks based on spatial and linguistic evidence. SPIE, Document Recognit. Retr. 4307(8), 56–67 (2001)

    Google Scholar 

  19. Hurst N., Li W., Marriott K.: Review of automatic document formatting. Symp. Document Eng. 1(9), 99–108 (2009)

    Google Scholar 

  20. Hush D., Horne G.: Progress in supervised neural networks: what’s new since Lippmann?.   IEEE Signal Process. Mag. 10(1), 8–38 (1993)

    Article  Google Scholar 

  21. Ingold R., Armangil D.: A top-down document analysis method for logical structure recognition. Int. Conf. Document Anal. Recognit. 1(1), 41–49 (1991)

    Google Scholar 

  22. Ishitani Y.: Logical structure analysis of document images based on emergent computation. Int. Conf. Document Anal. Recognit. 1(5), 189–192 (1999)

    Google Scholar 

  23. Kanai J., Rice S.V., Nartker T.A., Nagy G.: Automated evaluation of OCR zoning. IEEE Trans. Pattern Anal. Mach. Intell. 1(17), 86–90 (1995)

    Article  Google Scholar 

  24. Kim J., Le D.X., Thoma G.R.: Automated labeling in document images. SPIE, Document Recognit. Retr. VIII 4307(1), 111–122 (2001)

    Google Scholar 

  25. Kreich J., Luhn A., Maderlechner G.: An experimental environment for model based document analysis. Int. Conf. Document Anal. Recognit. 1(1), 50–58 (1991)

    Google Scholar 

  26. Krishnamoorthy M., Nagy G., Seth S., Viswanathan M.: Syntactic segmentation and labeling of digitized pages from technical journals. IEEE Trans. Pattern Anal. Mach. Intell. 7(15), 737–747 (1993)

    Article  Google Scholar 

  27. Küchler A., Goller C.: Inductive learning in symbolic domains using structure-driven recurrent neural networks. German Conference on Artificial Intelligence: Advances in Artificial Intelligence 1137(20), 183–197 (1996)

    Google Scholar 

  28. Le Cun Y., Bottou L., Orr G., Muller K.: Efficient backprop. Neural netw. Tricks Trade 1524, 9–50 (1998)

    Article  Google Scholar 

  29. Lervik J., Brygfjeld S.: Search engine technology applied in digital libraries. ERCIM News 1(66), 18–19 (2006)

    Google Scholar 

  30. Lin C., Niwa Y., Narita S.: Logical structure analysis of book document images using contents information. Int. Conf. Document Anal. Recognit. 2, 1048–1054 (1997)

    Article  Google Scholar 

  31. Lodwich A., Rangoni Y., Breuel T.: Evaluation of robustness and performance of early stopping rules with multi layer perceptrons. Int. Joint Conf. Neural Netw. 1(19), 1877–1884 (2009)

    Article  Google Scholar 

  32. Logar A.M., Corwin E.M., Oldham W.J.B.: A comparison of recurrent neural network learning algorithms. IEEE Trans. Neural Netw. 2, 1129–1134 (1993)

    Article  Google Scholar 

  33. Schenkel M.I., Guyon D.H.: On-line cursive script recognition using time delay neural networks and hidden markov models. Int. Conf. Acoustics Speech Signal Process. 2, 637–640 (1994)

    Google Scholar 

  34. Maddouri S.S., Amiri H., Belad A., Choisy C.: Combination of local and global vision modelling for arabic handwritten words recognition. Int. Workshop Frontiers Handwrit. Recognit. 1(8), 128–135 (2002)

    Article  Google Scholar 

  35. Mao S., Kim J.W., Thoma G.R.: Style-independent document labeling: design and performance evaluation. SPIE, Document Recognit. Retr. XI 5296(1), 14–22 (2003)

    Google Scholar 

  36. Mao S., Rosenfeld A., Kanungo T.: Document structure analysis algorithms: a literature survey. SPIE, Electron. Imaging 50(10), 197–207 (2003)

    Google Scholar 

  37. Mao S., Thoma G.R.: Bayesian learning of 2D document layout models for automated preservation metadata extraction. Int. Conf. Vis. Imaging Image Process. 1(4), 329–334 (2004)

    Google Scholar 

  38. MARG: Medical Records Groundtruth: http://marg.nlm.nih.gov (2003)

  39. Marinai S., Gori M., Soda G.: Artificial neural networks for document analysis and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 23–35 (2005)

    Article  Google Scholar 

  40. McClelland J., Rumelhart D.: An interactive activation model of context effects in letter perception. Psychol. Rev. 88(1), 375–407 (1981)

    Article  Google Scholar 

  41. Nagy G.: Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 38–62 (2000)

    Article  Google Scholar 

  42. Pearlmutter B.A.: Gradient calculations for dynamic recurrent neural networks:a survey. IEEE Trans. Neural Netw. 6(5), 1212–1228 (1995)

    Article  Google Scholar 

  43. Rangoni Y., Belaïd A.: Data categorization for a context return applied to logical document structure recognition. Int. Conf. Document Anal. Recognit. 1(8), 297–301 (2005)

    Google Scholar 

  44. Rangoni Y., Belaïd A.: Document logical structure analysis based on perceptive cycles. Conf. Document Anal. Syst. 1(7), 117–128 (2006)

    Article  Google Scholar 

  45. Sainz Palmero G.I., Cano Izquierdo J.M., Dimitriadis Y.A., Lopez Coronado J.: A new neuro-fuzzy system for logical labeling of documents. Int. Conf. Pattern Recognit. 18(4), 431–435 (1996)

    Article  Google Scholar 

  46. Sainz Palmero G.I., Dimitriadis Y.A.: Structured document labeling and rule extraction using new recurrent fuzzy-neural systems. Int. Conf. Document Anal. Recognit. 1(5), 181–184 (1999)

    Google Scholar 

  47. Schema for representing OCR results exported from FineReader 6.0: http://www.abbyy.com/FineReader_xml/FineReader6-schema-v1.xml (2002)

  48. Siggraph: http://www.siggraph.org/s2003/ (2003)

  49. Sperduti A., Starita A.: Supervised neural networks for the classification of structures. IEEE Trans. Neural Netw. 8(3), 714–735 (1997)

    Article  Google Scholar 

  50. Summers K.: Near-wordless document structure classification. Int. Conf. Document Anal. Recognit. 1(3), 462–465 (1995)

    Article  MathSciNet  Google Scholar 

  51. Szilas N., Cadoz C.: Adaptive networks for physical modeling. Neurocomputing 20(1-3), 209–225 (1998)

    Article  Google Scholar 

  52. Tateisi Y., Itoh N.: Using stochastic syntactic analysis for extracting a logical structure from a document image. Int. Conf. Pattern Recognit. 12(2), 391–394 (1994)

    Google Scholar 

  53. Wan, E.: Time series prediction by using a connectionist network with internal delay lines. In: Weigend A.S., Gershenfeld N.A. (eds.) Time Series Prediction. Forecasting the Future and Understanding the Past, SFI Studies in the Science of Complexity, vol. 17, pp. 195–217. Addison-Wesley, CA (1994)

  54. Yanikoglu B.A., Vincent L.: Pink panther: a complete environment for ground-truthing and benchmarking document page segmentation. Pattern Recognit 31(9), 1191–1204 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yves Rangoni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rangoni, Y., Belaïd, A. & Vajda, S. Labelling logical structures of document images using a dynamic perceptive neural network. IJDAR 15, 45–55 (2012). https://doi.org/10.1007/s10032-011-0151-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-011-0151-y

Keywords

Navigation