Labelling logical structures of document images using a dynamic perceptive neural network

Rangoni, Yves; Belaïd, Abdel; Vajda, Szilárd

doi:10.1007/s10032-011-0151-y

Labelling logical structures of document images using a dynamic perceptive neural network

Original Paper
Published: 03 March 2011

Volume 15, pages 45–55, (2012)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Yves Rangoni¹,
Abdel Belaïd¹ &
Szilárd Vajda²

249 Accesses
5 Citations
9 Altmetric
Explore all metrics

Abstract

This paper proposes a new method for labelling the logical structures of document images. The system starts with digitised images of paper documents, performs a physical layout analysis, runs an OCR and finally exploits the OCR’s outputs to find the meaning of each block of text (i.e. assigns labels like “Title”, “Author”, etc.). The method is an extension of our previous work where a classifier, the perceptive neural network, has been developed to be an analogy of the human perception. We introduce in this connectionist model a temporal dimension by the use of a time-delay neural network with local representation. During the recognition stage, the system performs several recognition cycles and corrections, while keeping track and reusing the previous outputs. This dynamic classifier allows then a better handling of noise and segmentation errors. The experiments have been carried out on two datasets: the public MARG containing more than 1,500 front pages of scientific papers with four zones of interest and another one composed of documents from the Siggraph 2003 conference, where 21 logical structures have been identified. The error rate on MARG is less than 2.5% and 7.3% on the Siggraph dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

ABBYY FineReader Engine: http://www.abbyy.com/ocr_sdk/ (2003)
Alam H., Hartono R., Kumar A., Rahman A.F.R., Tarnikova Y., Wilcox C.: Assuming accurate layout information for web documents is available, what now?. Int. Workshop Document Layout Interpret. Appl. 1(3), 27–30 (2003)
Google Scholar
Analyzed Layout and Text Object: http://www.loc.gov/standards/alto/ (2010)
Antonacopoulos A., Pletschacher S., Bridson D., Papadopoulos C.: ICDAR2009 page segmentation competition. Int. Conf. Document Anal. Recognit. 1(10), 1370–1374 (2009)
Article Google Scholar
Belaïd A., Rangoni Y.: Structure extraction in printed documents using neural approaches. Mach. Learn. Document Anal. Recognit. Ser. Stud. Computat. Intell. 90, 21–43 (2008)
Article Google Scholar
van Beusekom J., Keysers D., Shafait F., Breuel T.M.: Example-based logical labeling of document title page images. Int. Conf. Document Anal. Recognit. 1(9), 919–923 (2007)
Google Scholar
Blum A., Langley P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)
Article MathSciNet MATH Google Scholar
Brugger R., Bapst F., Ingold R.: A DTD extension for document structure recognition. Int. Conf. Electron. Publ. 1375(7), 343–354 (1998)
Google Scholar
Candela L., Castelli D., Pagano P.: A reference architecture for digital library systems: principles and applications. LNCS Digit. Libr. Res. Dev., Springer, Berlin 4877(1), 22–35 (2007)
Google Scholar
Conway A.: Page grammars and page parsing. A syntactic approach to document layout recognition. Int. Conf. Document Anal. Recognit. 1(2), 761–764 (1993)
MathSciNet Google Scholar
Côté M., Lecolinet E., Cheriet M., Suen C.: Automatic reading of cursive scripts using a reading model and perceptual concepts. Int. J. Document Anal. Recognit. 1(1), 3–17 (1998)
Article Google Scholar
Coüasnon B.: DMOS, a generic document recognition method: Application to table structure analysis in a general and in a specific way. Int. J. Document Anal. Recognit. 8(2), 111–122 (2006)
Article Google Scholar
Coyle K.: Mass digitization of books. J. Acad. Librariansh. 32(6), 641–645 (2006)
Article Google Scholar
Dengel A.R., Klein B.: Smartfix: a requirements-driven system for document analysis and understanding. Int. Conf. Document Anal. Recognit. 2423(5), 77–88 (2002)
Google Scholar
Doucet A., Kazai G.: ICDAR 2009 book structure extraction competition. Int. Conf. Document Anal. Recognit. 1(10), 1408–1412 (2009)
Article Google Scholar
Ford G., Thoma G.: Ground truth data for document image analysis. Symp. Document Image Underst. Technol. 1(5), 199–205 (2003)
Google Scholar
Hruschka H.: Interpretation Aids for Multilayer Perceptron Neural Nets. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin (2005)
Google Scholar
Hurst M.: Layout and language: an efficient algorithm for detecting text blocks based on spatial and linguistic evidence. SPIE, Document Recognit. Retr. 4307(8), 56–67 (2001)
Google Scholar
Hurst N., Li W., Marriott K.: Review of automatic document formatting. Symp. Document Eng. 1(9), 99–108 (2009)
Google Scholar
Hush D., Horne G.: Progress in supervised neural networks: what’s new since Lippmann?. IEEE Signal Process. Mag. 10(1), 8–38 (1993)
Article Google Scholar
Ingold R., Armangil D.: A top-down document analysis method for logical structure recognition. Int. Conf. Document Anal. Recognit. 1(1), 41–49 (1991)
Google Scholar
Ishitani Y.: Logical structure analysis of document images based on emergent computation. Int. Conf. Document Anal. Recognit. 1(5), 189–192 (1999)
Google Scholar
Kanai J., Rice S.V., Nartker T.A., Nagy G.: Automated evaluation of OCR zoning. IEEE Trans. Pattern Anal. Mach. Intell. 1(17), 86–90 (1995)
Article Google Scholar
Kim J., Le D.X., Thoma G.R.: Automated labeling in document images. SPIE, Document Recognit. Retr. VIII 4307(1), 111–122 (2001)
Google Scholar
Kreich J., Luhn A., Maderlechner G.: An experimental environment for model based document analysis. Int. Conf. Document Anal. Recognit. 1(1), 50–58 (1991)
Google Scholar
Krishnamoorthy M., Nagy G., Seth S., Viswanathan M.: Syntactic segmentation and labeling of digitized pages from technical journals. IEEE Trans. Pattern Anal. Mach. Intell. 7(15), 737–747 (1993)
Article Google Scholar
Küchler A., Goller C.: Inductive learning in symbolic domains using structure-driven recurrent neural networks. German Conference on Artificial Intelligence: Advances in Artificial Intelligence 1137(20), 183–197 (1996)
Google Scholar
Le Cun Y., Bottou L., Orr G., Muller K.: Efficient backprop. Neural netw. Tricks Trade 1524, 9–50 (1998)
Article Google Scholar
Lervik J., Brygfjeld S.: Search engine technology applied in digital libraries. ERCIM News 1(66), 18–19 (2006)
Google Scholar
Lin C., Niwa Y., Narita S.: Logical structure analysis of book document images using contents information. Int. Conf. Document Anal. Recognit. 2, 1048–1054 (1997)
Article Google Scholar
Lodwich A., Rangoni Y., Breuel T.: Evaluation of robustness and performance of early stopping rules with multi layer perceptrons. Int. Joint Conf. Neural Netw. 1(19), 1877–1884 (2009)
Article Google Scholar
Logar A.M., Corwin E.M., Oldham W.J.B.: A comparison of recurrent neural network learning algorithms. IEEE Trans. Neural Netw. 2, 1129–1134 (1993)
Article Google Scholar
Schenkel M.I., Guyon D.H.: On-line cursive script recognition using time delay neural networks and hidden markov models. Int. Conf. Acoustics Speech Signal Process. 2, 637–640 (1994)
Google Scholar
Maddouri S.S., Amiri H., Belad A., Choisy C.: Combination of local and global vision modelling for arabic handwritten words recognition. Int. Workshop Frontiers Handwrit. Recognit. 1(8), 128–135 (2002)
Article Google Scholar
Mao S., Kim J.W., Thoma G.R.: Style-independent document labeling: design and performance evaluation. SPIE, Document Recognit. Retr. XI 5296(1), 14–22 (2003)
Google Scholar
Mao S., Rosenfeld A., Kanungo T.: Document structure analysis algorithms: a literature survey. SPIE, Electron. Imaging 50(10), 197–207 (2003)
Google Scholar
Mao S., Thoma G.R.: Bayesian learning of 2D document layout models for automated preservation metadata extraction. Int. Conf. Vis. Imaging Image Process. 1(4), 329–334 (2004)
Google Scholar
MARG: Medical Records Groundtruth: http://marg.nlm.nih.gov (2003)
Marinai S., Gori M., Soda G.: Artificial neural networks for document analysis and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 23–35 (2005)
Article Google Scholar
McClelland J., Rumelhart D.: An interactive activation model of context effects in letter perception. Psychol. Rev. 88(1), 375–407 (1981)
Article Google Scholar
Nagy G.: Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 38–62 (2000)
Article Google Scholar
Pearlmutter B.A.: Gradient calculations for dynamic recurrent neural networks:a survey. IEEE Trans. Neural Netw. 6(5), 1212–1228 (1995)
Article Google Scholar
Rangoni Y., Belaïd A.: Data categorization for a context return applied to logical document structure recognition. Int. Conf. Document Anal. Recognit. 1(8), 297–301 (2005)
Google Scholar
Rangoni Y., Belaïd A.: Document logical structure analysis based on perceptive cycles. Conf. Document Anal. Syst. 1(7), 117–128 (2006)
Article Google Scholar
Sainz Palmero G.I., Cano Izquierdo J.M., Dimitriadis Y.A., Lopez Coronado J.: A new neuro-fuzzy system for logical labeling of documents. Int. Conf. Pattern Recognit. 18(4), 431–435 (1996)
Article Google Scholar
Sainz Palmero G.I., Dimitriadis Y.A.: Structured document labeling and rule extraction using new recurrent fuzzy-neural systems. Int. Conf. Document Anal. Recognit. 1(5), 181–184 (1999)
Google Scholar
Schema for representing OCR results exported from FineReader 6.0: http://www.abbyy.com/FineReader_xml/FineReader6-schema-v1.xml (2002)
Siggraph: http://www.siggraph.org/s2003/ (2003)
Sperduti A., Starita A.: Supervised neural networks for the classification of structures. IEEE Trans. Neural Netw. 8(3), 714–735 (1997)
Article Google Scholar
Summers K.: Near-wordless document structure classification. Int. Conf. Document Anal. Recognit. 1(3), 462–465 (1995)
Article MathSciNet Google Scholar
Szilas N., Cadoz C.: Adaptive networks for physical modeling. Neurocomputing 20(1-3), 209–225 (1998)
Article Google Scholar
Tateisi Y., Itoh N.: Using stochastic syntactic analysis for extracting a logical structure from a document image. Int. Conf. Pattern Recognit. 12(2), 391–394 (1994)
Google Scholar
Wan, E.: Time series prediction by using a connectionist network with internal delay lines. In: Weigend A.S., Gershenfeld N.A. (eds.) Time Series Prediction. Forecasting the Future and Understanding the Past, SFI Studies in the Science of Complexity, vol. 17, pp. 195–217. Addison-Wesley, CA (1994)
Yanikoglu B.A., Vincent L.: Pink panther: a complete environment for ground-truthing and benchmarking document page segmentation. Pattern Recognit 31(9), 1191–1204 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Nancy 2 University, LORIA, Vandæuvre-Lès-Nancy, France
Yves Rangoni & Abdel Belaïd
Computer Science Department, TU Dortmund, Dortmund, Germany
Szilárd Vajda

Authors

Yves Rangoni
View author publications
You can also search for this author in PubMed Google Scholar
Abdel Belaïd
View author publications
You can also search for this author in PubMed Google Scholar
Szilárd Vajda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yves Rangoni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rangoni, Y., Belaïd, A. & Vajda, S. Labelling logical structures of document images using a dynamic perceptive neural network. IJDAR 15, 45–55 (2012). https://doi.org/10.1007/s10032-011-0151-y

Download citation

Received: 30 March 2010
Revised: 22 January 2011
Accepted: 07 February 2011
Published: 03 March 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s10032-011-0151-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Labelling logical structures of document images using a dynamic perceptive neural network

Abstract

Access this article

Similar content being viewed by others

State of the Art in Defect Detection Based on Machine Vision

Siamese Neural Networks: An Overview

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Labelling logical structures of document images using a dynamic perceptive neural network

Abstract

Access this article

Similar content being viewed by others

State of the Art in Defect Detection Based on Machine Vision

Siamese Neural Networks: An Overview

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation