Unsupervised Recognition of the Logical Structure of Business Documents Based on Spatial Relationships

Kessi, Louisa; Lebourgeois, Frank; Garcia, Christophe

doi:10.1007/978-3-030-89131-2_6

Unsupervised Recognition of the Logical Structure of Business Documents Based on Spatial Relationships

Louisa Kessi^14,15,
Frank Lebourgeois^14,15 &
Christophe Garcia^14,15

Conference paper
First Online: 31 October 2021

796 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13053))

Abstract

This paper presents the very first unsupervised and automatic system which can recognize the logical structure of business documents without any models or prior information about their logical structure. Our solution can process totally unknown new models of documents. We consider the problem of recognition of logical structures as a problem of detection, because we simultaneously have to localize and recognize the logical function of blocks of text. We assume that any document is composed of parts from several other models of documents. We have proposed a part-based spatial model suited for partial voting. Our proposed model presents the concept of Spatial Context (SC) as a spatial feature, which locally measure the distribution of spatial information around a point of reference. Our method is based on a Gaussian voting process providing a robust mechanism to detect elements of any logical structure. Our solution is suited for non-rigid structures and works well with a reduced number of images. This excellent property is not shared by the supervised approaches, especially methods based on neuronal networks.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Srihari, N., et al.: Name and address block reader system for tax form processing. In: ICDAR, pp. 5–10 (1995)
Google Scholar
Mao, J., et al.: A system for automatically reading IATA flight coupons. In: ICDAR97, pp. 153–157 (1997)
Google Scholar
Cesarini, F., et al.: Trainable table location in document images. In: ICPR (3), pp. 236–240 (2002)
Google Scholar
Gatos, B., Danatsas, D., Pratikakis, I., Perantonis, S.J.: Automatic table detection in document images. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 609–618. Springer, Heidelberg (2005). https://doi.org/10.1007/11551188_67
Chapter Google Scholar
Coüasnon, B., et al.: Dmos, a generic document recognition method: application to table structure analysis in a general and in a specific way. In: IJDAR, pp. 111–122 (2006)
Google Scholar
Klein, B., et al.: Three approaches to “industrial” table spotting. In: ICDAR, pp. 513–517 (2001)
Google Scholar
Coüasnon, B., et al.: DMOS, It’s your turn! In: 1st International Workshop on Open Services and Tools for Document Analysis. ICDAR17
Google Scholar
Mao, J., et al.: A model-based form processing sub-system. In: ICPR (1996)
Google Scholar
Ting, A., et al.: Business form classification using strings. In: ICPR 96, p. 690
Google Scholar
Héroux, P.: Etude de méthhodes de classification pour l’identification automatique de classes de formulaires. In: CIFED (1998)
Google Scholar
Duygulu, P.: A hierarchical representation of form documents for identification and retrieval. IJDAR 5(1), 17–27 (2002)
Article Google Scholar
Ishitani, Y., et al.: Model based information extraction and its application to document images. In: DLIA (2001)
Google Scholar
Cesarini, F., et al.: INFORMys: A Flexible Invoice-Like Form-Reader System. In: IEEE PAMI, pp. 710–745 (1998)
Google Scholar
Cesarini, F., et al.: Analysis and understanding of multi-class invoices. IJDAR 6(2), 102–114 (2003)
Article Google Scholar
Hamza, H., et al.: Incremental classification of invoice documents. ICPR, pp. 1–4 (2008)
Google Scholar
Hamza, H., et al.: Application du raisonnement à partir de cas à l’analyse de documents administratifs. Nancy2 University, France (2008)
Google Scholar
Tateisi, Y., et al.: Using stochastic syntactic analysis for extracting a logical structure from a document image. In: ICPR, pp. 391–394 (1994)
Google Scholar
Belaïd, Y., et al.: Form analysis by neural classification of cells. In: DAS, pp. 58–71 (1998)
Google Scholar
Tsuji, Y., et al.: Document recognition system with layout structure generator. In: Proceedings of the MVA (1990)
Google Scholar
Yamashita, A., et al.: A model based layout understading method for the document recogntion system. In: ICDAR, pp. 130–138 (1991)
Google Scholar
LeBourgeois, F., et al.: Document understanding using probabilistic relaxation: application on tables of contents of periodicals. In: ICDAR, pp. 508–512 (2001)
Google Scholar
Lebourgeois, F.: Localisation de textes dans une image `a niveaux de gris. In: CNED 1996, pp. 207–214
Google Scholar
Hough, P.V.C.: Method and means for recognizing complex patterns, U.S. Patent 3,069,654, December 18 (1962)
Google Scholar
Duda, R.O. et al.: Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 72, 11–15
Google Scholar
Ballard, et al.: Generalizing the Hough transform to detect arbitrary shapes. Pattern Recogn. 13(2), pp. 111–122 (1981)
Google Scholar
Medioni, G., et al.: 3-D structures for generic object recognition. In: ICPR, pp. 1030–1037 (2000)
Google Scholar
Opelt, A., Pinz, A., Zisserman, A.: A boundary-fragment-model for object detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 575–588. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_44
Chapter Google Scholar
Leibe, B., et al.: Robust object detection with interleaved categorization and segmentation. Int. J. Comp. Vis. 77(1–3), 259–289 (2008)
Article Google Scholar
Rusinol, M., et al.: Field extraction from administrative documents by incremental structural templates. In: ICDAR, pp. 1100–1104 (2013)
Google Scholar
Dengel, A.R., Klein, B.: smartFIX: a requirements-driven system for document analysis and understanding. In: Lopresti, D., Hu, J., Kashi, R. (eds.) DAS 2002. LNCS, vol. 2423, pp. 433–444. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45869-7_47
Chapter MATH Google Scholar
Palm, R.B., et al.: Cloudscan-a configuration-free invoice analysis system using recurrent neural networks. In: ICDAR, pp. 406–413 (2017)
Google Scholar
https://www.kofax.com/-/media/Files/Datasheets/EN/ps_kofax-readsoft-invoices_en.pdf
https://www.abbyy.com/media/16413/fcadminguide0.pdf
Schuster, D., et al.: Intellix – end-user trained information extraction for document archiving. In: ICDAR, pp. 101–105 (2013)
Google Scholar
Liyuan, L., et al.: On the variance of the adaptive learning rate and beyond. In: ICLR (2020)
Google Scholar
Katti, A.R., et al.: Chargrid: towards understanding 2d documents. In: EMNLP, pp. 4459–4469 (2018)
Google Scholar
Zhao, X., et al.: CUTIE: learning to understand documents with convolutional universal text information extractor (2019)
Google Scholar
Denk, T.I., et al.: Bertgrid: Contextualized embedding for 2d document representation and understanding. CoRR,abs/1909.04948 (2019)
Google Scholar
Xiaojing, L., et al.: Graph convolution for multimodal information extraction from visually rich documents. In: NAACL, pp. 32–39 (2019)
Google Scholar
Majumder, B.P., et al.: Representation learning for information extraction from form-like documents. In: ACL, pp. 6495–6504 (2020)
Google Scholar
Gogar, T., Hubacek, O., Sedivy, J.: Deep neural networks for web page information extraction. In: IFIP AIAI (2016)
Google Scholar
Cai, D., Yu, S., Wen, J.-R., Ma, W.-Y.: Block-based web search. In: SIGIR, pp. 456–463 (2004). Yu et al. 2003
Google Scholar
Yu, S., et al.: Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: WWW, pp. 11–18 (2003)
Google Scholar
Zhu, J., et al.: Simultaneous record detection and attribute labeling in web data extraction. In: KDD, pp. 494–503 (2006)
Google Scholar
Lample, et al.: Neural architectures for named entity recognition. In: NAACL, pp. 260–270 (2016)
Google Scholar
Yang, X.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: CVPR (2017)
Google Scholar
Peng, N., Poon, H., Quirk, C., Toutanova, K., Yih, W.-T.: Cross-sentence N-ary relation extraction with graph LSTMs. Trans. Assoc. Comput. Linguist. 5, 101–115 (2017)
Article Google Scholar
Kessi, L., Lebourgeois, F., Garcia, C.: An efficient new PDE-based characters reconstruction after graphics removal. In: ICFHR, pp. 441–446 (2016)
Google Scholar
Kessi, L., Lebourgeois, F., Garcia, C.: An efficient image registration method based on modified nonlocal-means - application to color business document images. VISAPP (1), pp. 166–173 (2015)
Google Scholar
Kessi, L., Lebourgeois, F., Garcia, C., Duong, J.: AColDPS - robust and unsupervised automatic color document processing system. In: VISAPP (1), pp. 174–185 (2015)
Google Scholar
Kessi, L., Lebourgeois, F., Garcia, C.: AColDSS: robust unsupervised automatic color segmentation system for noisy heterogeneous document images. EPS (2015)
Google Scholar

Download references

Acknowledgement

This work was granted by ITESOFT and LIRIS Lab from INSA-LYON for the project DOD.

Author information

Authors and Affiliations

Université de Lyon, CNRS, Lyon, France
Louisa Kessi, Frank Lebourgeois & Christophe Garcia
INSA-Lyon, LIRIS, UMR5205, 69621, Lyon, France
Louisa Kessi, Frank Lebourgeois & Christophe Garcia

Authors

Louisa Kessi
View author publications
You can also search for this author in PubMed Google Scholar
Frank Lebourgeois
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Garcia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Louisa Kessi .

Editor information

Editors and Affiliations

Cyprus University of Technology, Limassol, Cyprus
Nicolas Tsapatsoulis
University of Cyprus, Nicosia, Cyprus
Andreas Panayides
University of Cyprus, Nicosia, Cyprus
Theo Theocharides
Cyprus University of Technology, Limassol, Cyprus
Andreas Lanitis
University of Cyprus, Nicosia, Cyprus
Constantinos Pattichis
University of Salerno, Salerno, Italy
Mario Vento

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kessi, L., Lebourgeois, F., Garcia, C. (2021). Unsupervised Recognition of the Logical Structure of Business Documents Based on Spatial Relationships. In: Tsapatsoulis, N., Panayides, A., Theocharides, T., Lanitis, A., Pattichis, C., Vento, M. (eds) Computer Analysis of Images and Patterns. CAIP 2021. Lecture Notes in Computer Science(), vol 13053. Springer, Cham. https://doi.org/10.1007/978-3-030-89131-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-89131-2_6
Published: 31 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89130-5
Online ISBN: 978-3-030-89131-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics