Skip to main content

Unsupervised Recognition of the Logical Structure of Business Documents Based on Spatial Relationships

  • Conference paper
  • First Online:
  • 796 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13053))

Abstract

This paper presents the very first unsupervised and automatic system which can recognize the logical structure of business documents without any models or prior information about their logical structure. Our solution can process totally unknown new models of documents. We consider the problem of recognition of logical structures as a problem of detection, because we simultaneously have to localize and recognize the logical function of blocks of text. We assume that any document is composed of parts from several other models of documents. We have proposed a part-based spatial model suited for partial voting. Our proposed model presents the concept of Spatial Context (SC) as a spatial feature, which locally measure the distribution of spatial information around a point of reference. Our method is based on a Gaussian voting process providing a robust mechanism to detect elements of any logical structure. Our solution is suited for non-rigid structures and works well with a reduced number of images. This excellent property is not shared by the supervised approaches, especially methods based on neuronal networks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Srihari, N., et al.: Name and address block reader system for tax form processing. In: ICDAR, pp. 5–10 (1995)

    Google Scholar 

  2. Mao, J., et al.: A system for automatically reading IATA flight coupons. In: ICDAR97, pp. 153–157 (1997)

    Google Scholar 

  3. Cesarini, F., et al.: Trainable table location in document images. In: ICPR (3), pp. 236–240 (2002)

    Google Scholar 

  4. Gatos, B., Danatsas, D., Pratikakis, I., Perantonis, S.J.: Automatic table detection in document images. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 609–618. Springer, Heidelberg (2005). https://doi.org/10.1007/11551188_67

    Chapter  Google Scholar 

  5. Coüasnon, B., et al.: Dmos, a generic document recognition method: application to table structure analysis in a general and in a specific way. In: IJDAR, pp. 111–122 (2006)

    Google Scholar 

  6. Klein, B., et al.: Three approaches to “industrial” table spotting. In: ICDAR, pp. 513–517 (2001)

    Google Scholar 

  7. Coüasnon, B., et al.: DMOS, It’s your turn! In: 1st International Workshop on Open Services and Tools for Document Analysis. ICDAR17

    Google Scholar 

  8. Mao, J., et al.: A model-based form processing sub-system. In: ICPR (1996)

    Google Scholar 

  9. Ting, A., et al.: Business form classification using strings. In: ICPR 96, p. 690

    Google Scholar 

  10. Héroux, P.: Etude de méthhodes de classification pour l’identification automatique de classes de formulaires. In: CIFED (1998)

    Google Scholar 

  11. Duygulu, P.: A hierarchical representation of form documents for identification and retrieval. IJDAR 5(1), 17–27 (2002)

    Article  Google Scholar 

  12. Ishitani, Y., et al.: Model based information extraction and its application to document images. In: DLIA (2001)

    Google Scholar 

  13. Cesarini, F., et al.: INFORMys: A Flexible Invoice-Like Form-Reader System. In: IEEE PAMI, pp. 710–745 (1998)

    Google Scholar 

  14. Cesarini, F., et al.: Analysis and understanding of multi-class invoices. IJDAR 6(2), 102–114 (2003)

    Article  Google Scholar 

  15. Hamza, H., et al.: Incremental classification of invoice documents. ICPR, pp. 1–4 (2008)

    Google Scholar 

  16. Hamza, H., et al.: Application du raisonnement à partir de cas à l’analyse de documents administratifs. Nancy2 University, France (2008)

    Google Scholar 

  17. Tateisi, Y., et al.: Using stochastic syntactic analysis for extracting a logical structure from a document image. In: ICPR, pp. 391–394 (1994)

    Google Scholar 

  18. Belaïd, Y., et al.: Form analysis by neural classification of cells. In: DAS, pp. 58–71 (1998)

    Google Scholar 

  19. Tsuji, Y., et al.: Document recognition system with layout structure generator. In: Proceedings of the MVA (1990)

    Google Scholar 

  20. Yamashita, A., et al.: A model based layout understading method for the document recogntion system. In: ICDAR, pp. 130–138 (1991)

    Google Scholar 

  21. LeBourgeois, F., et al.: Document understanding using probabilistic relaxation: application on tables of contents of periodicals. In: ICDAR, pp. 508–512 (2001)

    Google Scholar 

  22. Lebourgeois, F.: Localisation de textes dans une image `a niveaux de gris. In: CNED 1996, pp. 207–214

    Google Scholar 

  23. Hough, P.V.C.: Method and means for recognizing complex patterns, U.S. Patent 3,069,654, December 18 (1962)

    Google Scholar 

  24. Duda, R.O. et al.: Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 72, 11–15

    Google Scholar 

  25. Ballard, et al.: Generalizing the Hough transform to detect arbitrary shapes. Pattern Recogn. 13(2), pp. 111–122 (1981)

    Google Scholar 

  26. Medioni, G., et al.: 3-D structures for generic object recognition. In: ICPR, pp. 1030–1037 (2000)

    Google Scholar 

  27. Opelt, A., Pinz, A., Zisserman, A.: A boundary-fragment-model for object detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 575–588. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_44

    Chapter  Google Scholar 

  28. Leibe, B., et al.: Robust object detection with interleaved categorization and segmentation. Int. J. Comp. Vis. 77(1–3), 259–289 (2008)

    Article  Google Scholar 

  29. Rusinol, M., et al.: Field extraction from administrative documents by incremental structural templates. In: ICDAR, pp. 1100–1104 (2013)

    Google Scholar 

  30. Dengel, A.R., Klein, B.: smartFIX: a requirements-driven system for document analysis and understanding. In: Lopresti, D., Hu, J., Kashi, R. (eds.) DAS 2002. LNCS, vol. 2423, pp. 433–444. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45869-7_47

    Chapter  MATH  Google Scholar 

  31. Palm, R.B., et al.: Cloudscan-a configuration-free invoice analysis system using recurrent neural networks. In: ICDAR, pp. 406–413 (2017)

    Google Scholar 

  32. https://www.kofax.com/-/media/Files/Datasheets/EN/ps_kofax-readsoft-invoices_en.pdf

  33. https://www.abbyy.com/media/16413/fcadminguide0.pdf

  34. Schuster, D., et al.: Intellix – end-user trained information extraction for document archiving. In: ICDAR, pp. 101–105 (2013)

    Google Scholar 

  35. Liyuan, L., et al.: On the variance of the adaptive learning rate and beyond. In: ICLR (2020)

    Google Scholar 

  36. Katti, A.R., et al.: Chargrid: towards understanding 2d documents. In: EMNLP, pp. 4459–4469 (2018)

    Google Scholar 

  37. Zhao, X., et al.: CUTIE: learning to understand documents with convolutional universal text information extractor (2019)

    Google Scholar 

  38. Denk, T.I., et al.: Bertgrid: Contextualized embedding for 2d document representation and understanding. CoRR,abs/1909.04948 (2019)

    Google Scholar 

  39. Xiaojing, L., et al.: Graph convolution for multimodal information extraction from visually rich documents. In: NAACL, pp. 32–39 (2019)

    Google Scholar 

  40. Majumder, B.P., et al.: Representation learning for information extraction from form-like documents. In: ACL, pp. 6495–6504 (2020)

    Google Scholar 

  41. Gogar, T., Hubacek, O., Sedivy, J.: Deep neural networks for web page information extraction. In: IFIP AIAI (2016)

    Google Scholar 

  42. Cai, D., Yu, S., Wen, J.-R., Ma, W.-Y.: Block-based web search. In: SIGIR, pp. 456–463 (2004). Yu et al. 2003

    Google Scholar 

  43. Yu, S., et al.: Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: WWW, pp. 11–18 (2003)

    Google Scholar 

  44. Zhu, J., et al.: Simultaneous record detection and attribute labeling in web data extraction. In: KDD, pp. 494–503 (2006)

    Google Scholar 

  45. Lample, et al.: Neural architectures for named entity recognition. In: NAACL, pp. 260–270 (2016)

    Google Scholar 

  46. Yang, X.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: CVPR (2017)

    Google Scholar 

  47. Peng, N., Poon, H., Quirk, C., Toutanova, K., Yih, W.-T.: Cross-sentence N-ary relation extraction with graph LSTMs. Trans. Assoc. Comput. Linguist. 5, 101–115 (2017)

    Article  Google Scholar 

  48. Kessi, L., Lebourgeois, F., Garcia, C.: An efficient new PDE-based characters reconstruction after graphics removal. In: ICFHR, pp. 441–446 (2016)

    Google Scholar 

  49. Kessi, L., Lebourgeois, F., Garcia, C.: An efficient image registration method based on modified nonlocal-means - application to color business document images. VISAPP (1), pp. 166–173 (2015)

    Google Scholar 

  50. Kessi, L., Lebourgeois, F., Garcia, C., Duong, J.: AColDPS - robust and unsupervised automatic color document processing system. In: VISAPP (1), pp. 174–185 (2015)

    Google Scholar 

  51. Kessi, L., Lebourgeois, F., Garcia, C.: AColDSS: robust unsupervised automatic color segmentation system for noisy heterogeneous document images. EPS (2015)

    Google Scholar 

Download references

Acknowledgement

This work was granted by ITESOFT and LIRIS Lab from INSA-LYON for the project DOD.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Louisa Kessi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kessi, L., Lebourgeois, F., Garcia, C. (2021). Unsupervised Recognition of the Logical Structure of Business Documents Based on Spatial Relationships. In: Tsapatsoulis, N., Panayides, A., Theocharides, T., Lanitis, A., Pattichis, C., Vento, M. (eds) Computer Analysis of Images and Patterns. CAIP 2021. Lecture Notes in Computer Science(), vol 13053. Springer, Cham. https://doi.org/10.1007/978-3-030-89131-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89131-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89130-5

  • Online ISBN: 978-3-030-89131-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics