TransDocAnalyser: A Framework for Semi-structured Offline Handwritten Documents Analysis with an Application to Legal Domain

Chakraborty, Sagar; Harit, Gaurav; Ghosh, Saptarshi

doi:10.1007/978-3-031-41676-7_3

Sagar Chakraborty^11,12,
Gaurav Harit¹² &
Saptarshi Ghosh¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14187))

Included in the following conference series:

International Conference on Document Analysis and Recognition

1651 Accesses
2 Citations

Abstract

State-of-the-art offline Optical Character Recognition (OCR) frameworks perform poorly on semi-structured handwritten domain-specific documents due to their inability to localize and label form fields with domain-specific semantics. Existing techniques for semi-structured document analysis have primarily used datasets comprising invoices, purchase orders, receipts, and identity-card documents for benchmarking. In this work, we build the first semi-structured document analysis dataset in the legal domain by collecting a large number of First Information Report (FIR) documents from several police stations in India. This dataset, which we call the FIR dataset, is more challenging than most existing document analysis datasets, since it combines a wide variety of handwritten text with printed text. We also propose an end-to-end framework for offline processing of handwritten semi-structured documents, and benchmark it on our novel FIR dataset. Our framework used Encoder-Decoder architecture for localizing and labelling the form fields and for recognizing the handwritten content. The encoder consists of Faster-RCNN and Vision Transformers. Further the Transformer-based decoder architecture is trained with a domain-specific tokenizer. We also propose a post-correction method to handle recognition errors pertaining to the domain-specific terms. Our proposed framework achieves state-of-the-art results on the FIR dataset outperforming several existing models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Dataset of Vietnamese Documents for Text Detection

CASIA-onDo: A New Database for Online Handwritten Document Analysis

End-to-End Information Extraction in Handwritten Documents: Understanding Paris Marriage Records from 1880 to 1940

Notes

1.
https://en.wikipedia.org/wiki/First_information_report.
2.
http://bidhannagarcitypolice.gov.in/fir_record.php.
3.
https://home.rajasthan.gov.in/content/homeportal/en.html.
4.
https://police.sikkim.gov.in/visitor/fir.
5.
https://tripurapolice.gov.in/west/fir-copies.
6.
https://police.nagaland.gov.in/fir-2/.
7.
https://github.com/wkentaro/labelme.
8.
https://github.com/LegalDocumentProcessing/FIR_Dataset_ICDAR2023.
9.
We initially compared Tesseract with TrOCR-Base, and found TrOCR to perform much better. Hence subsequent experiments were done with TrOCR only.

References

Amano, A., Asada, N.: Complex table form analysis using graph grammar. In: Lopresti, D., Hu, J., Kashi, R. (eds.) DAS 2002. LNCS, vol. 2423, pp. 283–286. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45869-7_32
Chapter MATH Google Scholar
Amano, A., Asada, N., Mukunoki, M., Aoyama, M.: Table form document analysis based on the document structure grammar. Int. J. Doc. Anal. Recogn. (IJDAR) 8, 210–213 (2006). https://doi.org/10.1007/s10032-005-0008-3
Bag, S., Harit, G.: A medial axis based thinning strategy and structural feature extraction of character images. In: Proceedings of IEEE International Conference on Image Processing, pp. 2173–2176 (2010)
Google Scholar
Bag, S., Harit, G.: An improved contour-based thinning method for character images. Pattern Recogn. Lett. 32(14), 1836–1842 (2011)
Google Scholar
Bag, S., Harit, G.: Topographic feature extraction for Bengali and Hindi character images. Sig. Image Process. Int. J. 2, 2215 (2011)
Google Scholar
Bhattacharya, P., Hiware, K., Rajgaria, S., Pochhi, N., Ghosh, K., Ghosh, S.: A comparative study of summarization algorithms applied to legal case judgments. In: Proceedings of European Conference on Information Retrieval (ECIR), pp. 413–428 (2019)
Google Scholar
Bruno, T., Sasa, M., Donko, D.: KNN with TF-IDF based framework for text categorization. Procedia Eng. 69, 1356–1364 (2014)
Article Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Chowdhury, A., Vig, L.: An efficient end-to-end neural model for handwritten text recognition. In: British Machine Vision Conference (2018)
Google Scholar
Constum, T., et al.: Recognition and information extraction in historical handwritten tables: Toward understanding early 20th century Paris census. In: Proceedings of IAPR Workshop on Document Analysis Systems (DAS), pp. 143–157 (2022)
Google Scholar
Diesendruck, L., Marini, L., Kooper, R., Kejriwal, M., McHenry, K.: A framework to access handwritten information within large digitized paper collections. In: Proceedings of IEEE International Conference on E-Science, pp. 1–10 (10 2012)
Google Scholar
Divya, S., Gaurav, H.: Associating field components in heterogeneous handwritten form images using graph autoencoder. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 5, pp. 41–46 (2019)
Google Scholar
Duong, Q., Hämäläinen, M., Hengchen, S.: An unsupervised method for OCR post-correction and spelling normalisation for Finnish. In: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pp. 240–248 (2021)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Ha, H.T., Medved’, M., Nevěřilová, Z., Horák, A.: Recognition of OCR invoice metadata block types. In: Proceedings of Text, Speech, and Dialogue, pp. 304–312 (2018)
Google Scholar
Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 991–995 (2015)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)
Google Scholar
Hegghammer, T.: OCR with tesseract, Amazon textract, and google document AI: a benchmarking experiment. J. Comput. Soc. Sci. 5(1), 861–882 (2022)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Google Scholar
Huang, Z., et al.: Competition on scanned receipt OCR and information extraction. In: Proceedings of International Conference on Document Analysis and Recognition (ICDAR), pp. 1516–1520 (2019)
Google Scholar
Jaume, G., Kemal Ekenel, H., Thiran, J.P.: FUNSD: a dataset for form understanding in noisy scanned documents. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 2, pp. 1–6 (2019)
Google Scholar
Kim, G., et al.: OCR-free document understanding transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022. ECCV 2022. LNCS, vol. 13688, pp. 498–517. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_29
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 25. Curran Associates, Inc. (2012)
Google Scholar
Li, M., et al.: TrOCR: transformer-based optical character recognition with pre-trained models. In: Proceedings of AAAI (2023)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)
Google Scholar
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5, 39–46 (2002)
Google Scholar
Michael, J., Labahn, R., Grüning, T., Zöllner, J.: Evaluating sequence-to-sequence models for handwritten text recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1286–1293 (2019)
Google Scholar
Palm, R.B., Winther, O., Laws, F.: CloudScan - a configuration-free invoice analysis system using recurrent neural networks. In: Proceedings of IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 406–413 (2017)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Paul, S., Goyal, P., Ghosh, S.: LeSICiN: a heterogeneous graph-based approach for automatic legal statute identification from Indian legal documents. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36 (2022)
Google Scholar
Paul, S., Mandal, A., Goyal, P., Ghosh, S.: Pre-trained language models for the legal domain: a case study on Indian law. In: Proceedings of the International Conference on Artificial Intelligence and Law (ICAIL) (2023)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the International Conference on Neural Information Processing Systems - vol. 1, pp. 91–99. MIT Press (2015)
Google Scholar
Subramani, N., Matton, A., Greaves, M., Lam, A.: A survey of deep learning approaches for OCR and document understanding. CoRR abs/2011.13534 (2020)
Google Scholar
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10778–10787 (2020)
Google Scholar
Watanabe, T., Luo, Q., Sugie, N.: Layout recognition of multi-kinds of table-form documents. IEEE Trans. Pattern Anal. Mach. Intell. 17(4), 432–445 (1995)
Google Scholar
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: Proceedings of International Conference on Learning Representations (ICLR) (2021)
Google Scholar

Download references

Acknowledgement

This work is partially supported by research grants from Wipro Limited (www.wipro.com) and IIT Jodhpur (www.iitj.ac.in).

Author information

Authors and Affiliations

Wipro Limited, Salt Lake, Kolkata, India
Sagar Chakraborty
Department of Computer Science and Engineering, Indian Institute of Technology, Jodhpur, Rajasthan, India
Sagar Chakraborty & Gaurav Harit
Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, West Bengal, India
Saptarshi Ghosh

Authors

Sagar Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar
Gaurav Harit
View author publications
You can also search for this author in PubMed Google Scholar
Saptarshi Ghosh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sagar Chakraborty .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chakraborty, S., Harit, G., Ghosh, S. (2023). TransDocAnalyser: A Framework for Semi-structured Offline Handwritten Documents Analysis with an Application to Legal Domain. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14187. Springer, Cham. https://doi.org/10.1007/978-3-031-41676-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-41676-7_3
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41675-0
Online ISBN: 978-3-031-41676-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

TransDocAnalyser: A Framework for Semi-structured Offline Handwritten Documents Analysis with an Application to Legal Domain