Skip to main content

TransDocAnalyser: A Framework for Semi-structured Offline Handwritten Documents Analysis with an Application to Legal Domain

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14187))

Included in the following conference series:

Abstract

State-of-the-art offline Optical Character Recognition (OCR) frameworks perform poorly on semi-structured handwritten domain-specific documents due to their inability to localize and label form fields with domain-specific semantics. Existing techniques for semi-structured document analysis have primarily used datasets comprising invoices, purchase orders, receipts, and identity-card documents for benchmarking. In this work, we build the first semi-structured document analysis dataset in the legal domain by collecting a large number of First Information Report (FIR) documents from several police stations in India. This dataset, which we call the FIR dataset, is more challenging than most existing document analysis datasets, since it combines a wide variety of handwritten text with printed text. We also propose an end-to-end framework for offline processing of handwritten semi-structured documents, and benchmark it on our novel FIR dataset. Our framework used Encoder-Decoder architecture for localizing and labelling the form fields and for recognizing the handwritten content. The encoder consists of Faster-RCNN and Vision Transformers. Further the Transformer-based decoder architecture is trained with a domain-specific tokenizer. We also propose a post-correction method to handle recognition errors pertaining to the domain-specific terms. Our proposed framework achieves state-of-the-art results on the FIR dataset outperforming several existing models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://en.wikipedia.org/wiki/First_information_report.

  2. 2.

    http://bidhannagarcitypolice.gov.in/fir_record.php.

  3. 3.

    https://home.rajasthan.gov.in/content/homeportal/en.html.

  4. 4.

    https://police.sikkim.gov.in/visitor/fir.

  5. 5.

    https://tripurapolice.gov.in/west/fir-copies.

  6. 6.

    https://police.nagaland.gov.in/fir-2/.

  7. 7.

    https://github.com/wkentaro/labelme.

  8. 8.

    https://github.com/LegalDocumentProcessing/FIR_Dataset_ICDAR2023.

  9. 9.

    We initially compared Tesseract with TrOCR-Base, and found TrOCR to perform much better. Hence subsequent experiments were done with TrOCR only.

References

  1. Amano, A., Asada, N.: Complex table form analysis using graph grammar. In: Lopresti, D., Hu, J., Kashi, R. (eds.) DAS 2002. LNCS, vol. 2423, pp. 283–286. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45869-7_32

    Chapter  MATH  Google Scholar 

  2. Amano, A., Asada, N., Mukunoki, M., Aoyama, M.: Table form document analysis based on the document structure grammar. Int. J. Doc. Anal. Recogn. (IJDAR) 8, 210–213 (2006). https://doi.org/10.1007/s10032-005-0008-3

  3. Bag, S., Harit, G.: A medial axis based thinning strategy and structural feature extraction of character images. In: Proceedings of IEEE International Conference on Image Processing, pp. 2173–2176 (2010)

    Google Scholar 

  4. Bag, S., Harit, G.: An improved contour-based thinning method for character images. Pattern Recogn. Lett. 32(14), 1836–1842 (2011)

    Google Scholar 

  5. Bag, S., Harit, G.: Topographic feature extraction for Bengali and Hindi character images. Sig. Image Process. Int. J. 2, 2215 (2011)

    Google Scholar 

  6. Bhattacharya, P., Hiware, K., Rajgaria, S., Pochhi, N., Ghosh, K., Ghosh, S.: A comparative study of summarization algorithms applied to legal case judgments. In: Proceedings of European Conference on Information Retrieval (ECIR), pp. 413–428 (2019)

    Google Scholar 

  7. Bruno, T., Sasa, M., Donko, D.: KNN with TF-IDF based framework for text categorization. Procedia Eng. 69, 1356–1364 (2014)

    Article  Google Scholar 

  8. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  9. Chowdhury, A., Vig, L.: An efficient end-to-end neural model for handwritten text recognition. In: British Machine Vision Conference (2018)

    Google Scholar 

  10. Constum, T., et al.: Recognition and information extraction in historical handwritten tables: Toward understanding early 20th century Paris census. In: Proceedings of IAPR Workshop on Document Analysis Systems (DAS), pp. 143–157 (2022)

    Google Scholar 

  11. Diesendruck, L., Marini, L., Kooper, R., Kejriwal, M., McHenry, K.: A framework to access handwritten information within large digitized paper collections. In: Proceedings of IEEE International Conference on E-Science, pp. 1–10 (10 2012)

    Google Scholar 

  12. Divya, S., Gaurav, H.: Associating field components in heterogeneous handwritten form images using graph autoencoder. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 5, pp. 41–46 (2019)

    Google Scholar 

  13. Duong, Q., Hämäläinen, M., Hengchen, S.: An unsupervised method for OCR post-correction and spelling normalisation for Finnish. In: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pp. 240–248 (2021)

    Google Scholar 

  14. Girshick, R.: Fast R-CNN. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)

    Google Scholar 

  15. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  16. Ha, H.T., Medved’, M., Nevěřilová, Z., Horák, A.: Recognition of OCR invoice metadata block types. In: Proceedings of Text, Speech, and Dialogue, pp. 304–312 (2018)

    Google Scholar 

  17. Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 991–995 (2015)

    Google Scholar 

  18. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)

    Google Scholar 

  19. Hegghammer, T.: OCR with tesseract, Amazon textract, and google document AI: a benchmarking experiment. J. Comput. Soc. Sci. 5(1), 861–882 (2022)

    Article  Google Scholar 

  20. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Google Scholar 

  21. Huang, Z., et al.: Competition on scanned receipt OCR and information extraction. In: Proceedings of International Conference on Document Analysis and Recognition (ICDAR), pp. 1516–1520 (2019)

    Google Scholar 

  22. Jaume, G., Kemal Ekenel, H., Thiran, J.P.: FUNSD: a dataset for form understanding in noisy scanned documents. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 2, pp. 1–6 (2019)

    Google Scholar 

  23. Kim, G., et al.: OCR-free document understanding transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022. ECCV 2022. LNCS, vol. 13688, pp. 498–517. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_29

  24. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 25. Curran Associates, Inc. (2012)

    Google Scholar 

  25. Li, M., et al.: TrOCR: transformer-based optical character recognition with pre-trained models. In: Proceedings of AAAI (2023)

    Google Scholar 

  26. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)

    Google Scholar 

  27. Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5, 39–46 (2002)

    Google Scholar 

  28. Michael, J., Labahn, R., Grüning, T., Zöllner, J.: Evaluating sequence-to-sequence models for handwritten text recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1286–1293 (2019)

    Google Scholar 

  29. Palm, R.B., Winther, O., Laws, F.: CloudScan - a configuration-free invoice analysis system using recurrent neural networks. In: Proceedings of IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 406–413 (2017)

    Google Scholar 

  30. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

  31. Paul, S., Goyal, P., Ghosh, S.: LeSICiN: a heterogeneous graph-based approach for automatic legal statute identification from Indian legal documents. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36 (2022)

    Google Scholar 

  32. Paul, S., Mandal, A., Goyal, P., Ghosh, S.: Pre-trained language models for the legal domain: a case study on Indian law. In: Proceedings of the International Conference on Artificial Intelligence and Law (ICAIL) (2023)

    Google Scholar 

  33. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the International Conference on Neural Information Processing Systems - vol. 1, pp. 91–99. MIT Press (2015)

    Google Scholar 

  34. Subramani, N., Matton, A., Greaves, M., Lam, A.: A survey of deep learning approaches for OCR and document understanding. CoRR abs/2011.13534 (2020)

    Google Scholar 

  35. Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10778–10787 (2020)

    Google Scholar 

  36. Watanabe, T., Luo, Q., Sugie, N.: Layout recognition of multi-kinds of table-form documents. IEEE Trans. Pattern Anal. Mach. Intell. 17(4), 432–445 (1995)

    Google Scholar 

  37. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: Proceedings of International Conference on Learning Representations (ICLR) (2021)

    Google Scholar 

Download references

Acknowledgement

This work is partially supported by research grants from Wipro Limited (www.wipro.com) and IIT Jodhpur (www.iitj.ac.in).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sagar Chakraborty .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chakraborty, S., Harit, G., Ghosh, S. (2023). TransDocAnalyser: A Framework for Semi-structured Offline Handwritten Documents Analysis with an Application to Legal Domain. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14187. Springer, Cham. https://doi.org/10.1007/978-3-031-41676-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41676-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41675-0

  • Online ISBN: 978-3-031-41676-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics