Skip to main content

A Platform for Large Scale Auto Annotation of Scanned Documents Featuring Real-Time Model Building and Model Pooling

  • Conference paper
  • First Online:
Computer Vision and Image Processing (CVIP 2021)

Abstract

Document digitization is an active area of research especially involving handwritten manuscripts. While the most common use cases involve digital libraries, there are other important applications in the area of electronic health records where handwritten text is predominant in developing worlds. The state-of-the-art approaches are domain-specific, and scaling across domains is still an open research problem. We report here a platform for real-time annotation and training of sub-region models in scanned documents using model pools and plug-n-play of annotation services. Given a document, sub-regions are annotated with textual labels. The textual regions themselves may correspond to characters or words or any other pattern of interest. For a given sub-region category, several sub-regions may be present in a given page or across pages. In the proposed system, a user needs to annotate only some of the sub-regions. A convolutional neural network (CNN) model is built for each of the sub-region categories, and named sets or pools of such models are prepared for application on any new document. We observe that a sub-region label may be provided by an existing optical character recognition system instead of a human annotator. In this regard, we have provisioned annotation as a service where any third-party system can be integrated into a plug-n-play mechanism. The state-of-the-art systems focused on having a pre-trained monolithic model which suffers from the problem of catastrophic forgetting when new sub-region classes are added over time. In our approach, due to sub-region specific models, the previous data models are not touched and hence providing a truly incremental learning solution. We have carried out the validation by choosing handwritten data sets belonging to different languages such as Devanagari, Kannada, Telugu, English that span diverse text patterns and the models produced by our sub-region detection algorithm were evaluated on documents containing hundreds of handwritten scripts by several authors. With respect to the performance of our models on the validation data sets, we found mAP scores for different data sets as follows: Devanagari words (96.18); Telugu words (93.20); Devanagari letters (100); Kannada letters (99.83); Tesseract English word-level annotations (90). We have also presented a single page annotation as proof of concept for annotation as a service for Kannada, Telugu, Malayalam, and English recognition to learn from Tesseract annotations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://ocr.tdil-dc.gov.in.

  2. 2.

    https://wiki.gnome.org/Apps/OCRFeeder.

  3. 3.

    http://kconnect.eu/semantic-annotation-for-medical-texts.

  4. 4.

    https://github.com/microsoft/VoTT.

  5. 5.

    https://github.com/openvinotoolkit/cvat.

  6. 6.

    http://www.iapr-tc11.org/mediawiki/index.php?title=Devanagari_Character_Dataset.

  7. 7.

    http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/.

  8. 8.

    https://www.pseudoaj.com/2016/05/pseudoajdataset0-telugu-handwritten.html.

  9. 9.

    http://yann.lecun.com/exdb/mnist/.

  10. 10.

    https://github.com/garain/Handwritten-and-Printed-Text-Classification-in-Doctors-Prescription.

References

  1. Gregory, C., Saeed, A., Jonathan, T., André, V.S.: Emnist: extending mnist to handwritten letters. In: International Joint Conference on Neural Networks, pp. 2921–2926 (2017)

    Google Scholar 

  2. Bryan, C.R., Antonio, T., Kevin, P.M., William, T.F.: Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vision 77(1–3), 157–173 (2008)

    Google Scholar 

  3. Abhishek, D., Andrew, Z.: The VIA annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2276–2279 (2019)

    Google Scholar 

  4. Kartik, D., Praveen, K., Minesh, K., Jawahar, C.V.: Offline handwriting recognition on devanagari using a new benchmark dataset. In: International Workshop on Document Analysis Systems, pp. 25–30 (2018)

    Google Scholar 

  5. Aurelien, G.: Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Newton (2017)

    Google Scholar 

  6. Ross, G.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  7. Réka, H., Ákos, D., Gábor, H., Nikita, M., Péter, H.: Annotatorj: an imagej plugin to ease hand annotation of cellular compartments. Molec. Biol. cell 31(20), 2179–2186 (2020)

    Article  Google Scholar 

  8. Vural, H., Koyuncu, H., Guney, S.: A systematic literature review on microservices. In: International Conference on Computational Science and its Applications, pp. 203–217 (2017)

    Google Scholar 

  9. Kartik, D., Praveen, K., Minesh, M., Jawahar, C.V.: Towards spotting and recognition of handwritten words in indic scripts. In: International Conference on Frontiers in Handwriting Recognition, pp. 32–37 (2018)

    Google Scholar 

  10. Li, H., Wang, X., Ding, S.: Research and development of neural network ensembles: a survey. Artif. Intell. Rev. 49(4), 455–479 (2017). https://doi.org/10.1007/s10462-016-9535-1

    Article  Google Scholar 

  11. Thomas, M.B.: The OCRopus open source ocr system. Doc. Recogn. Retrieval 6815, 68150F (2008)

    Google Scholar 

  12. Thomas, M.B., Adnan, U.H., Mayce, A.A.,Faisal, S.: High-Performance OCR for Printed English and Fraktur using LSTM networks. In: 12th International Conference on Document Analysis and Recognition, vol. 1, pp. 683–687 (2013)

    Google Scholar 

  13. Juan, M.C., et al.: tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles. In: Database, vol. 1 (2014)

    Google Scholar 

  14. Jamshed, M., Maira, S., Khan, R.A., Mueen, U.: Handwritten optical character recognition : a comprehensive systematic literature review. IEEE Access 8, 142642–142668 (2020)

    Article  Google Scholar 

  15. Re, M., Valentini, G.: Ensemble methods: a review. In: Advances in Machine Learning and Data Mining for Astronomy, pp. 563–594 (2012)

    Google Scholar 

  16. Inyoung, P., Sangjun, O., Taeyeong, K., Injung, K.: Overcoming catastrophic forgetting by neuron-level plasticity control. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5339–5346 (2020)

    Google Scholar 

  17. Vinay, P.U.: Kannada-MNIST: a new handwritten digits dataset for the Kannada language. arXiv e-prints p. abs/1908.01242 (2019)

    Google Scholar 

  18. Joseph, R., Santhosh, D., Ross, G., Ali, F.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 779–788 (2016)

    Google Scholar 

  19. Ray, W.S.: History of the tesseract OCR engine: what worked and what didn’t. In: Electronic Imaging Conference on Document Recognition and Retrieval, vol. 8658, p. 865802 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yeturu Kalidas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Prashanth, K. et al. (2022). A Platform for Large Scale Auto Annotation of Scanned Documents Featuring Real-Time Model Building and Model Pooling. In: Raman, B., Murala, S., Chowdhury, A., Dhall, A., Goyal, P. (eds) Computer Vision and Image Processing. CVIP 2021. Communications in Computer and Information Science, vol 1567. Springer, Cham. https://doi.org/10.1007/978-3-031-11346-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11346-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11345-1

  • Online ISBN: 978-3-031-11346-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics