A Platform for Large Scale Auto Annotation of Scanned Documents Featuring Real-Time Model Building and Model Pooling

Prashanth, Komuravelli; Kowndinya, Boyalakuntla; Vijay, Chilaka; Teja, Dande; Rodge, Vidya; Velaga, Ramya; Deshmukh, Reena Abasaheb; Kalidas, Yeturu

doi:10.1007/978-3-031-11346-8_6

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1567))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

920 Accesses

Abstract

Document digitization is an active area of research especially involving handwritten manuscripts. While the most common use cases involve digital libraries, there are other important applications in the area of electronic health records where handwritten text is predominant in developing worlds. The state-of-the-art approaches are domain-specific, and scaling across domains is still an open research problem. We report here a platform for real-time annotation and training of sub-region models in scanned documents using model pools and plug-n-play of annotation services. Given a document, sub-regions are annotated with textual labels. The textual regions themselves may correspond to characters or words or any other pattern of interest. For a given sub-region category, several sub-regions may be present in a given page or across pages. In the proposed system, a user needs to annotate only some of the sub-regions. A convolutional neural network (CNN) model is built for each of the sub-region categories, and named sets or pools of such models are prepared for application on any new document. We observe that a sub-region label may be provided by an existing optical character recognition system instead of a human annotator. In this regard, we have provisioned annotation as a service where any third-party system can be integrated into a plug-n-play mechanism. The state-of-the-art systems focused on having a pre-trained monolithic model which suffers from the problem of catastrophic forgetting when new sub-region classes are added over time. In our approach, due to sub-region specific models, the previous data models are not touched and hence providing a truly incremental learning solution. We have carried out the validation by choosing handwritten data sets belonging to different languages such as Devanagari, Kannada, Telugu, English that span diverse text patterns and the models produced by our sub-region detection algorithm were evaluated on documents containing hundreds of handwritten scripts by several authors. With respect to the performance of our models on the validation data sets, we found mAP scores for different data sets as follows: Devanagari words (96.18); Telugu words (93.20); Devanagari letters (100); Kannada letters (99.83); Tesseract English word-level annotations (90). We have also presented a single page annotation as proof of concept for annotation as a service for Kannada, Telugu, Malayalam, and English recognition to learn from Tesseract annotations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Gregory, C., Saeed, A., Jonathan, T., André, V.S.: Emnist: extending mnist to handwritten letters. In: International Joint Conference on Neural Networks, pp. 2921–2926 (2017)
Google Scholar
Bryan, C.R., Antonio, T., Kevin, P.M., William, T.F.: Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vision 77(1–3), 157–173 (2008)
Google Scholar
Abhishek, D., Andrew, Z.: The VIA annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2276–2279 (2019)
Google Scholar
Kartik, D., Praveen, K., Minesh, K., Jawahar, C.V.: Offline handwriting recognition on devanagari using a new benchmark dataset. In: International Workshop on Document Analysis Systems, pp. 25–30 (2018)
Google Scholar
Aurelien, G.: Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Newton (2017)
Google Scholar
Ross, G.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Réka, H., Ákos, D., Gábor, H., Nikita, M., Péter, H.: Annotatorj: an imagej plugin to ease hand annotation of cellular compartments. Molec. Biol. cell 31(20), 2179–2186 (2020)
Article Google Scholar
Vural, H., Koyuncu, H., Guney, S.: A systematic literature review on microservices. In: International Conference on Computational Science and its Applications, pp. 203–217 (2017)
Google Scholar
Kartik, D., Praveen, K., Minesh, M., Jawahar, C.V.: Towards spotting and recognition of handwritten words in indic scripts. In: International Conference on Frontiers in Handwriting Recognition, pp. 32–37 (2018)
Google Scholar
Li, H., Wang, X., Ding, S.: Research and development of neural network ensembles: a survey. Artif. Intell. Rev. 49(4), 455–479 (2017). https://doi.org/10.1007/s10462-016-9535-1
Article Google Scholar
Thomas, M.B.: The OCRopus open source ocr system. Doc. Recogn. Retrieval 6815, 68150F (2008)
Google Scholar
Thomas, M.B., Adnan, U.H., Mayce, A.A.,Faisal, S.: High-Performance OCR for Printed English and Fraktur using LSTM networks. In: 12th International Conference on Document Analysis and Recognition, vol. 1, pp. 683–687 (2013)
Google Scholar
Juan, M.C., et al.: tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles. In: Database, vol. 1 (2014)
Google Scholar
Jamshed, M., Maira, S., Khan, R.A., Mueen, U.: Handwritten optical character recognition : a comprehensive systematic literature review. IEEE Access 8, 142642–142668 (2020)
Article Google Scholar
Re, M., Valentini, G.: Ensemble methods: a review. In: Advances in Machine Learning and Data Mining for Astronomy, pp. 563–594 (2012)
Google Scholar
Inyoung, P., Sangjun, O., Taeyeong, K., Injung, K.: Overcoming catastrophic forgetting by neuron-level plasticity control. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5339–5346 (2020)
Google Scholar
Vinay, P.U.: Kannada-MNIST: a new handwritten digits dataset for the Kannada language. arXiv e-prints p. abs/1908.01242 (2019)
Google Scholar
Joseph, R., Santhosh, D., Ross, G., Ali, F.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 779–788 (2016)
Google Scholar
Ray, W.S.: History of the tesseract OCR engine: what worked and what didn’t. In: Electronic Imaging Conference on Document Recognition and Retrieval, vol. 8658, p. 865802 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology Tirupati, Tirupati, India
Komuravelli Prashanth, Boyalakuntla Kowndinya, Chilaka Vijay, Dande Teja, Vidya Rodge, Ramya Velaga, Reena Abasaheb Deshmukh & Yeturu Kalidas

Authors

Komuravelli Prashanth
View author publications
You can also search for this author in PubMed Google Scholar
Boyalakuntla Kowndinya
View author publications
You can also search for this author in PubMed Google Scholar
Chilaka Vijay
View author publications
You can also search for this author in PubMed Google Scholar
Dande Teja
View author publications
You can also search for this author in PubMed Google Scholar
Vidya Rodge
View author publications
You can also search for this author in PubMed Google Scholar
Ramya Velaga
View author publications
You can also search for this author in PubMed Google Scholar
Reena Abasaheb Deshmukh
View author publications
You can also search for this author in PubMed Google Scholar
Yeturu Kalidas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yeturu Kalidas .

Editor information

Editors and Affiliations

Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Technology Ropar, Ropar, India
Subrahmanyam Murala
Jadavpur University, Kolkata, India
Ananda Chowdhury
Indian Institute of Technology Ropar, Ropar, India
Abhinav Dhall
Indian Institute of Technology Ropar, Ropar, India
Puneet Goyal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Prashanth, K. et al. (2022). A Platform for Large Scale Auto Annotation of Scanned Documents Featuring Real-Time Model Building and Model Pooling. In: Raman, B., Murala, S., Chowdhury, A., Dhall, A., Goyal, P. (eds) Computer Vision and Image Processing. CVIP 2021. Communications in Computer and Information Science, vol 1567. Springer, Cham. https://doi.org/10.1007/978-3-031-11346-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-11346-8_6
Published: 24 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11345-1
Online ISBN: 978-3-031-11346-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Platform for Large Scale Auto Annotation of Scanned Documents Featuring Real-Time Model Building and Model Pooling