Skip to main content

DocVisor: A Multi-purpose Web-Based Interactive Visualizer for Document Image Analytics

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 Workshops (ICDAR 2021)

Abstract

The performance for many document-based problems (OCR, Document Layout Segmentation, etc.) is typically studied in terms of a single aggregate performance measure (Intersection-Over-Union, Character Error Rate, etc.). While useful, the aggregation is a trade-off between instance-level analysis of predictions which may shed better light on a particular approach’s biases and performance characteristics. To enable a systematic understanding of instance-level predictions, we introduce DocVisor - a web-based multi-purpose visualization tool for analyzing the data and predictions related to various document image understanding problems. DocVisor provides support for visualizing data sorted using custom-specified performance metrics and display styles. It also supports the visualization of intermediate outputs (e.g., attention maps, coarse predictions) of the processing pipelines. This paper describes the appealing features of DocVisor and showcases its multi-purpose nature and general utility. We illustrate DocVisor’s functionality for four popular document understanding tasks – document region layout segmentation, tabular data detection, weakly-supervised document region segmentation and optical character recognition. DocVisor is available as a documented public repository for use by the community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/ihdia.

  2. 2.

    www.primaresearch.org/tools/Aletheia.

  3. 3.

    http://monkweb.nl/.

  4. 4.

    https://readcoop.eu/transkribus/.

  5. 5.

    https://tinyurl.com/5dd7dh6a.

  6. 6.

    https://github.com/OCR4all/OCR4all.

  7. 7.

    https://tinyurl.com/69thzj3c.

References

  1. Alberti, M., Bouillon, M., Ingold, R., Liwicki, M.: Open evaluation tool for layout analysis of document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 4, pp. 43–47. IEEE (2017)

    Google Scholar 

  2. Bukhari, S.S., Kadi, A., Jouneh, M.A., Mir, F.M., Dengel, A.: anyOCR: an open-source OCR system for historical archives. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 305–310. IEEE (2017)

    Google Scholar 

  3. Cheriet, M., Kharma, N., Liu, C.L., Suen, C.: Character Recognition Systems: A Guide for Students and Practitioners. John Wiley & Sons, Hoboken (2007)

    Book  Google Scholar 

  4. Clausner, C., Pletschacher, S., Antonacopoulos, A.: Aletheia - An advanced document layout and text ground-truthing system for production environments. In: 2011 International Conference on Document Analysis and Recognition, pp. 48–52. IEEE (2011)

    Google Scholar 

  5. Dwivedi, A., Saluja, R., Kiran Sarvadevabhatla, R.: An OCR for classical indic documents containing arbitrarily long words. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2020

    Google Scholar 

  6. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  7. Gatos, B., et al.: Ground-truth production in the tranScriptorium project. In: 2014 11th IAPR International Workshop on Document Analysis Systems, pp. 237–241. IEEE (2014)

    Google Scholar 

  8. Google: Convert pdf and photo files to text (2020). https://support.google.com/drive/answer/176692?hl=en. Accessed 26 March 2020

  9. Hellwig, O.: Indsenz OCR (2020). http://www.indsenz.com/. Accessed on 26 March 2020

  10. Huttenlocher, D.P., Klanderman, G.A., Rucklidge, W.J.: Comparing images using the hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 850–863 (1993)

    Article  Google Scholar 

  11. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7

  12. Jenckel, M., Bukhari, S.S., Dengel, A.: anyOCR: a sequence learning based OCR system for unlabeled historical documents. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 4035–4040. IEEE (2016)

    Google Scholar 

  13. Kahle, P., Colutto, S., Hackl, G., Mühlberger, G.: Transkribus-a service platform for transcription, recognition and retrieval of historical documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 4, pp. 19–24. IEEE (2017)

    Google Scholar 

  14. Kiessling, B., Tissot, R., Stokes, P., Ezra, D.S.B.: escriptorium: an open source platform for historical document analysis. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 2, p. 19. IEEE (2019)

    Google Scholar 

  15. Kumar, M.P., Kiran, S.R., Nayani, A., Jawahar, C., Narayanan, P.: Tools for developing OCRS for Indian scripts. In: 2003 Conference on Computer Vision and Pattern Recognition Workshop, vol. 3, p. 33. IEEE (2003)

    Google Scholar 

  16. Li, M., Xu, Y., Cui, L., Huang, S., Wei, F., Li, Z., Zhou, M.: DocBank: a benchmark dataset for document layout analysis (2020)

    Google Scholar 

  17. Oliveira, S.A., Seguin, B., Kaplan, F.: dhsegment: a generic deep-learning approach for document segmentation. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 7–12. IEEE (2018)

    Google Scholar 

  18. Prusty, A., Aitha, S., Trivedi, A., Sarvadevabhatla, R.K.: Indiscapes: instance segmentation networks for layout parsing of historical indic manuscripts. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 999–1006. IEEE (2019)

    Google Scholar 

  19. Saluja, R., Adiga, D., Ramakrishnan, G., Chaudhuri, P., Carman, M.: A framework for document specific error detection and corrections in Indic OCR. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 4, pp. 25–30. IEEE (2017)

    Google Scholar 

  20. Schomaker, L.: Design considerations for a large-scale image-based text search engine in historical manuscript collections. IT-Inf. Technol. 58(2), 80–88 (2016)

    Google Scholar 

  21. Sharan, S.P., Aitha, S., Amandeep, K., Trivedi, A., Augustine, A., Sarvadevabhatla, R.K.: Palmira: a deep deformable network for instance segmentation of dense and uneven layouts in handwritten manuscripts. In: International Conference on Document Analysis Recognition, ICDAR 2021 (2021)

    Google Scholar 

  22. Smith, R.: Tesseract-OCR (2020). https://github.com/tesseract-ocr/. Accessed 26 Mar 2020

  23. Trivedi, A., Sarvadevabhatla, R.K.: HInDoLA: A Unified Cloud-based Platform for Annotation, Visualization and Machine Learning-based Layout Analysis of Historical Manuscripts. In: 2nd International Workshop on Open Services and Tools for Document Analysis, OST@ICDAR 2019, Sydney, Australia, September 22–25, 2019. pp. 31–35. IEEE (2019). https://doi.org/10.1109/ICDARW.2019.10035, https://doi.org/10.1109/ICDARW.2019.10035

  24. Trivedi, A., Sarvadevabhatla, R.K.: BoundaryNet: an attentive deep network with fast marching distance maps for semi-automatic layout annotation. In: International Conference on Document Analysis Recognition, ICDAR 2021 (2021)

    Google Scholar 

  25. Wojna, Z., et al.: Attention-based extraction of structured information from street view imagery. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 844–850. IEEE (2017)

    Google Scholar 

  26. Zhong, X., ShafieiBavani, E., Yepes, A.J.: Image-based table recognition: data, model, and evaluation. arXiv preprint arXiv:1911.10683 (2019)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Belagavi, K., Tadimeti, P., Sarvadevabhatla, R.K. (2021). DocVisor: A Multi-purpose Web-Based Interactive Visualizer for Document Image Analytics. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12917. Springer, Cham. https://doi.org/10.1007/978-3-030-86159-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86159-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86158-2

  • Online ISBN: 978-3-030-86159-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics