Skip to main content

TACTFUL: A Framework for Targeted Active Learning for Document Analysis

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Abstract

Document Layout Parsing is an important step in an OCR pipeline, and several research attempts toward supervised, and semi-supervised deep learning methods are proposed for accurately identifying the complex structure of a document. These deep models require a large amount of data to get promising results. Creating such data requires considerable effort and annotation costs. To minimize both cost and effort, Active learning (AL) approaches are proposed. We propose a framework TACTFUL for Targeted Active Learning for Document Layout Analysis. Our contributions include (i) a framework that makes effective use of the AL paradigm and Submodular Mutual Information (SMI) functions to tackle object-level class imbalance, given a very small set of labeled data. (ii) an approach that decouples object detection from feature selection for subset selection that improves the targeted selection by a considerable margin against the current state-of-the-art and is computationally effective. (iii) A new dataset for legacy Sanskrit books on which we demonstrate the effectiveness of our approach, in addition to reporting improvements over state-of-the-art approaches on other benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abdallah, A., Berendeyev, A., Nuradin, I., Nurseitov, D.: Tncr: table net detection and classification dataset. Neurocomputing 473, 79–97 (2022)

    Article  Google Scholar 

  2. Binmakhashen, G.M., Mahmoud, S.A.: Document layout analysis: a comprehensive survey. ACM Comput. Surv. (CSUR) 52(6), 1–36 (2019)

    Article  Google Scholar 

  3. Chen, B.C., Wu, Z., Davis, L.S., Lim, S.N.: Efficient object embedding for spliced image retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14965–14975 (2021)

    Google Scholar 

  4. Fujishige, S.: Submodular Functions and Optimization. Elsevier, Amsterdam (2005)

    MATH  Google Scholar 

  5. Ginsparg, P.: Arxiv at 20. Nature 476(7359), 145–147 (2011)

    Article  Google Scholar 

  6. Gupta, A., Levin, R.: The online submodular cover problem. In: ACM-SIAM Symposium on Discrete Algorithms (2020)

    Google Scholar 

  7. Iyer, R., Khargoankar, N., Bilmes, J., Asnani, H.: Submodular combinatorial information measures with applications in machine learning (2020). arXiv preprint arXiv:2006.15412

  8. Kothawade, S., Ghosh, S., Shekhar, S., Xiang, Y., Iyer, R.: Talisman: targeted active learning for object detection with rare classes and slices using submodular mutual information (2021). arXiv preprint arXiv:2112.00166

  9. Kothawade, S., Kaushal, V., Ramakrishnan, G., Bilmes, J., Iyer, R.: Prism: a rich class of parameterized submodular information measures for guided subset selection (2021). arXiv preprint arXiv:2103.00128

  10. Li, J., Li, L., Li, T.: Multi-document summarization via submodularity. Appl. Intell. 37(3), 420–430 (2012)

    Article  Google Scholar 

  11. Li, M., et al.: Docbank: a benchmark dataset for document layout analysis (2020). arXiv preprint arXiv:2006.01038

  12. Lin, H.: Submodularity in natural language processing: algorithms and applications. PhD thesis (2012)

    Google Scholar 

  13. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  14. Ren, P., et al.: A survey of deep active learning. ACM Comput. Surv. (CSUR) 54(9), 1–40 (2021)

    Google Scholar 

  15. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

    Google Scholar 

  16. Shekhar, S., Guda, B.P.R., Chaubey, A., Jindal, I., Jain, A.: Opad: an optimized policy-based active learning framework for document content analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2826–2836 (2022)

    Google Scholar 

  17. Shen, Z., Zhao, J., Dell, M., Yu, Y., Li, W.: Olala: object-level active learning for efficient document layout annotation. arXiv preprint arXiv:2010.01762 (2020)

  18. Sun, H.Y., Zhong, Y., Wang, D.H.: Attention-based deep learning methods for document layout analysis. In: Proceedings of the 8th International Conference on Computing and Artificial Intelligence, pp. 32–37 (2022)

    Google Scholar 

  19. Tharwat, A., Schenck, W.: A survey on active learning: state-of-the-art, practical challenges and research directions. Mathematics 11(4), 820 (2023)

    Article  Google Scholar 

  20. Vasudevan, A.B., Gygli, M., Volokitin, A., Gool, L.V.: Query-adaptive video summarization via quality-aware relevance estimation. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 582–590 (2017)

    Google Scholar 

  21. Wei, K., Iyer, R., Bilmes, J.: Submodularity in data subset selection and active learning. In: International Conference on Machine Learning, pp. 1954–1963. PMLR (2015)

    Google Scholar 

  22. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2

  23. Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE (2019)

    Google Scholar 

Download references

Acknowledgements

We acknowledge the support of a grant from IRCC, IIT Bombay, and MEITY, Government of India, through the National Language Translation Mission-Bhashini project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Venkatapathy Subramanian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Subramanian, V., Poudel, S., Chaudhuri, P., Ramakrishnan, G. (2023). TACTFUL: A Framework for Targeted Active Learning for Document Analysis. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41734-4_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41733-7

  • Online ISBN: 978-3-031-41734-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics