TACTFUL: A Framework for Targeted Active Learning for Document Analysis

Subramanian, Venkatapathy; Poudel, Sagar; Chaudhuri, Parag; Ramakrishnan, Ganesh

doi:10.1007/978-3-031-41734-4_16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14191))

Included in the following conference series:

International Conference on Document Analysis and Recognition

631 Accesses

Abstract

Document Layout Parsing is an important step in an OCR pipeline, and several research attempts toward supervised, and semi-supervised deep learning methods are proposed for accurately identifying the complex structure of a document. These deep models require a large amount of data to get promising results. Creating such data requires considerable effort and annotation costs. To minimize both cost and effort, Active learning (AL) approaches are proposed. We propose a framework TACTFUL for Targeted Active Learning for Document Layout Analysis. Our contributions include (i) a framework that makes effective use of the AL paradigm and Submodular Mutual Information (SMI) functions to tackle object-level class imbalance, given a very small set of labeled data. (ii) an approach that decouples object detection from feature selection for subset selection that improves the targeted selection by a considerable margin against the current state-of-the-art and is computationally effective. (iii) A new dataset for legacy Sanskrit books on which we demonstrate the effectiveness of our approach, in addition to reporting improvements over state-of-the-art approaches on other benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdallah, A., Berendeyev, A., Nuradin, I., Nurseitov, D.: Tncr: table net detection and classification dataset. Neurocomputing 473, 79–97 (2022)
Article Google Scholar
Binmakhashen, G.M., Mahmoud, S.A.: Document layout analysis: a comprehensive survey. ACM Comput. Surv. (CSUR) 52(6), 1–36 (2019)
Article Google Scholar
Chen, B.C., Wu, Z., Davis, L.S., Lim, S.N.: Efficient object embedding for spliced image retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14965–14975 (2021)
Google Scholar
Fujishige, S.: Submodular Functions and Optimization. Elsevier, Amsterdam (2005)
MATH Google Scholar
Ginsparg, P.: Arxiv at 20. Nature 476(7359), 145–147 (2011)
Article Google Scholar
Gupta, A., Levin, R.: The online submodular cover problem. In: ACM-SIAM Symposium on Discrete Algorithms (2020)
Google Scholar
Iyer, R., Khargoankar, N., Bilmes, J., Asnani, H.: Submodular combinatorial information measures with applications in machine learning (2020). arXiv preprint arXiv:2006.15412
Kothawade, S., Ghosh, S., Shekhar, S., Xiang, Y., Iyer, R.: Talisman: targeted active learning for object detection with rare classes and slices using submodular mutual information (2021). arXiv preprint arXiv:2112.00166
Kothawade, S., Kaushal, V., Ramakrishnan, G., Bilmes, J., Iyer, R.: Prism: a rich class of parameterized submodular information measures for guided subset selection (2021). arXiv preprint arXiv:2103.00128
Li, J., Li, L., Li, T.: Multi-document summarization via submodularity. Appl. Intell. 37(3), 420–430 (2012)
Article Google Scholar
Li, M., et al.: Docbank: a benchmark dataset for document layout analysis (2020). arXiv preprint arXiv:2006.01038
Lin, H.: Submodularity in natural language processing: algorithms and applications. PhD thesis (2012)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Ren, P., et al.: A survey of deep active learning. ACM Comput. Surv. (CSUR) 54(9), 1–40 (2021)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Shekhar, S., Guda, B.P.R., Chaubey, A., Jindal, I., Jain, A.: Opad: an optimized policy-based active learning framework for document content analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2826–2836 (2022)
Google Scholar
Shen, Z., Zhao, J., Dell, M., Yu, Y., Li, W.: Olala: object-level active learning for efficient document layout annotation. arXiv preprint arXiv:2010.01762 (2020)
Sun, H.Y., Zhong, Y., Wang, D.H.: Attention-based deep learning methods for document layout analysis. In: Proceedings of the 8th International Conference on Computing and Artificial Intelligence, pp. 32–37 (2022)
Google Scholar
Tharwat, A., Schenck, W.: A survey on active learning: state-of-the-art, practical challenges and research directions. Mathematics 11(4), 820 (2023)
Article Google Scholar
Vasudevan, A.B., Gygli, M., Volokitin, A., Gool, L.V.: Query-adaptive video summarization via quality-aware relevance estimation. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 582–590 (2017)
Google Scholar
Wei, K., Iyer, R., Bilmes, J.: Submodularity in data subset selection and active learning. In: International Conference on Machine Learning, pp. 1954–1963. PMLR (2015)
Google Scholar
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE (2019)
Google Scholar

Download references

Acknowledgements

We acknowledge the support of a grant from IRCC, IIT Bombay, and MEITY, Government of India, through the National Language Translation Mission-Bhashini project.

Author information

Authors and Affiliations

Indian Institute of Technology Bombay, Mumbai, 400076, Maharashtra, India
Venkatapathy Subramanian, Sagar Poudel, Parag Chaudhuri & Ganesh Ramakrishnan
Anaadi Rural AI Center, Dindigul, India
Venkatapathy Subramanian

Authors

Venkatapathy Subramanian
View author publications
You can also search for this author in PubMed Google Scholar
Sagar Poudel
View author publications
You can also search for this author in PubMed Google Scholar
Parag Chaudhuri
View author publications
You can also search for this author in PubMed Google Scholar
Ganesh Ramakrishnan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Venkatapathy Subramanian .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Subramanian, V., Poudel, S., Chaudhuri, P., Ramakrishnan, G. (2023). TACTFUL: A Framework for Targeted Active Learning for Document Analysis. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-41734-4_16
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

TACTFUL: A Framework for Targeted Active Learning for Document Analysis