Abstract
Document Layout Parsing is an important step in an OCR pipeline, and several research attempts toward supervised, and semi-supervised deep learning methods are proposed for accurately identifying the complex structure of a document. These deep models require a large amount of data to get promising results. Creating such data requires considerable effort and annotation costs. To minimize both cost and effort, Active learning (AL) approaches are proposed. We propose a framework TACTFUL for Targeted Active Learning for Document Layout Analysis. Our contributions include (i) a framework that makes effective use of the AL paradigm and Submodular Mutual Information (SMI) functions to tackle object-level class imbalance, given a very small set of labeled data. (ii) an approach that decouples object detection from feature selection for subset selection that improves the targeted selection by a considerable margin against the current state-of-the-art and is computationally effective. (iii) A new dataset for legacy Sanskrit books on which we demonstrate the effectiveness of our approach, in addition to reporting improvements over state-of-the-art approaches on other benchmark datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdallah, A., Berendeyev, A., Nuradin, I., Nurseitov, D.: Tncr: table net detection and classification dataset. Neurocomputing 473, 79–97 (2022)
Binmakhashen, G.M., Mahmoud, S.A.: Document layout analysis: a comprehensive survey. ACM Comput. Surv. (CSUR) 52(6), 1–36 (2019)
Chen, B.C., Wu, Z., Davis, L.S., Lim, S.N.: Efficient object embedding for spliced image retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14965–14975 (2021)
Fujishige, S.: Submodular Functions and Optimization. Elsevier, Amsterdam (2005)
Ginsparg, P.: Arxiv at 20. Nature 476(7359), 145–147 (2011)
Gupta, A., Levin, R.: The online submodular cover problem. In: ACM-SIAM Symposium on Discrete Algorithms (2020)
Iyer, R., Khargoankar, N., Bilmes, J., Asnani, H.: Submodular combinatorial information measures with applications in machine learning (2020). arXiv preprint arXiv:2006.15412
Kothawade, S., Ghosh, S., Shekhar, S., Xiang, Y., Iyer, R.: Talisman: targeted active learning for object detection with rare classes and slices using submodular mutual information (2021). arXiv preprint arXiv:2112.00166
Kothawade, S., Kaushal, V., Ramakrishnan, G., Bilmes, J., Iyer, R.: Prism: a rich class of parameterized submodular information measures for guided subset selection (2021). arXiv preprint arXiv:2103.00128
Li, J., Li, L., Li, T.: Multi-document summarization via submodularity. Appl. Intell. 37(3), 420–430 (2012)
Li, M., et al.: Docbank: a benchmark dataset for document layout analysis (2020). arXiv preprint arXiv:2006.01038
Lin, H.: Submodularity in natural language processing: algorithms and applications. PhD thesis (2012)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Ren, P., et al.: A survey of deep active learning. ACM Comput. Surv. (CSUR) 54(9), 1–40 (2021)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Shekhar, S., Guda, B.P.R., Chaubey, A., Jindal, I., Jain, A.: Opad: an optimized policy-based active learning framework for document content analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2826–2836 (2022)
Shen, Z., Zhao, J., Dell, M., Yu, Y., Li, W.: Olala: object-level active learning for efficient document layout annotation. arXiv preprint arXiv:2010.01762 (2020)
Sun, H.Y., Zhong, Y., Wang, D.H.: Attention-based deep learning methods for document layout analysis. In: Proceedings of the 8th International Conference on Computing and Artificial Intelligence, pp. 32–37 (2022)
Tharwat, A., Schenck, W.: A survey on active learning: state-of-the-art, practical challenges and research directions. Mathematics 11(4), 820 (2023)
Vasudevan, A.B., Gygli, M., Volokitin, A., Gool, L.V.: Query-adaptive video summarization via quality-aware relevance estimation. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 582–590 (2017)
Wei, K., Iyer, R., Bilmes, J.: Submodularity in data subset selection and active learning. In: International Conference on Machine Learning, pp. 1954–1963. PMLR (2015)
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE (2019)
Acknowledgements
We acknowledge the support of a grant from IRCC, IIT Bombay, and MEITY, Government of India, through the National Language Translation Mission-Bhashini project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Subramanian, V., Poudel, S., Chaudhuri, P., Ramakrishnan, G. (2023). TACTFUL: A Framework for Targeted Active Learning for Document Analysis. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-41734-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)