Segmentation for document layout analysis: not dead yet

Markewich, Logan; Zhang, Hao; Xing, Yubin; Lambert-Shirzad, Navid; Jiang, Zhexin; Lee, Roy Ka-Wei; Li, Zhi; Ko, Seok-Bum

doi:10.1007/s10032-021-00391-3

Segmentation for document layout analysis: not dead yet

Original Paper
Published: 13 January 2022

Volume 25, pages 67–77, (2022)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Logan Markewich¹,
Hao Zhang¹,
Yubin Xing¹,
Navid Lambert-Shirzad²,
Zhexin Jiang²,
Roy Ka-Wei Lee¹,
Zhi Li¹ &
…
Seok-Bum Ko ORCID: orcid.org/0000-0002-9287-317X¹

1238 Accesses
11 Citations
6 Altmetric
1 Mention
Explore all metrics

Abstract

Document layout analysis is often the first task in document understanding systems, where a document is broken down into identifiable sections. One of the most common approaches to this task is image segmentation, where each pixel in a document image is classified. However, this task is challenging because as the number of classes increases, small and infrequent objects often get missed. In this paper, we propose a weighted bounding box regression loss methodology to improve accuracy for segmentation of document layouts, while demonstrating our results on our dense article dataset (DAD) and the existing PubLayNet dataset. First, we collect and annotate 43 document object classes across 450 open access research articles, constructing DAD. After benchmarking several segmentation networks, we achieve an F1 score of 96.26% on DAD and 97.11% on PubLayNet with DeeplabV3+, while also showing a bounding box regression method for segmentation results that improves the F1 by +1.99 points on DAD. Finally, we demonstrate the networks trained on DAD can be used as a bootstrapped annotation tool for the existing document layout datasets, decreasing annotation time by 38% with DeeplabV3+.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

U-Net: Convolutional Networks for Biomedical Image Segmentation

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Availability of Data and Materials

Our dataset is currently hosted in a public GitHub repository, located at https://github.com/LivingSkyTechnologies/Dense_Article_Dataset_DAD.

Code availability

The code to create and train all models detailed in this work is available in a public GitHub repository, located at https://github.com/LivingSkyTechnologies/Document_Layout_Segmentation.

Notes

References

Ares Oliveira, S., Seguin, B., Kaplan, F.: dhSegment: A generic deep-learning approach for document segmentation. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 7–12 (2018)
Baechler, M., Liwicki, M., Ingold, R.: Text line extraction using DMLP classifiers for historical manuscripts. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1029–1033 (2013)
Bateman, J., Deliny, J., Henschelz, R.: XML and multimodal corpus design: experiences with multi-layered stand-off annotations in the GeM corpus. In: LREC’02 Workshop: Towards a Roadmap for Multimodal Language Resources and Evaluation, pp. 7–14. Canary Islands, Spain (2002)
Binmakhashen, G.M., Mahmoud, S.A.: Document layout analysis: a comprehensive survey. ACM Comput. Surv. 52(6), 109:1–109:36 (2019)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. ArXiv preprint arXiv:2004.10934 (2020)
Borges Oliveira, D.A., Viana, M.P.: Fast CNN-based document layout analysis. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 1173–1180 (2017)
Capobianco, S., Scommegna, L., Marinai, S.: Historical handwritten document segmentation by using a weighted loss. In: 2018 Artificial Neural Networks in Pattern Recognition (ANNPR2018), 395–406 (2018)
Chen, K., Liu, C., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation for historical document images based on superpixel classification with unsupervised feature learning. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 299–304 (2016)
Chen, K., Seuret, M., Hennebert, J., Ingold, R.: Convolutional neural networks for page segmentation of historical document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 965–970 (2017)
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp. 801–818 (2018)
Constantin, A., Pettifer, S., Voronkov, A.: PDFX: fully-automated PDF-to-XML conversion of scientific literature. In: Proceedings of the 2013 ACM symposium on Document engineering - DocEng ’13, p. 177. ACM Press, Florence, Italy (2013). https://doi.org/10.1145/2494266.2494271
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)
Dong, R., Pan, X., Li, F.: DenseU-net-based semantic segmentation of small objects in urban remote sensing images. IEEE Access 7, 65347–65356 (2019)
Article Google Scholar
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. ArXiv preprint arXiv:1411.4734 (2015)
Gruning, T., Leifert, G., Straub, T., Michael, J., Labahn, R.: A two-stage method for text line detection in historical documents. Int. J. Doc. Anal. Recogn. 22, 285–302 (2019)
Article Google Scholar
Hadjar, K., Ingold, R.: Physical layout analysis of complex structured Arabic documents using artificial neural nets. In: 2004 International Workshop on Document Analysis Systems (DAS2004), pp. 170–178 (2004)
He, D., Cohen, S., Price, B., Kifer, D., Giles, C.L.: Multi-scale multi-task FCN for semantic page segmentation and table detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 254–261 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. ArXiv preprint arXiv:1512.03385 (2015)
Li, M., Xu, Y., Cui, L., Huang, S., Wei, F., Li, Z., Zhou, M.: DocBank: A benchmark dataset for document layout analysis. ArXiv preprint arXiv:2006.01038 (2020)
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. ArXiv preprint arXiv:1708.02002 (2017)
Marinai, S., Gori, M., Soda, G.: Artificial neural networks for document analysis and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 23–35 (2005)
Article Google Scholar
Markewich, L., Xing, Y., Zhang, H., Jiang, Z., Lambert-Shirzad, N., Lee, R., Li, Z., Ko, S.: Document structure extraction: An exploratory study. In: Fourth International Workshop on SCIentific DOCument Analysis (SCIDOCA2020) (2020)
Mehri, M., Nayef, N., Heroux, P., Gomez-Kramer, P., Mullot, R.: Learning texture features for enhancement and segmentation of historical document images. In: Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing (HIP’15), pp. 47–54 (2015)
O’Gorman, L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1162–1173 (1993)
Article Google Scholar
Quiros, L.: Multi-task handwritten document layout analysis. ArXiv preprint arXiv:1806.08852 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems 28 (NIPS2015), pp. 91–99 (2015)
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomedical image segmentation. In: 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI2015), pp. 234–241 (2015)
Soto, C., Yoo, S.: Visual detection with context for document layout analysis. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3464–3470 (2019)
Takikawa, T., Acuna, D., Jampani, V., Fidler, S.: Gated-scnn: Gated shape CNNs for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5229–5238 (2019)
Wick, C., Puppe, F.: Fully convolutional neural networks for page segmentation of historical document images. ArXiv preprint arXiv:1711.07695 (2017)
Wu, H., Zhang, J., Huang, K., Liang, K., Yu, Y.: Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation (2019)
Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Giles, C.L.: learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4342–4351 (2017)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890 (2017)
Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE (2019). https://doi.org/10.1109/ICDAR.2019.00166

Download references

Acknowledgements

This work was supported by the Mitacs Accelerate research program [IT17283, An Automated System to Identify and Extract Key Structural Components in Academic Written Texts or Genres] with Living Sky Technologies Inc., Canada.

Funding

This work was supported by the Mitacs Accelerate research program [IT17283, An Automated System to Identify and Extract Key Structural Components in Academic Written Texts or Genres] with Living Sky Technologies Inc., Canada.

Author information

Authors and Affiliations

University of Saskatchewan, Saskatoon, Canada
Logan Markewich, Hao Zhang, Yubin Xing, Roy Ka-Wei Lee, Zhi Li & Seok-Bum Ko
Living Sky Technologies Inc., Saskatoon, Canada
Navid Lambert-Shirzad & Zhexin Jiang

Authors

Logan Markewich
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yubin Xing
View author publications
You can also search for this author in PubMed Google Scholar
Navid Lambert-Shirzad
View author publications
You can also search for this author in PubMed Google Scholar
Zhexin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Roy Ka-Wei Lee
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Li
View author publications
You can also search for this author in PubMed Google Scholar
Seok-Bum Ko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seok-Bum Ko.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Markewich, L., Zhang, H., Xing, Y. et al. Segmentation for document layout analysis: not dead yet. IJDAR 25, 67–77 (2022). https://doi.org/10.1007/s10032-021-00391-3

Download citation

Received: 11 February 2021
Revised: 08 November 2021
Accepted: 16 November 2021
Published: 13 January 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s10032-021-00391-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Segmentation for document layout analysis: not dead yet

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Availability of Data and Materials

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Segmentation for document layout analysis: not dead yet

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Availability of Data and Materials

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation