Skip to main content
Log in

Segmentation for document layout analysis: not dead yet

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Document layout analysis is often the first task in document understanding systems, where a document is broken down into identifiable sections. One of the most common approaches to this task is image segmentation, where each pixel in a document image is classified. However, this task is challenging because as the number of classes increases, small and infrequent objects often get missed. In this paper, we propose a weighted bounding box regression loss methodology to improve accuracy for segmentation of document layouts, while demonstrating our results on our dense article dataset (DAD) and the existing PubLayNet dataset. First, we collect and annotate 43 document object classes across 450 open access research articles, constructing DAD. After benchmarking several segmentation networks, we achieve an F1 score of 96.26% on DAD and 97.11% on PubLayNet with DeeplabV3+, while also showing a bounding box regression method for segmentation results that improves the F1 by +1.99 points on DAD. Finally, we demonstrate the networks trained on DAD can be used as a bootstrapped annotation tool for the existing document layout datasets, decreasing annotation time by 38% with DeeplabV3+.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Availability of Data and Materials

Our dataset is currently hosted in a public GitHub repository, located at https://github.com/LivingSkyTechnologies/Dense_Article_Dataset_DAD.

Code availability

The code to create and train all models detailed in this work is available in a public GitHub repository, located at https://github.com/LivingSkyTechnologies/Document_Layout_Segmentation.

Notes

  1. https://github.com/LivingSkyTechnologies/Dense_Article_Dataset_DAD.

  2. https://github.com/LivingSkyTechnologies/Document_Layout_Segmentation.

  3. https://github.com/microsoft/VoTT.

  4. https://github.com/kermitt2/grobid.

  5. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html.

References

  1. Ares Oliveira, S., Seguin, B., Kaplan, F.: dhSegment: A generic deep-learning approach for document segmentation. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 7–12 (2018)

  2. Baechler, M., Liwicki, M., Ingold, R.: Text line extraction using DMLP classifiers for historical manuscripts. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1029–1033 (2013)

  3. Bateman, J., Deliny, J., Henschelz, R.: XML and multimodal corpus design: experiences with multi-layered stand-off annotations in the GeM corpus. In: LREC’02 Workshop: Towards a Roadmap for Multimodal Language Resources and Evaluation, pp. 7–14. Canary Islands, Spain (2002)

  4. Binmakhashen, G.M., Mahmoud, S.A.: Document layout analysis: a comprehensive survey. ACM Comput. Surv. 52(6), 109:1–109:36 (2019)

  5. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. ArXiv preprint arXiv:2004.10934 (2020)

  6. Borges Oliveira, D.A., Viana, M.P.: Fast CNN-based document layout analysis. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 1173–1180 (2017)

  7. Capobianco, S., Scommegna, L., Marinai, S.: Historical handwritten document segmentation by using a weighted loss. In: 2018 Artificial Neural Networks in Pattern Recognition (ANNPR2018), 395–406 (2018)

  8. Chen, K., Liu, C., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation for historical document images based on superpixel classification with unsupervised feature learning. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 299–304 (2016)

  9. Chen, K., Seuret, M., Hennebert, J., Ingold, R.: Convolutional neural networks for page segmentation of historical document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 965–970 (2017)

  10. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp. 801–818 (2018)

  11. Constantin, A., Pettifer, S., Voronkov, A.: PDFX: fully-automated PDF-to-XML conversion of scientific literature. In: Proceedings of the 2013 ACM symposium on Document engineering - DocEng ’13, p. 177. ACM Press, Florence, Italy (2013). https://doi.org/10.1145/2494266.2494271

  12. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)

  13. Dong, R., Pan, X., Li, F.: DenseU-net-based semantic segmentation of small objects in urban remote sensing images. IEEE Access 7, 65347–65356 (2019)

    Article  Google Scholar 

  14. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. ArXiv preprint arXiv:1411.4734 (2015)

  15. Gruning, T., Leifert, G., Straub, T., Michael, J., Labahn, R.: A two-stage method for text line detection in historical documents. Int. J. Doc. Anal. Recogn. 22, 285–302 (2019)

    Article  Google Scholar 

  16. Hadjar, K., Ingold, R.: Physical layout analysis of complex structured Arabic documents using artificial neural nets. In: 2004 International Workshop on Document Analysis Systems (DAS2004), pp. 170–178 (2004)

  17. He, D., Cohen, S., Price, B., Kifer, D., Giles, C.L.: Multi-scale multi-task FCN for semantic page segmentation and table detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 254–261 (2017)

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. ArXiv preprint arXiv:1512.03385 (2015)

  19. Li, M., Xu, Y., Cui, L., Huang, S., Wei, F., Li, Z., Zhou, M.: DocBank: A benchmark dataset for document layout analysis. ArXiv preprint arXiv:2006.01038 (2020)

  20. Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. ArXiv preprint arXiv:1708.02002 (2017)

  21. Marinai, S., Gori, M., Soda, G.: Artificial neural networks for document analysis and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 23–35 (2005)

    Article  Google Scholar 

  22. Markewich, L., Xing, Y., Zhang, H., Jiang, Z., Lambert-Shirzad, N., Lee, R., Li, Z., Ko, S.: Document structure extraction: An exploratory study. In: Fourth International Workshop on SCIentific DOCument Analysis (SCIDOCA2020) (2020)

  23. Mehri, M., Nayef, N., Heroux, P., Gomez-Kramer, P., Mullot, R.: Learning texture features for enhancement and segmentation of historical document images. In: Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing (HIP’15), pp. 47–54 (2015)

  24. O’Gorman, L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1162–1173 (1993)

    Article  Google Scholar 

  25. Quiros, L.: Multi-task handwritten document layout analysis. ArXiv preprint arXiv:1806.08852 (2018)

  26. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems 28 (NIPS2015), pp. 91–99 (2015)

  27. Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)

  28. Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomedical image segmentation. In: 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI2015), pp. 234–241 (2015)

  29. Soto, C., Yoo, S.: Visual detection with context for document layout analysis. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3464–3470 (2019)

  30. Takikawa, T., Acuna, D., Jampani, V., Fidler, S.: Gated-scnn: Gated shape CNNs for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5229–5238 (2019)

  31. Wick, C., Puppe, F.: Fully convolutional neural networks for page segmentation of historical document images. ArXiv preprint arXiv:1711.07695 (2017)

  32. Wu, H., Zhang, J., Huang, K., Liang, K., Yu, Y.: Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation (2019)

  33. Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Giles, C.L.: learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4342–4351 (2017)

  34. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890 (2017)

  35. Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE (2019). https://doi.org/10.1109/ICDAR.2019.00166

Download references

Acknowledgements

This work was supported by the Mitacs Accelerate research program [IT17283, An Automated System to Identify and Extract Key Structural Components in Academic Written Texts or Genres] with Living Sky Technologies Inc., Canada.

Funding

This work was supported by the Mitacs Accelerate research program [IT17283, An Automated System to Identify and Extract Key Structural Components in Academic Written Texts or Genres] with Living Sky Technologies Inc., Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seok-Bum Ko.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Markewich, L., Zhang, H., Xing, Y. et al. Segmentation for document layout analysis: not dead yet. IJDAR 25, 67–77 (2022). https://doi.org/10.1007/s10032-021-00391-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-021-00391-3

Keywords

Navigation