Abstract
Page segmentation of document images remains a challenge due to complex layout and heterogeneous image contents. Existing deep learning based methods usually follow the general semantic segmentation or object detection frameworks, without plentiful exploration of document image characteristics. In this paper, we propose an effective method for page segmentation using convolutional neural network (CNN) and graphical model, where the CNN is powerful for extracting visual features and the graphical model explores the relationship (spatial context) between visual primitives and regions. A page image is represented as a graph whose nodes represent the primitives and edges represent the relationships between neighboring primitives. We consider two types of graphical models: graph attention network (GAT) and conditional random field (CRF). Using a convolutional feature pyramid network (FPN) for feature extraction, its parameters can be estimated jointly with the GAT. The CRF can be used for joint prediction of primitive labels, and combined with the CNN and GAT. Experimental results on the PubLayNet dataset show that our method can extract various page regions with precise boundaries. The comparison of different configurations show that GAT improves the performance when using shallow backbone CNN, but the improvement with deep backbone CNN is not evident, while CRF is always effective to improve, even when combining on top of GAT.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Shafait, F., Keysers, D., Breuel, T.: Performance evaluation and benchmarking of six-page segmentation algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 30, 941–954 (2008)
Li, X.-H., Yin, F., Liu, C.-L.: Page object detection from pdf document images by deep structured prediction and supervised clustering. In: ICPR, pp. 3627–3632 (2018)
Meier, B., Stadelmann, T., Stampfli, J., Arnold, M., Cieliebak, M.: Fully convolutional neural networks for newspaper article segmentation. In: ICDAR, pp. 414–419 (2017)
He, D., Cohen, S., Price, B., Kifer, D., Lee Giles, C.: Multi-scale multi-task FCN for semantic page segmentation and table detection. In: ICDAR, pp. 254–261 (2017)
Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Lee Giles, C.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: CVPR, pp. 5315–5324 (2017)
Li, X.-H., Yin, F., Xue, T., Liu, L., Ogier, J.-M., Liu, C.-L.: Instance aware document image segmentation using label pyramid networks and deep watershed transformation. In: ICDAR, pp. 514–519 (2019)
Huang, Y., et al.: A yolo-based table detection method. In: ICDAR, pp. 813–818 (2019)
Sun, N., Zhu, Y., Hu, X.: Faster R-CNN based table detection combining corner locating. In: ICDAR, pp. 1314–1319 (2019)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Saha, R., Mondal, A., Jawahar, CV.: Graphical object detection in document images. In: ICDAR, pp. 51–58 (2019)
Zhong, X., Tang, J., Yepes, A.J.: PubLayNet: largest dataset ever for document layout analysis. In: ICDAR. (2019)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)
Yi, X., Gao, L., Liao, Y., Zhang, X., Liu, R., Jiang, Z.: CNN based page object detection in document images. In: ICDAR, pp. 230–235 (2017)
Dai, B., Zhang, Y., Lin, D.: Detecting visual relationships with deep relational networks. In: CVPR, pp. 3076–3086 (2017)
Davis, B., Morse, B., Cohen, S., Price, B., Tensmeyer, C.: Deep visual template-free form parsing. In: ICDAR (2019)
Mahdavi, M., Condon, M., Davila, K., Zanibbi, R.: LPGA: line-of-sight parsing with graph-based attention for math formula recognition. In: ICDAR (2019)
Zhou, J., et al.: Graph neural networks: a review of methods and applications. arXiv:1812.08434 (2018)
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv:1609.02907 (2016)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv:1710.10903 (2017)
Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Ye, J.-Y., Zhang, Y.-M., Yang, Q., Liu, C.-L.: Contextual stroke classification in online handwritten documents with graph attention networks. In: ICDAR, pp. 993–998 (2019)
Qasim, S.R., Mahmood, H., Shafait, F.: Rethinking table recognition using graph neural networks. In: ICDAR, pp. 142–147 (2019)
Ye, J.-Y., Zhang, Y.-M., Liu, C.-L.: Joint training of conditional random fields and neural networks for stroke classification in online handwritten documents. In: ICPR, pp. 3264–3269 (2016)
Li, X.-H., Yin, F., Liu, C.-L.: Printed/handwritten texts and graphics separation in complex documents using conditional random fields. In: DAS, pp. 145–150 (2018)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Acknowledgments
This work has been supported by National Natural Science Foundation of China (NSFC) Grants 61733007, 61573355 and 61721004.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, XH., Yin, F., Liu, CL. (2020). Page Segmentation Using Convolutional Neural Network and Graphical Model. In: Bai, X., Karatzas, D., Lopresti, D. (eds) Document Analysis Systems. DAS 2020. Lecture Notes in Computer Science(), vol 12116. Springer, Cham. https://doi.org/10.1007/978-3-030-57058-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-57058-3_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57057-6
Online ISBN: 978-3-030-57058-3
eBook Packages: Computer ScienceComputer Science (R0)