BPFormNet: a lightweight block pyramid network for form segmentation and classification

Lin, Hanyang; Zhan, Yongzhao; Wu, Chongshu

doi:10.1007/s10032-023-00440-z

BPFormNet: a lightweight block pyramid network for form segmentation and classification

Original Paper
Published: 08 June 2023

Volume 27, pages 1–17, (2024)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Hanyang Lin^1,2,
Yongzhao Zhan¹ &
Chongshu Wu²

278 Accesses
Explore all metrics

Abstract

Business forms with the dense text boxes have a complicated layout, diverse content, and low quality. It is a challenging task for the existing methods of form understanding to recognize form structure and to meet the requirements in real-time application scenarios. In this paper, we propose a novel multi-task lightweight block pyramid network for form segmentation and classification, named BPFormNet. According to the characteristics of the form images, we exploit the multi-scale pyramidal feature hierarchy of CNN (Convolutional Neural Network) to construct a multi-level, multi-scale block pyramid, which consists of the low-level, mid-level, and high-level convolutional blocks designed for the corresponding feature layer, and builds the semantic feature maps of multi-scale effective fusion at every level. BPFromNet leverages the interdependence between the twin task of segmentation task of form frames and classification task to improve the performance of the classification under the training strategy of small samples. Furthermore, BPFormNet performs comprehensive lightweighting from three levels: multi-level, multi-scale convolutional block combination, multi-size kernel combination, and disassembly of kernels. Experimental results on the collected image dataset of Chinese insurance forms (CIF) show that BPFormNet with the block pyramid has a strong capability of form feature representation. Comparing with some state-of-the-art (SOTA) lightweight models and their combinations, BPFormNet achieves a better performance in both segmentation and classification task than the models of single block, significantly reduces the model complexity while maintaining the model accuracy, and provides the real-time, high-quality results of form structure recognition for the downstream task of text recognition and information extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Towards Combining Object Detection and Text Classification Models for Form Entity Recognition

Improved Plate Defect Detection Algorithm Based on YOLOv5

Understanding contents of filled-in Bangla form images

Article 22 September 2020

Data availability

The datasets generated during and analyzed during the current study are available in the [GitHub] repository, [https://github.com/HansonLinn/Insurance-Forms-Understanding-Framework/tree/main/Form%20datasets].

References

Adam P., Abhishek C., Sangpil K., Eugenio C.: ENet: a deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147 (2016)
Andrew H., Ruoming P., Hartwig A., Quoc V. L., Mark S., Bo C., Weijun W., Liang-Chieh C., Mingxing T., Grace C., Vijay V., Yukun Z.: Searching for MobileNetV3. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314–1324 (2019)
Brian L. D., Bryan S. M., Scott C., Brian L. P., Chris T.: Deep visual template-free form parsing. International Conference on Document Analysis and Recognition (ICDAR), pp. 134–141 (2019)
Cesarini F., Marinai S, Sarti L., Soda G.: Trainable table location in document images. International Conference on Pattern Recognition, pp. 236–240 (2002)
Changqian Y., Jingbo W., Chao P., Changxin G., Gang Y., Nong S.: BiSeNet: Bilateral segmentation network for real-time semanticsegmentation. European Conference of Computer Vision (ECCV), pp. 334–349 (2018)
Chen, J., Chen, Y.S., Li, W.H., et al.: Image co-segmentation based on pyramid features cross-correlation network. Sci China Inf Sci 66(1), 119101 (2023)
Article Google Scholar
Christian S., Vincent V., Sergey I., Jonathon S., Zbigniew W.: Rethinking the inception architecture for computer vision. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826 (2016)
Dang T., Hoang D., Tran Q., et al.: End-to-End Hierarchical Relation Extraction for Generic Form Understanding. 2020 25th International Conference on Pattern Recognition (ICPR). pp.5238–5245 (2021)
Devashish P., Ayan G., Kshitij K., et al.: CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 572–573 (2020)
Dominika, T., Pawel, S., Mateusz, F., Piotr, J.D., Lukasz, B.: CERMINE: automatic extraction of structured metadata from scientific literature. Int. J. Document Anal. Recognit. 18(4), 317–335 (2015)
Article Google Scholar
Eduardo, R., José, M.Á., Luis, M.B., Roberto, A.: ERFNet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2018)
Article Google Scholar
Gilani A., Qasim S R., Malik I., Shafait F.: Table detection using deep learning. International Conference on Document Analysis and Recognition (ICDAR) pp. 771–776 (2017)
Hanchao L., Pengfei X., Haoqiang F., Jian S. DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9522–9531 (2019)
Hao L., Gao L., Yi X., Tang Z.: A table detection method for pdf documents based on convolutional neural networks. IAPR Workshop on Document Analysis Systems, pp. 287–292 (2016)
He K., Zhang X., Ren S., Sun J.: Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Jaume G., Ekenel H K., Thiran J P.:FUNSD: A dataset for form understanding in noisy scanned documents. International Conference on Document Analysis and Recognition Workshops (ICDARW), pp. 1–6 (2019)
Johan, F., Murat, S., Burak, K., Shahzad, K.: TableDet: an end-to-end deep learning approach for table detection and table image classification in data sheet images. Neurocomputing 468, 317–334 (2022)
Article Google Scholar
Karen S., Andrew Z.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)
Kasar T., Barlas P., Adam S., Chatelain C., Paquet T.: Learning to detect tables in scanned document images using line information. In: 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1185–1189 (2013)
Lecun, Y., Bottou, L., Bengio, Y., Patrick, H.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Li C., Bi B., Yan M., Wang W., Huang S., Huang F., Si L.: StructuralLM: Structural pre-training for form understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021: 6309–6318 (2021)
Li M., Cui L., Huang S., Wei F., Zhou M., Li Z.: TableBank: table benchmark for image-based table detection and recognition. In: 12th Language Resources and Evaluation Conference, pp. 1918–1925 (2019)
Li M., Xu Y., Cui L., Huang S., Wei F., Li Z., Zhou M.:. DocBank: A benchmark dataset for document layout analysis. In: International Conference on Computational Linguistics, pp. 949–960 (2020)
Liang Q., Zaisheng L., Zhanzhan C., Peng Z., Shiliang P., Yi N., Wenqi R., Wenming T., Fei W.: LGPMA: Complicated table structure recognition with local and global pyramid mask alignment. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 99–114 (2021)
Liu, L., Wang, Z., Qiu, T., Chen, Q., Suen, C.Y.: Document image classification: progress over two decades. Neurocomputing 453, 223–240 (2021)
Article Google Scholar
Long J., Shelhamer E., Darrell T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015)
Mark S., Andrew G H., Menglong Z., Andrey Z., Liang-Chieh C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520 (2018)
Matthew D Z., Rob F.: Visualizing and understanding convolutional networks. In: European Conference of Computer Vision (ECCV), pp. 818–833 (2014)
Mohammad H., Hamed M.: Document image classification using SqueezeNet convolutional neural network. In: 5th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), pp. 1–4 (2019)
Nikola, M., Cassie, G., Robert, H., Goran, N.: A framework for information extraction from tables in biomedical literature. Int. J. Doc. Anal. Recognit. 22(1), 55–78 (2019)
Article Google Scholar
Nishant S., Alexandre M., Malcolm G., Adrian L.: A survey of deep learning approaches for OCR and document understanding, arXiv:2011.13534 (2021)
Olaf R., Philipp F., Thomas B.: U-Net: Convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention (MICCAI), pp. 234–241 (2015)
Paliwal SS., Vishwanath D., Rahul R., Sharma M., Vig L.: TableNet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In: International conference on document analysis and recognition (ICDAR), pp. 128–133 (2019)
Rudra P. K. P., Stephan L., Roberto C.: Fast-SCNN: fast semantic segmentation network. In: British machine vision conference, pp. 289 (2019)
Sanghyeon A., Minjun L., Sanglee P., Heerin Y., Jungmin S.: An ensemble of simple convolutional neural network models for MNIST digit recognition. arXiv:2008.10400 (2020)
Schreiber S., Agne S., Wolf I., Dengel A., Ahmed S.: DeepDeSRT: deep learning for detection and structure recognition of tables in document images. In: International conference on document analysis and recognition (ICDAR), pp. 1162–1167 (2017)
Siddiqui SA., Pervaiz IK., Andreas D., Sheraz A.: Rethinking semantic segmentation for table structure recognition in forms. In: International conference on document analysis and recognition (ICDAR), pp. 1397–1402 (2019)
Taha E., Hossam E A E M., Hazem M A.: LiteSeg: a novel lightweight convnet for semantic segmentation. In: Digital image computing: techniques and applications, pp. 1–7 (2019)
Thomas K., Andreas D.: The T-Recs table recognition and analysis system. In: Document analysis systems, pp. 255–269 (1998)
Tsung-Yi, L., Michael, M., Serge, J. B., James, H., Pietro, P., Deva, R., Piotr, D.C., Lawrence Z.: Microsoft COCO: common objects in context. In: European conference on computer vision (ECCV) pp. 740–755 (2014)
Tsung-Yi, L., Piotr, D., Ross, B.G., Kaiming, H., Bharath, H., Serge, J.B.: Feature pyramid networks for object detection. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 936–944 (2017)
Xiaohan, D., Xiangyu, Z., Ningning, M., Jungong, H., Guiguang, D., Jian, S.: RepVGG: making VGG-style convnets great again. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 13733–13742 (2021)
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: pre-training of text and layout for document image understanding. In: Conference on Knowledge Discovery and Data Mining (KDD), pp. 1192–1200 (2020)
Xu, Y., Lv, T., Cui, L., Wang, G., Lu, Y., Florêncio, D., Zhang, C., Wei, F.:LayoutXLM: multimodal pre-training for multilingual visually-rich document understanding. arXiv preprint arXiv: 2104.08836 (2021)
Yang, X., Yiheng, X., Tengchao, L., Lei, C., Furu, W., Guoxin, W., Yijuan, L., Dinei, A. F. F., Cha, Z., Wanxiang, C., Min, Z., Lidong, Z.: LayoutLMv2: multi-modal pre-training for visually-rich document understanding. In: The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP 2021), pp. 2579–2591 (2021)
Zhang X., Zhou X., Lin M., Sun J.: ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6848–6856 (2018)

Download references

Acknowledgements

This work was supported in part by Jiangsu Provincial Department of Science and Technology of China (Grant No. BE2020099).

Funding

This work was supported in part by Jiangsu Provincial Department of Science and Technology of China (Grant No. BE2020099).

Author information

Authors and Affiliations

School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, 212000, Jiangsu, China
Hanyang Lin & Yongzhao Zhan
Jiangsu Start Dima Data Processing Co., Ltd., Kunshan, 215300, Jiangsu, China
Hanyang Lin & Chongshu Wu

Authors

Hanyang Lin
View author publications
You can also search for this author inPubMed Google Scholar
Yongzhao Zhan
View author publications
You can also search for this author inPubMed Google Scholar
Chongshu Wu
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

HL was contributed to conceptualization, methodology, software, investigation, formal analysis, funding acquisition, validation, writing—original draft; YZ was contributed to conceptualization, resources, supervision, writing—review and editing; CW was contributed to data curation, visualization, writing—editing. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hanyang Lin.

Ethics declarations

Conflict of interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lin, H., Zhan, Y. & Wu, C. BPFormNet: a lightweight block pyramid network for form segmentation and classification. IJDAR 27, 1–17 (2024). https://doi.org/10.1007/s10032-023-00440-z

Download citation

Received: 05 September 2022
Revised: 12 April 2023
Accepted: 09 May 2023
Published: 08 June 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10032-023-00440-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BPFormNet: a lightweight block pyramid network for form segmentation and classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Towards Combining Object Detection and Text Classification Models for Form Entity Recognition

Improved Plate Defect Detection Algorithm Based on YOLOv5

Understanding contents of filled-in Bangla form images

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now