research-article

Layout-aware Single-image Document Flattening

Authors:

Pu Li,

Weize Quan,

Jianwei Guo,

Dong-Ming YanAuthors Info & Claims

ACM Transactions on Graphics, Volume 43, Issue 1

Article No.: 9, Pages 1 - 17

https://doi.org/10.1145/3627818

Published: 02 November 2023 Publication History

Get Access

Abstract

Single image rectification of document deformation is a challenging task. Although some recent deep learning-based methods have attempted to solve this problem, they cannot achieve satisfactory results when dealing with document images with complex deformations. In this article, we propose a new efficient framework for document flattening. Our main insight is that most layout primitives in a document have rectangular outline shapes, making unwarping local layout primitives essentially homogeneous with unwarping the entire document. The former task is clearly more straightforward to solve than the latter due to the more consistent texture and relatively smooth deformation. On this basis, we propose a layout-aware deep model working in a divide-and-conquer manner. First, we employ a transformer-based segmentation module to obtain the layout information of the input document. Then a new regression module is applied to predict the global and local UV maps. Finally, we design an effective merging algorithm to correct the global prediction with local details. Both quantitative and qualitative experimental results demonstrate that our framework achieves favorable performance against state-of-the-art methods. In addition, the current publicly available document flattening datasets have limited 3D paper shapes without layout annotation and also lack a general geometric correction metric. Therefore, we build a new large-scale synthetic dataset by utilizing a fully automatic rendering method to generate deformed documents with diverse shapes and exact layout segmentation labels. We also propose a new geometric correction metric based on our paired document UV maps. Code and dataset will be released at https://github.com/BunnySoCrazy/LA-DocFlatten.

Supplementary Material

3627818-supp (3627818-supp.pdf)

Supplementary material

Download
44.38 KB

References

[1]

Md Amirul Islam, Mrigank Rochan, Neil D. B. Bruce, and Yang Wang. 2017. Gated feedback refinement network for dense image labeling. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 3751–3759.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

A Deep Learning-Based System for Document Layout Analysis

DIG: Complex Layout Document Image Generation with Authentic-looking Text for Enhancing Layout Analysis

Doc-DINO: A Transformer Model for Complex Logical Document Layout Analysis

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Full Text

Share

Share this Publication link

Share on social media

Affiliations