skip to main content
research-article

Layout-aware Single-image Document Flattening

Authors Info & Claims
Published:02 November 2023Publication History
Skip Abstract Section

Abstract

Single image rectification of document deformation is a challenging task. Although some recent deep learning-based methods have attempted to solve this problem, they cannot achieve satisfactory results when dealing with document images with complex deformations. In this article, we propose a new efficient framework for document flattening. Our main insight is that most layout primitives in a document have rectangular outline shapes, making unwarping local layout primitives essentially homogeneous with unwarping the entire document. The former task is clearly more straightforward to solve than the latter due to the more consistent texture and relatively smooth deformation. On this basis, we propose a layout-aware deep model working in a divide-and-conquer manner. First, we employ a transformer-based segmentation module to obtain the layout information of the input document. Then a new regression module is applied to predict the global and local UV maps. Finally, we design an effective merging algorithm to correct the global prediction with local details. Both quantitative and qualitative experimental results demonstrate that our framework achieves favorable performance against state-of-the-art methods. In addition, the current publicly available document flattening datasets have limited 3D paper shapes without layout annotation and also lack a general geometric correction metric. Therefore, we build a new large-scale synthetic dataset by utilizing a fully automatic rendering method to generate deformed documents with diverse shapes and exact layout segmentation labels. We also propose a new geometric correction metric based on our paired document UV maps. Code and dataset will be released at https://github.com/BunnySoCrazy/LA-DocFlatten.

Skip Supplemental Material Section

Supplemental Material

REFERENCES

  1. Islam Md Amirul, Rochan Mrigank, Bruce Neil D. B., and Wang Yang. 2017. Gated feedback refinement network for dense image labeling. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 37513759.Google ScholarGoogle ScholarCross RefCross Ref
  2. Binmakhashen Galal M. and Mahmoud Sabri A.. 2019. Document layout analysis: A comprehensive survey. ACM Comput. Surv. 52, 6 (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Oliveira Dário Augusto Borges and Viana Matheus Palhares. 2017. Fast CNN-based document layout analysis. In International Conference on Computer Vision Workshop. 11731180.Google ScholarGoogle Scholar
  4. Brown Michael S. and Seales W. Brent. 2001. Document restoration using 3D shape: A general deskewing algorithm for arbitrarily warped documents. In IEEE International Conference on Computer Vision (ICCV’01), Vol. 2. 367374.Google ScholarGoogle ScholarCross RefCross Ref
  5. Brown Michael S., Sun Mingxuan, Yang Ruigang, Yun Lin, and Seales W. Brent. 2007. Restoring 2D content from distorted documents. IEEE Trans. Pattern Anal. Mach. Intell. 29, 11 (2007), 19041916.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brown Michael S. and Tsoi Yau-Chat. 2006. Geometric and shading correction for images of printed materials using boundary. IEEE Trans. Image Process. 15, 6 (2006), 15441554.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Burden Alexander, Cote Melissa, and Albu Alexandra Branzan. 2019. Rectification of camera-captured document images with mixed contents and varied layouts. In IEEE Conference on Computer Robot Vision. 3340.Google ScholarGoogle ScholarCross RefCross Ref
  8. Cao Huaigu, Ding Xiaoqing, and Liu Changsong. 2003. A cylindrical surface model to rectify the bound document image. In IEEE International Conference on Computer Vision (ICCV’03), Vol. 1. 228233.Google ScholarGoogle Scholar
  9. Chen Lei, Liu Rui, Zhou Dongsheng, Yang Xin, and Zhang Qiang. 2020. Fused behavior recognition model based on attention mechanism. Vis. Comput. Industr., Biomed. Art 3, 1 (2020), 110.Google ScholarGoogle Scholar
  10. Courteille Frédéric, Crouzil Alain, Durou Jean-Denis, and Gurdjos Pierre. 2007. Shape from shading for the digitization of curved documents. Pattern Recog. 18 (2007), 301316.Google ScholarGoogle Scholar
  11. Das Sagnik, Ma Ke, Shu Zhixin, Samaras Dimitris, and Shilkrot Roy. 2019. DewarpNet: Single-image document unwarping with stacked 3D and 2D regression networks. In IEEE International Conference on Computer Vision (ICCV’19). 131140.Google ScholarGoogle ScholarCross RefCross Ref
  12. Das Sagnik, Mishra Gaurav, Sudharshana Akshay, and Shilkrot Roy. 2017. The common fold: Utilizing the four-fold to dewarp printed documents from a single image. In ACM Symposium on Document Engineering. 125128.Google ScholarGoogle Scholar
  13. Das Sagnik, Singh Kunwar Yashraj, Wu Jon, Bas Erhan, Mahadevan Vijay, Bhotika Rahul, and Samaras Dimitris. 2021. End-to-end piece-wise unwarping of document images. In IEEE/CVF International Conference on Computer Vision. 42684277.Google ScholarGoogle ScholarCross RefCross Ref
  14. Dasgupta Tanmoy, Das Nibaran, and Nasipuri Mita. 2020. Multistage curvilinear coordinate transform based document image dewarping using a novel quality estimator. CoRR abs/2003.06872 (2020).Google ScholarGoogle Scholar
  15. Davoudi Homa, Fiorucci Marco, and Traviglia Arianna. 2021. Ancient document layout analysis: Autoencoders meet sparse coding. In International Conference on Pattern Recognition. 59365942.Google ScholarGoogle ScholarCross RefCross Ref
  16. Ding Henghui, Jiang Xudong, Shuai Bing, Liu Ai Qun, and Wang Gang. 2020. Semantic segmentation with context encoding and multi-path decoding. IEEE Trans. Image Process. 29 (2020), 35203533.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dosovitskiy Alexey, Beyer Lucas, Kolesnikov Alexander, Weissenborn Dirk, Zhai Xiaohua, Unterthiner Thomas, Dehghani Mostafa, Minderer Matthias, Heigold Georg, Gelly Sylvain, Uszkoreit Jakob, and Houlsby Neil. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  18. Fawzi Mohamed, Rashwan Mohsen. A., Ahmed Hany, Samir Shaimaa, Abdou Sherif M., Al-Barhamtoshy Hassanin M., and Jambi Kamal M.. 2015. Rectification of camera captured document images for camera-based OCR technology. In IAPR International Conference on Document Analysis and Recognition. 12261230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Feng Hao, Shaokai Liu, Jiajun Deng, Wengang Zhou, and Houqiang Li. 2023. Deep unrestricted document image rectification. arXiv preprint arXiv:2304.08796.Google ScholarGoogle Scholar
  20. Feng Hao, Wang Yuechen, Zhou Wengang, Deng Jiajun, and Li Houqiang. 2021a. DocTr: Document image transformer for geometric unwarping and illumination correction. In ACM International Conference on Multimedia. 273281.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Feng Hao, Zhou Wengang, Deng Jiajun, Tian Qi, and Li Houqiang. 2021b. DocScanner: Robust document image rectification with progressive learning. CoRR abs/2110.14968 (2021).Google ScholarGoogle Scholar
  22. Feng Hao, Zhou Wengang, Deng Jiajun, Wang Yuechen, and Li Houqiang. 2022. Geometric representation learning for document image rectification. In European Conference on Computer Vision.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gardner Marc-André, Sunkavalli Kalyan, Yumer Ersin, Shen Xiaohui, Gambaretto Emiliano, Gagné Christian, and Lalonde Jean-François. 2017. Learning to predict indoor illumination from a single image. ACM Trans. Graph. 36, 6 (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. He Dafang, Cohen Scott, Price Brian, Kifer Daniel, and Giles C. Lee. 2017. Multi-scale multi-task FCN for semantic page segmentation and table detection. In IAPR International Conference on Document Analysis and Recognition, Vol. 01. 254261.Google ScholarGoogle ScholarCross RefCross Ref
  25. He Yuan, Pan Pan, Xie Shufu, Sun Jun, and Naoi Satoshi. 2013. A book dewarping system by boundary-based 3D surface reconstruction. In IAPR International Conference on Document Analysis and Recognition. 403407.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jiang Xiangwei, Long Rujiao, Xue Nan, Yang Zhibo, Yao Cong, and Xia Gui-Song. 2022. Revisiting document image dewarping by grid regularization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 45434552.Google ScholarGoogle ScholarCross RefCross Ref
  27. Kim Beom Su, Koo Hyung Il, and Cho Nam Ik. 2015. Document dewarping via text-line based optimization. Pattern Recog. 48, 11 (2015), 36003614.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kim Theodore, Thürey Nils, James Doug, and Gross Markus. 2008. Wavelet turbulence for fluid simulation. ACM Trans. Graph. 27, 3 (2008), 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kingma Diederik P. and Ba Jimmy. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  30. Koo Hyung Il and Cho Nam Ik. 2010. State estimation in a document image and its application in text block identification and text line extraction. In European Conference on Computer Vision (ECCV’10). 421434.Google ScholarGoogle ScholarCross RefCross Ref
  31. Lavialle Olivier, Molines X., Angella Franck, and Baylou Pierre. 2001. Active contours network to straighten distorted text lines. In IEEE International Conference on Image Processing, Vol. 3. 748751.Google ScholarGoogle ScholarCross RefCross Ref
  32. Li Xiaoyu, Zhang Bo, Liao Jing, and Sander Pedro V.. 2019. Document rectification and illumination correction using a patch-based CNN. ACM Trans. Graph. 38, 6 (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Liang Jian, DeMenthon Daniel, and Doermann David. 2008. Geometric rectification of camera-captured document images. IEEE Trans. Pattern Anal. Mach. Intell. 30, 4 (2008), 591605.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Lin Guosheng, Milan Anton, Shen Chunhua, and Reid Ian. 2017b. RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 19251934.Google ScholarGoogle ScholarCross RefCross Ref
  35. Lin Tsung-Yi, Dollár Piotr, Girshick Ross, He Kaiming, Hariharan Bharath, and Belongie Serge. 2017a. Feature pyramid networks for object detection. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’17). 21172125.Google ScholarGoogle ScholarCross RefCross Ref
  36. Liu Ce, Yuen Jenny, and Torralba Antonio. 2010. Sift flow: Dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33, 5 (2010), 978994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Liu Xiyan, Meng Gaofeng, Fan Bin, Xiang Shiming, and Pan Chunhong. 2020. Geometric rectification of document images using adversarial gated unwarping network. Pattern Recog. 108 (2020), 107576.Google ScholarGoogle ScholarCross RefCross Ref
  38. Long Jonathan, Shelhamer Evan, and Darrell Trevor. 2015. Fully convolutional networks for semantic segmentation. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’15). 34313440.Google ScholarGoogle ScholarCross RefCross Ref
  39. Ma Ke, Das Sagnik, Shu Zhixin, and Samaras Dimitris. 2022. Learning from documents in the wild to improve document unwarping. In ACM SIGGRAPH Conference Proceedings. 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ma Ke, Shu Zhixin, Bai Xue, Wang Jue, and Samaras Dimitris. 2018. DocUNet: Document image unwarping via a stacked u-net. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’18).Google ScholarGoogle ScholarCross RefCross Ref
  41. Markovitz Amir, Lavi Inbal, Perel Or, Mazor Shai, and Litman Roee. 2020. Can you read me now? Content aware rectification using angle supervision. In European Conference on Computer Vision (ECCV’20). 208223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Meng Gaofeng, Su Yuanqi, Wu Ying, Xiang Shiming, and Pan Chunhong. 2018. Exploiting vector fields for geometric rectification of distorted document images. In European Conference on Computer Vision (ECCV’18). 180195.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Meng Gaofeng, Wang Ying, Qu Shenquan, Xiang Shiming, and Pan Chunhong. 2014. Active flattening of curved document images via two structured beams. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’14). 38903897.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Mischke Lothar and Luther Wolfram. 2005. Document image de-warping based on detection of distorted text lines. In International Conference on Image Analysis Processing. 10681075.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Pfaff Tobias, Thuerey Nils, Cohen Jonathan, Tariq Sarah, and Gross Markus. 2010. Scalable fluid simulation using anisotropic turbulence particles. ACM Trans. Graph. 29, 6 (2010).Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Ranftl René, Bochkovskiy Alexey, and Koltun Vladlen. 2021. Vision transformers for dense prediction. In IEEE International Conference on Computer Vision (ICCV’21). 1217912188.Google ScholarGoogle ScholarCross RefCross Ref
  47. Salvi Dhaval, Zheng Kang, Zhou Youjie, and Wang Song. 2015. Distance transform based active contour approach for document image rectification. In IEEE Winter Conference on on Applications of Computer Vision. 757764.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Stamatopoulos Nikolaos, Gatos Basilis, Pratikakis Ioannis, and Perantonis Stavros J.. 2011. Goal-oriented rectification of camera-based document images. IEEE Trans. Image Process. 20, 4 (2011), 910920.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Sun Mingxuan, Yang Ruigang, Yun Lin, Landon G., Seales W. Brent, and Brown Michael S.. 2005. Geometric and photometric restoration of distorted documents. In IEEE International Conference on Computer Vision (ICCV’05), Vol. 2. 11171123.Google ScholarGoogle Scholar
  50. Takezawa Yusuke, Hasegawa Makoto, and Tabbone Salvatore. 2017. Robust perspective rectification of camera-captured document images. In IAPR International Conference on Document Analysis and Recognition, Vol. 06. 2732.Google ScholarGoogle ScholarCross RefCross Ref
  51. Tan Chew Lim, Zhang Li, Zhang Zheng, and Xia Tao. 2006. Restoring warped document images through 3D shape modeling. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2 (2006), 195208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Tian Yuandong and Narasimhan Srinivasa G.. 2011. Rectification and 3D reconstruction of curved document images. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’11). 377384.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Tsoi Yau-Chat and Brown Michael S.. 2007. Multi-view document rectification using boundary. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’07). 18.Google ScholarGoogle ScholarCross RefCross Ref
  54. Ulges Adrian, Lampert Christoph H., and Breuel Thomas M.. 2005. Document image dewarping using robust estimation of curled text lines. In IAPR International Conference on Document Analysis and Recognition, Vol. 2. 1001– 1005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Vaswani Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, California, 6000–6010.Google ScholarGoogle Scholar
  56. Wada Toshikazu, Ukida Hiroyuki, and Matsuyama Takashi. 1997. Shape from shading with interreflections under a proximal light source: Distortion-free copying of an unfolded book.Int. J. Comput. Vis. 24, 2 (1997), 125135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Wang Zhou, Simoncelli Eero P., and Bovik Alan C.. 2003. Multiscale structural similarity for image quality assessment. In 37th Asilomar Conference on Signals, Systems & Computers, Vol. 2. IEEE, 13981402.Google ScholarGoogle ScholarCross RefCross Ref
  58. Wu Xingjiao, Hu Ziling, Du Xiangcheng, Yang Jing, and He Liang. 2021. Document layout analysis via dynamic residual feature fusion. In International Conference on Multimedia and Expo. 16.Google ScholarGoogle ScholarCross RefCross Ref
  59. Xian Ke, Shen Chunhua, Cao Zhiguo, Lu Hao, Xiao Yang, Li Ruibo, and Luo Zhenbo. 2018. Monocular relative depth perception with web stereo data supervision. In IEEE Computer Vision and Pattern Recognition Conference (CVPR’18). 311320.Google ScholarGoogle ScholarCross RefCross Ref
  60. Xie Guo-Wang, Yin Fei, Zhang Xu-Yao, and Liu Cheng-Lin. 2021a. Dewarping document image by displacement flow estimation with fully convolutional network. CoRR abs/2104.06815 (2021).Google ScholarGoogle Scholar
  61. Xie Guo-Wang, Yin Fei, Zhang Xu-Yao, and Liu Cheng-Lin. 2021b. Document dewarping with control points. In International Conference on Document Analysis and Recognition. Springer, 466480.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. You Shaodi, Matsushita Yasuyuki, Sinha Sudipta, Bou Yusuke, and Ikeuchi Katsushi. 2018. Multiview rectification of folded documents. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2 (2018), 505511.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Zandifar Ali. 2007. Unwarping scanned image of Japanese/English documents. In International Conference on Image Analysis and Processing. 129136.Google ScholarGoogle ScholarCross RefCross Ref
  64. Zhang Jiaxin, Luo Canjie, Jin Lianwen, Guo Fengjun, and Ding Kai. 2022. Marior: Margin removal and iterative content rectification for document dewarping in the wild. arXiv preprint arXiv:2207.11515 (2022).Google ScholarGoogle Scholar
  65. Zhang Li, Yip A. M., Brown M. S., and Tan Chew Lim. 2009. A unified framework for document restoration using inpainting and shape-from-shading. Pattern Recog. 42, 11 (2009), 29612978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Zhang Li, Zhang Yu, and Tan Chew. 2008. An improved physically-based method for geometric restoration of distorted document images. IEEE Trans. Anal. Mach. Intell. 30, 4 (2008), 728734.Google ScholarGoogle Scholar
  67. Zhong Xu, Tang Jianbin, and Yepes Antonio Jimeno. 2019. PubLayNet: Largest dataset ever for document layout analysis. In IAPR International Conference on Document Analysis and Recognition. 10151022.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Layout-aware Single-image Document Flattening

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 43, Issue 1
      February 2024
      211 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3613512
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 November 2023
      • Online AM: 13 October 2023
      • Accepted: 25 September 2023
      • Revised: 10 July 2023
      • Received: 8 September 2022
      Published in tog Volume 43, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)604
      • Downloads (Last 6 weeks)117

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text