Skip to main content

Geometric Representation Learning for Document Image Rectification

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13697))

Included in the following conference series:

Abstract

In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one. However, such geometric constraints are largely ignored in existing advanced solutions, which limits the rectification performance. To this end, we present DocGeoNet for document image rectification by introducing explicit geometric representation. Technically, two typical attributes of the document image are involved in the proposed geometric representation learning, i.e., 3D shape and textlines. Our motivation raises from the insight that 3D shape provides global unwarping cues for rectifying a distorted document image, while overlooking the local structure. On the other hand, textlines complementarily provide explicit geometric constraints for local patterns. The learned geometric representation effectively bridges the distorted image and the ground truth one. Extensive experiments show the effectiveness of our framework and demonstrate the superiority of our DocGeoNet over state-of-the-art methods on both the DocUNet Benchmark dataset and our proposed DIR300 test set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.blender.org/.

References

  1. Brown, M.S., Seales, W.B.: Document restoration using 3D shape: a general deskewing algorithm for arbitrarily warped documents. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2, pp. 367–374 (2001)

    Google Scholar 

  2. Brown, M.S., Seales, W.B.: Image restoration of arbitrarily warped documents. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1295–1306 (2004)

    Article  Google Scholar 

  3. Brown, M.S., Sun, M., Yang, R., Yun, L., Seales, W.B.: Restoring 2d content from distorted documents. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1904–1916 (2007)

    Article  Google Scholar 

  4. Tan, C.L., Zhang, L., Zhang, Z., Xia, T.: Restoring warped document images through 3D shape modeling. IEEE Trans. Pattern Anal. Mach. Intell. 28(2), 195–208 (2006)

    Article  Google Scholar 

  5. Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3606–3613 (2014)

    Google Scholar 

  6. Das, S., Ma, K., Shu, Z., Samaras, D., Shilkrot, R.: DewarpNet: single-image document unwarping with stacked 3D and 2D regression networks. In: Proceedings of the International Conference on Computer Vision, pp. 131–140 (2019)

    Google Scholar 

  7. Das, S., et al.: End-to-end piece-wise unwarping of document images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4268–4277 (2021)

    Google Scholar 

  8. De Boer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)

    Article  MathSciNet  Google Scholar 

  9. Feng, H., Wang, Y., Zhou, W., Deng, J., Li, H.: DocTr: document image transformer for geometric unwarping and illumination correction. In: Proceedings of the ACM International Conference on Multimedia, pp. 273–281 (2021)

    Google Scholar 

  10. Feng, H., Zhou, W., Deng, J., Tian, Q., Li, H.: DocScanner: robust document image rectification with progressive learning. arXiv preprint arXiv:2110.14968 (2021)

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  12. He, Y., Pan, P., Xie, S., Sun, J., Naoi, S.: A book dewarping system by boundary-based 3D surface reconstruction. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 403–407 (2013)

    Google Scholar 

  13. Cao, H., Ding, X., Liu, C.: Rectifying the bound document image captured by the camera: a model based approach. In: Proceedings of the International Conference on Document Analysis and Recognition, vol. 1, pp. 71–75 (2003)

    Google Scholar 

  14. Jiang, X., Long, R., Xue, N., Yang, Z., Yao, C., Xia, G.S.: Revisiting document image dewarping by grid regularization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4543–4552 (2022)

    Google Scholar 

  15. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2015)

    Google Scholar 

  16. Koo, H.I., Kim, J., Cho, N.I.: Composition of a dewarped and enhanced document image from two view images. IEEE Trans. Image Process. 18(7), 1551–1562 (2009)

    Article  MathSciNet  Google Scholar 

  17. Lavialle, O., Molines, X., Angella, F., Baylou, P.: Active contours network to straighten distorted text lines. In: Proceedings of the International Conference on Image Processing, vol. 3, pp. 748–751 (2001)

    Google Scholar 

  18. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals, vol. 10, pp. 707–710 (1966)

    Google Scholar 

  19. Li, X., Zhang, B., Liao, J., Sander, P.V.: Document rectification and illumination correction using a patch-based CNN. ACM Trans. Graph. 38(6), 1–11 (2019)

    Google Scholar 

  20. Liang, J., DeMenthon, D., Doermann, D.: Geometric rectification of camera-captured document images. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 591–605 (2008)

    Article  Google Scholar 

  21. Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)

    Article  Google Scholar 

  22. Liu, X., Meng, G., Fan, B., Xiang, S., Pan, C.: Geometric rectification of document images using adversarial gated unwarping network. Pattern Recogn. 108, 107576 (2020)

    Article  Google Scholar 

  23. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: Proceedings of the International Conference on Learning Representations (2019)

    Google Scholar 

  24. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  25. Ma, K., Shu, Z., Bai, X., Wang, J., Samaras, D.: DocUNet: document image unwarping via a stacked u-net. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4700–4709 (2018)

    Google Scholar 

  26. Markovitz, A., Lavi, I., Perel, O., Mazor, S., Litman, R.: Can you read me now? Content aware rectification using angle supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 208–223. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_13

    Chapter  Google Scholar 

  27. Meng, G., Pan, C., Xiang, S., Duan, J., Zheng, N.: Metric rectification of curved document images. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 707–722 (2011)

    Article  Google Scholar 

  28. Meng, G., Wang, Y., Qu, S., Xiang, S., Pan, C.: Active flattening of curved document images via two structured beams. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3890–3897 (2014)

    Google Scholar 

  29. Morris, A.C., Maier, V., Green, P.: From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition. In: Proceedings of the International Conference on Spoken Language Processing (2004)

    Google Scholar 

  30. Paszke, A., et al.: Automatic differentiation in pytorch (2017)

    Google Scholar 

  31. Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O.R., Jagersand, M.: U2-net: going deeper with nested u-structure for salient object detection. Pattern Recogn. 106, 107404 (2020)

    Google Scholar 

  32. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  33. Smith, R.: An overview of the tesseract OCR engine. In: Proceedings of the International Conference on Document Analysis and Recognition, vol. 2, pp. 629–633 (2007)

    Google Scholar 

  34. Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24

    Chapter  Google Scholar 

  35. Tsoi, Y.C., Brown, M.S.: Multi-view document rectification using boundary. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)

    Google Scholar 

  36. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the Neural Information Processing Systems, pp. 6000–6010 (2017)

    Google Scholar 

  37. Wada, T., Ukida, H., Matsuyama, T.: Shape from shading with interreflections under a proximal light source: distortion-free copying of an unfolded book. Int. J. Comput. Vision 24(2), 125–135 (1997)

    Article  Google Scholar 

  38. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  39. Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: Proceedings of the Asilomar Conference on Signals, Systems Computers, vol. 2, pp. 1398–1402 (2003)

    Google Scholar 

  40. Wu, C., Agam, G.: Document image de-warping for text/Graphics recognition. In: Caelli, T., Amin, A., Duin, R.P.W., de Ridder, D., Kamel, M. (eds.) SSPR /SPR 2002. LNCS, vol. 2396, pp. 348–357. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-70659-3_36

    Chapter  MATH  Google Scholar 

  41. Xie, G.-W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Document dewarping with control points. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 466–480. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_30

    Chapter  Google Scholar 

  42. Xie, G.-W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Dewarping document image by displacement flow estimation with fully convolutional network. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) DAS 2020. LNCS, vol. 12116, pp. 131–144. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57058-3_10

    Chapter  Google Scholar 

  43. Xue, C., Tian, Z., Zhan, F., Lu, S., Bai, S.: Fourier document restoration for robust document dewarping and recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4573–4582 (2022)

    Google Scholar 

  44. Yamashita, A., Kawarago, A., Kaneko, T., Miura, K.T.: Shape reconstruction and image restoration for non-flat surfaces of documents with a stereo vision system. In: Proceedings of the International Conference on Pattern Recognition, vol. 1, pp. 482–485 (2004)

    Google Scholar 

  45. You, S., Matsushita, Y., Sinha, S., Bou, Y., Ikeuchi, K.: Multiview rectification of folded documents. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 505–511 (2018)

    Article  Google Scholar 

  46. Zhang, L., Zhang, Y., Tan, C.: An improved physically-based method for geometric restoration of distorted document images. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 728–734 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Contract 61836011 and 62021001. It was also supported by the GPU cluster built by MCC Lab of Information Science and Technology Institution, USTC.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wengang Zhou or Houqiang Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2923 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Feng, H., Zhou, W., Deng, J., Wang, Y., Li, H. (2022). Geometric Representation Learning for Document Image Rectification. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13697. Springer, Cham. https://doi.org/10.1007/978-3-031-19836-6_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19836-6_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19835-9

  • Online ISBN: 978-3-031-19836-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics