Abstract
In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one. However, such geometric constraints are largely ignored in existing advanced solutions, which limits the rectification performance. To this end, we present DocGeoNet for document image rectification by introducing explicit geometric representation. Technically, two typical attributes of the document image are involved in the proposed geometric representation learning, i.e., 3D shape and textlines. Our motivation raises from the insight that 3D shape provides global unwarping cues for rectifying a distorted document image, while overlooking the local structure. On the other hand, textlines complementarily provide explicit geometric constraints for local patterns. The learned geometric representation effectively bridges the distorted image and the ground truth one. Extensive experiments show the effectiveness of our framework and demonstrate the superiority of our DocGeoNet over state-of-the-art methods on both the DocUNet Benchmark dataset and our proposed DIR300 test set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Brown, M.S., Seales, W.B.: Document restoration using 3D shape: a general deskewing algorithm for arbitrarily warped documents. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2, pp. 367–374 (2001)
Brown, M.S., Seales, W.B.: Image restoration of arbitrarily warped documents. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1295–1306 (2004)
Brown, M.S., Sun, M., Yang, R., Yun, L., Seales, W.B.: Restoring 2d content from distorted documents. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1904–1916 (2007)
Tan, C.L., Zhang, L., Zhang, Z., Xia, T.: Restoring warped document images through 3D shape modeling. IEEE Trans. Pattern Anal. Mach. Intell. 28(2), 195–208 (2006)
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3606–3613 (2014)
Das, S., Ma, K., Shu, Z., Samaras, D., Shilkrot, R.: DewarpNet: single-image document unwarping with stacked 3D and 2D regression networks. In: Proceedings of the International Conference on Computer Vision, pp. 131–140 (2019)
Das, S., et al.: End-to-end piece-wise unwarping of document images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4268–4277 (2021)
De Boer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)
Feng, H., Wang, Y., Zhou, W., Deng, J., Li, H.: DocTr: document image transformer for geometric unwarping and illumination correction. In: Proceedings of the ACM International Conference on Multimedia, pp. 273–281 (2021)
Feng, H., Zhou, W., Deng, J., Tian, Q., Li, H.: DocScanner: robust document image rectification with progressive learning. arXiv preprint arXiv:2110.14968 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, Y., Pan, P., Xie, S., Sun, J., Naoi, S.: A book dewarping system by boundary-based 3D surface reconstruction. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 403–407 (2013)
Cao, H., Ding, X., Liu, C.: Rectifying the bound document image captured by the camera: a model based approach. In: Proceedings of the International Conference on Document Analysis and Recognition, vol. 1, pp. 71–75 (2003)
Jiang, X., Long, R., Xue, N., Yang, Z., Yao, C., Xia, G.S.: Revisiting document image dewarping by grid regularization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4543–4552 (2022)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2015)
Koo, H.I., Kim, J., Cho, N.I.: Composition of a dewarped and enhanced document image from two view images. IEEE Trans. Image Process. 18(7), 1551–1562 (2009)
Lavialle, O., Molines, X., Angella, F., Baylou, P.: Active contours network to straighten distorted text lines. In: Proceedings of the International Conference on Image Processing, vol. 3, pp. 748–751 (2001)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals, vol. 10, pp. 707–710 (1966)
Li, X., Zhang, B., Liao, J., Sander, P.V.: Document rectification and illumination correction using a patch-based CNN. ACM Trans. Graph. 38(6), 1–11 (2019)
Liang, J., DeMenthon, D., Doermann, D.: Geometric rectification of camera-captured document images. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 591–605 (2008)
Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)
Liu, X., Meng, G., Fan, B., Xiang, S., Pan, C.: Geometric rectification of document images using adversarial gated unwarping network. Pattern Recogn. 108, 107576 (2020)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: Proceedings of the International Conference on Learning Representations (2019)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Ma, K., Shu, Z., Bai, X., Wang, J., Samaras, D.: DocUNet: document image unwarping via a stacked u-net. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4700–4709 (2018)
Markovitz, A., Lavi, I., Perel, O., Mazor, S., Litman, R.: Can you read me now? Content aware rectification using angle supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 208–223. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_13
Meng, G., Pan, C., Xiang, S., Duan, J., Zheng, N.: Metric rectification of curved document images. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 707–722 (2011)
Meng, G., Wang, Y., Qu, S., Xiang, S., Pan, C.: Active flattening of curved document images via two structured beams. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3890–3897 (2014)
Morris, A.C., Maier, V., Green, P.: From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition. In: Proceedings of the International Conference on Spoken Language Processing (2004)
Paszke, A., et al.: Automatic differentiation in pytorch (2017)
Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O.R., Jagersand, M.: U2-net: going deeper with nested u-structure for salient object detection. Pattern Recogn. 106, 107404 (2020)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Smith, R.: An overview of the tesseract OCR engine. In: Proceedings of the International Conference on Document Analysis and Recognition, vol. 2, pp. 629–633 (2007)
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Tsoi, Y.C., Brown, M.S.: Multi-view document rectification using boundary. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the Neural Information Processing Systems, pp. 6000–6010 (2017)
Wada, T., Ukida, H., Matsuyama, T.: Shape from shading with interreflections under a proximal light source: distortion-free copying of an unfolded book. Int. J. Comput. Vision 24(2), 125–135 (1997)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: Proceedings of the Asilomar Conference on Signals, Systems Computers, vol. 2, pp. 1398–1402 (2003)
Wu, C., Agam, G.: Document image de-warping for text/Graphics recognition. In: Caelli, T., Amin, A., Duin, R.P.W., de Ridder, D., Kamel, M. (eds.) SSPR /SPR 2002. LNCS, vol. 2396, pp. 348–357. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-70659-3_36
Xie, G.-W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Document dewarping with control points. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 466–480. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_30
Xie, G.-W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Dewarping document image by displacement flow estimation with fully convolutional network. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) DAS 2020. LNCS, vol. 12116, pp. 131–144. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57058-3_10
Xue, C., Tian, Z., Zhan, F., Lu, S., Bai, S.: Fourier document restoration for robust document dewarping and recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4573–4582 (2022)
Yamashita, A., Kawarago, A., Kaneko, T., Miura, K.T.: Shape reconstruction and image restoration for non-flat surfaces of documents with a stereo vision system. In: Proceedings of the International Conference on Pattern Recognition, vol. 1, pp. 482–485 (2004)
You, S., Matsushita, Y., Sinha, S., Bou, Y., Ikeuchi, K.: Multiview rectification of folded documents. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 505–511 (2018)
Zhang, L., Zhang, Y., Tan, C.: An improved physically-based method for geometric restoration of distorted document images. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 728–734 (2008)
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Contract 61836011 and 62021001. It was also supported by the GPU cluster built by MCC Lab of Information Science and Technology Institution, USTC.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Feng, H., Zhou, W., Deng, J., Wang, Y., Li, H. (2022). Geometric Representation Learning for Document Image Rectification. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13697. Springer, Cham. https://doi.org/10.1007/978-3-031-19836-6_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-19836-6_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19835-9
Online ISBN: 978-3-031-19836-6
eBook Packages: Computer ScienceComputer Science (R0)