Abstract:
Document images captured with cameras often exhibit photometric and geometric distortions. Here, we propose a novel learning-based approach for efficient joint rectificat...Show MoreMetadata
Abstract:
Document images captured with cameras often exhibit photometric and geometric distortions. Here, we propose a novel learning-based approach for efficient joint rectification of document images. Inspired by the strong correlation between visual shadows and physical deformations, we design a shared encoder architecture to fully leverage structured document features. A cross-attention module is introduced to facilitate information exchange between deformation and coordinate domains. Our method effectively addresses both geometric and photometric distortions in an end-to-end manner, making it highly valuable for applications involving camera-captured document images.
Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information: