DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degradations

Authors

  • Ruilu Wang South China University of Technology
  • Yang Xue South China University of Technology
  • Lianwen Jin South China University of Technology

DOI:

https://doi.org/10.1609/aaai.v38i6.28366

Keywords:

CV: Representation Learning for Vision, CV: Applications, ML: Representation Learning

Abstract

Document Image Enhancement (DIE) remains challenging due to the prevalence of multiple degradations in document images captured by cameras. In this paper, we respond an interesting question: can the performance of pre-trained models and downstream DIE models be improved if they are bootstrapped using different degradation types of the same semantic samples and their high-dimensional features with ambiguous inter-class distance? To this end, we propose an effective contrastive learning paradigm for DIE — a Document image enhancement framework with Normalization and Latent Contrast (DocNLC). While existing DIE methods focus on eliminating one type of degradation, DocNLC considers the relationship between different types of degradation while utilizing both direct and latent contrasts to constrain content consistency, thus achieving a unified treatment of multiple types of degradation. Specifically, we devise a latent contrastive learning module to enforce explicit decorrelation of the normalized representations of different degradation types and to minimize the redundancy between them. Comprehensive experiments show that our method outperforms state-of-the-art DIE models in both pre-training and fine-tuning stages on four publicly available independent datasets. In addition, we discuss the potential benefits of DocNLC for downstream tasks. Our code is released at https://github.com/RylonW/DocNLC

Published

2024-03-24

How to Cite

Wang, R., Xue, Y., & Jin, L. (2024). DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degradations. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 5563-5571. https://doi.org/10.1609/aaai.v38i6.28366

Issue

Section

AAAI Technical Track on Computer Vision V