Abstract
Various types of degradation in document images, such as blurring, shadow, and physical wear and tear, significantly impact the effectiveness of downstream tasks in multimedia applications. The need for document image enhancement arises from the urgent need to improve the legibility and quality of these images, which are integral for accurate Optical Character Recognition(OCR), information retrieval, document analysis, etc. This paper introduces a novel and simple approach employing Large Kernel Convolutional Networks (ConvNets) for document image enhancement, capitalizing on their ability to encapsulate expansive contextual information to improve image quality. Extensive experimental evaluations across multiple benchmarks have demonstrated that our method achieves state-of-the-art (SOTA) while maintaining low computational cost. Code and pre-trained models are available at https://github.com/qijunshi/LKNet.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Awal, A.M., Ghanmi, N., Sicre, R., Furon, T.: Complex document classification and localization application on identity document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 426–431 (2017). https://doi.org/10.1109/ICDAR.2017.77
Burie, J.C., Coustaty, M., Hadi, S., Kesiman, M.W.A., Ogier, J.M., Paulus, E., Sok, K., Sunarya, I.M.G., Valy, D.: Icfhr2016 competition on the analysis of handwritten text in images of balinese palm leaf manuscripts. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). pp. 596–601 (2016https://doi.org/10.1109/ICFHR.2016.0114
Chen, H., Chu, X., Ren, Y., Zhao, X., Huang, K.: Pelk: Parameter-efficient large kernel convnets with peripheral convolution. arXiv preprint arXiv:2403.07589 (2024)
Chen, L., Chu, X., Zhang, X., Sun, J.: Simple baselines for image restoration. arXiv preprint arXiv:2204.04676 (2022)
Ding, X., Zhang, X., Zhou, Y., Han, J., Ding, G., Sun, J.: Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. arXiv preprint arXiv:2203.06717 (2022)
Fan, L., Fan, L., Tan, C.: Wavelet diffusion for document image denoising. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. pp. 1188–1192 (2003https://doi.org/10.1109/ICDAR.2003.1227845
Gatos, B., Ntirogiannis, K., Pratikakis, I.: Icdar 2009 document image binarization contest (dibco 2009). In: 2009 10th International Conference on Document Analysis and Recognition. pp. 1375–1382 (2009).https://doi.org/10.1109/ICDAR.2009.246
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. p. 2672-2680. NIPS’14, MIT Press, Cambridge, MA, USA (2014)
Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv: Learning (2016), https://api.semanticscholar.org/CorpusID:125617073
Hradiš, M., Kotera, J., Zemčík, P., Šroubek, F.: Convolutional neural networks for direct text deblurring. In: British Machine Vision Conference (2015), https://api.semanticscholar.org/CorpusID:14143575
Jiao, J., Sun, J., Satoshi, N.: A convolutional neural network based two-stage document deblurring. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 703–707 (2017).https://doi.org/10.1109/ICDAR.2017.120
Kligler, N., Katz, S., Tal, A.: Document enhancement using visibility detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2374–2382 (2018https://doi.org/10.1109/CVPR.2018.00252
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. p. 1097-1105. NIPS’12, Curran Associates Inc., Red Hook, NY, USA (2012)
Kupyn, O., Martyniuk, T., Wu, J., Wang, Z.: Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. In: The IEEE International Conference on Computer Vision (ICCV) (Oct 2019)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Naeem, M.F., Zia, N.u.S., Awan, A.A., Shafait, F., ul Hasan, A.: Impact of ligature coverage on training practical urdu ocr systems. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 131–136 (2017https://doi.org/10.1109/ICDAR.2017.30
Nah, S., Kim, T.H., Lee, K.M.: Deep multi-scale convolutional neural network for dynamic scene deblurring (2018), https://arxiv.org/abs/1612.02177
Ntirogiannis, K., Gatos, B., Pratikakis, I.: Icfhr2014 competition on handwritten document image binarization (h-dibco 2014). In: 2014 14th International Conference on Frontiers in Handwriting Recognition. pp. 809–813 (2014https://doi.org/10.1109/ICFHR.2014.141
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979). https://doi.org/10.1109/TSMC.1979.4310076
Parker, J., Frieder, O., Frieder, G.: Automatic enhancement and binarization of degraded document images. In: 2013 12th International Conference on Document Analysis and Recognition. pp. 210–214 (2013https://doi.org/10.1109/ICDAR.2013.49
Pratikakis, I., Gatos, B., Ntirogiannis, K.: H-dibco 2010 - handwritten document image binarization competition. In: 2010 12th International Conference on Frontiers in Handwriting Recognition. pp. 727–732 (2010).https://doi.org/10.1109/ICFHR.2010.118
Pratikakis, I., Gatos, B., Ntirogiannis, K.: Icdar 2011 document image binarization contest (dibco 2011). In: Proceedings of the 2011 International Conference on Document Analysis and Recognition. p. 1506-1510. ICDAR ’11, IEEE Computer Society, USA (2011https://doi.org/10.1109/ICDAR.2011.299, https://doi.org/10.1109/ICDAR.2011.299
Pratikakis, I., Gatos, B., Ntirogiannis, K.: Icfhr 2012 competition on handwritten document image binarization (h-dibco 2012). In: 2012 International Conference on Frontiers in Handwriting Recognition. pp. 817–822 (2012https://doi.org/10.1109/ICFHR.2012.216
Pratikakis, I., Gatos, B., Ntirogiannis, K.: Icdar 2013 document image binarization contest (dibco 2013). In: 2013 12th International Conference on Document Analysis and Recognition. pp. 1471–1476 (2013).https://doi.org/10.1109/ICDAR.2013.219
Pratikakis, I., Zagori, K., Kaddas, P., Gatos, B.: Icfhr 2018 competition on handwritten document image binarization (h-dibco 2018). In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). pp. 489–493 (2018https://doi.org/10.1109/ICFHR-2018.2018.00091
Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: Icfhr2016 handwritten document image binarization contest (h-dibco 2016). In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). pp. 619–623 (2016https://doi.org/10.1109/ICFHR.2016.0118
Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: Icdar2017 competition on document image binarization (dibco 2017). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 1395–1403 (2017https://doi.org/10.1109/ICDAR.2017.228
Pratikakis, I., Zagoris, K., Karagiannis, X., Tsochatzidis, L., Mondal, T., Marthot-Santaniello, I.: Icdar 2019 competition on document image binarization (dibco 2019). In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 1547–1556 (2019).https://doi.org/10.1109/ICDAR.2019.00249
Santos, J.E.B.d.: Automatic content extraction on semi-structured documents. In: 2011 International Conference on Document Analysis and Recognition. pp. 1235–1239 (2011https://doi.org/10.1109/ICDAR.2011.249
Sauvola, J., Seppanen, T., Haapakoski, S., Pietikainen, M.: Adaptive document binarization. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition. vol. 1, pp. 147–152 vol.1 (1997).https://doi.org/10.1109/ICDAR.1997.619831
Souibgui, M.A., Biswas, S., Jemni, S.K., Kessentini, Y., Fornés, A., Lladós, J., Pal, U.: Docentr: An end-to-end document image enhancement transformer. In: 2022 26th International Conference on Pattern Recognition (ICPR) (2022)
Souibgui, M.A., Kessentini, Y.: De-gan: A conditional generative adversarial network for document enhancement. IEEE Trans. Pattern Anal. Mach. Intell. (2020). https://doi.org/10.1109/TPAMI.2020.3022406
Tensmeyer, C., Martinez, T.: Document image binarization with fully convolutional neural networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 99–104 (2017).https://doi.org/10.1109/ICDAR.2017.25
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. p. 6000-6010. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)
Wang, J.R., Chuang, Y.Y.: Shadow removal of text document images by estimating local and global background colors. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1534–1538 (2020https://doi.org/10.1109/ICASSP40776.2020.9053378
Yang, Z., Liu, B., Xxiong, Y., Yi, L., Wu, G., Tang, X., Liu, Z., Zhou, J., Zhang, X.: Docdiff: Document enhancement via residual diffusion models. In: Proceedings of the 31st ACM International Conference on Multimedia. pp. 2795–2806 (2023)
Zagoris, K., Pratikakis, I.: Bio-inspired modeling for the enhancement of historical handwritten documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 287–292 (2017).https://doi.org/10.1109/ICDAR.2017.55
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Restormer: Efficient transformer for high-resolution image restoration. In: CVPR (2022)
Zhai, X., Kolesnikov, A., Houlsby, N., Beyer, L.: Scaling vision transformers. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 1204–1213 (2021), https://api.semanticscholar.org/CorpusID:235367962
Zhao, J., Shi, C., Jia, F., Wang, Y., Xiao, B.: Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recogn. 96(C) (dec 2019).https://doi.org/10.1016/j.patcog.2019.106968, https://doi.org/10.1016/j.patcog.2019.106968
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant No.62176091.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shi, Q. et al. (2025). LK-Net: Efficient Large Kernel ConvNet for Document Enhancement. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15321. Springer, Cham. https://doi.org/10.1007/978-3-031-78305-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-78305-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78304-3
Online ISBN: 978-3-031-78305-0
eBook Packages: Computer ScienceComputer Science (R0)