LK-Net: Efficient Large Kernel ConvNet for Document Enhancement

Shi, Qijun; Zhan, Hongjian; Li, Yangfu; Zou, Weijun; Li, Huasheng; Pal, Umapada; Lu, Yue

doi:10.1007/978-3-031-78305-0_18

Qijun Shi¹³,
Hongjian Zhan¹³,
Yangfu Li¹³,
Weijun Zou¹⁴,
Huasheng Li¹⁴,
Umapada Pal¹⁵ &
…
Yue Lu¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15321))

Included in the following conference series:

International Conference on Pattern Recognition

279 Accesses

Abstract

Various types of degradation in document images, such as blurring, shadow, and physical wear and tear, significantly impact the effectiveness of downstream tasks in multimedia applications. The need for document image enhancement arises from the urgent need to improve the legibility and quality of these images, which are integral for accurate Optical Character Recognition(OCR), information retrieval, document analysis, etc. This paper introduces a novel and simple approach employing Large Kernel Convolutional Networks (ConvNets) for document image enhancement, capitalizing on their ability to encapsulate expansive contextual information to improve image quality. Extensive experimental evaluations across multiple benchmarks have demonstrated that our method achieves state-of-the-art (SOTA) while maintaining low computational cost. Code and pre-trained models are available at https://github.com/qijunshi/LKNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Camera Captured DIQA with Linearity and Monotonicity Constraints

DDocE: Deep Document Enhancement with Multi-scale Feature Aggregation and Pixel-Wise Adjustments

FD-Net: A Fully Dilated Convolutional Network for Historical Document Image Binarization

References

Awal, A.M., Ghanmi, N., Sicre, R., Furon, T.: Complex document classification and localization application on identity document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 426–431 (2017). https://doi.org/10.1109/ICDAR.2017.77
Burie, J.C., Coustaty, M., Hadi, S., Kesiman, M.W.A., Ogier, J.M., Paulus, E., Sok, K., Sunarya, I.M.G., Valy, D.: Icfhr2016 competition on the analysis of handwritten text in images of balinese palm leaf manuscripts. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). pp. 596–601 (2016https://doi.org/10.1109/ICFHR.2016.0114
Chen, H., Chu, X., Ren, Y., Zhao, X., Huang, K.: Pelk: Parameter-efficient large kernel convnets with peripheral convolution. arXiv preprint arXiv:2403.07589 (2024)
Chen, L., Chu, X., Zhang, X., Sun, J.: Simple baselines for image restoration. arXiv preprint arXiv:2204.04676 (2022)
Ding, X., Zhang, X., Zhou, Y., Han, J., Ding, G., Sun, J.: Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. arXiv preprint arXiv:2203.06717 (2022)
Fan, L., Fan, L., Tan, C.: Wavelet diffusion for document image denoising. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. pp. 1188–1192 (2003https://doi.org/10.1109/ICDAR.2003.1227845
Gatos, B., Ntirogiannis, K., Pratikakis, I.: Icdar 2009 document image binarization contest (dibco 2009). In: 2009 10th International Conference on Document Analysis and Recognition. pp. 1375–1382 (2009).https://doi.org/10.1109/ICDAR.2009.246
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. p. 2672-2680. NIPS’14, MIT Press, Cambridge, MA, USA (2014)
Google Scholar
Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv: Learning (2016), https://api.semanticscholar.org/CorpusID:125617073
Hradiš, M., Kotera, J., Zemčík, P., Šroubek, F.: Convolutional neural networks for direct text deblurring. In: British Machine Vision Conference (2015), https://api.semanticscholar.org/CorpusID:14143575
Jiao, J., Sun, J., Satoshi, N.: A convolutional neural network based two-stage document deblurring. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 703–707 (2017).https://doi.org/10.1109/ICDAR.2017.120
Kligler, N., Katz, S., Tal, A.: Document enhancement using visibility detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2374–2382 (2018https://doi.org/10.1109/CVPR.2018.00252
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. p. 1097-1105. NIPS’12, Curran Associates Inc., Red Hook, NY, USA (2012)
Google Scholar
Kupyn, O., Martyniuk, T., Wu, J., Wang, Z.: Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. In: The IEEE International Conference on Computer Vision (ICCV) (Oct 2019)
Google Scholar
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Naeem, M.F., Zia, N.u.S., Awan, A.A., Shafait, F., ul Hasan, A.: Impact of ligature coverage on training practical urdu ocr systems. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 131–136 (2017https://doi.org/10.1109/ICDAR.2017.30
Nah, S., Kim, T.H., Lee, K.M.: Deep multi-scale convolutional neural network for dynamic scene deblurring (2018), https://arxiv.org/abs/1612.02177
Ntirogiannis, K., Gatos, B., Pratikakis, I.: Icfhr2014 competition on handwritten document image binarization (h-dibco 2014). In: 2014 14th International Conference on Frontiers in Handwriting Recognition. pp. 809–813 (2014https://doi.org/10.1109/ICFHR.2014.141
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979). https://doi.org/10.1109/TSMC.1979.4310076
Article Google Scholar
Parker, J., Frieder, O., Frieder, G.: Automatic enhancement and binarization of degraded document images. In: 2013 12th International Conference on Document Analysis and Recognition. pp. 210–214 (2013https://doi.org/10.1109/ICDAR.2013.49
Pratikakis, I., Gatos, B., Ntirogiannis, K.: H-dibco 2010 - handwritten document image binarization competition. In: 2010 12th International Conference on Frontiers in Handwriting Recognition. pp. 727–732 (2010).https://doi.org/10.1109/ICFHR.2010.118
Pratikakis, I., Gatos, B., Ntirogiannis, K.: Icdar 2011 document image binarization contest (dibco 2011). In: Proceedings of the 2011 International Conference on Document Analysis and Recognition. p. 1506-1510. ICDAR ’11, IEEE Computer Society, USA (2011https://doi.org/10.1109/ICDAR.2011.299, https://doi.org/10.1109/ICDAR.2011.299
Pratikakis, I., Gatos, B., Ntirogiannis, K.: Icfhr 2012 competition on handwritten document image binarization (h-dibco 2012). In: 2012 International Conference on Frontiers in Handwriting Recognition. pp. 817–822 (2012https://doi.org/10.1109/ICFHR.2012.216
Pratikakis, I., Gatos, B., Ntirogiannis, K.: Icdar 2013 document image binarization contest (dibco 2013). In: 2013 12th International Conference on Document Analysis and Recognition. pp. 1471–1476 (2013).https://doi.org/10.1109/ICDAR.2013.219
Pratikakis, I., Zagori, K., Kaddas, P., Gatos, B.: Icfhr 2018 competition on handwritten document image binarization (h-dibco 2018). In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). pp. 489–493 (2018https://doi.org/10.1109/ICFHR-2018.2018.00091
Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: Icfhr2016 handwritten document image binarization contest (h-dibco 2016). In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). pp. 619–623 (2016https://doi.org/10.1109/ICFHR.2016.0118
Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: Icdar2017 competition on document image binarization (dibco 2017). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 1395–1403 (2017https://doi.org/10.1109/ICDAR.2017.228
Pratikakis, I., Zagoris, K., Karagiannis, X., Tsochatzidis, L., Mondal, T., Marthot-Santaniello, I.: Icdar 2019 competition on document image binarization (dibco 2019). In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 1547–1556 (2019).https://doi.org/10.1109/ICDAR.2019.00249
Santos, J.E.B.d.: Automatic content extraction on semi-structured documents. In: 2011 International Conference on Document Analysis and Recognition. pp. 1235–1239 (2011https://doi.org/10.1109/ICDAR.2011.249
Sauvola, J., Seppanen, T., Haapakoski, S., Pietikainen, M.: Adaptive document binarization. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition. vol. 1, pp. 147–152 vol.1 (1997).https://doi.org/10.1109/ICDAR.1997.619831
Souibgui, M.A., Biswas, S., Jemni, S.K., Kessentini, Y., Fornés, A., Lladós, J., Pal, U.: Docentr: An end-to-end document image enhancement transformer. In: 2022 26th International Conference on Pattern Recognition (ICPR) (2022)
Google Scholar
Souibgui, M.A., Kessentini, Y.: De-gan: A conditional generative adversarial network for document enhancement. IEEE Trans. Pattern Anal. Mach. Intell. (2020). https://doi.org/10.1109/TPAMI.2020.3022406
Article Google Scholar
Tensmeyer, C., Martinez, T.: Document image binarization with fully convolutional neural networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 99–104 (2017).https://doi.org/10.1109/ICDAR.2017.25
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. p. 6000-6010. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)
Google Scholar
Wang, J.R., Chuang, Y.Y.: Shadow removal of text document images by estimating local and global background colors. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1534–1538 (2020https://doi.org/10.1109/ICASSP40776.2020.9053378
Yang, Z., Liu, B., Xxiong, Y., Yi, L., Wu, G., Tang, X., Liu, Z., Zhou, J., Zhang, X.: Docdiff: Document enhancement via residual diffusion models. In: Proceedings of the 31st ACM International Conference on Multimedia. pp. 2795–2806 (2023)
Google Scholar
Zagoris, K., Pratikakis, I.: Bio-inspired modeling for the enhancement of historical handwritten documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 287–292 (2017).https://doi.org/10.1109/ICDAR.2017.55
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Restormer: Efficient transformer for high-resolution image restoration. In: CVPR (2022)
Google Scholar
Zhai, X., Kolesnikov, A., Houlsby, N., Beyer, L.: Scaling vision transformers. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 1204–1213 (2021), https://api.semanticscholar.org/CorpusID:235367962
Zhao, J., Shi, C., Jia, F., Wang, Y., Xiao, B.: Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recogn. 96(C) (dec 2019).https://doi.org/10.1016/j.patcog.2019.106968, https://doi.org/10.1016/j.patcog.2019.106968

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant No.62176091.

Author information

Authors and Affiliations

Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai, 200241, China
Qijun Shi, Hongjian Zhan, Yangfu Li & Yue Lu
Shanghai Hex Information Technology Co., Ltd., Shanghai, China
Weijun Zou & Huasheng Li
Indian Statistical Institute Kolkata, Kolkata, India
Umapada Pal

Authors

Qijun Shi
View author publications
You can also search for this author in PubMed Google Scholar
Hongjian Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Yangfu Li
View author publications
You can also search for this author in PubMed Google Scholar
Weijun Zou
View author publications
You can also search for this author in PubMed Google Scholar
Huasheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Umapada Pal
View author publications
You can also search for this author in PubMed Google Scholar
Yue Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongjian Zhan .

Editor information

Editors and Affiliations

University of Salford, Salford, UK
Apostolos Antonacopoulos
Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
Subhasis Chaudhuri
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
IIT Kharagpur, Kharagpur, India
Saumik Bhattacharya
Indian Statistical Institute Kolkata, Kolkata, India
Umapada Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, Q. et al. (2025). LK-Net: Efficient Large Kernel ConvNet for Document Enhancement. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15321. Springer, Cham. https://doi.org/10.1007/978-3-031-78305-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-78305-0_18
Published: 04 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78304-3
Online ISBN: 978-3-031-78305-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)