Abstract
Handwritten text erasure on examination papers is an important new research topic with high practical value due to its ability to restore examination papers and collect questions that are answered incorrectly for review, thereby improving educational efficiency. However, to the best of our knowledge, there is no publicly available dataset for handwritten text erasure on examination papers. To facilitate the development of this field, we build a real-world dataset called SCUT-EnsExam (short for EnsExam). The dataset consists of 545 examination paper images, each of which has been carefully annotated to provide a visually reasonable erasure target. With EnsExam, we propose an end-to-end model, which introduces a soft stroke mask to erase the handwritten text precisely. Furthermore, we propose a simple yet effective loss called stroke normalization (SN) loss to alleviate the imbalance between text and non-text regions. Extensive numerical experiments shows that our proposed method outperforms previous state-of-the-art methods on EnsExam. In addition, quantitative experiments on scene text removal benchmark, SCUT-EnsText, demonstrate the generalizability of our method. The EnsExam will be made available at https://github.com/SCUT-DLVCLab/SCUT-EnsExam.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bian, X., Wang, C., Quan, W., et al.: Scene text removal via cascaded text stroke detection and erasing. Comput. Vis. Media 8(2), 273–287 (2022)
Chen, K., Pang, J., Wang, J., et al.: Hybrid task cascade for instance segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4974–4983 (2019)
Chen, L.C., Papandreou, G., Kokkinos, I., et al.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Chng, C.K., Liu, Y., Sun, Y., et al.: ICDAR2019 robust reading challenge on arbitrary-shaped text -RRC-ArT. In: 2019 International Conference on Document Analysis and Recognition, pp. 1571–1576 (2019)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition, pp. 1156–1160 (2015)
Karatzas, D., Shafait, F., Uchida, S., et al.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
Lee, H., Choi, C.: The surprisingly straightforward scene text removal method with gated attention and region of interest generation: a comprehensive prominent model analysis. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol. 13676, pp. 457–472. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19787-1_26
Liu, C., et al.: Don’t forget me: accurate background recovery for text removal via modeling local-global context. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol. 13688, pp. 409–426. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_24
Liu, C., Liu, Y., Jin, L., et al.: EraseNet: end-to-end text removal in the wild. IEEE Trans. Image Process. 29, 8760–8775 (2020)
Liu, Y., et al.: Exploring the capacity of an orderless box discretization network for multi-orientation scene text detection. Int. J. Comput. Vision 129(6), 1972–1992 (2021)
Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision, pp. 565–571 (2016)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: International Conference on Learning Representations (2018)
Nayef, N., Patel, Y., Busta, M., et al.: ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition - RRC-MLT-2019. In: 2019 International Conference on Document Analysis and Recognition, pp. 1582–1587 (2019)
Nayef, N., Yin, F., Bizid, I., et al.: ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - RRC-MLT. In: 2017 14th IAPR International Conference on Document Analysis and Recognition, pp. 1454–1459 (2017)
Salehi, S.S.M., Erdogmus, D., Gholipour, A.: Tversky loss function for image segmentation using 3d fully convolutional deep networks. In: Machine Learning in Medical Imaging, pp. 379–387 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Tang, Z., Miyazaki, T., Sugaya, Y., Omachi, S.: Stroke-based scene text erasing using synthetic data for training. IEEE Trans. Image Process. 30, 9306–9320 (2021)
Tursun, O., Denman, S., Zeng, R., et al.: MTRNet++: one-stage mask-based scene text eraser. Comput. Vis. Image Underst. 201, 103066 (2020)
Tursun, O., Zeng, R., Denman, S., et al.: MTRNet: a generic scene text eraser. In: 2019 International Conference on Document Analysis and Recognition, pp. 39–44 (2019)
Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: COCO-Text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 (2016)
Wang, B., Li, J., Jin, X., Yuan, Q.: CHENet: image to image Chinese handwriting eraser. In: Pattern Recognition and Computer Vision, pp. 40–51 (2022)
Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_43
Wang, W., Xie, E., Song, X., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)
Wang, Y., Xie, H., Fang, S., et al.: PERT: a progressively region-based network for scene text removal. arXiv preprint arXiv:2106.13029 (2021)
Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, pp. 600–612 (2004)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Zhang, S.X., Zhu, X., Chen, L., et al.: Arbitrary shape text detection via segmentation with probability maps. IEEE Trans. Pattern Anal. Mach. Intell. (2022). https://doi.org/10.1109/TPAMI.2022.3176122
Zhang, S., Liu, Y., Jin, L., et al.: EnsNet: ensconce text in the wild. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 801–808 (2019)
Acknowledgements
This research is supported in part by NSFC (Grant No.: 61936003), Zhuhai Industry Core and Key Technology Research Project (no. 2220004002350), and Science and Technology Foundation of Guangzhou Huangpu Development District (No. 2020GH17) and GD-NSF (No.2021A1515011870).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, L. et al. (2023). EnsExam: A Dataset for Handwritten Text Erasure on Examination Papers. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14189. Springer, Cham. https://doi.org/10.1007/978-3-031-41682-8_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-41682-8_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41681-1
Online ISBN: 978-3-031-41682-8
eBook Packages: Computer ScienceComputer Science (R0)