Skip to main content

Annotation-Free Character Detection in Historical Vietnamese Stele Images

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12821))

Abstract

Images of Historical Vietnamese stone engravings provide historians with a unique opportunity to study the past of the country. However, due to the large heterogeneity of thousands of images regarding both the text foreground and the stone background, it is difficult to use automatic document analysis methods for supporting manual examination, especially with a view to the labeling effort needed for training machine learning systems. In this paper, we present a method for finding the location of Chu Nom characters in the main text of the steles without the need of any human annotation. Using self-calibration, fully convolutional object detection methods trained on printed characters are successfully adapted to the handwritten image collection. The achieved detection results are promising for subsequent document analysis tasks, such as keyword spotting or transcription.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://vietnamica.hypotheses.org.

  2. 2.

    Readers interested in the source code and the dataset are referred to our GitHub repository https://github.com/asciusb/annotationfree.

  3. 3.

    http://www.nomfoundation.org.

  4. 4.

    More specifically, three random selections have been performed. 11 samples have been selected among already transcribed steles, 22 samples from the dataset used in previous work [20], and 22 samples from the rest of the dataset.

References

  1. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)

  2. Borges Oliveira, D.A., Viana, M.P.: Fast CNN-based document layout analysis. In: Proceedings International Conference on Computer Vision Workshops (ICCVW), pp. 1173–1180 (2017)

    Google Scholar 

  3. Clanuwat, T., Lamb, A., Kitamoto, A.: KuroNet: Pre-modern Japanese Kuzushiji character recognition with deep learning. In: Proceedings 15th International Conference on Document Analysis and Recognition (ICDAR), pp. 607–614 (2019)

    Google Scholar 

  4. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)

    Google Scholar 

  5. Farhadi, A., Redmon, J.: Yolov3: An incremental improvement. arXiv:1804.02767 (2018)

  6. Fischer, A., Liwicki, M., Ingold, R. (eds.): Handwritten historical document analysis, recognition, and retrieval – State of the art and future trends. World Scientific (2020)

    Google Scholar 

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  8. Jocher, G., et al.: ultralytics/yolov5: v4.0 - nn.SiLU() activations, Weights & Biases logging, PyTorch Hub integration (2021). https://doi.org/10.5281/ZENODO.4418161

  9. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings International Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2117–2125 (2017)

    Google Scholar 

  10. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)

    Google Scholar 

  11. Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Proceedings 13th European Conference on Computer Vision (ECCV), pp. 740–755 (2014)

    Google Scholar 

  12. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759–8768 (2018)

    Google Scholar 

  13. Nguyen, K.C., Nguyen, C.T., Nakagawa, M.: Nom document digitalization by deep convolution neural networks. Pattern Recogn. Lett. 133, 8–16 (2020)

    Article  Google Scholar 

  14. Papin, P.: Aperçu sur le programme “Publication de l’inventaire et du corpus complet des inscriptions sur stèles du Viêt-Nam’’. Bull. de l’École Française d’Extrême-Orient 90(1), 465–472 (2003)

    Article  Google Scholar 

  15. Papin, P., Manh, T.K., Nguyên, N.V.: Corpus des inscriptions anciennes du Vietnam. EPHE, EFEO, Institut Han-Nôm (2005–2013)

    Google Scholar 

  16. Papin, P., Manh, T.K., Nguyên, N.V.: Catalogue des inscriptions du Viêt-Nam. EPHE, EFEO, Institut Han-Nôm (2007–2012)

    Google Scholar 

  17. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)

    Google Scholar 

  18. Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 658–666 (2019)

    Google Scholar 

  19. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Proceedings International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 234–241 (2015)

    Google Scholar 

  20. Scius-Bertrand, A., Voegtlin, L., Alberti, M., Fischer, A., Bui, M.: Layout analysis and text column segmentation for historical Vietnamese steles. In: Proceedings 5th International Workshop on Historical Document Imaging and Processing (HIP), pp. 84–89 (2019)

    Google Scholar 

  21. Stewart, S., Barrett, B.: Document image page segmentation and character recognition as semantic segmentation. In: Proceedings 4th International Workshop on Historical Document Imaging and Processing (HIP), pp. 101–106 (2017)

    Google Scholar 

  22. Sudholt, S., Fink, G.A.: PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. In: Proceedings 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 277–282 (2016)

    Google Scholar 

  23. Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML), pp. 6105–6114 (2019)

    Google Scholar 

  24. Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 9627–9636 (2019)

    Google Scholar 

  25. Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., Yeh, I.H.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings International Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 390–391 (2020)

    Google Scholar 

  26. Wu, Y., He, K.: Group normalization. In: Proceedings European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

    Google Scholar 

  27. Yang, H., Jin, L., Huang, W., Yang, Z., Lai, S., Sun, J.: Dense and tight detection of Chinese characters in historical documents: datasets and a recognition guided detector. IEEE Access 6, 30174–30183 (2018)

    Article  Google Scholar 

  28. Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9759–9768 (2020)

    Google Scholar 

Download references

Acknowledgements

This work has been supported by the Swiss Hasler Foundation (project 20008). It has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 833933 - VIETNAMICA).

We would like to thank Bélinda Hakkar, Marine Scius-Bertrand, Jean-Michel Nafziger, René Boutin, Morgane Vannier, Delphine Mamie and Tobias Widmer for annotating bounding boxes during more than hundred hours to create the ground truth of the test set.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Scius-Bertrand .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Scius-Bertrand, A., Jungo, M., Wolf, B., Fischer, A., Bui, M. (2021). Annotation-Free Character Detection in Historical Vietnamese Stele Images. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12821. Springer, Cham. https://doi.org/10.1007/978-3-030-86549-8_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86549-8_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86548-1

  • Online ISBN: 978-3-030-86549-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics