Skip to main content

Is ImageNet Always the Best Option? An Overview on Transfer Learning Strategies for Document Layout Analysis

  • Conference paper
  • First Online:
Image Analysis and Processing - ICIAP 2023 Workshops (ICIAP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14366))

Included in the following conference series:

  • 176 Accesses

Abstract

Semantic segmentation models have shown impressive performance in the context of historical document layout analysis, but their effectiveness is reliant on having access to a large number of high-quality annotated images for training. A popular approach to address the lack of training data in other domains is to rely on transfer learning to transfer the knowledge learned from a large-scale, general-purpose dataset (e.g. ImageNet) to a domain-specific task. However, this approach has been shown to lead to unsatisfactory results when the target task is completely unrelated to the data employed for the pre-training process, which is the case when working on document layout analysis. For this reason, in the present paper, we provide an overview of domain-specific transfer learning for document layout segmentation. In particular, we show how relying on document-related images for the pre-training process leads to consistently improved performance and faster convergence compared to training from scratch or even relying on a large, general purpose, dataset such as ImageNet.

A. De Nardin and S. Zottin—Equally contributed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Andrist, P.: Toward a definition of paratexts and paratextuality: the case of ancient Greek manuscripts, pp. 130–150. De Gruyter, Berlin, Boston (2018). https://doi.org/10.1515/9783110603477-010

  2. Brodzicki, A., Piekarski, M., Kucharski, D., Jaworek-Korjakowska, J., Gorgon, M.: Transfer learning methods as a new approach in computer vision tasks with small datasets. Found. Comput. Decision Sci. 45(3), 179–193 (2020). https://doi.org/10.2478/fcds-2020-0010

    Article  Google Scholar 

  3. Bukhari, S.S., Breuel, T.M., Asi, A., El-Sana, J.: Layout analysis for Arabic historical document images using machine learning. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 639–644 (2012). https://doi.org/10.1109/ICFHR.2012.227

  4. Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587 (2017)

    Google Scholar 

  5. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018, pp. 833–851. Springer International Publishing, Cham (2018)

    Chapter  Google Scholar 

  6. De Nardin, A., Zottin, S., Piciarelli, C., Colombi, E., Foresti, G.L.: Few-shot pixel-precise document layout segmentation via dynamic instance generation and local thresholding. Int. J. Neural Syst. 33(10), 2350052 (2023). https://doi.org/10.1142/S0129065723500521, PMID: 37567858

  7. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848

  8. Droby, A., Barakat, B.K., Madi, B., Alaasam, R., El-Sana, J.: Unsupervised deep learning for handwritten page segmentation. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 240–245. Dortmund, Germany (2020). https://doi.org/10.1109/ICFHR2020.2020.00052

  9. Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88, 303–338 (2010)

    Article  Google Scholar 

  10. Iakubovskii, P.: Segmentation models pytorch (2019). https://github.com/qubvel/segmentation_models.pytorch

  11. Kornblith, S., Shlens, J., Le, Q.V.: Do better imagenet models transfer better? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)

    Google Scholar 

  12. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  13. Lin, T.Y., et al.: Microsoft coco: Common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014, pp. 740–755. Springer International Publishing, Cham (2014)

    Chapter  Google Scholar 

  14. Simistira, F., Seuret, M., Eichenberger, N., Garz, A., Liwicki, M., Ingold, R.: Diva-hisdb: a precisely annotated large dataset of challenging medieval manuscripts. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 471–476. Shenzen, China (2016). https://doi.org/10.1109/ICFHR.2016.0093

  15. Studer, L., et al.: A comprehensive study of imagenet pre-training for historical document image analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 720–725 (2019). https://doi.org/10.1109/ICDAR.2019.00120

  16. Tarride, S., Lemaitre, A., Coüasnon, B., Tardivel, S.: Combination of deep neural networks and logical rules for record segmentation in historical handwritten registers using few examples. Int. J. Doc. Anal. Recogn. (IJDAR) 24(1), 77–96 (2021). https://doi.org/10.1007/s10032-021-00362-8

    Article  Google Scholar 

  17. Zhuang, F., et al.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76 (2021). https://doi.org/10.1109/JPROC.2020.3004555

    Article  Google Scholar 

Download references

Acknowledgements

Partial financial support was received from Piano Nazionale di Ripresa e Resilienza (PNRR) DD 3277 del 30 dicembre 2021 (PNRR Missione 4, Componente 2, Investimento 1.5) - Interconnected Nord-Est Innovation Ecosystem (iNEST).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Axel De Nardin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

De Nardin, A., Zottin, S., Colombi, E., Piciarelli, C., Foresti, G.L. (2024). Is ImageNet Always the Best Option? An Overview on Transfer Learning Strategies for Document Layout Analysis. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing - ICIAP 2023 Workshops. ICIAP 2023. Lecture Notes in Computer Science, vol 14366. Springer, Cham. https://doi.org/10.1007/978-3-031-51026-7_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-51026-7_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-51025-0

  • Online ISBN: 978-3-031-51026-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics