Is ImageNet Always the Best Option? An Overview on Transfer Learning Strategies for Document Layout Analysis

De Nardin, Axel; Zottin, Silvia; Colombi, Emanuela; Piciarelli, Claudio; Foresti, Gian Luca

doi:10.1007/978-3-031-51026-7_41

Axel De Nardin¹⁰,
Silvia Zottin¹⁰,
Emanuela Colombi¹¹,
Claudio Piciarelli¹⁰ &
…
Gian Luca Foresti¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14366))

Included in the following conference series:

International Conference on Image Analysis and Processing

176 Accesses

Abstract

Semantic segmentation models have shown impressive performance in the context of historical document layout analysis, but their effectiveness is reliant on having access to a large number of high-quality annotated images for training. A popular approach to address the lack of training data in other domains is to rely on transfer learning to transfer the knowledge learned from a large-scale, general-purpose dataset (e.g. ImageNet) to a domain-specific task. However, this approach has been shown to lead to unsatisfactory results when the target task is completely unrelated to the data employed for the pre-training process, which is the case when working on document layout analysis. For this reason, in the present paper, we provide an overview of domain-specific transfer learning for document layout segmentation. In particular, we show how relying on document-related images for the pre-training process leads to consistently improved performance and faster convergence compared to training from scratch or even relying on a large, general purpose, dataset such as ImageNet.

A. De Nardin and S. Zottin—Equally contributed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andrist, P.: Toward a definition of paratexts and paratextuality: the case of ancient Greek manuscripts, pp. 130–150. De Gruyter, Berlin, Boston (2018). https://doi.org/10.1515/9783110603477-010
Brodzicki, A., Piekarski, M., Kucharski, D., Jaworek-Korjakowska, J., Gorgon, M.: Transfer learning methods as a new approach in computer vision tasks with small datasets. Found. Comput. Decision Sci. 45(3), 179–193 (2020). https://doi.org/10.2478/fcds-2020-0010
Article Google Scholar
Bukhari, S.S., Breuel, T.M., Asi, A., El-Sana, J.: Layout analysis for Arabic historical document images using machine learning. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 639–644 (2012). https://doi.org/10.1109/ICFHR.2012.227
Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587 (2017)
Google Scholar
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018, pp. 833–851. Springer International Publishing, Cham (2018)
Chapter Google Scholar
De Nardin, A., Zottin, S., Piciarelli, C., Colombi, E., Foresti, G.L.: Few-shot pixel-precise document layout segmentation via dynamic instance generation and local thresholding. Int. J. Neural Syst. 33(10), 2350052 (2023). https://doi.org/10.1142/S0129065723500521, PMID: 37567858
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Droby, A., Barakat, B.K., Madi, B., Alaasam, R., El-Sana, J.: Unsupervised deep learning for handwritten page segmentation. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 240–245. Dortmund, Germany (2020). https://doi.org/10.1109/ICFHR2020.2020.00052
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88, 303–338 (2010)
Article Google Scholar
Iakubovskii, P.: Segmentation models pytorch (2019). https://github.com/qubvel/segmentation_models.pytorch
Kornblith, S., Shlens, J., Le, Q.V.: Do better imagenet models transfer better? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Lin, T.Y., et al.: Microsoft coco: Common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014, pp. 740–755. Springer International Publishing, Cham (2014)
Chapter Google Scholar
Simistira, F., Seuret, M., Eichenberger, N., Garz, A., Liwicki, M., Ingold, R.: Diva-hisdb: a precisely annotated large dataset of challenging medieval manuscripts. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 471–476. Shenzen, China (2016). https://doi.org/10.1109/ICFHR.2016.0093
Studer, L., et al.: A comprehensive study of imagenet pre-training for historical document image analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 720–725 (2019). https://doi.org/10.1109/ICDAR.2019.00120
Tarride, S., Lemaitre, A., Coüasnon, B., Tardivel, S.: Combination of deep neural networks and logical rules for record segmentation in historical handwritten registers using few examples. Int. J. Doc. Anal. Recogn. (IJDAR) 24(1), 77–96 (2021). https://doi.org/10.1007/s10032-021-00362-8
Article Google Scholar
Zhuang, F., et al.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76 (2021). https://doi.org/10.1109/JPROC.2020.3004555
Article Google Scholar

Download references

Acknowledgements

Partial financial support was received from Piano Nazionale di Ripresa e Resilienza (PNRR) DD 3277 del 30 dicembre 2021 (PNRR Missione 4, Componente 2, Investimento 1.5) - Interconnected Nord-Est Innovation Ecosystem (iNEST).

Author information

Authors and Affiliations

Department of Mathematics, Informatics and Physics (DMIF), University of Udine, Udine, Italy
Axel De Nardin, Silvia Zottin, Claudio Piciarelli & Gian Luca Foresti
Department of Humanities and Cultural Heritage (DIUM), University of Udine, 33100, Udine, Italy
Emanuela Colombi

Authors

Axel De Nardin
View author publications
You can also search for this author in PubMed Google Scholar
Silvia Zottin
View author publications
You can also search for this author in PubMed Google Scholar
Emanuela Colombi
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Piciarelli
View author publications
You can also search for this author in PubMed Google Scholar
Gian Luca Foresti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Axel De Nardin .

Editor information

Editors and Affiliations

University of Udine, Udine, Italy
Gian Luca Foresti
University of Udine, Udine, Italy
Andrea Fusiello
University of York, York, UK
Edwin Hancock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

De Nardin, A., Zottin, S., Colombi, E., Piciarelli, C., Foresti, G.L. (2024). Is ImageNet Always the Best Option? An Overview on Transfer Learning Strategies for Document Layout Analysis. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing - ICIAP 2023 Workshops. ICIAP 2023. Lecture Notes in Computer Science, vol 14366. Springer, Cham. https://doi.org/10.1007/978-3-031-51026-7_41

Download citation

DOI: https://doi.org/10.1007/978-3-031-51026-7_41
Published: 21 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51025-0
Online ISBN: 978-3-031-51026-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Is ImageNet Always the Best Option? An Overview on Transfer Learning Strategies for Document Layout Analysis