Abstract
Semantic segmentation models have shown impressive performance in the context of historical document layout analysis, but their effectiveness is reliant on having access to a large number of high-quality annotated images for training. A popular approach to address the lack of training data in other domains is to rely on transfer learning to transfer the knowledge learned from a large-scale, general-purpose dataset (e.g. ImageNet) to a domain-specific task. However, this approach has been shown to lead to unsatisfactory results when the target task is completely unrelated to the data employed for the pre-training process, which is the case when working on document layout analysis. For this reason, in the present paper, we provide an overview of domain-specific transfer learning for document layout segmentation. In particular, we show how relying on document-related images for the pre-training process leads to consistently improved performance and faster convergence compared to training from scratch or even relying on a large, general purpose, dataset such as ImageNet.
A. De Nardin and S. Zottin—Equally contributed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andrist, P.: Toward a definition of paratexts and paratextuality: the case of ancient Greek manuscripts, pp. 130–150. De Gruyter, Berlin, Boston (2018). https://doi.org/10.1515/9783110603477-010
Brodzicki, A., Piekarski, M., Kucharski, D., Jaworek-Korjakowska, J., Gorgon, M.: Transfer learning methods as a new approach in computer vision tasks with small datasets. Found. Comput. Decision Sci. 45(3), 179–193 (2020). https://doi.org/10.2478/fcds-2020-0010
Bukhari, S.S., Breuel, T.M., Asi, A., El-Sana, J.: Layout analysis for Arabic historical document images using machine learning. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 639–644 (2012). https://doi.org/10.1109/ICFHR.2012.227
Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587 (2017)
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018, pp. 833–851. Springer International Publishing, Cham (2018)
De Nardin, A., Zottin, S., Piciarelli, C., Colombi, E., Foresti, G.L.: Few-shot pixel-precise document layout segmentation via dynamic instance generation and local thresholding. Int. J. Neural Syst. 33(10), 2350052 (2023). https://doi.org/10.1142/S0129065723500521, PMID: 37567858
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Droby, A., Barakat, B.K., Madi, B., Alaasam, R., El-Sana, J.: Unsupervised deep learning for handwritten page segmentation. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 240–245. Dortmund, Germany (2020). https://doi.org/10.1109/ICFHR2020.2020.00052
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88, 303–338 (2010)
Iakubovskii, P.: Segmentation models pytorch (2019). https://github.com/qubvel/segmentation_models.pytorch
Kornblith, S., Shlens, J., Le, Q.V.: Do better imagenet models transfer better? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Lin, T.Y., et al.: Microsoft coco: Common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014, pp. 740–755. Springer International Publishing, Cham (2014)
Simistira, F., Seuret, M., Eichenberger, N., Garz, A., Liwicki, M., Ingold, R.: Diva-hisdb: a precisely annotated large dataset of challenging medieval manuscripts. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 471–476. Shenzen, China (2016). https://doi.org/10.1109/ICFHR.2016.0093
Studer, L., et al.: A comprehensive study of imagenet pre-training for historical document image analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 720–725 (2019). https://doi.org/10.1109/ICDAR.2019.00120
Tarride, S., Lemaitre, A., Coüasnon, B., Tardivel, S.: Combination of deep neural networks and logical rules for record segmentation in historical handwritten registers using few examples. Int. J. Doc. Anal. Recogn. (IJDAR) 24(1), 77–96 (2021). https://doi.org/10.1007/s10032-021-00362-8
Zhuang, F., et al.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76 (2021). https://doi.org/10.1109/JPROC.2020.3004555
Acknowledgements
Partial financial support was received from Piano Nazionale di Ripresa e Resilienza (PNRR) DD 3277 del 30 dicembre 2021 (PNRR Missione 4, Componente 2, Investimento 1.5) - Interconnected Nord-Est Innovation Ecosystem (iNEST).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
De Nardin, A., Zottin, S., Colombi, E., Piciarelli, C., Foresti, G.L. (2024). Is ImageNet Always the Best Option? An Overview on Transfer Learning Strategies for Document Layout Analysis. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing - ICIAP 2023 Workshops. ICIAP 2023. Lecture Notes in Computer Science, vol 14366. Springer, Cham. https://doi.org/10.1007/978-3-031-51026-7_41
Download citation
DOI: https://doi.org/10.1007/978-3-031-51026-7_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51025-0
Online ISBN: 978-3-031-51026-7
eBook Packages: Computer ScienceComputer Science (R0)