Authors:
Adeline Granet
;
Emmanuel Morin
;
Harold Mouchère
;
Solen Quiniou
and
Christian Viard-Gaudin
Affiliation:
Université de Nantes, France
Keyword(s):
Handwriting Recognition, Historical Document, Transfer Learning, Deep Neural Network, Unlabeled Data.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Cardiovascular Imaging and Cardiography
;
Cardiovascular Technologies
;
Computer Vision, Visualization and Computer Graphics
;
Health Engineering and Technology Applications
;
Image Understanding
;
Pattern Recognition
;
Signal Processing
;
Software Engineering
Abstract:
In this work, we investigate handwriting recognition on new historical handwritten documents using transfer
learning. Establishing a manual ground-truth of a new collection of handwritten documents is time consuming
but needed to train and to test recognition systems. We want to implement a recognition system without
performing this annotation step. Our research deals with transfer learning from heterogeneous datasets with a
ground-truth and sharing common properties with a new dataset that has no ground-truth. The main difficulties
of transfer learning lie in changes in the writing style, the vocabulary, and the named entities over centuries
and datasets. In our experiment, we show how a CNN-BLSTM-CTC neural network behaves, for the task
of transcribing handwritten titles of plays of the Italian Comedy, when trained on combinations of various
datasets such as RIMES, Georges Washington, and Los Esposalles. We show that the choice of the training
datasets and the merging methods are d
eterminant to the results of the transfer learning task.
(More)