Authors:
Heba Hassan
1
;
Marwan Torki
2
and
Mohamed E. Hussein
2
;
3
Affiliations:
1
Dept. of Computer Science and Engineering, Egypt-Japan University of Science and Technology, Egypt
;
2
Dept. of Computer and Systems Engineering, Alexandria University, Egypt
;
3
Information Sciences Institute, University of Southern California, U.S.A.
Keyword(s):
Text Recognition, Multi-task Learning.
Abstract:
Text recognition continues to be a challenging problem in the context of text reading in natural scenes. Bearing in mind the sequential nature of text, the problem is usually posed as a sequence prediction problem from a whole-word image. Alternatively, it can also be posed as a character prediction problem. The latter approach is typically more robust to challenging word shapes. Attempting to find the sweet spot that attains the best of the two approaches, we propose Sequence-Character Aware Network (SCAN). SCAN starts by locating and recognizing the characters, and then generates the word using a sequence-based approach. It comprises two modules: a semantic-segmentation-based character prediction, and an encoder-decoder network for word generation. The training is done over two stages. In the first stage, we adopt a multi-task training technique with both character-level and word-level losses and trainable loss weighting. In the second stage, the character-level loss is removed, en
abling the use of data with only word-level annotations. Experiments are conducted on several datasets for both regular and irregular text, showing state of the art performance of the proposed approach. It also shows that the proposed approach is robust against noisy word detection.
(More)