Abstract:
Scene text detection methods have achieved significant progresses. However, stack-omnidirectional text dilemma, under-segmentation of very close text words, and over-segm...Show MoreMetadata
Abstract:
Scene text detection methods have achieved significant progresses. However, stack-omnidirectional text dilemma, under-segmentation of very close text words, and over-segmentation of arbitrary-shape long text lines, are still main challenges. Motivated by these problems, we proposed a two stage method called omnidirectional pyramid mask proposal text detector (OPMP). OPMP removes anchor mechanism that requires heuristic non-maximum suppress processing. Instead, it uses an effective pyramid lengthwise and sidewise residual sequence modeling method to produce arbitrary-shape proposals. To accurately extract the features of text shape, OPMP enhances the backbone layers by a multiple arbitrary-shape fitting mechanism. Finally, a multi-grain text classification module is proposed, which reclassifies each text region robustly. Comprehensive ablation studies demonstrate the effectiveness of each proposed component. In addition, experiments on various benchmarks, including ICDAR2015, MLT, MSRA-TD500, CTW1500, and Total-text, show that our method outperforms previous state-of-the-art methods.
Published in: IEEE Transactions on Multimedia ( Volume: 23)