Impact Statement:Impact Statement—Street view text spotting techniques will be increasingly important in intelligent transportation systems (ITS), because they are very beneficial for the...Show More
Abstract:
Urban scenes are full of street entities with sign boards. Therefore, in autonomous driving, street view text spotting techniques will play a significant role in the prec...Show MoreMetadata
Impact Statement:
Impact Statement—Street view text spotting techniques will be increasingly important in intelligent transportation systems (ITS), because they are very beneficial for the understanding of surrounding scenes during driving. However, the performance of existing scene text spotting algorithms still has a large room for improvement. To advance this research, in this work, we propose an MSTD neural network structure for character-level scene text localization and recognition. With a significant recognition performance increase of more than 25% on Chinese scene text datasets, the MSTD provides a better solution for street view scene text spotting. Our proposed technique can be used by digital mapping industries to automatically collect the points of interests along the streets; it can also be used in ITS for the understanding of surrounding scenes during driving.
Abstract:
Urban scenes are full of street entities with sign boards. Therefore, in autonomous driving, street view text spotting techniques will play a significant role in the precise understanding of surrounding scenes during driving, because texts contained in the images usually provide important clues for accurate image understanding, while it is often ambiguous for existing computer vision algorithms to understand scene images without texts. In this work, we propose a Multi-Segmentation network for character-level scene Text Detection (MSTD). The MSTD introduces a densely connected atrous spatial pyramid pooling module to enlarge the receptive field of the feature extraction layer, so as to localize long as well as large-sized text instances. Moreover, it devises a double segmentation subnetwork to utilize two independent but inherently complementary losses to co-optimize the network and increase the reliability of the confidence scores in predicting the text/nontext areas. With the characte...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 3, Issue: 2, April 2022)