Abstract
Based on Bag of Visual Words (BoVWs) model, this paper proposes a novel method using an integrated feature to detect sign text in the street view images. BRISK features are first extracted from the street view images for dictionary learning. The Self-Growing and Self-Organized Neural Gas (SGONG) network is then used to cluster adaptively the extracted BRISK descriptors for generating visual words. The histogram of visual words is further calculated to form the appearance feature of the sign text. For eliminating the color differences and further highlighting the histogram similarity of all colors of signs, a color invariant histogram, called CIHS histogram, is presented to represent the color information of the sign text. By integrating the visual words histograms and CIHS histograms, an integrated descriptor, called Appearance and Color (A&C) descriptor, is specifically designed as the input features for cascade-Adaboost classifier. In the multi-scale sliding window text sign detection, integral image is applied to the spatial distribution map of each visual word for avoiding repeated extraction of features. Experimental results demonstrate that the proposed method outperforms the state-of-the-art methods and the detectors with the traditional descriptors.










Similar content being viewed by others
References
Alahi Alexandre, Ortiz Raphael, Vandergheynst Pierre (2012) Freak: fast retina keypoint. Proc Comput IEEE Conf Vision Pattern Recogn (CVPR): 510–517. doi:https://doi.org/10.1109/CVPR.2012.6247715
Atsalakis A, Papamarkos N (2006) Color reduction and estimation of the number of dominant colors by using a self-growing and self-organized neural gas. Eng Appl Artif Intell 19(7):769–786
Bai X, Yao C, Liu W (2016) Strokelets: a learned multi-scale mid-level representation for scene text recognition. IEEE Trans Image Process 25(6):2789–2802. https://doi.org/10.1109/TIP.2016.2555080
Bai X, Yao C, Liu W (2016) Strokelets: a learned multi-scale mid-level representation for scene text recognition. IEEE Transaction on Image Processing 25(6):2789–2802. https://doi.org/10.1109/TIP.2016.2555080
Calonder M, Lepetit V, Strecha C et al (2010) Brief: binary robust independent elementary features. Proc Eur Conf Comput Vision (ECCV) 6314:778–792. https://doi.org/10.1007/978-3-642-15561-1_56
Chen Guan-Jhih, Chang I-Cheng, Yeh Hung-Yu (2017) Action segmentation based on bag-of-visual-words models. In: Proceedings of 10th International Conference on Ubi-media Computing and Workshops (Ubi-Media). pp. 1-5
Cheng W-C, Jhan D-M (2013) A self-constructing cascade classifier with AdaBoost and SVM for pedestrian detection. Eng Appl Artif Intell 26(3):1016–1028. https://doi.org/10.1016/j.engappai.2012.08.013
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. Proc IEEE Conf Comput Vision Pattern Recogn (CVPR): 2963–2970. doi:https://doi.org/10.1109/CVPR.2010.5540041
Fang S, Xie H, Chen Z (2017) Detecting Uyghur text in complex background images with convolutional neural network. Multimed Tools Appl 76:15083–15103. https://doi.org/10.1007/s11042-017-4538-8
Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. Proc IEEE Conf Comput Vision Pattern Recogn (CVPR): 524–531. doi:https://doi.org/10.1109/CVPR.2005.16
González Á, Bergasa LM, Javier Yebes J (2014) Text detection and recognition on traffic panels from street level imagery using visual appearance. IEEE Trans Intell Transp Syst 15(1):228–238. https://doi.org/10.1109/TITS.2013.2277662
Greenhalgh J, Mirmehdi M (2015) Recognizing text-based traffic signs. IEEE Trans Intell Transp Syst 16(3):1360–1369
He T, Huang W, Yu Q et al (2016) Accurate text localization in natural image with cascaded convolutional text network. ArXiv Preprint ArXiv 1603(09423):1–10
He T, Huang W, Qiao Y, Yao J (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541
Jagannathan S, Desappan K, Swami P et al. (2017) Efficient object detection and classification on low power embedded systems. Proc 2017 I.E. Int Conf Consumer Electonics (ICCE): 233–234
Juneja M, A. Vedaldi, C.V. Jawahar, et al. (2013) Blocks that shout: distinctive parts for scene classification. Proc IEEE Conf Comput Vision Pattern Recogn (CVPR): 923–930. doi:https://doi.org/10.1109/CVPR.2013.124
Karatzas Dimosthenis, Shafait Faisal, Uchida Seiichi et al. (2013) ICDAR 2013 robust reading competition. 12th Int Conf Doc Anal Recogn: 1484-1493. doi:https://doi.org/10.1109/ICDAR.2013.221
Karatzas D, Gomez-Bigorda L, Nicolaou A et al. (2015) ICDAR 2015 competition on robust reading. 13th Int Conf Doc Anal Recogn (ICDAR): 1156–1160. doi:https://doi.org/10.1109/ICDAR.2015.7333942
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Proc 2012 Neural Inform Process Syst (NIPS): 1097–1105
Lee JJ, Lee PH, Lee SW et al (2011) Adaboost for text detection in natural scene. 2011 Int Conf Doc Anal Recogn: 429–434. doi: https://doi.org/10.1109/ICDAR.2011.93
Leutenegger S, Chli M, Siegwart RY (2011) Brisk: binary robust invariant scalable keypoints. Proc IEEE Int Conf Comput Vision (ICCV): 2548–2555. doi:https://doi.org/10.1109/ICCV.2011.6126542
Lim JJ, Zitnick CL, Dollár P (2013) Sketch tokens: a learned midlevel representation for contour and object detection. Proc IEEE Conf Comput Vision Pattern Recogn (CVPR): 3158–3165. doi:https://doi.org/10.1109/CVPR.2013.406
Liu Z, Li Y, Qi X (2017) Method for unconstrained text detection in natural scene image. IET Comput Vis 11(7):596–604. https://doi.org/10.1049/iet-cvi.2016.0452
Lu S, Chen T, Tian S, Lim JH, Tan CL (2015) Scene text extraction based on edges and support vector regression. Int J Doc Anal Recognit IJDAR 18:125–135. https://doi.org/10.1007/s10032-015-0237-z
Merino-Gracia C, Lenc K, Mirmehdi M (2011) A head-mounted device for recognizing text in natural scenes. Int Workshop Camera-Based Doc Anal Recogn (IWCDAR): 29–41. doi:https://doi.org/10.1007/978-3-642-29364-1_3
Mogelmose A, Trivedi MM, Moeslund TB (2012) Vision-based traffic sign detection and analysis for intelligent driver assistance systems: perspectives and survey. IEEE Trans Intell Transp Syst 13:1484–1497. https://doi.org/10.1109/TITS.2012.2209421
Neumann L, Matas J (2013) Scene text localization and recognition with oriented stroke detection. Proc ICCV:97–104. https://doi.org/10.1109/ICCV.2013.19
Neycharan JG, Ahmadyfard A (2017) Edge color transform: a new operator for natural scene text localization. Multimed Tools Appl 77:7615–7636. https://doi.org/10.1007/s11042-017-4663-4
Noble FK (2016) Comparison of OpenCV's feature detectors and feature matchers. Proc 23rd Int Conf Mechatron Machine Vision Pract (M2VIP): 1–6. doi:https://doi.org/10.1109/M2VIP.2016.7827292
Papadopoulos DP, Kalogeiton VS, Chatzichristofis SA, Papamarkos N (2013) Automatic summarization and annotation of videos with lack of metadata information. Expert Syst Appl 40(14):5765–5778
Rublee E, Rabaud V, Konolige K et al (2011) Orb: an efficient alternative to sift or surf. Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2564–2571. doi:https://doi.org/10.1109/ICCV.2011.6126544
Shahab A, Shafait F, Dengel A (2011) ICDAR 2011 robust reading competition challenge 2: reading text in scene images. 2011 Int Conf Doc Anal Recogn (ICDAR): 1491–1496. doi:https://doi.org/10.1109/ICDAR.2011.296
Shivakumara P, Phan TQ, Tan CL (2011) A laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419. https://doi.org/10.1109/TPAMI.2010.166
Stergiopoulou E, Papamarkos N (2009) Hand gesture recognition using a neural network shape fitting technique. Eng Appl Artif Intell 22(8):1141–1158
Umakanthan S, Denman S, Fookes C, Sridharan S (2013) Semi-binary based video features for activity representation. In: Proceedings of 2013 international conference on digital image computing: techniques and applications (DICTA): 1–7. doi:https://doi.org/10.1109/DICTA.2013.6691527
Viola P, Jones MJ, Snow D (2003) Detecting pedestrians using patterns of motion and appearance. Proceedings of Ninth IEEE International Conference On Computer Vision 2:734–741
Wang Kai, Babenko Boris, Belongie S (2011) End-to-end scene text recognition. Proc 2011 Int Conf Comput Vision (ICCV): 1457–1464. doi:https://doi.org/10.1109/ICCV.2011.6126402
Wang X, Wang B, Bai X et al (2013) Max-margin multiple instance dictionary learning. Proceedings of conference on machine learning(ICML): 846–854. http://dblp.unitrier.de/db/conf/icml/icml2013.html#WangWBLT13
Yang J, Yu K, Gong Y et al. (2009) Linear spatial pyramid matching using sparse coding for image classification. Proceedings of IEEE conference on computer vision and pattern recognition (CVPR 2009): 1794–1801. doi:https://doi.org/10.1109/CVPR.2009.5206757
Yao Cong, Bai Xiang, Liu Wenyu et al. (2012) Detecting texts of arbitrary orientations in natural images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 1083–1090
Yao C, Bai X, Liu W (2014) A unified framework for multi-oriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4737. https://doi.org/10.1109/TIP.2014.2353813
Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500. https://doi.org/10.1109/TPAMI.2014.2366765
Yin X-C, Yin X, Huang K, Hao H-W (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
Yin X-C, Pei W-Y, Zhang J, Hao H-W (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37(9):1930–1937. https://doi.org/10.1109/TPAMI.2014.2388210
Yuan J, Wei B, Liu Y et al (2015) A method for text line detection in natural images. Multimed Tools Appl 74:859–884. https://doi.org/10.1007/s11042-013-1702-7
Zhao X, Lin KH, Fu Y et al (2012) Text from corners: a novel approach to detect text and caption in videos. IEEE Trans Image Process 20(3):201–205. https://doi.org/10.1109/TIP.2010.2068553
Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: recent advances and future trends. Front Comput Sci 10(1):19–36. https://doi.org/10.1007/s11704-015-4488-0
Choi S, Han SW (2014) New binary descriptors based on BRISK sampling pattern for image retrieval. In: Proceedings of 2014 International Conference on Information and Communication Technology Convergence (ICTC), pp. 575–576. doi:https://doi.org/10.1109/ICTC.2014.6983215
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.61671376 and 61671374.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhao, F., Yang, Y., Zhang, Hy. et al. Sign text detection in street view images using an integrated feature. Multimed Tools Appl 77, 28049–28076 (2018). https://doi.org/10.1007/s11042-018-5975-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5975-8