SignParser: An End-to-End Framework for Traffic Sign Understanding

Guo, Yunfei; Feng, Wei; Yin, Fei; Liu, Cheng-Lin

doi:10.1007/s11263-023-01912-9

SignParser: An End-to-End Framework for Traffic Sign Understanding

Published: 17 October 2023

Volume 132, pages 805–821, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Yunfei Guo ORCID: orcid.org/0000-0003-2422-3332^1,2,
Wei Feng^1,2,
Fei Yin^1,2 &
…
Cheng-Lin Liu^1,2

467 Accesses
Explore all metrics

Abstract

In intelligent transportation systems, parsing traffic signs and transmitting traffic information to humans is an urgent need. However, despite the success achieved in the detection and recognition of low-level circular or triangular traffic signs, parsing the more complex and informative rectangular traffic signs remains unexplored and challenging. Our work is devoted to the topic called “Traffic Sign Understanding (TSU)”, which is aimed to parse various traffic signs and generate semantic descriptions for them. To achieve this goal, we propose an end-to-end framework that integrates component detection, content reasoning, and semantic description generation. The component detection module first detects initial components in the sign image. Then the content reasoning module acquires the detailed content of the sign, including final components, their relations, and layout category, which provide local and global information for the subsequent module. In the end, the semantic description generation module mines relational attributes and text semantic attributes from the preceding results, embeds them with the layout categories, and transforms them into semantic descriptions through a dynamic prediction transformer. The three modules are trained jointly in an end-to-end manner for optimizing the overall performance. This method achieves state-of-the-art performance not only in the final semantic description generation stage but also on multiple subtasks of the CASIA-Tencent CTSU Dataset. Abundant ablation experiments are provided to prove the effectiveness of this method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating Road Sign Ground Truth Construction with Knowledge Graph and Machine Learning

Traffic Sign Recognition Framework Using Zero-Shot Learning

Fully Annotated Indian Traffic Signs Database for Recognition

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The CASIA-Tencent CTSU Dataset analysed during the current study is available at http://www.nlpr.ia.ac.cn/databases/CASIA-Tencent CTSU/index.html. The Chinese Traffic Sign Database (CTSDB) analysed during the current study is available at http://www.nlpr.ia.ac.cn/pal/trafficdata/recognition.html.

Notes

The dataset is available at http://www.nlpr.ia.ac.cn/databases/CASIA-TencentCTSU/index.html.

References

Aghdam, H. H., Heravi, E. J., & Puig, D. (2017). A practical and highly optimized convolutional neural network for classifying traffic signs in real-time. International Journal of Computer Vision (IJCV), 122, 246–269.
Article MathSciNet Google Scholar
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., & Xu, J., et al. (2019). Mmdetection: Open mmlab detection toolbox and benchmark. arXiv:1906.07155.
CireAan, D., Meier, U., Masci, J., & Schmidhuber, J. (2012). Multi-column deep neural network for traffic sign classification. Neural Networks, 32, 333–338.
Article Google Scholar
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. (2017). Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision (ICCV), pp. 764–773.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 886–893.
De La Escalera, A., Moreno, L. E., Salichs, M. A., & Armingol, J. M. (1997). Road traffic sign detection and classification. IEEE Transactions on Industrial Electronics (T-IE), 44(6), 848–859.
Gao, D., Li, K., Wang, R., Shan, S., & Chen, X. (2020). Multi-modal graph neural network for joint reasoning on vision and scene text. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 12746–12756.
Gao, C., Zhu, Q., Wang, P., Li, H., Liu, Y., Hengel, A. V. D., & Wu, Q. (2020). Structured multimodal attentions for textvqa. arXiv:2006.00753.
Garcia-Garrido, M. A., Sotelo, M. A., & Martin-Gorostiza, E. (2006). Fast traffic sign detection and recognition under changing lighting conditions, pp. 811–816.
Gonzalez, A., Bergasa, L. M., & Yebes, J. J. (2013). Text detection and recognition on traffic panels from street-level imagery using visual appearance. IEEE Transactions on Intelligent Transportation Systems (T-ITS), 15(1), 228–238.
Guo, Y., Feng, W., Yin, F., Xue, T., Mei, S., & Liu, C.-L. (2021). Learning to understand traffic signs. In Proceedings of the ACM international conference on multimedia (ACM MM), pp. 2076–2084.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (ICCV), pp. 2961–2969.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778.
Hechri, A., Hmida, R., & Mtibaa, A. (2015). Robust road lanes and traffic signs recognition for driver assistance system. International Journal of Computational Science and Engineering, 10(1–2), 202–209.
Article Google Scholar
Hemadri, V. B., & Kulkarni, U. P. (2017). Recognition of traffic sign based on support vector machine and creation of the Indian traffic sign recognition benchmark. In International conference on cognitive computing and information processing (CCIP), pp. 227–238. Springer.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Article CAS PubMed Google Scholar
Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., & Igel, C. (2013). Detection of traffic signs in real-world images: The german traffic sign detection benchmark. In: The International Joint Conference on Neural Networks (IJCNN), pp. 1–8.
Hu, R., Singh, A., Darrell, T., & Rohrbach, M. (2020). Iterative answer prediction with pointer-augmented multimodal transformers for textvqa. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 9992–10002.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980.
Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., et al. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision (IJCV), 123(1), 32–73.
Article MathSciNet Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS), pp. 1097–1105.
Li, S., Kulkarni, G., Berg, T., Berg, A., & Choi, Y. (2011). Composing simple image descriptions using web-scale n-grams. In Proceedings of the conference on computational natural language learning (CoNLL), pp. 220–228.
Li, Y., Ouyang, W., Zhou, B., Wang, K., & Wang, X. (2017). Scene graph generation from objects, phrases and region captions. In Proceedings of the IEEE international conference on computer vision (ICCV), pp. 1261–1270.
Lin, T. -Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (ICCV), pp. 2980–2988.
Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Proceedings of the IEEE international conference on computer vision (ICCV), pp. 1150–1157.
Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In International conference on machine learning (ICML), p. 3.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., & Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library, pp. 8026–8037.
Qi, M., Li, W., Yang, Z., Wang, Y., & Luo, J. (2019). Attentive relational networks for mapping images to scene graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 3957–3966.
Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 39(6), 1137–1149.
Rong, X., Yi, C., & Tian, Y. (2016). Recognizing text-based traffic guide panels with cascaded localization network. In European conference on computer vision workshops (ECCV), LNCS 9913, pp. 109–121.
Sathish, P., & Bharathi, D. (2016) Automatic road sign detection and recognition based on sift feature matching algorithm. In Proceedings of the international conference on soft computing systems (ICSCS), pp. 421–431.
Sidorov, O., Hu, R., Rohrbach, M., & Singh, A. (2020). Textcaps: a dataset for image captioning with reading comprehension. In European conference on computer vision (ECCV), pp. 742–758.
Tang, K., Niu, Y., Huang, J., Shi, J., & Zhang, H. (2020). Unbiased scene graph generation from biased training. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 3716–3725.
Tian, Z., Shen, C., Chen, H., & He, T. (2020). Fcos: A simple and strong anchor-free object detector. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 44(4), 1922–1933.
Ushiku, Y., Yamaguchi, M., Mukuta, Y., & Harada, T. (2015). Common subspace for model and similarity: Phrase learning for caption generation from images. In Proceedings of the IEEE international conference on computer vision (ICCV), pp. 2668–2676.
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2018). Graph attention networks. In International conference on learning representations (ICLR).
Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 3156–3164.
Wang, G., Ren, G., Jiang, L., & Quan, T. (2014). Hole-based traffic sign detection method for traffic signs with red rim. The Visual Computer, 30(5), 539–551.
Article Google Scholar
Woo, S., Kim, D., Cho, D., & Kweon, I. -S. (2018). Linknet: Relational embedding for scene graph, pp. 560–570.
Wu, Y., Liu, Y., Li, J., Liu, H., & Hu, X. (2013). Traffic sign detection based on convolutional neural networks. In The international joint conference on neural networks (IJCNN), pp. 1–7.
Wu, Q., Shen, C., Liu, L., Dick, A., & Van Den Hengel, A. (2016). What value do explicit high level concepts have in vision to language problems? In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 203–212.
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (ICML), pp. 2048–2057.
Xu, D., Zhu, Y., Choy, C. B., & Fei-Fei, L. (2017). Scene graph generation by iterative message passing. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 5410–5419.
Yakimov, P. (2015). Traffic signs detection using tracking with prediction. In International conference on E-Business and telecommunications (ICETE), pp. 454–467.
Yang, J., Lu, J., Lee, S., Batra, D., & Parikh, D. (2018). Graph r-cnn for scene graph generation. In European conference on computer vision (ECCV), pp. 670–685.
Yang, Y., Luo, H., Xu, H., & Wu, F. (2015). Towards real-time traffic sign detection and classification. IEEE Transactions on Intelligent Transportation Systems (T-ITS), 17(7), 2022–2031.
Yang, Y., Teo, C., Daumé III, H., & Aloimonos, Y. (2011). Corpus-guided sentence generation of natural images. In Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp. 444–454.
Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., & Cao, Z. (2016). Scene text detection via holistic, multi-channel prediction. arXiv:1606.09002.
Ye, J. -Y., Zhang, Y. -M., Yang, Q., & Liu, C. -L. (2019). Contextual stroke classification in online handwritten documents with graph attention networks. In 2019 International conference on document analysis and recognition (ICDAR), pp. 993–998.
Yin, F., Wu, Y. -C., Zhang, X. -Y., & Liu, C. -L. (2017). Scene text recognition with sliding convolutional character models. arXiv:1709.01727.
Youssef, A., Albani, D., Nardi, D., Bloisi, D. D. (2016). Fast traffic sign recognition using color segmentation and deep convolutional networks. In International conference on advanced concepts for intelligent vision systems (ACIVS), pp. 205–216.
Zaklouta, F., Stanciulescu, B., & Hamdoun, O. (2011). Traffic sign classification using kd trees and random forests. In The international joint conference on neural networks (IJCNN), pp. 2151–2155.
Zellers, R., Yatskar, M., Thomson, S., & Choi, Y. (2018). Neural motifs: Scene graph parsing with global context. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 5831–5840.
Zhan, Y., Yu, J., Yu, T., & Tao, D. (2020). Multi-task compositional network for visual relationship detection. International Journal of Computer Vision (IJCV), 128(8), 2146–2165.
Article Google Scholar
Zhang, J., Elhoseiny, M., Cohen, S., Chang, W., & Elgammal, A. (2017) Relationship proposal networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 5678–5686.
Zhang, J., Shih, K., Tao, A., Catanzaro, B., & Elgammal, A. (2018). An interpretable model for scene graph generation. arXiv:1811.09543.
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., & Bai, X. (2016). Multi-oriented text detection with fully convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 4159–4167.

Download references

Author information

Authors and Affiliations

State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation of Chinese Academy of Sciences, Beijing, 100190, China
Yunfei Guo, Wei Feng, Fei Yin & Cheng-Lin Liu
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
Yunfei Guo, Wei Feng, Fei Yin & Cheng-Lin Liu

Authors

Yunfei Guo
View author publications
You can also search for this author in PubMed Google Scholar
Wei Feng
View author publications
You can also search for this author in PubMed Google Scholar
Fei Yin
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Lin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yunfei Guo.

Additional information

Communicated by Dimosthenis Karatzas.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Figs. 12 and 13

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Guo, Y., Feng, W., Yin, F. et al. SignParser: An End-to-End Framework for Traffic Sign Understanding. Int J Comput Vis 132, 805–821 (2024). https://doi.org/10.1007/s11263-023-01912-9

Download citation

Received: 23 May 2022
Accepted: 15 September 2023
Published: 17 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11263-023-01912-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SignParser: An End-to-End Framework for Traffic Sign Understanding

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Accelerating Road Sign Ground Truth Construction with Knowledge Graph and Machine Learning

Traffic Sign Recognition Framework Using Zero-Shot Learning

Fully Annotated Indian Traffic Signs Database for Recognition

Data Availability

Notes

References