research-article

Learning to Understand Traffic Signs

Authors:

Cheng-Lin LiuAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 2076 - 2084

https://doi.org/10.1145/3474085.3475362

Published: 17 October 2021 Publication History

Abstract

One of the intelligent transportation system's critical tasks is to understand traffic signs and convey traffic information to humans. However, most related works are focused on the detection and recognition of traffic sign texts or symbols, which is not sufficient for understanding. Besides, there has been no public dataset for traffic sign understanding research. Our work takes the first step towards addressing this problem. First, we propose a "CASIA-Tencent Chinese Traffic Sign Understanding Dataset" (CTSU Dataset), which contains 5000 images of traffic signs with rich semantic descriptions. Second, we introduce a novel multi-task learning architecture that extracts text and symbol information from traffic signs, reasons the relationship between texts and symbols, classifies signs into different categories, and finally, composes the descriptions of the signs. Experiments show that the task of traffic sign understanding is achievable, and our architecture demonstrates state-of-the-art and superior performance. The CTSU Dataset is available at http://www.nlpr.ia.ac.cn/databases/CASIA-Tencent%20CTSU/index.html.

References

[1]

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65--72.

[2]

Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, et al. 2019. MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019).

[3]

Yajie Chen and Linlin Huang. 2016. Chinese Traffic Panels Detection and Recognition From Street-Level Images. In MATEC Web of Conferences, Vol. 42. 06001.

[4]

Bo Dai, Yuqi Zhang, and Dahua Lin. 2017b. Detecting visual relationships with deep relational networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3076--3086.

[5]

Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017a. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 764--773.

[6]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2961--2969.

[7]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.

[8]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

Digital Library

[9]

Jie-Bo Hou, Xiaobin Zhu, Chang Liu, Chun Yang, Long-Huang Wu, Hongfa Wang, and Xu-Cheng Yin. 2020. Detecting text in scene and traffic guide panels with attention anchor mechanism. IEEE Transactions on Intelligent Transportation Systems (TITS) (2020).

[10]

Lei Ke, Wenjie Pei, Ruiyu Li, Xiaoyong Shen, and Yu-Wing Tai. 2019. Reflective decoding network for image captioning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 8888--8897.

[11]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision (IJCV), Vol. 123, 1 (2017), 32--73.

Digital Library

[12]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems (NIPS), Vol. 25 (2012), 1097--1105.

Digital Library

[13]

Siming Li, Girish Kulkarni, Tamara Berg, Alexander Berg, and Yejin Choi. 2011. Composing simple image descriptions using web-scale n-grams. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL). 220--228.

Digital Library

[14]

Yikang Li, Wanli Ouyang, Bolei Zhou, Jianping Shi, Chao Zhang, and Xiaogang Wang. 2018. Factorizable net: an efficient subgraph-based framework for scene graph generation. In European Conference on Computer Vision (ECCV). 335--351.

[15]

Yikang Li, Wanli Ouyang, Bolei Zhou, Kun Wang, and Xiaogang Wang. 2017. Scene graph generation from objects, phrases and region captions. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 1261--1270.

[16]

Chin-Yew Lin and Franz Josef Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the Association for Computational Linguistics (ACL). 605--612.

Digital Library

[17]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2980--2988.

[18]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV). 740--755.

[19]

Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In International conference on machine learning (ICML), Vol. 30. 3.

[20]

Lukávs Neumann and Jivr 'i Matas. 2012. Real-time scene text localization and recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3538--3545.

Digital Library

[21]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the Association for Computational Linguistics (ACL). 311--318.

Digital Library

[22]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems (NIPS), Vol. 32, 8026--8037.

Digital Library

[23]

X. Peng, X. Chen, and C. Liu. 2020. Real-time Traffic Sign Text Detection Based on Deep Learning. IOP Conference Series: Materials Science and Engineering (MSE), Vol. 768, 7 (2020), 072039 (8pp).

[24]

A Vázquez Reina, RJ López Sastre, S Lafuente Arroyo, and P Gil Jiménez. 2006. Adaptive traffic road sign panels text extraction. In Proceedings of the WSEAS International Conference on Signal Processing, Robotics and Automation (ISPRA). 295--300.

Digital Library

[25]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 39, 6 (2016), 1137--1149.

Digital Library

[26]

Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2012. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural networks, Vol. 32 (2012), 323--332.

Digital Library

[27]

Domen Tabernik and Danijel Skovc aj. 2019. Deep learning for large-scale traffic-sign detection and recognition. IEEE Transactions on Intelligent Transportation Systems (TITS), Vol. 21, 4 (2019), 1427--1440.

[28]

Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased scene graph generation from biased training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3716--3725.

[29]

Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 9627--9636.

[30]

Yoshitaka Ushiku, Masataka Yamaguchi, Yusuke Mukuta, and Tatsuya Harada. 2015. Common subspace for model and similarity: Phrase learning for caption generation from images. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2668--2676.

Digital Library

[31]

Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4566--4575.

[32]

Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. International Conference on Learning Representations (ICLR) (2018).

[33]

Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3156--3164.

[34]

Shiyuan Wang, Linlin Huang, and Jian Hu. 2018. Text line detection from rectangle traffic panels of natural scene. In Journal of Physics: Conference Series (JPCS), Vol. 960. 012038.

[35]

Sanghyun Woo, Dahun Kim, Donghyeon Cho, and In-So Kweon. 2018. LinkNet: Relational Embedding for Scene Graph. Advances in Neural Information Processing Systems (NIPS), 560--570.

Digital Library

[36]

Qi Wu, Chunhua Shen, Lingqiao Liu, Anthony Dick, and Anton Van Den Hengel. 2016. What value do explicit high level concepts have in vision to language problems?. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 203--212.

[37]

Wen Wu, Xilin Chen, and Jie Yang. 2005. Detection of text on road signs from video. IEEE Transactions on Intelligent Transportation Systems (TITS), Vol. 6, 4 (2005), 378--390.

Digital Library

[38]

Danfei Xu, Yuke Zhu, Christopher B Choy, and Li Fei-Fei. 2017. Scene graph generation by iterative message passing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5410--5419.

[39]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (ICML). 2048--2057.

Digital Library

[40]

Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, and Devi Parikh. 2018. Graph r-cnn for scene graph generation. In European Conference on Computer Vision (ECCV). 670--685.

[41]

Yi Yang, Hengliang Luo, Huarong Xu, and Fuchao Wu. 2015. Towards real-time traffic sign detection and classification. IEEE Transactions on Intelligent Transportation Systems (TITS), Vol. 17, 7 (2015), 2022--2031.

Digital Library

[42]

Yezhou Yang, Ching Teo, Hal Daumé III, and Yiannis Aloimonos. 2011. Corpus-guided sentence generation of natural images. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 444--454.

Digital Library

[43]

Fei Yin, Yi-Chao Wu, Xu-Yao Zhang, and Cheng-Lin Liu. 2017. Scene text recognition with sliding convolutional character models. arXiv preprint arXiv:1709.01727 (2017).

[44]

Rowan Zellers, Mark Yatskar, Sam Thomson, and Yejin Choi. 2018. Neural motifs: Scene graph parsing with global context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5831--5840.

[45]

Ji Zhang, Mohamed Elhoseiny, Scott Cohen, Walter Chang, and Ahmed Elgammal. 2017. Relationship proposal networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5678--5686.

[46]

Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. East: an efficient and accurate scene text detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5551--5560.

Cited By

Zhu AXiao KZhou BWang RCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Trust Prophet or Not? Taking a Further Verification Step toward Accurate Scene Text RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681551(1741-1750)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681551
Yang CZhuang KChen MMa HHan XHan TGuo CHan HZhao BWang Q(2024)Traffic Sign Interpretation via Natural Language DescriptionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.342861925:11(18939-18953)Online publication date: Nov-2024
https://doi.org/10.1109/TITS.2024.3428619
Lv XWang QYu CJin H(2023)A Feedback-Driven DNN Inference Acceleration System for Edge-Assisted Video AnalyticsIEEE Transactions on Computers10.1109/TC.2023.327509472:10(2902-2912)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TC.2023.3275094
Show More Cited By

Index Terms

Learning to Understand Traffic Signs
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
        Scene understanding
    2. Knowledge representation and reasoning
      1. Spatial and physical reasoning

Recommendations

Detection of U.S. Traffic Signs
This paper presents a comprehensive research study of the detection of U.S. traffic signs. Until now, the research in Traffic Sign Recognition systems has been centered on European traffic signs, but signs can look very different across different parts of ...
SignParser: An End-to-End Framework for Traffic Sign Understanding
Abstract
In intelligent transportation systems, parsing traffic signs and transmitting traffic information to humans is an urgent need. However, despite the success achieved in the detection and recognition of low-level circular or triangular traffic signs,...
Traffic shaping and bandwidth allocation algorithms for vbr traffic

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key Research and Development Program Grant
NSFC

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
360
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)5

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhu AXiao KZhou BWang RCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Trust Prophet or Not? Taking a Further Verification Step toward Accurate Scene Text RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681551(1741-1750)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681551
Yang CZhuang KChen MMa HHan XHan TGuo CHan HZhao BWang Q(2024)Traffic Sign Interpretation via Natural Language DescriptionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.342861925:11(18939-18953)Online publication date: Nov-2024
https://doi.org/10.1109/TITS.2024.3428619
Lv XWang QYu CJin H(2023)A Feedback-Driven DNN Inference Acceleration System for Edge-Assisted Video AnalyticsIEEE Transactions on Computers10.1109/TC.2023.327509472:10(2902-2912)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TC.2023.3275094
Strelnikoff SXu JAshari A(2023)Towards the Semantic Interpretation of Arbitrary Traffic Signs: Semantic Parsing for Action-oriented Signs2023 International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA58977.2023.00269(1771-1777)Online publication date: 15-Dec-2023
https://doi.org/10.1109/ICMLA58977.2023.00269
Guo YYin FLi XYan XXue TMei SLiu C(2023)Visual Traffic Knowledge Graph Generation from Scene Images2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01975(21547-21556)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.01975
Guo YFeng WYin FLiu C(2023)SignParser: An End-to-End Framework for Traffic Sign UnderstandingInternational Journal of Computer Vision10.1007/s11263-023-01912-9132:3(805-821)Online publication date: 17-Oct-2023
https://doi.org/10.1007/s11263-023-01912-9
Greer RIsa JDeo NRangesh ATrivedi M(2022)On Salience-Sensitive Sign Classification in Autonomous Vehicle Path Planning: Experimental Explorations with a Novel Dataset2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)10.1109/WACVW54805.2022.00070(636-644)Online publication date: Jan-2022
https://doi.org/10.1109/WACVW54805.2022.00070

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten