Abstract
Realizing real-time and robust crosswalk (zebra crossing) detection in complex scenarios and under limited computing power is one of the important difficulties of current intelligent traffic management systems (ITMS). Limited edge computing capabilities and real complex scenarios such as in cloudy, sunny, rainy, foggy and at night simultaneously challenge this task. In this study, the crosswalk detection network (CDNet) based on YOLOv5 is proposed to achieve fast and accurate crosswalk detection under the vision of the vehicle-mounted camera, and real-time detection is implemented on Jetson nano device. The powerful convolution neural network feature extractor is used to handle complex environments, the squeeze-and-excitation (SE) attention mechanism module is embedded into the network, the negative samples training (NST) method is used to improve the accuracy, the region of interest (ROI) algorithm is utilized to further improve the detection speed, and a novel slide receptive field short-term vector memory (SSVM) algorithm is proposed to improve vehicle-crossing behavior detection accuracy, the synthetic fog augmentation algorithm is used to allow the model adaptable to foggy scenario. Finally, with a detection speed of 33.1 FPS on Jetson nano, we obtained an average F1 score of 94.83% in the above complex scenarios. For better weather condition such as sunny and cloudy days, the F1 score exceeds 98%. This work provides a reference for the specific application of artificial neural network algorithm optimization methods on edge computing devices. The datasets, tutorials and source codes are available on GitHub.









Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Girshick R, et al. (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2014.81
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition, p 1. https://arxiv.org/abs/1409.1556
He K, et al. (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2016.90
Szegedy C, et al. (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2015.7298594
Liu W, et al. (2016) SSD: single shot multibox detector. In: 2016 European conference on computer vision (ECCV). https://doi.org/10.1007/978-3-319-46448-0_2
Huang G, et al. (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2017.243
He KM, et al. (2017) Mask R-CNN. In: 2017 IEEE international conference on computer vision (ICCV). https://doi.org/10.1109/iccv.2017.322
Wang K, et al. (2019) Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00584
Wang JD et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 1(1):1–1. https://doi.org/10.1109/tpami.2020.2983686
Redmon J, et al. (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2016.91
Bochkovskiy A, Wang CY, Liao HYM (2020) YOLOv4: optimal speed and accuracy of object detection, p 1. https://arxiv.org/abs/2004.10934v1
Ultralytics (2020) YOLOv5. 2021–02–01]. https://github.com/ultralytics/yolov5/tree/v4.0. Accessed 01 Feb 2021
Microsoft (2014) COCO dataset. https://cocodataset.org. Accessed 02 Mar 2021
Zhang ZD (2021) The dataset, demo and source code of crosswalk detection network (CDNet). https://github.com/zhangzhengde0225/CDNet. Accessed 02 Mar 2021
Se S (2000) Zebra-crossing detection for the partially sighted. In: 2000 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2000.854787
Uddin MS, Shioyama T (2005) Detection of pedestrian crossing using bipolarity feature-an image-based technique. IEEE Trans Intell Transp Syst 6(4):439–445. https://doi.org/10.1109/TITS.2005.858787
Akinlar C, Topal C (2011) Edlines: real-time line segment detection by Edge Drawing. In: 2011 IEEE international conference on image processing. https://doi.org/10.1109/ICIP.2011.6116138
Mascetti S et al (2016) ZebraRecognizer: pedestrian crossing recognition for people with visual impairment or blindness. Pattern Recogn 60(1):405–419. https://doi.org/10.1016/j.patcog.2016.05.002
Huang X, Lin Q (2017) An improved method of zebra crossing detection based on bipolarity. Comput Appl Softw 34(12):202–205. https://doi.org/10.3969/j.issn.1000-386x.2017.12.038
Chen N, Hong F, Bai B (2019) Zebra crossing recognition method based on edge feature and Hough transform. J Zhejiang Univer Sci Technol 31(06):476–483. https://doi.org/10.3969/j.issn.1671-8798.2019.06.008
Redmon J, Farhadi A (2018) YOLOv3: an Incremental Improvement, p 1. https://arxiv.org/abs/1804.02767
Wang C, et al. (2020) CSPNet: a new backbone that can enhance learning capability of CNN. In: 2020 IEEE conference on computer vision and pattern recognition workshops (CVPRW). https://doi.org/10.1109/CVPRW50498.2020.00203
He K et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824
Liu S, et al. (2018) Path aggregation network for instance segmentation. In: 2018 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2018.00913
Hu J, et al. (2019) Squeeze-and-excitation networks, p 1. https://arxiv.org/abs/1709.01507
He KM, Sun J, Tang XO (2011) Single image haze removal using dark channel prior. IEEE Trans Pattern Anal Mach Intell 33(12):2341–2353. https://doi.org/10.1109/tpami.2010.168
Zhang ZD (2021) The training, testing and validation datasets for crosswalk detection network (CDNet). https://github.com/zhangzhengde0225/CDNet/blob/master/docs/DATASETS.md. Accessed 02 Mar 2021
Stehman SV (1997) Selecting and interpreting measures of thematic classification accuracy. Remote Sens Environ 62(1):77–89. https://doi.org/10.1016/s0034-4257(97)00083-7
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom 21(1):1–1. https://doi.org/10.1186/s12864-019-6413-7
Hussian T et al (2021) Multi-view summarization and activity recognition meet edge computing in IoT environments. IEEE Internet Things J 8(12):9634–9644. https://doi.org/10.1109/JIOT.2020.3027483
Chen K, et al. (2019) MMDetection: open MMLab detection toolbox and benchmark. https://github.com/open-mmlab/mmdetection. Accessed 10 Oct 2021
Ren SQ et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/tpami.2016.2577031
Ge Z, et al. (2021) YOLOX: exceeding YOLO series in 2021, p 1. https://arxiv.org/abs/2107.08430
Liu Z, et al. (2021) Swin transformer: hierarchical vision transformer using shifted windows, p 1. https://arxiv.org/abs/2103.14030
NVIDIA (2019) TensorRT open source software. https://github.com/NVIDIA/TensorRT. Accessed 10 Oct 2021
Acknowledgements
This work was supported by the National Natural Science Foundation of China [Grant Numbers: 61873163]. We also acknowledge the Center for High Performance Computing at Shanghai Jiao Tong University for providing computing resources.
Author information
Authors and Affiliations
Contributions
Zhengde Zhang contributed to conceptualization, methodology, software, formal analysis, data curation, writing—original draft and visualization. Menglu Tan contributed to methodology, data curation, writing—original draft and writing—review and editing. Zhicai Lan contributed to conceptualization, investigation and writing—review and editing. Haichun Liu contributed to validation and investigation. Ling Pei contributed to resources, funding acquisition and methodology. Wenxian Yu contributed to resources, supervision, funding acquisition and project administration.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, ZD., Tan, ML., Lan, ZC. et al. CDNet: a real-time and robust crosswalk detection network on Jetson nano based on YOLOv5. Neural Comput & Applic 34, 10719–10730 (2022). https://doi.org/10.1007/s00521-022-07007-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07007-9