Abstract
Railroad inspections to identify missing track components are crucial to railroad operational safety. This paper presents a new lightweight computer vision model on edge devices for accurate, real-time rail track inspection. It modifies the teacher–student guidance mechanism in NanoDet (https://github.com/RangiLyu/nanodet) by introducing a new adaptively weighted loss (AWL) to the training process. The AWL evaluates the teacher and student model qualities, determines the weight of the student loss, and then balances their loss contributions on-the-fly, gearing the training process toward proper knowledge distillation and guidance. Compared to SOTA models, our AWL-NanoDet features a tiny model size of less than 10 MB and a computation cost of 1.52 G FLOPs and achieves an processing time of less than 14 ms per frame when tested on Nvidia’s AGX Orin. Relative to native NanoDet, it also notably improves the model’s performance by nearly 10%, enabling highly accurate, real-time detection of track components.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
RangiLyu., NanoDet, https://github.com/RangiLyu/nanodet
FRA, Train accidents by cause form (form FRA F 6180.54). https://safetydata.fra.dot.gov/OfficeofSafety/publicsite/Query/inccaus.aspx
LeCun Y et al (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1:541–551
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60:84–90
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In IEEE conference on computer vision and pattern recognition, pp 770–778
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE conference on computer vision and pattern recognition, pp 580–587
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition, pp 779–788
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: European conference on computer vision, pp 734–750
Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: IEEE/CVF international conference on computer visio, pp 9627–9636
Ren S, He K, Girshick R, and Sun J, (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst, 28
Howard AG et al. 2017 Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint
Szegedy G et al. (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360
Zhang X, Zhou X, Lin M, and Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In IEEE conference on computer vision and pattern recognition, pp 6848–6856
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint
Nguyen CH, Nguyen TC, Tang TN, Phan NL (2022) Improving object detection by label assignment distillation. In: IEEE/CVF winter conference on applications of computer vision, pp 1005–1014
Vijayalakshmi G, Gayathri J, Senthilkumar K, Kalanandhini G, Aravind A (2022) A smart rail track inspection system. In: AIP conference proceedings p 1
Hashmi MSA et al (2022) Railway track inspection using deep learning based on audio to spectrogram conversion: an on-the-fly approach. Sensors 22:1983
Zhou W, Hong J (2023) FHENet: lightweight feature hierarchical exploration network for real-time rail surface defect inspection in RGB-D images. IEEE Trans Instrum Meas 72:1–8
Yang T, Liu Y, Huang Y, Liu J, Wang S (2023) Symmetry-driven unsupervised abnormal object detection for railway inspection. IEEE Trans Ind Inform. https://doi.org/10.1109/TII.2023.3246995
Guo F, Qian Y, Shi Y (2021) Real-time railroad track components inspection based on the improved YOLOv4 framework. Autom Constr 125:103596
Cha Y-J, Choi W, Suh G, Mahmoudkhani S, Büyüköztürk O (2018) Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types. Comput-Aided Civ Infrastruct Eng 33:731–747
Guo F, Qian Y, Wu Y, Leng Z, Yu H (2021) Automatic railroad track components inspection using real-time instance segmentation. Comput-Aided Civ Infrastruct Eng 36:362–377
Zhang C, Chang C, Jamshidi M (2020) Concrete bridge surface damage detection using a single-stage detector. Comput-Aided Civ Infrastruct Eng 35:389–409
Li S, Zhao X, Zhou G (2019) Automatic pixel-level multiple damage detection of concrete structure using fully convolutional network. Comput-Aided Civ Infrastruct Eng 34:616–634
Zheng J et al (2023) An inspection method of rail head surface defect via bimodal structured light sensors. Int J Mach Learn Cybern 14:1903–1920
Liang X (2019) Image-based post-disaster inspection of reinforced concrete bridge systems using deep learning with Bayesian optimization. Comput-Aided Civ Infrastruct Eng 34:415–430
Guo F, Qian Y, Yu H (2023) Automatic rail surface defect inspection using the pixel-wise semantic segmentation model. IEEE Sens J. https://doi.org/10.1109/JSEN.2023.3280117
Wei D, Wei X, Tang Q, Jia L, Yin X, Ji Y (2023) RTLSeg: A novel multi-component inspection network for railway track line based on instance segmentation. Eng Appl Artif Intell 119:105822
Feng H, Jiang Z, Xie F, Yang P, Shi J, Chen L (2013) Automatic fastener classification and defect detection in vision-based railway inspection systems. IEEE Trans Instrum Meas 63:877–888
“COCO,” [Online] Available: https://cocodataset.org/#home
“‘ImageNet,” [Online]. Available: https://www.image-net.org
“‘TCIS,” [Online]. Available: https://www.ensco.com/rail/track-component-imaging-system-tcis
Bochkovskiy A, Wang C-Y, and Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Tan M, Pang R, and Le QV (2020) Efficientdet: scalable and efficient object detection. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781–10790
Acknowledgements
This research is partially funded by the Federal Railroad Administration (FRA), Contract No. 693JJ621C000011. The images used in this study are from FRA’s Track Component Imaging System. Mr. Cameron Stuart from FRA has provided essential guidance and insight during the system development. The opinions expressed in this article are solely those of the authors and do not represent the opinions of the funding agency.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
We use images in Fig. 10 as an example to illustrate how the histogram is calculated. We first divided each image into a 4 × 4 grid, and each cell within the grid is assigned an index number, i.e., the cell index, indicating the relative position of each cell in the image. For example, in Fig. 10, the cell index ranges from 0 to 15, corresponding to 16 cells in total. In Image A in Fig. 10, the object (i.e., a “spike”) appears in cells with the index numbers 2, 10, and 14, and in Image B, the “spike” is present in cells numbered 5, 6, 9, and 10. Combining the statistics in both Image A and B, the “spike” is present once in cells numbered 2, 5, 6, 9, and 14 and twice in the cell numbered 10. The count of the object occurrence translates to the histogram in Fig. 10c, in which the x-axis represents the “cell index” in the range of [0 15], and the y-axis denotes the “Number of Occurrences.” We can see that cell index 10 appears twice (once in each image), and therefore, its “Number of Occurrences” in the histogram is 2. Indices 2, 5, 6, 9, and 14 each appear once, and thus, their occurrence count is 1. All the other indices have an occurrence count of 0 (See Fig. 10).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, J., Zhang, S., Qian, Y. et al. An adaptively weighted loss-enabled lightweight teacher–student model for real-time railroad inspection on edge devices. Neural Comput & Applic 35, 24455–24472 (2023). https://doi.org/10.1007/s00521-023-09038-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09038-2