Deep learning-based visual ensemble method for high-speed railway catenary clevis fracture detection
Introduction
The maintenance of catenary systems is a crucial task for ensuring the safety of electrical railway operation. Traditionally, this task is carried out by railway workers who search for damaged catenary fittings that need to be replaced along the railway. With the development of the high-speed railway network, manual inspection is not able to meet the required efficiency and reliability. In recent years, computer vision-based detection methods have drawn great attention from railway companies and research institutions. The advantages of these methods include minimal interference with railway operation, low investment costs and high detection efficiency. Currently, computer vision-based detection methods have been successfully used in dynamic stagger measurement [1], [2], rail maintenance [3], [4], [5], [6] and active pantograph control [7], but the recognition of catenary fittings and the diagnosis of faults are still heavily dependent on the observation of workers.
In this paper, an automatic visual inspection method is proposed to detect the fracture of the cross link clevises of the high-speed railway catenary. The cross link clevises are used to connect the registration arms and cantilevers in catenary systems. Fig. 1 shows the physical locations of the clevises, the registration arms and the cantilevers, where cantilevers are marked in red and registration arms are marked in blue. The clevises are marked in green and displayed in red circles in the images. Pictures in Fig. 1(a) and (b) are taken from the opposite directions. For convenience, we name the clevises in Fig. 1(a) and (b) left clevises and right clevises, respectively. They are detected and analyzed separately in this paper. Fig. 2 shows the details of clevises. Clevis fractures are caused by constant vibration of the catenary system triggered by high-speed trains. It leads to the weakening of mechanical strength of the catenary system, which increases the possibility of pantograph-catenary accident. Fig. 3 shows examples of clevis fracture highlighted in red rectangles.
The process of clevis fracture detection can be divided into two steps, clevis extraction and fracture detection. In the field of object detection, a breakthrough happened in 2001 when the boosted cascade framework based on Haar-like features was proposed by Viola and Jones [8]. After that, the combination of a machine learning-based classifier and hand-crafted local features predominated the field for many years. The classifier trained with different types of local features was applied to a sliding window of the image to determine the presence of the object. Widely-used hand-crafted local feature descriptors included SIFT features [9], SURF features [10], Haar-like features [8], Histogram of Orientated Gradients (HOG) features [11] and Local Binary Pattern (LBP) features [12]. In [11], a linear Support Vector Machine (SVM) classifier trained with HOG features was used in the detection of pedestrians. In [13], Zhu et al. used integral histograms to efficiently calculate HOG features and adopted a cascade of rejecters to simplify the detection. In [14], deformable part models were proposed to detect objects that may have large variation in shape appearance. The object was divided into multiple components, and the relative positions of different components as well as the label of the object were learned by discriminative learning. In [15], five different features were combined using a weighted score-level feature fusion approach to improve the accuracy of object detection. In [16], sparse representation features were generated from HOG and LBP features using K-singular value decomposition, and the dimensions of HOG and LBP features were reduced using the principal component analysis (PCA) method.
Although the machine learning-based object detection methods have been widely used, designing feature descriptors that are both discriminative and generalized is not an easy job. It requires careful engineering and considerable domain expertise [17]. With the development of neuroscience and biology, the hierarchy of the visual cortex is discovered. When the neural excitation propagates from lower layers to higher layers of the visual cortex, the optical signal perceived by the retina will be transformed to feature representations that are more and more abstract. Inspired by the hierarchy of the visual cortex, convolutional neural networks (CNNs) are proposed by LeCun et al. in 1989 and are used in zip code recognition [18]. In 2012, Krizhevsky et al. created a “large, deep convolutional neural network” [19] named AlexNet. This network won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [20] competition with a substantial improvement in image classification accuracy. In the next few years, the champions of ILSVRC were all convolutional neural networks. The superior ability of convolutional neural networks in object recognition and natural image classification is proved. Compared with traditional image processing methods, deep neural networks can self-adaptively extract features that effectively represent the key information of the image and learn the complex map function between the raw data and the image label.
Successes have been made in bridging the gap between natural image classification and object detection. In object detection, not only the type, but also the position of the object needs to be determined. Deep learning based object detection methods can be broadly divided into two types, regional proposal based methods and regression based methods. Regional proposal based methods first generate a series of regional proposals which may contain objects from the image. Then the regional proposals are inputted to a sub-network for object classification. Examples of this type of methods include R-CNN (region-based convolutional network) [21], fast R-CNN (fast region-based convolutional networks) [22] and faster R-CNN (towards real-time object detection with region proposal networks) [23]. Regression based methods do not rely on regional proposals. Instead, the object localization problem is treated as a regression problem from the beginning. Object detection is achieved based on the responses in different default boxes on the output spaces of a multi-task network. Examples of this type of methods include YOLO [24,25] (you only look once) and SSD (single shot multi-box detector) [26]. In this paper, the extraction of clevises is based on the faster R-CNN method.
The detection of fracture is based on the detection of cracks. This is achieved by analyzing the edge information of the clevis sub-image. Compared to traditional edge detection methods, the active contour models can be dynamically adapted to the contours of the objects with more flexibility and accuracy, and is capable of detecting weak edges that other gradient based methods may ignore. The introduction of the level set method has broadened the application range of active contour models. The basic idea of the level set method is that the curves can be implicitly represented by the zero level set of a function in the higher dimension (which is called the level set function). The level set function is deformed according to the partial differential equation (PDE) [27]. Existing implicit active contour models can be roughly categorized into two basic classes, the edge-based models [28] and the region-based models [29]. We focus on region-based models in this paper which generally have better performance in the presence of weak or discontinuous boundaries. Early popular region-based models tend to rely on intensity homogeneity [29]. However, the unevenly distributed light intensity (caused by the spotlights used for illumination and the cylinder-like shape of the clevis) on the surface of clevises may bring inhomogeneity in grayscale. The piecewise constant model [30] is able to handle intensity inhomogeneity but suffers from high computational cost. The region-scalable fitting (RSF) model proposed by Li et al. [31] draws upon intensity information in spatially varying local regions depending on a scale parameter. This method performs well in processing magnetic resonance images [32] and retinal blood vessel segmentation [33], and can be further incorporated into other models [34], [35], [36]. In this paper, the edge information of clevises is extracted based on the RSF model. Then the cracks are detected in the crucial areas (the areas in which the cracks are more likely to occur) obtained by image registration between the clevis sub-image and a standard clevis image.
The process of fracture detection is shown in Fig. 4. First, the RSF model is utilized to extract the edge information of the clevis sub-images. Then, the crucial areas, which are manually delineated in a standard clevis image, are projected to the clevis sub-image by shape context matching and affine transformation matrix computing. Finally, the clevis fracture detection is achieved by detecting cracks in the crucial areas. This is done by calculating the wavelet entropy inside the crucial areas and morphological filtering.
Section snippets
Catenary suspension image acquisition
Images of catenary systems are taken by CCD cameras mounted on the top of an inspection vehicle. As the inspection vehicle runs along the railway, the cameras are triggered automatically when a catenary pillar is detected. A sketch of the inspection vehicle is shown in Fig. 5. In order to eliminate the interference of image background, the catenary images are taken at night. LED spotlights are utilized for illumination. The captured catenary images are stored with the IDs of the corresponding
Clevis extraction
Because of the variation of illumination conditions, the grayscale distribution and texture on the surface of clevises are not invariable. Besides, the scale of the clevises may change with the shooting angle and shooting distance. Moreover, the existence of cantilevers, overhead lines, and insulators greatly increases the complexity of catenary images, making the extraction of clevises difficult. Considering the outstanding performance of CNN based methods in object detection and image
Edge information extraction
The edge information of the clevis sub-image is extracted using the RSF model. RSF model was first proposed by Li et al. [31]. Unlike other popular region-based active contour models, RSF model is capable of segmenting images with intensity inhomogeneity.
The basic idea of RSF model is to define a region-scalable fitting (RSF) energy function. For a given point x in the image, the local intensity fitting energy can be defined as
Experimental results and performance analyze
In this section, the performance of the proposed fracture detection method is evaluated both in the clevis extraction stage and the fracture detection stage.
Conclusions
This paper proposes a visual inspection method to detect clevis fractures in the high-speed railway catenary system based on multiple local features and RSF model. The clevis extractor trained based on the modified Faster R-CNN network can accurately extract the clevises from the image acquired under different image acquisition and illumination conditions. The proposed fracture detection method based on the RSF model and crucial area projection is reliable in most cases. Although false alarms
Conflict of interest
The authors declare that they do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Acknowledgment
This study was partially supported by National Natural Science Foundation of China (U1734202, U1434203), China Railway Science and Technology Major Research Project (2015J008-A), as well as Sichuan Province Youth Science and Technology Innovation Team (2016TD0012). The dataset collection were assisted by Guangzhou Railway Company and China Academy of Railway Sciences.
Ye Han received his B.Sc. degree in 2011 from Southwest Jiaotong University, Chengdu, China. Now he is a Ph.D. Candidate in School of Electrical Engineering, Southwest Jiaotong University. His main research field is intelligent detection of traction power supply system.
References (39)
- et al.
Speeded-up robust features (SURF)
Comput. Vis. Image Underst.
(2008) - et al.
Active contours driven by weighted region-scalable fitting energy based on local entropy
Signal Process.
(2012) - et al.
Detecting soft shadows in a single outdoor image: from local edge-based models to global constraints
Comput. Graph.
(2014) - et al.
Split Bregman method for minimization of improved active contour model combining local and global information dynamically
J. Math. Anal. Appl.
(2012) - et al.
Video-based dynamic stagger measurement of railway overhead power lines using rotation-invariant feature matching
IEEE Trans. Intell. Transp. Syst.
(2015) - et al.
A high-Precision detection approach for catenary geometry parameters of electrical railway
IEEE Trans. Instrum. Meas.
(2017) - et al.
Automatic fastener classification and defect detection in vision-based railway inspection systems
IEEE Trans. Instrum. Meas.
(2014) - et al.
Automated visual inspection of railroad tracks
IEEE Trans. Intell. Transp. Syst.
(2013) - et al.
A real-time visual inspection system for discrete surface defects of rail heads
IEEE Trans. Instrum. Meas.
(2012) - et al.
Deep multitask learning for railway track inspection
IEEE Trans. Intell. Transp. Syst.
(2017)
A new computer vision approach for active pantograph control
Rapid object detection using a boosted cascade of simple features
Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
Histograms of oriented gradients for human detection
Multiresolution gray-scale and rotation invariant texture classification with local binary patterns
IEEE Trans. Pattern Anal. Mach. Intell.
Fast human detection using a cascade of histograms of oriented gradients
Object detection with discriminatively trained part based models
IEEE Trans. Pattern Anal. Mach. Intell.
Multi-class fruit detection based on image region selection and improved object proposals
Neurocomputing
Pedestrian detection based on gradient and texture feature integration
Neurocomputing
Cited by (0)
Ye Han received his B.Sc. degree in 2011 from Southwest Jiaotong University, Chengdu, China. Now he is a Ph.D. Candidate in School of Electrical Engineering, Southwest Jiaotong University. His main research field is intelligent detection of traction power supply system.
Zhigang Liu received his B.Sc. degree in 1997, M.Sc. degree in 2000 and Ph.D. in 2003, all from Southwest Jiaotong University, Chengdu, China. Now he is a professor in School of Electrical Engineering, Southwest Jiaotong University. His current research interests include electrical relationships of vehicle grids in high-speed railways, power quality considering grid connections of new energies, pantograph-catenary dynamics, fault detection, status assessment, and active control.
Yang Lv received his B.Sc. degree in 2017 from Southwest Jiaotong University, Chengdu, China, where he is currently pursuing the M.Sc, with a focus on the detection and diagnosis of the railway pantograph-catenary system.
Kai Liu received his B.Sc. degree in 2017 from Southwest Jiaotong University, Chengdu, China, where he is currently pursuing the M.Sc, with a focus on the detection and diagnosis of the railway pantograph-catenary system.
Changjiang Li received his B.Sc. degree in 2017 from Southwest Jiaotong University, Chengdu, China, where he is currently pursuing the M.Sc, with a focus on the detection and diagnosis of the railway pantograph-catenary system.
Wenxuan Zhang is currently an Assistant Research Fellow with the Infrastructure Inspection Research Institute, China Academy of Railway Sciences. His research interests include data analysis for pantograph, and developing inspection equipment for the catenary system in high-speed railway.