Extreme Learning Machine-Based Traffic Incidents Detection with Domain Adaptation Transfer Learning

Chaimae Elhatri; Mohammed Tahifa; Jaouad Boumhidi

doi:10.1515/jisys-2016-0028

Open Access Published by De Gruyter October 19, 2016

Extreme Learning Machine-Based Traffic Incidents Detection with Domain Adaptation Transfer Learning

Chaimae Elhatri , Mohammed Tahifa and Jaouad Boumhidi

From the journal Journal of Intelligent Systems

https://doi.org/10.1515/jisys-2016-0028

Abstract

Traffic incidents in big cities are increasing alongside economic growth, causing traffic delays and deteriorating road safety conditions. Thus, developing a universal freeway automatic incident detection (AID) algorithm is a task that took the interest of researchers. This paper presents a novel automatic traffic incident detection method based on the extreme learning machine (ELM) algorithm. Furthermore, transfer learning has recently gained popularity as it can successfully generalise information across multiple tasks. This paper aimed to develop a new approach for the traffic domain-based domain adaptation. The ELM was used as a classifier for detection, and target domain adaptation transfer ELM (TELM-TDA) was used as a tool to transfer knowledge between environments to benefit from past experiences. The detection performance was evaluated by common criteria including detection rate, false alarm rate, and others. To prove the efficiency of the proposed method, a comparison was first made between back-propagation neural network and ELM; then, another comparison was made between ELM and TELM-TDA.

Keywords: Extreme learning machine (ELM); transfer learning; target domain adaptation transfer ELM (TELM-TDA); automatic incident detection (AID); simulator of urban mobility (SUMO); traffic control interface (Traci)

1 Introduction

Traffic incidents are defined as unusual events that cause a reduction of roadway capacity or an abnormal increase in demand, such as accidents, disabled vehicles, etc. Traffic incidents are not only the main cause of traffic congestion, which subsequently increases travel delay and fuel consumption, but are also a significant threat to traffic safety [11]. Thus, the increase of traffic incidents led researchers to focus their works on the development of effective detection techniques. The ability of detecting an incident and its location is important in traffic management systems. Early detection of incidents reduces the delay experienced by road users, fuel consumptions, gas emissions, and the probability of other collisions, as well as improves road safety and real-time traffic control. The study of automatic incident detection (AID) techniques has become an important aspect of current and future transportation systems. The objective of AID is to minimise human intervention in the detection of traffic incidents. The detection algorithm determines the presence of an incident by using real-time data received from traffic detector systems at fixed intervals. Various AID techniques have been released to address this problem, such as back-propagation neural network (BPNN) [13], fuzzy logic (FL) [16], support vector machine (SVM) [22], particle swarm optimisation [17], partial least squares [13, 18], and inductive logic programmes like nFOIL [12]. Therefore, the issue of detection can be formulated as a task of classification, to determine whether or not an incident happens according to data gathered from traffic flow. Artificial neural networks have been widely used to detect freeway incidents, and some of the studies ensured that neural network models can provide reliable incident detection; however, all of the works cited before suffer from drawbacks. For example, the limitation of SVM comes from the choice of kernel function, and the determination and tuning of several parameters. FL is based on human knowledge by determining fuzzy rules and fuzzy membership functions, which is often set manually by experts. Thus, the detection performance is affected by subjective decisions. Similarly, BPNN suffers from slow convergence. It has a number of parameters to determine and the possibility of getting caught in the local minima. It is often necessary to execute more than an experiment, which consumes time to obtain an optimal configuration for the network. The extreme learning machine (ELM) learning algorithm was proposed by Huang et al. [3], and used in many other works [4, 28]. Compared with other classification methods, ELMs have been confirmed to be efficient and effective learning techniques. Previous studies have proved that ELM can significantly increase the generalisation capability of neural networks at a small computational cost. The use of the ELM algorithm in the domain of traffic was discussed in Ref. [7], where the authors used the algorithm as a classifier for recognition of traffic signs. The classifier is a single-hidden-layer feed-forward network (SLFN). On the basis of the ELM algorithm, the connection between input and hidden layers realises random feature mapping, while only the weights between the hidden and output layers are trained. They concluded that the ELM-based classifier can achieve an optimal and generalised solution for multiclass traffic sign recognition. In Ref. [10], a semi-supervised ELM algorithm was used in combination with Laplacian SVM to construct a real-time driver distraction detection using eye and head movements, to classify two driver states: attentive and cognitively distracted. The paper explored semi-supervised methods for driver distraction detection in real driving conditions to alleviate the cost of labelling training data.

This paper focuses on the application of the ELM algorithm for traffic incidents detection. The basic idea is that when a traffic incident occurs, relevant information will be collected by loop detectors and the input signal will be pre-processed, and then the signal will be classified through ELM to determine the event type. On the other hand, the existing works assume that sufficient amount of training data are given for modelling the incident decision patterns; therefore, the performance of the methods is not guaranteed without enough data. However, it is practically expensive and time consuming to collect sufficiently large number of data from every area of the traffic network. Hence, discovering typical patterns with insufficient data is a critical problem. As the typical patterns may be shared by several traffic networks, where the idea to use transfer learning technology comes from, it is considered as an effective solution to transfer knowledge from one domain to another.

Transfer learning is a method whose objective is reusing knowledge learned in various environments to enhance the learning performance in new environments [14, 15]. When the two distributions do not match, two different transfer learning sub-problems can be defined depending on whether the training and testing data refer to the same domain or not [1]. In the framework of domain adaptation, most of the learning methods are inspired by the idea that these two considered domains, although different, are highly correlated [20]. The method inspired researchers to use it in different areas; in Ref. [27], the authors proposed a novel reconstruction-based transfer learning method called latent sparse domain transfer for domain adaptation and visual categorisation of heterogeneous data. For handling a cross-domain distribution mismatch, in Ref. [26], the authors used the concept of domain adaptation in image/video data represented with multiple visual features, to overcome the fact that training a small amount of labelled data is prone to over-fitting on the one hand, and the manual labelling process of a large number of unlabelled data is tedious and time consuming on the other hand. The use of domain adaptation-based transfer learning combined with ELM was described in Ref. [25], where the authors addressed the problem of visual knowledge adaptation by leveraging labelled patterns from a source domain and a very limited number of labelled instances in a target domain to learn a robust classifier for visual categorisation. They proposed a new ELM-based cross-domain network learning framework, called ELM-based domain adaptation. The unlabelled target data, as useful knowledge, are also integrated as a fidelity term to guarantee the stability during cross-domain learning. Hence, the objective of this paper was to develop a novel traffic incident detection approach based on ELM, which uses transfer learning method domain adaptation to adapt the learning task from one environment to another. When the data in the target domain to be learned are time consuming to gather, prior knowledge obtained from other domains referred to as source domains will be used.

2 State of the Art of Algorithms

2.1 Extreme Learning Machine

The ELM algorithm originally proposed by Huang et al. [3] is a very simple and efficient training method for SLFNs [5, 6]. On the other hand, conventional ELMs are based on training and test data that should be under the same distribution. All parameters of the feed-forward networks need to be tuned, while ELM randomly chooses hidden nodes and determines the output weights of SLFNs analytically. Thus, the network is obtained with very few steps and with low computational cost. For regression and classification, ELM tends to provide similar or better generalisation performance, at much faster learning speed than does traditional SVM and back-propagation [6].

2.1.1 Universal approximation capability

According to the ELM learning theory, a widespread type of feature mappings h(x) can be used in ELM so that ELM can approximate any continuous target functions f(x). Obviously, a learning machine with a feature mapping that does not satisfy the universal approximation condition cannot approximate all target continuous functions. Thus, the universal approximation condition is not only a sufficient condition but also a necessary condition for a feature mapping to be widely used. This is also true for classification applications.

2.1.2 Classification capability

Similar to the classification capability theorem of SLFNs, it is proved in Ref. [4] that the classification capability of the generalised SLFNs with the hidden-layer mapping h(x) satisfies the universal approximation condition networks. It is a necessary and sufficient condition that the feature mapping h(x) is chosen to make h(x)β have the capability of approximating any target continuous function. If not, there may exist some shapes of regions that cannot be separated by a classifier with such feature mapping h(x).

In the binary classification case, ELM only uses a single output node, and the class label closer to the output value of ELM is chosen as the predicted class label of the input data.

The output function of ELM for generalised SLFNs is represented by Eq. (1):

(1)fL(x)=∑i = 1Lβihi(x)=h(x)β,

where β=[β₁, …, β_L]^T is the vector of the output weights between the hidden layer of L nodes and the output node, and h(x)=[h₁(x), …, h_L(x)]^T is the output (row) vector of the hidden layer with respect to the input x. h(x) actually maps the data from the d-dimensional input space to the L-dimensional hidden-layer feature space (ELM feature space) H, and thus, h(x) is indeed a feature mapping. For the binary classification applications, the decision function of ELM is

(2)fL(x)=sin(h(x)β).

Different from traditional learning algorithms [19], ELM tends to reach not only the smallest training error but also the smallest norm of output weights. For feed-forward neural networks reaching a smaller training error, the smaller the norms of weights are, the better generalisation performance the networks tend to have. ELM tends to minimise the training error as well as the norm of the output weights:

(3)Minimize:min||Hβ−T||2 and ||β||,

where H is the hidden-layer output matrix:

H=[h(x1)⋮h(xN)]=[h1(x1)…hL(x1)⋮⋱⋮h1(xN)…hL(xN)].

To minimise the norm of the output weights ||β|| is actually to maximise the distance of the separating margins of the two different classes in the ELM feature space: 2/||β||.

The minimal norm least square method instead of the standard optimisation method was used in the original implementation of ELM:

(4)β=H†T=HT(HHT)−1T,

where H^T is the transpose of matrix H and H^† denotes the Moore-Penrose generalised inverse of the hidden layer output matrix H [21]. If the number N of training patterns is smaller than L, an underdetermined least square problem would be handled. In this case, the solution can be obtained as

(5)β=HT(HHT+INC)−1T,

where I_N is the identity matrix with size N and C is a penalty constant on the training errors.

Given a training set {(x_i, t_i) | x_i∈Rⁿ, t_i∈R^m, i∈[1, N]}, activation function g and the number of hidden nodes L, the procedure of ELM is summarised in Algorithm 1 from Ref. [2] below:

Algorithm 1:

ELM Training Algorithm

1	input	: speed X₁, occupancy X₂, flow rate X₃, entries of the SLFN
2	Randomly generate hidden node parameters (a_i, b_i), i=1, …, L. Calculate the hidden layer output matrix H.
3	IfN<Lthen
4		compute the output weights β using Eq. (5).
5	else
6		compute the output weights β using Eq. (4).
7	output	: the predicted y is a Boolean value (0 or 1) defining if an incident occurred or not, respectively.

After assigning randomly the arc weights between the input layer and the hidden layer and the hidden layer bias (line 1), ELM initialises the hidden layer to map the input data into a feature space by some non-linear mapping functions. The non-linear mapping function in ELM can be any non-linear piecewise function. We used the sigmoid function defined by Eq. (6):

(6)G(a, b, x)=11+exp(−(aix+bi)).

The hidden layer output matrix H is then calculated (line 2). In the second stage of ELM training (line 3), ELM aims at achieving not only the minimum training error but also the smallest norm of the output weights.

2.2 Transfer Learning

In reality, when the task comes from a new domain, new samples are relabelled costly and it would be a waste to discard all the old domain data. Thus, it is often desirable for the algorithm to learn an accurate model using only a tiny amount of new data and a large amount of old data. Transfer learning aims to solve related but different target domain problems by using plenty of labelled source domain data [9]. Transfer learning aims to learn from other related tasks and apply the learned model into the current task. The most general form of transfer learning is to learn similar tasks from one domain to another, and transfer the knowledge from one or more source tasks to a target task. Today, transfer learning methods appear in several works, most notably in data mining and machine learning [15]. The objective of this work is to adapt transfer learning into the traffic incident detection task to achieve high-accuracy classification, in order to avoid consuming time for collecting sufficient data in the target domain.

3 Target Domain Adaptation Transfer ELM (TELM-TDA)

In a domain like traffic incident detection, the data distribution obtained with different conditions may change from different domains. It is also well known that it is expensive and time consuming to collect a sufficiently large number of data from every domain, while the classifier trained by a small number of labelled data is not robust and therefore leads to weak generalisation. Domain adaptation methods have been proposed for classifying learning with a few labelled samples from target domain by using a number of labelled samples from source domains. In this paper, we extend ELM to handle a domain adaptation problem for improving the transferring capability of ELM between multiple domains with very few labelled data in the target domain. The proposed target domain adaptation transfer ELM can learn using small labelled data from the target domain, and use the remaining unlabelled data by approximating the prediction of the base classifier.

In the proposed TELM-TDA, we assume that all the samples in the source domain are labelled data. The output weights β_T of the target domain T is learned from the output weights β_S of the source domain S and the very few labelled data in the target domain DT,il. The structure of the proposed TELM-SDA is described in Figure 1, from which we can see that unlabelled data in the target domain have also been explored. If the number of training samples N_T>L, then β_T can be obtained by Eq. (7) [23, 24]:

Figure 1:

Structure of TELM-TDA Algorithm with M Target Domains.

(7)βT=(I+CTHTTHT+CTuHTuTHTu)−1(CTHTTtT+CTuHTuTHTuβS).

Else, the output weights can be obtained by Eq. (8):

(8)βT=HTTαT+HTuTαTu=HTT(QP−1O−R)−1(QP−1TTu−TT)+HTT[P−1TTu−P−1O(QP−1O−R)−1(QP−1TTu−TT)],

where

TTu=HTuβS,

O=HTuHTT,

P=HTuHTuT+ICTu,

Q=HTHTuT,

R=HTHTT+ICTu,

and I is the identity matrix with size of N_T.

For recognition of the numerous unlabelled data in target domain, we calculate the final output using Eq. (9):

(9)yTuk=HTuk∗βT with k=1, …, NTu,

where HTuk denotes the hidden layer output with respect to the k^th unlabelled vector in the target domain and N_Tu is the number of unlabelled vectors in the target domain.

Given the training samples {Xs, ts}={xsi, tsi}i = 1NS of the source domain S; the labelled guide samples {XT, tT}={xTj, tTj}j = 1NT of the target domain T; the unlabelled samples {XTu}={xTk, tsk}k = 1NS of T; the trade-off parameters C_s, C_T, and C_Tu; the output weights β_T; and the predicted output y_Tu of unlabelled data in target domain, the procedure of TELM-TDA is summarised in Algorithm 2 [24].

Algorithm 2:

TELM-TDA Algorithm

1	inputs	: speed X_s1, occupancy X_s2, flow rate X_s3, entries of the SLFN as data of source domain; speed X_T1, occupancy X_T2, flow rate X_T3, entries of the SLFN as labelled data of target domain; speed XTu1, occupancy XTu2, flow rate XTu3, entries of the SLFN as unlabelled data of target domain;
2	Initialise the ELM network of L hidden neurons with random input weights W_i and hidden bias B_i.
3	Calculate the output matrix H_s of hidden layer as H_s=h(W_i*X_s+B_i).
4	ifN_s<L, then
5		compute the output weights β_s of the base classifier using Eq. (4).
6	else
7		compute the output weight β_s of the base classifier using Eq. (5)
8	Initialise the ELM network of L hidden neurons with random input weights W_j and hidden bias B_j.
9	Calculate the output matrix H_T and H_Tu of hidden layer with labelled and unlabelled data in target domains as H_T=h(W_jX_T+B_j) and H_Tu=h(W_jX_Tu+B_j).
10	IfN_T<L, then
11		compute the output weights β_T using Eq. (8).
12	else
13		compute the output weights β_T using Eq. (7).
14	Calculate the predicted output y_Tu using Eq. (9).
15	output	: the predicted y_Tu is a Boolean value (0 or 1) defining if an incident occurred or not, respectively.

4 Performance Measures and Network Configuration

4.1 Measures for AID Performance

To evaluate the detection performance, there are numerous measures, such as detection rate (DR), false alarm rate (FAR), as well as mean time to detection (MTTD). DR and FAR quantify the effectiveness of an algorithm, while MTTD reflects the efficiency of the algorithm. DR is defined as the percentage between detected incidents and the total number of incidents known to have occurred during the observation period. FAR is one of the main parameters for evaluating AID systems. It is identified as the proportion of objects that are incorrectly classified as incident objects, based on the total objects in the testing set. MTTD is computed as the average length of time between the start of the incident and the time the alarm is initiated. The first correct alarm declared for a single incident is used for computing the DT and MTTD.

Another measure to add is the classification rate (CR). We can add it as the fourth index to test AID algorithms. CR is computed as the percentage of classified objects (including both incident and non-incident objects) by the model of the total number of testing objects.

The classification of incident detection can be viewed as a problem of deciding whether a traffic incident occurs or not. Suppose that the testing data contain n incidents occurrence, in which m incidents were detected successfully by our proposed AID, and P positive objects indicating an incident state and N negative objects indicating an incident-free state. TP, TN, FP, and FN are the number of true positive, true negative, false positive, and false negative, respectively. The performance measures mentioned above are proposed as follows:

(10)DR=mn∗100,

(11)FAR=FPP+N∗100,

(12)MTTD=t1+,…,+tmm,

(13)CR=TP+TN(P+N)∗100,

where t_i is the time delay between the occurrence of the incident and its detection and m is the number of incidents detected.

The parameters cited previously are highly dependent. Sensitive detection has a higher DR; hence, sensitive algorithms also have a tendency to produce a large number of FAR. In the same way, as less sensitive algorithms produce fewer false alarms, they also detect fewer incidents.

To evaluate the effectiveness of the proposed model, we use the mean absolute error (MAE), defined as

(14)MAE=1n∑i = 1n|fi−gi|,

where f_i is the observed traffic state, g_i is the predicted traffic state, and n is the number of training data.

4.2 Network Configuration

To prove the efficiency of the proposed method in detecting traffic incidents, we used the simulation tool SUMO (simulator of urban mobility). SUMO is an open-source, highly portable, microscopic, and multi-model traffic simulation package that has been available since 2001. SUMO is designed to handle large road networks and to establish a common test bed for algorithms and models from traffic research. SUMO can be enhanced with custom models and provides various APIs to remotely control the simulation [8]. We employed in our work Traci, a library providing extensive commands to dynamically control the behaviour of the simulation including vehicle state and road configuration.

The traffic network used in this work is shown in Figure 2. It contains 25 traditional four-way intersections equipped with traffic signals and >300 links. Each edge contains three lanes in each direction. Each lane has two detectors, one at the entry and the other at the exit, as shown in Figure 3. These detectors are used to collect the upstream and downstream traffic data for each lane. During the simulation, new vehicles are generated by uniform distribution over 20 input sections. In this configuration, the number of phases is four, each to let the circulating vehicles at one of the four directions, and the cycle time is set to 100 s.

Figure 2:

Network Configuration Composed of 25 Intersections.

Figure 3:

Intersection Configuration with Detectors in Each Lane.

An incident refers to anything that interrupts the traffic flow. Examples in the real world could be a collision, road works, parking troubles, or bad weather conditions causing very low speed, long delays, and queues. There are a few options in SUMO to simulate such events; for instance, stopping a car for a while, by defining a point along its road when it should halt and for how long, or setting the speed limit on an edge to low. We decided to employ the first option since it is the easiest way to implement and therefore easy to transfer from one scenario to another. Various problems arise when it comes to initially getting a car to stop for a desired realistic time of 800 s. The car, in turn, stops all other cars behind it or those trying to get onto the road, causing a traffic block, long vehicle queue, and delay.

To train an algorithm to detect traffic incidents, sufficient detector data that include different incident patterns under a variety of flow conditions have to be acquired. We introduced 20 incidents in the pattern of road traffic that included 10 incidents with one lane blocked, and 10 incidents with two or more lanes blocked. Our dataset contained traffic information (speed, occupancy, flow rate) that was recorded in 63 h and captured by 360 pairs of inductive loop detectors, which measure the presence of a vehicle passing over them. For example, vehicle speed was computed using the time elapsed when a vehicle passed from one detector of the same pair to the other. The final dataset was split to training and testing subsets. It contained 2300 labelled data; 80% data were randomly selected as training data and the remaining 20% data were used for testing. The class to be predicted is represented in binary form by the values 0 and 1 for evaluating the absence or the presence of an incident, respectively.

The main idea of our AID system is to check the variation between upstream and downstream detectors. Traffic measurements have a tendency to change after the occurrence of an incident, such as flow rate of vehicles, speed, and occupancy. The simulation in this paper was done by gathering these data from SUMO every 100 s interval, and using them as entries for the ELM. The data used include speed, traffic flow, and occupancy between downstream and upstream, which have been well studied in traffic engineering.

5 Performance Evaluation

Our simulation was divided basically into two parts. We compared the effectiveness of ELM to detect traffic incidents against BPNN in the first part. In the second part, the performance of the proposed TELM-TDA was measured by comparing it against ELM, and trained by a small amount of labelled data from the target domain.

We used two different traffic networks. For the first comparison, we used the network discussed in Figure 1. For the second comparison, we used a different traffic network in which all the edges contain one lane in each direction.

Table 1 summarises the testing performance of ELM and BPNN. It is evident that ELM, showing a high DR, low FAR, and high CR, performs better than BPNN. Nevertheless, both systems are approximately equal in terms of MTTD. Another term of comparison is training time. It can calculate the time consumed by the method to reach an optimal classifier. We can conclude that ELM is far better than BPNN; ELM can converge in 0.14 s – better than BPNN, which takes 311.75 s.

Table 1:

Comparison between BPNN and ELM in Terms of Indices of Performance and Training Time.

Methods	DR (%)	FAR (%)	MTTD (s)	CR (%)	TT (s)
BPNN	92.30	0.03	241.66	85.34	311.75
ELM	100.00	0.02	261.53	89.62	0.14

DR, detection rate; FAR, false alarm rate; MTTD, mean time to detection; CR, classification rate; TT, training time.

The collection of sufficient incident data was impractical because of the prohibitively long time period required to form the data set, and the effort needed to obtain historical records. It was, therefore, decided to base the training and testing on TELM-TDA. We compared TELM-TDA with ELM to prove its efficiency to transfer knowledge between different environments and benefit from past experiences. We generated another network in which each edge contains one lane in each direction. We considered the updated network as the target domain to be learned. Without learning from the beginning in the target domain, with the use of TELM-TDA, we transferred knowledge from the first environment referred to as the source domain to the target domain to benefit from the past knowledge. To help the TELM-TDA adapt the past experience to the new domain, we extracted 300 labelled data from the new traffic network to use them for domain adaptation.

The source domain is a network with edges composed of three lanes each and the target domain is a network with one lane each. Samples of data training for source and target domains are represented in Tables 2 and 3.

Table 2:

Training Data Extracted from Source Domain Network with Three Lanes.

Binary output	Flow rate (%)	Average speed (m/s)	Occupancy rate (%)
1	83.0	2.50	5.0
1	23.0	0.37	10.0
0	79.0	9.50	5.0
0	72.0	10.0	4.0
0	62.0	10.0	4.0
1	93.0	1.41	26.0
1	93.0	1.71	31.0
0	89.0	6.00	3.0
0	72.0	10.0	5.0
0	75.0	10.0	3.0

Table 3:

Training Data Extracted from Target Domain Network with One Lane.

Binary output	Flow rate (%)	Average speed (m/s)	Occupancy rate (%)
0	97.0	8.37	12
1	93.0	0.23	35
0	100.0	0.00	0
0	96.0	10.00	2
0	97.0	7.67	13
1	95.0	0.31	85
1	68.0	0.00	100
0	95.0	10.00	4
0	100.0	0.00	0
1	49.0	1.56	37

Table 4 shows the comparison between learning with ELM and with TELM-TDA. As we can see, TELM-TDA, which trained by using the prior knowledge obtained from the source domain, largely surpasses ELM in almost all the performance indices. The ELM has drawbacks when the learning data are insufficient, and this is reflected by the lower DR and CR, and the high FAR. TELM-TDA can handle this issue and give better performance than ELM, benefiting from past experiences from other environments and adapting them to new environments, by using the small amount of data collected from SUMO. These results prove that the proposed method could be easily applied to increasing detection accuracy without requiring large training data that are time consuming to collect.

Table 4:

Comparison between ELM and TELM-TDA in Terms of Indices of Performance and MAE.

Methods	DR (%)	FAR (%)	MTTD (s)	CR (%)	MAE
ELM	90.90	0.046	130.00	86.21	0.2291
TELM-TDA	98.38	0.035	236.67	90.53	0.0882

DR, detection rate; FAR, false alarm rate; MTTD, mean time to detection; CR, classification rate; MAE, mean absolute error.

6 Conclusion

Automatic traffic incidents detection algorithms are presented as one of the main components of an effective freeway traffic management system. We have presented an application of ELM, to solve the problem of traffic incidents detection. The task addressed in this paper was to learn a classifier to decide whether an incident has occurred or not. Hence, a domain adaptation-based transfer learning was used to transfer knowledge of past experiences to future environments to avoid learning from scratches. The comparison was made in two steps. The first comparison was made between ELM and BPNN, and the second comparison was made between ELM and TELM-TDA. Their performances were compared in terms of DR, FAR, MTTD, CR, training time, and MAE. The simulation results showed that ELM surpasses BPNN in terms of speed of convergence and indices of performance, whereas the proposed TELM-TDA outperforms ELM. To conclude, TELM-TDA achieves satisfactory performance with a small number of training samples by using the knowledge learned from source domains to the target domain in order to benefit from past experiences.

Much work still needs to be done. Automatic traffic incidents detection is just a first step to obtain a successful traffic management system. Future works will focus on coupling this AID with good policy management for evacuation of vehicles when an incident happens.

Bibliography

[1] L. Bruzzone and M. Marconcini, Domain adaptation problems: a DASVM classification technique and a circular validation strategy, IEEE Trans. Pattern Anal. Mach. Intell.32 (2010), 770–787.10.1109/TPAMI.2009.57Search in Google Scholar

[2] H. Hardy and Y. N. Cheah, Question classification using extreme learning machine on semantic features, J. ICT Res. Appl.7 (2013), 36–58.10.5614/itbj.ict.res.appl.2013.7.1.3Search in Google Scholar

[3] G. B. Huang, L. Chen and C. K. Siew, Universal approximation using incremental constructive feedforward networks with random hidden nodes, IEEE Trans. Neural Netw.17 (2006), 879–892.10.1109/TNN.2006.875977Search in Google Scholar

[4] G. B. Huang, H. Zhou, X. Ding and R. Zhang, Extreme learning machine for regression and multiclass classification, IEEE Trans. Syst. Man Cybernet. Pt. B Cybernet.42 (2012), 513–529.10.1109/TSMCB.2011.2168604Search in Google Scholar

[5] G. B. Huang, Q. Y. Zhu and C. K. Siew, Extreme learning machine: a new learning scheme of feedforward neural networks, in: Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, 2, pp. 985–990, IEEE, 2004.Search in Google Scholar

[6] G. B. Huang, Q. Y. Zhu and C. K. Siew, Extreme learning machine: theory and applications, Neurocomputing70 (2006), 489–501.10.1016/j.neucom.2005.12.126Search in Google Scholar

[7] Z. Huang, Y. Yu, J. Gu and H. Liu, An efficient method for traffic sign recognition based on extreme learning machine, IEEE Trans. Cybernet. (2016), 1–14. Available online at http://www.ntu.edu.sg/home/egbhuang/pdf/ELM-Traffic-Sign-Recognition.pdf.10.1109/TCYB.2016.2533424Search in Google Scholar

[8] D. Krajzewicz, G. Hertkorn, C. Rössel and P. Wagner, SUMO (Simulation of Urban MObility) – an open-source traffic simulation, in: Proceedings of the 4th Middle East Symposium on Simulation and Modelling (MESM20002), pp. 183–187, 2002.Search in Google Scholar

[9] X. Li, W. Mao and W. Jiang, Extreme learning machine based transfer learning for data classification, Neurocomputing174 (2016), 203–210.10.1016/j.neucom.2015.01.096Search in Google Scholar

[10] T. Liu, Y. Yang, G. B. Huang, Y. K. Yeo and Z. Lin, Driver distraction detection using semi-supervised machine learning, IEEE Trans. Intell. Transport. Syst.17 (2016), 1108–1120.10.1109/TITS.2015.2496157Search in Google Scholar

[11] Y. U. Liu, Y. U. Lei, Q. I. Yi, J. Wang and H. Wen, Traffic incident detection algorithm for urban expressways based on probe vehicle data, J. Transport. Syst. Eng. Inform. Technol.8 (2008), 36–41.10.1016/S1570-6672(08)60031-8Search in Google Scholar

[12] J. Lu, S. Chen, W. Wang and B. Ran, Automatic traffic incident detection based on nFOIL, Expert Syst. Appl.39 (2012), 6547–6556.10.1016/j.eswa.2011.12.050Search in Google Scholar

[13] J. Lu, S. Chen, W. Wang and H. van Zuylen, A hybrid model of partial least squares and neural network for traffic incident detection, Expert Syst. Appl.39 (2012), 4775–4784.10.1016/j.eswa.2011.09.158Search in Google Scholar

[14] S. J. Pan, J. T. Kwok and Q. Yang, Transfer learning via dimensionality reduction, in: AAAI, 8, pp. 677–682, 2008.Search in Google Scholar

[15] S. J. Pan and Q. Yang, A survey on transfer learning, IEEE Trans. Knowl. Data Eng.22 (2010), 1345–1359.10.1109/TKDE.2009.191Search in Google Scholar

[16] R. Rossi, M. Gastaldi, G. Gecchele and V. Barbaro, Fuzzy logic-based incident detection system using loop detectors data, Transport. Res. Proc.10 (2015), 266–275.10.1016/j.trpro.2015.09.076Search in Google Scholar

[17] D. Srinivasan, W. H. Loo and R. L. Cheu, Traffic incident detection using particle swarm optimization, in: Proceedings of the 2003 IEEE Swarm Intelligence Symposium, SIS’03, IEEE, pp. 144–151, 2003.Search in Google Scholar

[18] W. Wang, S. Chen and G. Qu, Incident detection algorithm based on partial least squares regression, Transport. Res. Pt. C Emerg. Technol.16 (2008), 54–70.10.1016/j.trc.2007.06.005Search in Google Scholar

[19] D. E. Rumelhart, G. E. Hinton and R. J. Williams, Learning representations by back-propagating errors, Nature323 (1986), 533–536.10.1038/323533a0Search in Google Scholar

[20] E. W. Xiang, B. Cao, D. H. Hu and Q. Yang, Bridging domains using world wide knowledge for transfer learning, IEEE Trans. Knowl. Data Eng.22 (2010), 770–783.10.1109/TKDE.2010.31Search in Google Scholar

[21] J. Xu, H. Zhou and G. B. Huang, Extreme learning machine based fast object recognition, in: 15th International Conference on Information Fusion (FUSION), IEEE, pp. 1490–1496, 2012.Search in Google Scholar

[22] F. Yuan and R. L. Cheu, Incident detection using support vector machines, Transport. Res. Pt. C Emerg. Technol.11 (2003), 309–328.10.1016/S0968-090X(03)00020-2Search in Google Scholar

[23] L. Zhang and D. Zhang, Domain adaptation extreme learning machines for drift compensation in E-nose systems, IEEE Trans. Instrum. Measure.64 (2015), 1790–1801.10.1109/TIM.2014.2367775Search in Google Scholar

[24] L. Zhang and D. Zhang, Domain adaptation transfer extreme learning machines, in: Proceedings of ELM-2014, vol. 1, pp. 103–119, Springer, 2015.10.1007/978-3-319-14063-6_10Search in Google Scholar

[25] L. Zhang and D. Zhang, Robust visual knowledge transfer via extreme learning machine based domain adaptation, IEEE Trans. Image Process.25 (2016), 4959–4973.10.1109/TIP.2016.2598679Search in Google Scholar PubMed

[26] L. Zhang and D. Zhang, Visual understanding via multi-feature shared learning with global consistency, IEEE Trans. Multimed.18 (2016), 247–259.10.1109/TMM.2015.2510509Search in Google Scholar

[27] L. Zhang, W. Zuo and D. Zhang, LSDT: latent sparse domain transfer learning for visual adaptation, IEEE Trans. Image Process.25 (2016), 1177–1191.10.1109/TIP.2016.2516952Search in Google Scholar PubMed

[28] W. Zong and G. B. Huang, Face recognition based on extreme learning machine, Neurocomputing74 (2011), 2541–2551.10.1016/j.neucom.2010.12.041Search in Google Scholar

Received: 2016-4-6

Published Online: 2016-10-19

Published in Print: 2017-9-26

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Extreme Learning Machine-Based Traffic Incidents Detection with Domain Adaptation Transfer Learning

Abstract

1 Introduction

2 State of the Art of Algorithms

2.1 Extreme Learning Machine

2.1.1 Universal approximation capability

2.1.2 Classification capability

2.2 Transfer Learning

3 Target Domain Adaptation Transfer ELM (TELM-TDA)

4 Performance Measures and Network Configuration

4.1 Measures for AID Performance

4.2 Network Configuration

5 Performance Evaluation

6 Conclusion

Bibliography

Journal and Issue

Articles in the same Issue