Keywords

1 Introduction

Far-field activities can be described by trajectories [4, 10], their spatial and angular features [1, 9], or by generative models [2, 6, 11, 12]. A high-level description of the spatial distribution of frequent activity patterns can support anomaly detection [8] and accessibility planning [5], and encode semantic regions of the scene [13].

In the literature, both diffusion and probabilistic models have been used to describe frequent activity patterns and to label trajectories according to activity patterns. On the one hand, diffusion models such as optical flow models have been used to describe coherent motions and semantic regions [14], and heat maps based on thermal diffusion processes have been used to capture the temporal motion information of activities [7]. On the other hand, probabilistic models such as Hidden Markov Models have been used to detect the points of interest in a scene [11], and Dirichlet Process Mixture Models have been used to robustly label new trajectories according to their activity patterns [6]. These studies either focused on attributing a single semantic meaning to specific spatial regions [11, 14], or on labelling trajectories according to their activity patterns [6, 7]. However, neither of them provide straightforward information about velocity and direction of the activity patterns, and only the latter are able to describe different global activity patterns in the same region of the scene. Typically, this type of motion information is associated to vector fields which embed both the physical meaning of motion (i.e., velocity and direction) and the semantic interpretation of different regions of the scene [5, 12].

We propose to (i) describe multiple, complex activity patterns using layered vector fields and (ii) label test trajectories according to activity patterns using the estimated vector fields. Our approach extends that of previous studies in that: (a) it imposes data-driven sparsity to the vector field abstractions to prevent erroneous extrapolations in regions with no target data (contrarily to [5, 12]); (b) it is sensitive to concurrent activity patterns (contrarily to [11, 14]); and (c) it provides information on the velocity and direction of activity patterns (contrarily to [6, 7]).

The proposed vector field estimation uses a cost function specifically designed to yield sparse estimates. Contrarily to other studies, which induce sparsity of the vector field estimates through the \(l_1\)-norm [3], this work does so through statistical conditioning on available data – the estimated spatial Probability Density Function (PDF) of the targets positions restricts vector field estimates to the regions where targets are observed. The proposed cost function further benefits from automatic parameter tuning using targets positions and trajectory features. Layered vector field abstractions can be obtained if the proposed approach is applied on pre-clustered trajectories with similar activity patterns. We assess the accuracy of the vector field abstractions of synthetic trajectories by comparing the estimated and generating vector fields, and the correct sparsity by comparing the estimated vector fields in regions with no target data with the null vector field. Vector field comparisons focus on the mean vector length (RMSL) and the vector similarity coefficient (R) [16].

Moreover, we propose a trajectory labeling algorithm according to activity patterns. The displacement error between test trajectories and generated trajectories using the estimated vector fields is the measure for classification. This way, test trajectories are sorted according to activity patterns or detected as outliers. We assess the accuracy of the trajectory labeling algorithm by comparing the attributed and the observed activity pattern labels of test trajectories.

2 Estimation of Multiple Vector Fields

We first aim to estimate the vector field, \(\mathbf {T}\), that describes the activity patterns of a set of S trajectories, \(\mathcal {X}= \{ \mathbf {x}_{1}, \dots , \mathbf {x}_{S} \}\). Let \(t=1, \dots , L_s\) and the target position, \(\mathbf {x}_s(t)\), in the image plane of a camera be driven by \(\mathbf {T}\) according to

$$\begin{aligned} \mathbf {x}_s(t)= \mathbf {x}_s(t-1) + \mathbf {T}(\mathbf {x}_s(t-1)) + \mathbf {w}_s(t), \end{aligned}$$
(1)

where \(\mathbf {w}_s(t) \sim \mathcal {N}(\mathbf {0}, \sigma ^2 \, \mathbf {I}), \forall \,t\), is a white random perturbation. Let the image plane be normalized, thus \(\mathbf {x}_s(t) \in [0, 1]^2\, \forall \,t\).

The vector field, \(\mathbf {T}:[0, 1]^2 \rightarrow \mathbb {R}^2\), is defined only at the grid nodes of an over-imposed regular, uniform grid, \(\mathcal {G}= \{ \mathbf {g}_n \in [0, 1]^2, n= 1, 2, \dots , N \}\), on the image plane. As the target trajectories can be defined in any image coordinate, even if it does not correspond to a grid node (\(\mathbf {x}_s(t) \notin \mathcal {G}\)), we bilinearly interpolate to represent the vector field that drives the target position on any coordinate of the image plane, \(\mathbf {x}_s(t) \notin \mathcal {G}\):

$$\begin{aligned} \mathbf {T}(\mathbf {x}_s(t)) = \sum _{n=1}^{N} \mathbf {\phi }_n(\mathbf {x}_s(t))\, \mathbf {t}_n , \end{aligned}$$
(2)

where \(\mathbf {\phi }_n(\mathbf {x}_s(t))\) are the interpolation coefficients of the velocity vectors, \(\mathbf {t}_n\), at the grid nodes. The matrix of interpolation coefficients for set \(\mathcal {X}\) is \(\mathbf {\varPhi }\).

Vector field estimation corresponds to an optimization problem where \(\mathbf {T}\) is the minimizer of a given cost function that has to induce data-driven sparsity. To impose sparsity of the vector field estimates in the regions where target data does not exist, the velocity vectors in \(\mathbf {T}\) are weighted by 1 minus the spatial probability density function (PDF) of targets positions, i.e., \(\mathcal {D}=\mathbbm {1}-\varGamma _p\), \(\varGamma _p \in \mathbb {R}^{N}\). \(\varGamma _p\) is the estimated PDF of the targets positions using the Parzen window algorithm over set \(\mathcal {X}\). Then, to get its value at the grid nodes, we discretize at the desired image coordinates (Fig. 1).

Fig. 1.
figure 1

Probability density functions of the targets positions for 3 different activity patterns from the synthetic data set (\(D_0\)). Gradient colours of ascending density of data points from dark blue to yellow. (Color figure online)

The cost function is therefore defined as

$$\begin{aligned} f(\mathbf {T}) = \Vert \mathbf {V} - \mathbf {T}\,\mathbf {\varPhi } \Vert _2^2 + \alpha \, \Vert \mathbf {T}\circ \mathbbm {1}\mathcal {D}^{\top } \Vert _2^2, \end{aligned}$$
(3)

where \(\mathbbm {1}\) is of size \([2 \times 1]\), \(\Vert .\, \Vert _2\) defines the \(l_2\)-norm of a vector, “\(\circ \)" represents the Hadamard product, and \(\mathbf {T} \in \mathbb {R}^{2\times N}\), \(\mathbf {V} \in \mathbb {R}^{2\times M}\), \(\mathbf {\varPhi } \in \mathbb {R}^{N\times M}\), \(M= \sum _{s=1}^{S}(L_s - 1)\), are given by

$$\begin{aligned} \mathbf {T}&= \begin{bmatrix} \mathbf {t}_1&\;\dots&\;\mathbf {t}_N \end{bmatrix}, \end{aligned}$$
(4)
$$\begin{aligned} \mathbf {V}&= \begin{bmatrix} \mathbf {v}_1(2) \dots \mathbf {v}_1(L_1)\;&|&\dots&|&\mathbf {v}_S(2) \dots \mathbf {v}_S(L_S) \end{bmatrix}, \end{aligned}$$
(5)
$$\begin{aligned} \mathbf {\varPhi }&= \begin{bmatrix} \begin{array} {c} \mathbf {\phi }_1(\mathbf {x}_1(1)) \dots \mathbf {\phi }_1(\mathbf {x}_1(L_1 -1)) \\ \vdots \\ \mathbf {\phi }_N(\mathbf {x}_1(1)) \dots \mathbf {\phi }_N(\mathbf {x}_1(L_1 -1)) \end{array} &{} \Bigg |&{} \dots &{} \Bigg |&{} \begin{array}{c} \mathbf {\phi }_1(\mathbf {x}_S(1)) \dots \mathbf {\phi }_1(\mathbf {x}_S(L_S -1)) \\ \vdots \\ \mathbf {\phi }_N(\mathbf {x}_S(1)) \dots \mathbf {\phi }_N(\mathbf {x}_S(L_S -1)) \end{array} \end{bmatrix}, \end{aligned}$$
(6)

with matrix \(\mathbf {V}\) composed of the velocity vectors between consecutive target positions, i.e., \(\mathbf {v}_s(t)= \mathbf {x}_s(t) - \mathbf {x}_s(t-1)\).

In Eq. (3), \(\alpha \) is correlated with the grid resolution N. To avoid manual parameter input in every estimation procedure, we propose an automatic tuning based on the cardinality of the non-zero elements (i.e., \(|\;\cdot \; |_{\ne 0}\)), expected value (i.e., \(\mathbb {E}[\;\cdot \;]\)), and standard deviation [i.e., \(\sigma (\,\cdot \,)\)] of the estimated PDFs of target and trajectory features,

$$\begin{aligned} \alpha&= 1- \frac{|\varGamma _p |_{\ne 0}}{N}, \end{aligned}$$
(7)
$$\begin{aligned} N&= \max \big \{N_{\text {min}}, |\varGamma _c |_{\ne 0} > \mathbb {E}[\varGamma _c]+ 1.5\, \sigma (\varGamma _c)\big \}, \end{aligned}$$
(8)

where \(\varGamma _p\) is the spatial PDF of the target positions as before; \(\varGamma _c \in \mathbb {R}^N\) is the average curvature of the trajectories at the grid nodes, which is estimated using the velocity angles \(\theta (t)= \tan ^{-1}\Big (\frac{y(t)-y(t-1)}{x(t)-x(t-1)}\Big )\) [1, 8]; and \(N_{\text {min}}\) is the minimum grid resolution selected by the user. In (8), very curly trajectories (i.e., extreme values of the distribution of \(\varGamma _c\)) define the grid resolution.

Multiple vector fields can be estimated using (3) if it is applied to each set of pre-clustered trajectories (\(\mathcal {X}_k\)) with similar activity patterns, e.g. using multiple features [1]. In the following, we assume that the pre-clustering step has taken place and that we have access to the sets \(\mathcal {X}_k\).

3 Activity Pattern Labeling

Our second aim is to label trajectories according to their activity patterns. To achieve this aim, we first estimate the \(\mathbf {T}_k\) following the above approach and using only trajectories from training sets \(\mathcal {X}_k\). Then, we propose the following labeling algorithm:

  1. 1.

    Trajectory labeling:

    1. (a)

      Generate trajectories from the starting point of a given test trajectory using the estimated \(\mathbf {T}_k\) and (1);

    2. (b)

      Compute the displacement error as the euclidean distance between the generated and the test trajectories;

    3. (c)

      Label each test trajectory with the activity pattern (vector field abstraction) that yields the smallest displacement error.

  2. 2.

    Outlier detection using threshold:

    1. (a)

      Compute the cutoff threshold as the sum of the median and the median absolute deviation (MAD) of the displacement errors obtained from steps 1.(a) and 1.(b) applied on a set of validation trajectories (Fig. 2);

    2. (b)

      Label test trajectories as outliers of the labeled activity pattern from step 1.(c) if their displacement error is above the threshold.

In the Outlier detection step above, the cutoff threshold for outlier detection is the sum of the median and median absolute deviation of the displacement errors. We use the median and its absolute deviation instead of the mean and standard deviation given that the distribution of displacement errors is right skewed.

Fig. 2.
figure 2

Histogram of displacement errors between validation trajectories and generated trajectories using the known labels of activity patterns (vector fields). The cutoff threshold is shown in black. The validation trajectories used in this example come from the real data set \(D_3\).

4 Experimental Results

4.1 Synthetic Data

Assessment Measures. Estimates of vector fields using synthetic data (\(\mathbf {T}_{\text {est}}\)) are assessed regarding both the accuracy when compared to the known generating vector field (\(\mathbf {T}_{\text {ref}}\)) and the correct sparsity compared to the null vector field (\(\mathbf {T}_{0}\)) in regions where no target data is observed. Let each node on the over-imposed grid be labeled according to its proximity to a given trajectory as an active node, if it belongs to a square of nodes containing part of a given trajectory, or a non-active node, if it does not belong to such a square of nodes. Thus, the region where no target data is observed is defined as the set of non-active nodes in the image plane, \(\mathcal {Z}\), with respect to a given trajectory set.

The assessment measures compare pairs of vectors regarding the vector similarity coefficient (R), i.e., the mean of the inner product of normalized vector pairs from 2 vector fields A and B, defined as [16],

$$\begin{aligned} R = \frac{1}{|\mathcal {P} |} \sum _{i \in \mathcal {P}} \hat{\mathbf {t}}^A_i \cdot \hat{\mathbf {t}}^B_i\,; \end{aligned}$$
(9)

and the vector root mean square length (RMSL), i.e., the systematic difference in the mean vector length, defined as [16],

$$\begin{aligned} \text {RMSL} = L_V^2 = \frac{1}{|\mathcal {P} |} \sum _{i \in \mathcal {P}} \Big \Vert \, \mathbf {t}^A_i - \mathbf {t}^B_i \, \Big \Vert _2^2\,; \end{aligned}$$
(10)

where “\(\cdot \)" is the inner product, \(\hat{\mathbf {t}} = \frac{\mathbf {t}}{\sqrt{\Vert \mathbf {t} \Vert _2^2}}\), \(\Vert \,.\, \Vert _2\) represents the \(l_2\)-norm of a vector, and \(|\,.\, |\) represents the cardinality of a set. In the case of accuracy assessment, \(A= \mathbf {T}_{\text {ref}}\), \(B= \mathbf {T}_{\text {est}}\), and \(\mathcal {P} = \mathcal {G}\), the set of grid nodes. In the case of correct sparsity assessment, \(A= \mathbf {T}_{0}\), \(B= \mathbf {T}_{\text {est}}\), and \(\mathcal {P} = \mathcal {Z}\), the set of non-active nodes. The optimal values for these measures are (R, RMSL)\(= (1,1)\) and RMSL\(=0\), respectively for accuracy and sparsity assessments.

Data Set. The synthetic data set (\(D_0\)) has 300 trajectories generated using 6 different activity patterns. We use \(D_0\) as a proof of concept for the assessment of accuracy and correct sparsity of the estimated vector fields.

Fig. 3.
figure 3

Vector field estimates for the 6 activity patterns of the synthetic data set (\(D_0\)). Activity patterns are ordered across rows from left to right (top: 1, 2, 3; bottom: 4, 5, 6). Generating vector fields are shown in red, estimated vector fields in blue, generated trajectories in gray. (Color figure online)

Results. Figure 3 shows that the vector field estimates are very similar to the generating vector fields not only in terms of magnitude (RMSL) and direction (R) but also regarding sparsity in the regions with no data, as expected. The accuracy of the estimated vector fields (activity patterns) is respectively (R, RMSL) 1: (0.920, 0.663); 2: (0.952, 2.305); 3: (0.982, 0.979); 4: (0.973, 0.723); 5: (0.954, 9.787); 6: (0.968, 0.763), and the RMSL of the vector field estimates corresponds to sparse vector fields, i.e., all bellow 2.94e−04.

4.2 Real Data

Assessment Measures. Activity pattern labeling accuracy is computed by comparing attributed and known trajectory labels taking into account the cutoff threshold as

$$\begin{aligned} Acc= \frac{\sum \text {diag}(\mathcal {M})}{\sum _{ij} \mathcal {M}_{ij}}\,, \end{aligned}$$
(11)

where \(\mathcal {M}\) is the confusion matrix in a problem with multiple activity patterns.

Data Set. The real data sets we used are: \(D_1\) (Hu), containing 1500 trajectories with 15 activity patterns [6]; \(D_2\) (Wang), containing 220 trajectories with 11 activity patterns [14]; \(D_3\) (Morris), containing 1900 trajectories with 19 activity patterns [11]. We use these data sets to assess activity pattern labeling and outlier detection.

Table 1. Overall accuracy of trajectory labeling for the proposed approach and comparison with literature results [6, 11, 14].

Results. Table 1 shows that overall the proposed algorithm correctly labels trajectories according to their activity patterns with an accuracy above that described in the literature. More specifically with higher accuracy than Heat-map, HMM, and DPMM, which are comparable generative models used to describe activity patterns [6, 11, 14, 15].

Regarding outlier detection, note that the proposed algorithm always assigns an activity pattern to a trajectory – the attributed activity pattern is the one that generates trajectories with the smallest displacement error relative to the test trajectory. However, if the displacement error is above the threshold the respective trajectory is plotted in a different colour than the others and tagged as an outlier. Figure 4 shows examples of 2 activity patterns for each real data set, which have similar motion patterns but different semantic meanings. Concerning Activity Pattern I, only \(D_1\) has outlier trajectories from two other activity patterns (shown in different colors, Fig. 4 middle row). Concerning Activity Pattern II, all the data sets have outlier trajectories from Activity Pattern I, and \(D_1\) also has outlier trajectories form one additional activity pattern (Fig. 4 bottom row).

Fig. 4.
figure 4

Examples of trajectory labeling using the real data sets (\(D_1\), \(D_2\), and \(D_3\)), only two example activity patterns are shown for each data set. Top row. Overview of background image and test trajectories. Middle row. Correctly labelled trajectories and over-imposed estimated vector fields (black arrows) for Activity Pattern I. Bottom row. Correctly labelled trajectories, over-imposed estimated vector fields (black arrows), and outliers (shown in different colors) for Activity Pattern II. (Color figure online)

The proposed approach yields vector field abstractions that can distinguish between similar activity patterns with different underlying semantics, given that for each data set, trajectories which were wrongly labelled as having one activity pattern were correctly detected as outliers of that activity pattern. For example, the green outlier trajectories from \(D_1\) (Fig. 4 bottom left panel) are detected as outliers of that activity pattern – whereas the vector field of interest describes a left turn into the primary road, the outlier trajectories correspond to targets that instead performed a left turn into a secondary road. Similar examples for the other two data sets are shown in the bottom row of Fig. 4.

5 Conclusion

We proposed a vector field estimation approach that copes with dense trajectory data and yields compact abstractions of frequent activity patterns. The proposed approach abstracts frequently observed activity patterns and embeds data-driven sparsity, through the estimated spatial Probability Density Function (PDF) of the targets positions. Moreover, it informs about the physical and semantic meaning of the observed activity patterns. Finally, the estimated vector fields can be used to label new trajectories and detect outliers according to their activity pattern, with an improvement of about 5–12% on trajectory labeling accuracy when compared to other generative models.