Keywords

1 Introduction

Improvements in camera technology make the video surveillance systems easily accessible. For this reason, application areas of video surveillance systems are broad. Together with this progress, user expectations have induced new challanges to the field. The biggest challange is that automated handling of some tasks became mandatory for surveillance systems. Activity perception and anomaly detection are among those important tasks for surveillance systems. Many approaches have been proposed in literature for anomaly detection and activity perception in scenes. These approaches generally differ from each other with respect to the visual features they utilize. Despite some difficulties in the extraction stage, especially in crowded scenes, trajectory is still one of the most useful features for an object of interest.

Trajectory is 2D or 3D time series data depending on application. It carries position information of the moving object with respect to time. Other valuable information such as velocity can also be derived from trajectory data. Therefore, trajectory data is crucial for several surveillance applications. In maritime surveillance, trajectory of a vessel is the biggest clue about its behaviour. A hijacked plane can be identified from its trajectory in aviation surveillance. For video surveillance, trajectories of the objects in the scene gives information about motion patterns. Also, trajectory of a high speed car will be different from others and can be identified as an anomaly. As can be seen from the examples, trajectories are valuable features of moving objects to handle tasks such as anomaly detection and activity perception.

In this work, a novel descriptor is proposed for trajectories using feature covariance matrices. A feature vector is defined for each point of the trajectory and a feature matrix is obtained by concatenating these vectors. The proposed descriptor is the covariance of the feature matrix. By representing trajectories via feature covariance matrices, essentially, a novel distance measure is introduced for trajectories. This measure is capable of calculating the distance between the trajectories of different lengths. Since covariance matrices lie on Riemannian manifolds, a distance metric which is capable of measuring geodesic distance is utilized while calculating the distance between the trajectories. Another contribution of the work is the achievement of anomaly detection by sparse representations on nearest neighbors. The proposed anomaly detection approach based on sparse representation optimizes the number and weights of the nearest neighbors while setting up an anomaly measure. Spectral clustering is essential block of the activity perception. Distances determined through the covariance matrices are transformed to similarities to build a similarity graph. Activity perception is then treated to extract the dominant motion patterns in the scene through the use of spectral clustering.

Organization of the paper is as follows. A brief literature review is given in Sect. 2. In Sect. 3, the proposed representation for trajectories is introduced. Anomaly detection approach based on sparse representation of nearest neighbors is described in Sect. 4. Activity perception through clustering of trajectories using spectral clustering algorithm is presented in Sect. 5. Experimental results on both synthetic and real datasets are given in Sect. 6. Conclusion is the last section of the paper.

2 Related Work

Feature covariance matrices are first proposed and used as descriptors in [1]. The covariance descriptor basically enables to determine the distance between two instances by representing the instances by their features and their covariance matrix of the feature matrices. After it is proposed in [1] for object detection and classification, covariance descriptor is exploited to solve several computer vision problems such as visual tracking [2], action recognition [35], and saliency detection [6]. In all of these works, covariance descriptor is utilized as region descriptor. Some optical flow components are included in the feature vector, however, in none of them, covariance descriptor is used to describe a 2D time series.

Trajectory is a spatiotemporal feature for a moving object and carries information about its journey in the scene. Hence, it is important to get information about the activities and it is used for activity perception in previous works [79]. While analyzing trajectories, the critical point is the selection of proper distance measure. Several distance measures [7, 1014] for trajectories have been proposed so far. Two excellent review papers [15, 16] compare different distance measures for trajectories.

Anomaly detection and activity perception are two important problems for surveillance systems. In recent years, there are many successful works that handle these problems for realistic scenarios. For anomaly detection, in [17], authors use a mixture of temporal and spatial models to detect the anomalies and in [18], they extend the models to multiple scales to detect anomalies at different spatial and temporal scales. A Gaussian Mixture Model (GMM) based probabilistic model is fit to particle trajectories which is extracted by particle advection in [19]. Trajectories that do not fit to this model are labeled as anomalies. Aside from computer vision community, there are other works focusing on anomaly detection on trajectories. Laxhammar et al. [20] apply their anomaly detector called conformal anomaly detector to the trajectories. In [13], a 1-class Support Vector Machine (SVM) is utilized to detect the anomalous trajectories. The most interesting part of the study is the introduction of a faster solution for SVM training in the presence of outliers. An outlier detection method which is based on the concept of discords is introduced in [21]. Discord for an instance is an another instance that has maximum Euclidean distance to its nearest neighbor.

Nonparametric Bayesian models are widely used for activity perception in recent years. Starting from the pioneer work [22], there are significant works [23, 24] in this path. In [22], nonparametric Bayesian models are adapted to activity perception in visual scenes by modeling the motions in the scene as visual words, short video clips as documents and activities as topics. Follow-up works [23, 24] adapt Markov models to learn the temporal dependencies between activities.

There is a recent approach [25] that considers the trajectories on Riemannian manifolds. The method is based on a representation called transported square-root vector field (TSRVF) and L2 norm on the space of TSRVFs. Authors have also applied their methods to visual speech recognition problem in [26]. In this method, trajectories are mapped into a tangent space by parametrization via its TSVRF. TSVRF formulation includes the derivative and square root of the derivative of the parametrized version of the trajectory. To conclude, the method has a similar idea with our method; however, in our method feature covariance matrices are exploited to map the trajectories to Riemannian manifolds.

3 Trajectory Representation by Feature Covariance Matrices

Trajectories can be considered as time series of 2D coordinates. For a visual scene, there might be lots of trajectories of different lengths. In order to analyze these trajectories, first, a similarity or a distance function should be defined. In this work, we propose to describe the trajectories with covariance matrices of their features. By doing so, all trajectories are transferred to space of Riemannian manifolds and similarities between them are calculated in this set.

A 2D trajectory can be defined as sequential concatenation of K points or more formally as a Kx2 matrix, [\(x_{1}\) \(y_{1}\),...,\(x_{K}\) \(y_{K}\)], as shown in Fig. 1. A point of a trajectory can also be defined by its features

$$\begin{aligned} f = [x\ y\ v_{x}\ v_{y}\ t] \end{aligned}$$
(1)

where x and y define position, \(v_{x}\) and \(v_{y}\) are velocities in x and y directions respectively and t is the time index. During experiments, several features including cumulative sum, acceleration etc. have been examined to increase the performance. However, the best performance values are obtained with feature set defined in Eq. 1. For the whole trajectory, feature matrix can be defined similarly as

$$\begin{aligned} F = \begin{bmatrix} x_{2}\ y_{2}\ v_{x_{2}}\ v_{y_{2}}\ t_{2} \\ . \\ . \\ . \\ x_{K}\ y_{K}\ v_{x_{K}}\ v_{y_{K}}\ t_{K}\end{bmatrix} \end{aligned}$$
(2)
Fig. 1.
figure 1

Representation of trajectories using feature covariance matrices. Trajectory from synthetic dataset [13] is shown as a sample. Length of the trajectories is 16 in the dataset. A feature vector is formed for all points in the trajectory except the first point. For such a case, F matrix will be 15\(\,\times \,\)5 including feature vectors of all trajectories and the resulting covariance matrix will be 5\(\,\times \,\)5.

Feature covariance matrix is determined as

$$\begin{aligned} C = \frac{1}{K} \sum _{k=1}^K (F_{k}-\mu )(F_{k}-\mu )^T \end{aligned}$$
(3)

where \(\mu \) is the mean vector of all instances in matrix F. At this point, a small multiple of the identity matrix is added to covariance matrices. This regularization is performed to ensure the positive definiteness of the covariance matrix. Positive definiteness is important for the distance metric which involves a logarithm operation.

It should be noted that for all trajectories of different lengths, we end up with a 5\(\,\times \,\)5 covariance matrix. This enable us to determine the similarity between the trajectories of different lengths. After covariance representation, trajectories are carried onto Riemannian manifolds. The critical point from now on is to calculate the distances between the trajectories on Riemannian manifolds.

A distance measure that approximate the geodesic distance between two points on Riemannian manifolds must be used. For this purpose, as previous works [13] that utilize covariance matrices suggested, Euclidean distance metrics must be avoided. We use log-Euclidean metric which was first proposed in [27] between covariance matrices. Compared to other distance metrics [28] and divergence functions [29], the best performance is achieved by using log-Euclidean metric in this study. Log-Euclidean metric is, in principle, based on matrix logarithms. The determination of the log-Euclidean metric starts with the eigenvalue decomposition of covariance matrices.

$$\begin{aligned} C = VQV^T \end{aligned}$$
(4)

After this eigenvalue decomposition, matrix logarithm is obtained as

$$\begin{aligned} log(C) \triangleq V\widetilde{Q}V^T \end{aligned}$$
(5)

where \(\widetilde{Q}\) is a diagonal matrix obtained from Q by replacing Q’s diagonal entries by their logarithms. The distance between covariance matrices is calculated via Frobenius norm of the distance matrix logarithms.

$$\begin{aligned} \rho (C_{1},C_{2}) = \left\| log(C_{1})-log(C_{2}) \right\| _{F} \end{aligned}$$
(6)

Now, we can calculate all the distances between trajectories via feature covariance matrices. In subsequent sections, anomaly detection and activity perception problems will be based on these distances. Anomaly detection is carried out by a novel approach based on sparse representation of nearest neighbors. Activity perception is achieved by forming a similarity matrix from pairwise distances and utilizing this in spectral clustering.

4 Anomaly Detection on Trajectories

In this work, trajectories are utilized as the feature of objects in the scene. Therefore, to detect the anomalies in the scene, anomalies are determined by detecting anomalous trajectories. An anomalous trajectory can be described as a sample that does not fit to motion patterns in the scene. Based on this definition, the nearest neighbor approach can be considered the simplest solution for anomaly detection. The distance to the nearest neighbor can be a good measure for some cases while deciding anomalies. However, depending on structure of the data and amount of anomalous observations, distance to the nearest neighbor might not be a good alternative. In this work, we propose a method which considers the distances to a set of nearest neighbors and tries to optimize the weights and number of nearest neighbors. A scenario is depicted in Fig. 2 to explain the necessity of the algorithm.

After representation of trajectories via covariance matrices and calculation of distances between trajectories, anomaly detection is carried out by using a measure comprising distances to nearest neighbors. For this purpose, we select nearest neighbors through a sparse representation. In this approach, an anomaly measure is calculated via weighted sum of distances to nearest neighbors for each sample. Our goal is to optimize the number of the neighbors and their weights while deciding if an instance is anomaly or not.

Fig. 2.
figure 2

A scenario to explain the necessity of sparse anomaly detection algorithm. For some anomalies, the nearest neighbor or a weighted sum of nearest neighbors might not be a good anomaly measure. Anomalies are shown inside the red dashed ellipse. For anomalies in the orange circle, the distance to third nearest neighbor should be included in the anomaly measure. (Color figure online)

In sparse anomaly detection approach, the data is assumed to be offline and available to be divided into uniform parts. In particular, we exploit some part of the data for training and derive optimal weights of nearest neighbors from this subset. Same number of data samples are taken into testing process. Anomaly measure is composed of distances to K nearest neighbors for each sample in the training set of M samples.

$$\begin{aligned} A_{i} = w_{1}s_{1}+...+w_{K}s_{K} \end{aligned}$$
(7)

where \(s_i\),..., \(s_K\) are distances to K nearest neighbors. Equation 7 can be written in a matrix form

$$\begin{aligned} A_{i} = \begin{bmatrix} w_{1}\ ... w_{K}\end{bmatrix} \begin{bmatrix} s_{1}\\ \\.\\ s_{K}\end{bmatrix} = W S_{i} \end{aligned}$$
(8)

and finally when all instances are considered

$$\begin{aligned} A = WS \end{aligned}$$
(9)

where A is a 1\(\times \)M vector consisting of anomaly measures for all instances, W is a 1xK vector consisting of weights of K nearest neighbors.

While deciding on anomalies, we can consider a fixed percentage or a fixed number of data points as anomaly. It is also possible that instances whose anomaly measures are above a threshold should be considered as anomalies. For all cases, a nonlinear function, f, is needed to map the anomaly measure to the decision of an anomaly.

$$\begin{aligned} L = f(A,\mu ) \end{aligned}$$
(10)

where \(\mu \) represents the parameter of the nonlinear function, threshold value on anomaly measure or percentage on samples. L is the label vector that designates if a sample is an anomaly or not. Besides, since there might be several combinations of weighted neighbors for each instance, a minimum number of neighbors should be used. Therefore, combining with previous observations, the optimization problem can be summarized as

$$\begin{aligned} w = \underset{w}{{\text {argmin}}} \left\{ \lambda |w|_{0} + |L_{gt}-f(A,\mu )| \right\} \end{aligned}$$
(11)

where \(L_{gt}\) is the ground truth of label vector in the training set. Since L0 norm is a nonconvex function, L1 norm is a first alternative to L0 norm. However, L2 norm guarantees the positive weights in our problem. Then, the final optimization becomes

$$\begin{aligned} w = \underset{w}{{\text {argmin}}} \left\{ \lambda |w|_{2} + |L_{gt}-f(A,\mu )| \right\} \end{aligned}$$
(12)

In our experiments, we show that anomaly detection with sparse representation gives better results than the single use of nearest neighbors or equally weighted of them.

5 Activity Perception via Trajectories

Activity perception is the second problem for which the proposed representation is exploited. An activity can be considered as a set of similar trajectories. Clustering is the direct solution for the identification of these sets or activities. Therefore, in this work, activity perception is handled with clustering of trajectories.

Describing trajectories through the utilization of feature covariance matrices enables us to construct a similarity matrix between trajectories of different lengths. This similarity matrix can be used to build an undirected graph which allows extracting the motion patterns in the scene. Spectral clustering methods are popular since they are capable of handling non-convex patterns in the data. As in [30], the similarity matrix is built using the distances derived with feature covariance matrices

$$\begin{aligned} s_{ij} = e^{-d_{ij}^2/2\sigma ^2} \end{aligned}$$
(13)

where \(d_{ij}\) is the distance between the trajectories i and j. Spectral clustering is achieved by the clustering of eigenvectors of a matrix called Laplacian. In its unnormalized formulation, Laplacian is the difference of the degree matrix and the similarity matrix.

$$\begin{aligned} L = D - S \end{aligned}$$
(14)

where D is a diagonal matrix which contains sum of each row of similarity matrix (or column depending on its symmetry). Laplacian matrix is normalized as in [30] to handle the clusters of different sizes.

$$\begin{aligned} L = I - D^{-1/2}SD^{-1/2} \end{aligned}$$
(15)

where L is the normalized Laplacian and D is degree matrix. Clusters are determined by applying k-means algorithm on eigenvectors of normalized Laplacian.

6 Experiments

During experiments, a synthetic dataset and two real datasets are exploited. Synthetic dataset first built in [21] is used. The real datasets are UCSD anomaly detection [17, 18] and MIT Parking Lot [8]. It is better to mention about two practical details before experimental results. First, the regularization parameter mentioned after Eq. 3 is selected as 0.005 in all experiments. Secondly, in all real datasets, a size threshold is applied to eliminate small tracks.

There is no ground truth data for anomalies or activities in MIT Parking Lot [8] dataset. In UCSD case [17, 18], anomaly ground data are frame based and not appropriate for our approach. Therefore, quantitative results cannot be produced for these datasets.

Anomaly detection on trajectories was carried out on both synthetic and real datasets to evaluate the performance of the proposed representation. For synthetic dataset case, the dataset generated in [13] is exploited and compared with the results acquired in [13, 20, 21]. This dataset includes 1000 subsets and in each subset, there are 260 trajectories. In each subset, last 10 trajectories are anomalous. Comparative results are given in Table 1 for this dataset and a sample result is shown in Fig. 3. As can be seen in Table 1, the proposed representation has outperformed the state-of-the-art techniques just by utilizing the distance to nearest neighbor only.

Synthetic dataset is also exploited while probing the performance of sparse anomaly detection. Sparse anomaly detection is implemented through running of Monte Carlo simulations in synthetic dataset. In each run, we select 100 sets for training from the whole dataset including 1000 sets. The remainder of the dataset is used for testing. Sparse representation or the weights of the nearest neighbors are applied to the testing set. As shown in Table 1, the best results are obtained with the combination of proposed trajectory representation and sparse anomaly detector.

Table 1. Accuracies of anomaly detection methods for the synthetic dataset built in [13]. The proposed representation outperforms the state-of-the-art techniques with use of anomaly measures, nearest neighbors (NN) and sparse representation (SR). Sparse representation also gives better results compared to single use of nearest neighbor.
Fig. 3.
figure 3

A sample result for anomaly detection in the synthetic dataset. Ten samples are shown for each cluster of normal trajectories. Anomalous trajectories are indicated with bold magenta lines. (Color figure online)

Fig. 4.
figure 4

Clustering result of a set of trajectories in the synthetic dataset. Starting and ending points of the trajectories are indicated by green and red circles, respectively. Correct clustering rate is achieved as 0.9055 for the whole dataset which contains 1000 trajectory sets as in this sample. (Color figure online)

Synthetic dataset is also utilized for the activity perception part. The previously mentioned 250 non-anomalous trajectories in the dataset belong to five equal size clusters. Similarity matrix shown in Fig. 5 is used to obtain the clustering result shown in Fig. 4. Similarity matrix shown in Fig. 5 also gives an idea about the usefulness of the representation. Correct clustering rate is 0.9055 for whole dataset which contains 1000 subsets of 260 trajectories.

Fig. 5.
figure 5

Similarity matrix of trajectories given in Fig. 4. Five clusters can be observed together with the anomalies which lie in the last rows and columns.

Fig. 6.
figure 6

Some examples of anomaly detection results in UCSD dataset. Results lied in the rows are from two different scenes in the dataset. Images in the first two rows indicate the most anomalous trajectories in the folders of Test014, Test019, Test022, Test024 of UCSDped1 scene and the ones in the third and fourth rows are for Test003, Test005, Test006, Test009 of UCSDped2 scene. Starting points are shown with green star and end points with red star, respectively. (Color figure online)

Fig. 7.
figure 7

Some anomalous trajectories from MIT Parking Lot dataset. These anomalous trajectories might be result of problems in extraction stage.

Fig. 8.
figure 8

Trajectory patterns in MIT Parking Lot dataset. Number of clusters is set to eight to get these results in the final k-means step of spectral clustering. Starting and ending points are shown with green and red circles, respectively. (Color figure online)

For real dataset case, UCSD anomaly detection [17, 18] and MIT Parking Lot [8] datasets are utilized. In UCSD dataset, there are sequences of two scenes. For these scenes, training and test sequences are also provided. Anomalies are motions of non-pedestrian objects such as cars, skaters and bicyclists. A critical issue for UCSD anomaly detection dataset is the extraction of trajectories. For this purpose, KLT tracker used in [31] is exploited to extract the trajectories. After extraction of trajectories for both training and test sequences, a covariance matrix for each trajectory is calculated. Covariance matrix of each trajectory in the test sequences is compared with all covariance matrices of the trajectories in the training set. Training dataset is not sufficiently large to calculate a sparse representation for this dataset. Therefore, a predefined anomaly measure which is the combination of three nearest neighbors are used in experiments. For some test sequences from two scenes of the dataset, the most anomalous trajectories are shown in Fig. 6.

The next dataset utilized is MIT Parking Lot dataset [8]. This dataset comprises trajectories captured from a parking lot containing trajectories of cars and people. There are certain motion patterns in the scene and the dataset is exploited to detect these motion patterns or activities. In this work, the dataset is used for both of anomaly detection and activity perception. Anomaly detection is again based on distance to three nearest neighbors. Anomaly detection results are shown in Fig. 7. Some sharp trajectories which are caused by an error in extraction stage are labeled as anomaly. Activity perception is carried out by a forming a similarity matrix and applying spectral clustering. In the dataset, there are 40453 trajectories and spectral clustering might not be computationally manageable when applied to the whole dataset. However, aforementioned size limit makes the spectral clustering feasible. The activity perception results are given in Fig. 8. The number of clusters is set to eight to achieve these results in the final k-means step of spectral clustering. Obviously, some clusters contain more than one meaningful motion pattern. It is observed that these motion patterns are extracted when the number of clusters are set to a bigger number. A potential improvement lies in this part of the study. A clustering algorithm without specifying the number of clusters and still work on similarity matrices will be a good alternative to spectral clustering.

7 Conclusion

In this work, we propose a novel approach by describing trajectories with feature covariance matrices. We study the problems of anomaly detection and clustering for trajectories in this context. Feature covariance matrices enable us to measure similarity between trajectories of different lengths. Also, conducted experiments show that covariance descriptor for trajectories yields satisfactory results compared to the state of the art.

We have also introduced a sparse anomaly detector to decide the number and the weights of the nearest neighbors that should be used. This sparse representation can be applied to other similar problems. The only requirement is to have a training dataset for which annotated anomaly data is given.

The whole study has been conducted with the assumption that crowd density allows to extract trajectories for each object. In dense crowd scenarios, other approaches such as particle advection used in [19] might be more feasible to obtain trajectories.

There is a possible improvement in activity perception part of this work. Instead of standard spectral clustering approach for which the number of clusters must be given, another clustering approach on distance matrix calculated with the representation can be used.

A representation is proposed for time series of 2D data in this study. A possible extension of this work is shape classification problem. Although there is no time information and invariance on rotation and scale could be problematic, feature covariance matrices can be used to describe 2D shapes.