Analysis and performance evaluation of optical flow features for dynamic texture recognition

https://doi.org/10.1016/j.image.2007.05.013Get rights and content

Abstract

We address the problem of dynamic texture (DT) classification using optical flow features. Optical flow based approaches dominate among the currently available DT classification methods. The features used by these approaches often describe local image distortions in terms of such quantities as curl or divergence. Both normal and complete flows have been considered, with normal flow (NF) being used more frequently. However, precise meaning and applicability of normal and complete flow features have never been analysed properly. We provide a principled analysis of local image distortions and their relation to optical flow. Then we present the results of a comprehensive DT classification study that compares the performances of different flow features for a NF algorithm and four different complete flow algorithms. The efficiencies of two flow confidence measures are also studied.

Introduction

Dynamic textures (DTs) such as waves, smoke, or fire, are time-varying, non-figurative visual patterns represented by image sequences possessing certain temporal stationarity. Description and recognition of DTs have attracted growing attention since the early nineties, when the pioneering paper by Nelson and Polana [19] was published. Polana and Nelson [25] categorise visual motion into three classes: activities, motion events and dynamic (temporal) textures. Activities are defined as motion patterns that are periodic in time and localised in space. Motion events do not show temporal or spatial periodicity. Finally, DTs exhibit statistical regularity but have indeterminate spatial and temporal extent. A generic property of DTs is their intrinsic dynamics, that is, motion that cannot be compensated by any global spatial transformation. For example, translational motion of an otherwise static texture does not generate a real (strong) DT. At most, it can be considered as a weak DT.

Computer vision deals with the analysis of the three classes of visual motion. Table 1 summarises the tasks of the analysis in the case of DTs, as compared to the case of traditional, static textures. Classification means recognition of pre-segmented spatial or spatiotemporal patterns. Segmentation is partitioning an image or a video into homogeneous areas or volumes. In the supervised case, this is basically classification of blocks of data, windows or cubes. The unsupervised segmentation is data clustering. In both cases, some of the data may be rejected as outliers: non-texture, or unknown texture. Detection is finding a specific or any texture in the data. When a reference pattern is given, this is again a classification problem; otherwise, one faces the problem of defining and estimating generic properties such as statistical regularity or intrinsic dynamics. Defect detection is in a sense opposite to the previous task, since it assumes that a vast majority of the data is a regular pattern, while defects or events are detected as outliers in the data. An interesting example of event in DT is unusual behaviour in a walking crowd. Finally, separation of DTs aims at separating partially transparent layers of DTs, such as smoke covering a river. Since the separation is mainly possible due to motion, this task seems to have no analogue in the static case.

In this paper, we discuss the problem of DT classification based on optical flow. A recent brief survey of DT description and recognition [5] mentions five categories of approaches to DT recognition: methods based on optical flow [19], [25], [2], [7], [8], [21], [22], [17], [23], methods computing geometric properties in the spatiotemporal domain [20], [34], methods based on local spatiotemporal filtering [33], methods using global spatiotemporal transforms [28] and, finally, model-based methods that use estimated model parameters as features [27], [10]. Methods based on optical flow are currently the most popular because optic flow estimation is a computationally efficient and natural way to characterise the local dynamics of a temporal texture. It helps to reduce DT analysis to analysis of a sequence of instantaneous motion patterns viewed as static textures. When necessary, image texture features can be added to the motion features, to form a complete feature set for motion- and appearance-based recognition.

The goal of our study is to investigate the meaning and the efficiency of optical flow features in DT classification. For this reason, this paper is limited to methods based on optical flow; other methods, such as model-based ones [11] or class-specific ones (e.g., for fire and smoke [31]) are not discussed. This does not imply that optical flow methods are completely sufficient for DT recognition. For a discussion of other approaches to DT, the reader is referred to the survey [5].

In this study, we concentrate on the dynamic component of a temporal texture. The appearance component can be added when desired, for example, by using colour and shape for better segmentation. In the sequel, we intentionally ignore the appearance component in order to understand how much information visual motion can provide about the classes of natural and artificial processes captured by videos.

It is well known [16] that, due to the aperture problem, only the normal flow (NF) can be computed unless a larger region is considered and additional smoothness constraints are introduced. NF is a component of the complete flow (CF) vector which is orthogonal to the contour and (anti-) parallel to the spatial image gradient. Its computation only needs the three partial derivatives of the spatiotemporal image function. Being purely local, NF does not tend to extend motion over discontinuities. However, as illustrated in Fig. 1, NF is noise-prone even in its regularised form (smoothing, thresholding). Its close relation to the spatial gradient, that is, to contours and shapes, implies that NF features correlate with appearance features. This was acknowledged in [25] as a negative aspect, but no real solution was proposed. Fablet and Bouthemy [8] even claim that the direction of NF contains no independent motion information; they only use magnitude. The examples of NF fields given in the literature do not reflect well the visual dynamics of the processes.

The regularised CF vector field is much better in this respect. (See Figs. 1 and 2.) However, its calculation requires more effort, and care should be taken not to enforce smoothness across motion discontinuities. Both problems have been addressed in the recent research on optical flow estimation [3], [1]. Using modern multigrid numerical schemes, one can achieve near real-time performance on a general-purpose computer [4].

This paper is a significantly extended and revised version of the workshop paper [9]. Its main contributions are as follows. Comparing NF features to CF features in DT classification is our main goal. To achieve it, we first provide a detailed analysis of local image distortions. Precise definitions of DT features describing such distortions are given. Then we present the results of DT classification tests comparing the performances of NF and four different CF algorithms with various sets of flow features and on two different DT data sets. The efficiencies of two flow confidence measures are also studied.

Section snippets

Previous work

In the pioneering studies by Nelson and Polana [19], [25], NF was used to form features characterising the overall magnitude and directionality of motion, as well as local image deformations due to motion. Spatial co-occurrence matrices for NF directions within pixel neighbourhood were considered to obtain directional difference statistics. The directionality was evaluated by accumulating a coarse histogram of flow directions and computing the absolute deviation from a uniform distribution.

Optical flow

In this section, we give a brief overview of the optical flow algorithms used in our comparative study. Let I(x,y,t) denote the image intensity in the point (x,y) at time t, v=(vx,vy) the (complete) optical flow vector which describes the apparent motion of intensity in the image plane. The well-known optical flow constraint equation [14]xIvx+yIvy+tI=0can easily be derived from the assumption of intensity conservation (d/dt)I(x,y,t)=0. Here zff/z is a partial derivative and x=x(t), y=y(t)

Image distortion features

In this section we give a detailed analysis of image distortions due to motion and define a complete set of DT features describing such distortions. Most of these features have already been used for DT classification, while some are new. In any case, we try to clarify the precise meaning of a feature, in order to better understand its potential discriminative power in DT classification.

The features are based on the optical flow which provides, in a straightforward way, information on

Comparative experimental study

In this section we present the results of an experimental study whose goal is to compare NF features and CF features in DT classification. Different spatiotemporal divisions and averaging procedures are considered, since this affects the classification results. The subsequences obtained by the division are used as samples in learning the classes and in the subsequent classification. Feature values are calculated for each sample of each DT. Then a simple leave-one-out test with the Euclidean

Conclusion

Addressing the problem of DT recognition using optical flow features, we presented a principled analysis of local image distortions and their relation to optical flow. The theoretical analysis of image distortion features was supported by a comprehensive comparative performance evaluation on two databases. Summarising our study, we conclude that for a small number of low-quality DTs the proposed features computed for NF provide an acceptable result. The additional effort of computing the CF for

Acknowledgements

This work was supported by the EU Network of Excellence MUSCLE (FP6-507752). The authors thank T. Brox, A. Bruhn, N. Papenberg and J. Weickert, as well as T. Amiaz and N. Kiryati, for providing their programs for optical flow computation.

References (34)

  • B. Horn et al.

    Determining optical flow

    Artif. Intell.

    (1981)
  • R. Nelson et al.

    Qualitative recognition of motion using temporal texture

    CVGIP: Image Understanding

    (1992)
  • T. Amiaz, N. Kiryati, Dense discontinuous optical flow via contour-based segmentation, in: ICIP, vol. 3, 2005, pp....
  • P. Bouthemy, R. Fablet, Motion characterization from temporal cooccurrences of local motion-based measures for video...
  • T. Brox, A. Bruhn, N. Papenberg, J. Weickert, High accuracy optical flow estimation based on a theory for warping, in:...
  • A. Bruhn, J. Weickert, C. Feddern, T. Kohlberger, C. Schnörr, Real-time optic flow computation with variational...
  • D. Chetverikov, R. Péteri, A brief survey of dynamic texture description and recognition, in: Fourth International...
  • D. Edwards et al.

    Motion field estimation for temporal textures

  • R. Fablet, P. Bouthemy, Motion recognition using spatio-temporal random walks in sequence of 2D motion-related...
  • R. Fablet et al.

    Motion recognition using nonparametric image motion models estimated from temporal and multiscale co-occurrence statistics

    IEEE Trans. PAMI

    (2003)
  • S. Fazekas, D. Chetverikov, Normal versus complete flow in dynamic texture recognition: a comparative study, in:...
  • K. Fujita, S. Nayar, Recognition of dynamic textures using impulse responses of state variables, in: 3rd International...
  • J. Grim, M. Haindl, P. Somol, P. Pudil, A subspace approach to texture modelling by using Gaussian mixtures, in: ICPR,...
  • H. Haussecker et al.

    Tensor-based image sequence processing techniques for the study of dynamical processes

  • B.K.P. Horn

    Robot Vision

    (1986)
  • Intel Corporation, Microprocessor Research Labs, OpenCV: Open Source Computer Vision Library,...
  • B. Jähne

    Digital Image Processing

    (1997)
  • Cited by (44)

    • Water detection through spatio-temporal invariant descriptors

      2017, Computer Vision and Image Understanding
      Citation Excerpt :

      For the optical flow, 4 flow statistics are computed on estimated flows and averaged over the video. These statistics include characteristic direction, characteristic magnitude, divergence, and curl (Fazekas and Chetverikov, 2007). The optical flow baseline is performed both using the flow algorithm of Lucas and Kanade (1981) and using the flow algorithm of Horn and Schunck (1981).

    • Dynamic texture recognition by aggregating spatial and temporal features via ensemble SVMs

      2016, Neurocomputing
      Citation Excerpt :

      Difficulties arise when simultaneously modeling the spatial and temporal patterns to form the unified spatio-temporal description, subject to the requirement of rigorous mathematical or physical derivation. Motivated by the motion-based features, optical flow [5–10] computes frame-to-frame motion estimation, but the assumptions of brightness constancy and local smoothness are sometimes undesired for dynamic textures, not mentioning the chaotic dynamics in DT. Data-dependent feature extraction limits the generalization to wider dynamic texture classes.

    • A complex network approach for dynamic texture recognition

      2015, Neurocomputing
      Citation Excerpt :

      Most of the existing methods can be divided into four categories: (i) motion-based methods, (ii) spatio-temporal filtering and transform-based methods, (iii) model-based methods, and (iv) spatio-temporal geometric property based methods. The methods from the motion-based category are the most popular due to the efficient estimation of measures from motion, such as optical flow [2–4]. Motion is a natural way to describe the local dynamics and then stands for a powerful cue for visual recognition.

    • Video fire detection - Review

      2013, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      Currently, a wide variety of methods including geometric, model-based, statistical and motion based techniques are used for dynamic texture detection [48–50]. In Fig. 5, dynamic texture detection and segmentation examples are shown, which use video clips from the DynTex dynamic texture and Bilkent databases [51,52,50,47]. Contours of dynamic texture regions, e.g., fire, water and steam, are shown in this figure.

    • Dynamic texture analysis and segmentation using deterministic partially self-Avoiding walks

      2013, Expert Systems with Applications
      Citation Excerpt :

      Most of the methods can be classified according to how they characterize dynamic textures into four categories: (i) motion-based methods, (ii) spatio-temporal filtering and transform based methods, (iii) model based methods, and (iv) spatio-temporal geometric property based methods. The motion-based methods are the most popular due to the efficient estimation of measures based on motion, such as optical flow (Fablet & Bouthemy, 2003; Fazekas & Chetverikov, 2007; Polana & Nelson, 1997, chap. 5). These methods reduce the dynamic texture representation to the analysis of a sequence of motion patterns.

    View all citing articles on Scopus
    View full text