Elsevier

Pattern Recognition

Volume 45, Issue 12, December 2012, Pages 4192-4203
Pattern Recognition

Higher rank Support Tensor Machines for visual recognition

https://doi.org/10.1016/j.patcog.2012.04.033Get rights and content

Abstract

This work addresses the two class classification problem within the tensor-based large margin classification paradigm. To this end, we formulate the higher rank Support Tensor Machines (STMs), in which the parameters defining the separating hyperplane form a tensor (tensorplane) that is constrained to be the sum of rank one tensors. Subsequently, we propose two extensions in which the separating tensorplanes take into consideration the spread of the training data along the different tensor modes. More specifically, we first propose the higher rank Σ/Σw STMs that use the total or the within-class covariance matrix in order to whiten the data and thus provide invariance to affine transformations. Second, we propose the higher rank Relative Margin Support Tensor Machines (RMSTMs) that bound from above the distance of the data samples from the separating tensorplane while maximizing the margin from it. The corresponding optimization problem is solved in an iterative manner utilizing the CANDECOMP/PARAFAC (CP) decomposition, where at each iteration the parameters corresponding to the projections along a single tensor mode are estimated by solving a typical Support Vector Machine (SVM)-type optimization problem. The efficiency of the proposed method is illustrated on the problems of gait and action recognition where we report results that improve, in some cases considerably, the state of the art.

Highlights

► We introduce the higher rank two class Support Tensor Machines. ► We introduce the higher rank two class Σ/Σw Support Tensor Machines. ► We introduce the higher rank two class Relative Margin Support Tensor Machines.

Introduction

Tensors constitute a natural way of representing multidimensional objects, whose elements are indexed by more than two indices. Images, grayscale and color videos can be regarded as 2nd, 3rd and 4th order tensors (an example of a 3rd order tensor of an image sequence can be seen in Fig. 1). The advantages of using tensorial representations and setting the corresponding machine learning problems within the tensorial framework have recently attracted significant interest from the research community. Within the last decade, several works proposed tensor-based extensions of fundamental methods. For example, Principal Component Analysis was extended to Multilinear Principal Component Analysis (MPCA) in [1], Linear Discriminant Analysis (LDA) to Multilinear Discrimininant Analysis (MDA) in [2], two class Support Vector Machines (SVMs) to Support Tensor Machines (STMs) in [3], Non-negative Matrix Factorization (NMF) to Non-negative Tensor Factorization (NTF) in [4], [5], [6] and Vector Correlation to Canonical Analysis Correlation of tensors (CAC) in [7].

The extension of the above-mentioned algorithms within the tensorial framework has led to considerable performance improvements. This is due to the fact that the proposed algorithms can better retain and utilize information about the structure of the high dimensional space the data lie in, for example about the spatial arrangement of the pixel-based features in a 2D image. By contrast, vector schemes discard this information since the input vectors that they use are created by stacking the rows or columns of the original data tensors in a rather arbitrary way. In this way high dimensional vectors are derived. This may lead to overfitting, especially for small sample size problems. Tensor-based algorithms on the other hand decompose the whole problem in a number of smaller and simpler ones, each of which is typically defined over a certain tensor mode and is of lower dimensionality. The introduction of tensors has been shown to reduce the degree of overfitting that appears in vector-based learning techniques when few training samples are available [3].

In this paper, we exploit the advantages of tensor-based framework in order to address the classification problem within the large margin paradigm. To this end, we formulate the higher rank STMs, in which the parameter defining the separating hyperplane from a tensor (tensorplane) is constrained to be the sum of rank one tensors.

The benefits of the above-mentioned proposed scheme are twofold. First, the use of direct tensor representations is intuitively closer to the idea of properly processing tensorial input data, as the data topology is more efficiently retained. The huge dimensionality of the vectorized form of the input data especially when only a few training samples are available can cause overtraining problems and therefore classification errors at SVMs. The second theoretical benefit lies in the use of the CP decomposition. This is a general decomposition that decomposes a tensor into a sum of component rank-one tensors. Our framework allows in that way multiple projections of the input tensor along each mode leading to considerable improvement in the discriminative ability of the resulting classifier. Therefore, the choice of a tensorial representation for the weight parameters, derived using the CP decomposition, is a sound one, confirmed by the results acquired from the experiments we have conducted.

Subsequently, we propose two extensions in which the separating tensorplanes take into consideration the spread of the training data along the different tensor modes. More specifically, we first propose the higher rank Σ/Σw STMs that use the total or the within-class covariance matrix in order to whiten the data and thus provide invariance to affine transformations. Second, we propose the higher rank Relative Margin Support Tensor Machines (RMSTMs) that bound from above the distance of the data samples from the separating tensorplane while maximizing the margin from it. Both methods are inspired by very recent works in the vector domain [8] but this is the first time that they are formulated within the tensor-based framework. The corresponding optimization problems are solved in an iterative manner utilizing the CANDECOMP/PARAFAC (CP) decomposition, where at each iteration the parameters corresponding to the projections along a single tensor mode are estimated by solving a typical SVM-type optimization problem. We demonstrate the strengths of the proposed classifiers in terms of recognition accuracy on the problems of gait and action recognition where we report results that improve, in some cases considerably, the state of the art.

Summarizing, the contributions of this paper are as follows:

  • We introduce the higher rank two class STMs.

  • We introduce the higher rank two class Σ/Σw STMs.

  • We introduce the higher rank two class RMSTMs.

The rest of the paper is organized as follows. In Section 2, we briefly organize and review related works in tensor-based formulations and related works that in the large margin classification paradigm take into consideration the data variance, as well as related works in action and gait recognition. Some useful notations that will be used throughout the paper are presented in Section 3. In Section 4, we introduce the novel algorithms that are able to handle tensorial representations of the data. In detail, we present the higher rank STMs in Section 4.1 and introduce the higher rank Σ/Σw STMs in Section 4.2. In Section 4.3 we formulate the higher rank RMSTMs and also present their fast implementation (Section 4.4). All of the algorithms presented in Section 4 assume that the weights are in a tensorial format and can be implemented using typical SVMs. The power of the proposed classifiers is demonstrated on the gait and actions recognition problems in Section 5. Finally, conclusions are drawn in Section 6.

Section snippets

Related work

In [3], the SVMs learning framework was extended to handle tensorial input. More precisely, a two class STM formulation was proposed, where the weight parameters were defined as rank one tensors, one for every mode of the input tensor. This is in contrast to the approach followed in this work, where the weight parameters are defined as a higher rank tensor, that can be written as a sum of rank one tensors following the rationale behind the CP decomposition, creating in that way a novel higher

Useful notations in multilinear algebra

An n-th order tensor is a collection of measurements indexed by n indices, each index corresponding to a mode. Vectors are first-order and matrices are second-order tensors [27]. We will use lower case letters (e.g. x) to denote scalars, boldface lowercase letters (e.g. x) and boldface capital letters (e.g. X) to denote vector and matrices, respectively. Tensors of order 3 or higher will be denoted by boldface Euler script calligraphic letters (e.g. X).

The i-th element of a vector xR+I is

Higher rank Support Tensor Machines

In this section we will propose the higher rank Support Tensor Machines (STMs) and also present their novel extensions, in order to address the classification problem within the large margin paradigm. More specifically we will formulate the higher rank STMs, in which the weights parameter is constrained to be a tensor written as the sum of rank one tensors according to the CP decomposition. We will also proceed with the higher rank Σ/Σw STMs that use the total or the within-class covariance

Experimental results

In this section, we will present experimental results that show the superiority of the proposed higher rank STMs, Σ/Σw STMs and RMSTMs over vector-based methods and rank one STMs. We address the gait recognition and the human action recognition problems using publicly available databases. For both problems we study the influence of the rank R.

Note that both the gait and the action recognition problems are multiclass classification problems. We address them by combining multiple two class

Conclusions

In this work, we addressed the two class classification problem within the tensor-based large margin classification framework. More precisely, we first proposed the higher rank Support Tensor Machines (STMs), in which the parameters defining the separating hyperplane form a tensor (tensorplane) that can be written as a sum of rank one tensors, according to the CANDECOMP/PARAFAC (CP) decomposition. We subsequently proposed the higher rank Σ/Σw STMs, that use the total or the within-class

Acknowledgment

This work was supported by the EPSRC grant ‘Recognition and Localization of Human Actions in Image Sequences’ (EP/G033935/1).

Irene Kotsia received the diploma and Ph.D. in Informatics from the Aristotle University of Thessaloniki, Greece, in 2002 and 2008, respectively. From 2008 to 2009 she was a Research Associate in Artificial Intelligence and Information Analysis (AIIA) laboratory in the Department of Informatics at Aristotle University of Thessaloniki. Since September 2009 she has been a Research Associate with the Multimedia and Vision Research group (MMV) in School of Electronic Engineering and Computer

References (38)

  • N.V. Boulgouris et al.

    Gait recognition using linear time normalization

    Pattern Recognition

    (2006)
  • M. Ahmad et al.

    Human action recognition using shape and CLG-motion flow from multi-view image sequences

    Pattern Recognition

    (2008)
  • H. Lu et al.

    MPCA: multilinear principal component analysis of tensor objects

    IEEE Transactions on Neural Networks

    (2008)
  • S. Yan et al.

    Multilinear discriminant analysis for face recognition

    IEEE Transactions on Image Processing

    (2007)
  • D. Tao et al.

    Supervised tensor learning

    Knowledge and Information Systems

    (2007)
  • S. Zafeiriou

    Discriminant nonnegative tensor factorization algorithms

    IEEE Transactions on Neural Networks

    (2009)
  • S. Zafeiriou et al.

    Nonnegative tensor factorization as an alternative Csiszar–Tusnady procedure: algorithms, convergence, probabilistic interpretations and novel probabilistic tensor latent variable analysis algorithms, Data Mining and Knowledge Discovery 1–48

    (2011)
  • S. Zafeiriou

    Algorithms for nonnegative tensor factorization

    Tensors in Image Processing and Computer Vision

    (2009)
  • T.-K. Kim et al.

    Canonical correlation analysis of video volume tensors for action categorization and detection

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2009)
  • P. Shivaswamy et al.

    Maximum relative margin and data-dependent regularization

    Journal of Machine Learning Research

    (2010)
  • P.N. Belhumeur et al.

    Eigenfaces vs. fisherfaces: recognition using class specific linear projection

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1997)
  • S. Mika et al.

    Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2003)
  • K.-R. Muller et al.

    An introduction to kernel-based learning algorithms

    IEEE Transactions on Neural Networks

    (2001)
  • I. Kotsia et al.

    Facial expression recognition in image sequences using geometric deformation features and support vector machines

    IEEE Transactions on Image Processing

    (2007)
  • P. Shivaswamy, T. Jebara, Ellipsoidal kernel machines, in: Proceedings of the Eleventh International Conference on...
  • A. Sundaresan et al.

    Multicamera tracking of articulated human motion using shape and motion cues

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2009)
  • D. Ramanan, D. A. Forsyth, Automatic annotation of everyday movements, in: Proceedings of Advances in Neural...
  • A. Oikonomopoulos et al.

    Spatiotemporal salient points for visual recognition of human actions

    IEEE Transactions on Systems, Man and Cybernetics—Part B: Cybernetics

    (2006)
  • K. Rapantzikos, Y. Avrithis, S. Kollias, Dense saliency-based spatiotemporal feature points for action recognition, in:...
  • Cited by (0)

    Irene Kotsia received the diploma and Ph.D. in Informatics from the Aristotle University of Thessaloniki, Greece, in 2002 and 2008, respectively. From 2008 to 2009 she was a Research Associate in Artificial Intelligence and Information Analysis (AIIA) laboratory in the Department of Informatics at Aristotle University of Thessaloniki. Since September 2009 she has been a Research Associate with the Multimedia and Vision Research group (MMV) in School of Electronic Engineering and Computer Science, Queen Mary, University of London. She has coauthored many journal publications in a number of scientific journals, including IEEE Transactions on Image Processing, IEEE Transactions on Neural Networks and IEEE Transactions on Forensics and Security. Her current research interests lie in the areas of image and signal processing, statistical pattern recognition especially for human actions localization and recognition, facial expression recognition from static images and image sequences as well as in the areas of graphics and animation.

    Weiwei Guo received the B.Sc. and M.Sc. degrees in Information and Communication Engineering from the National University of Defense Technology, Changsha, China, in 2005 and in 2007, respectively. He is currently pursuing his Ph.D. Degree in Queen Mary, University of London, UK. His main research interests lie in pattern recognition and machine learning and their applications in the fields of computer vision.

    Ioannis (Yiannis) Patras received the B.Sc. and M.Sc. degrees in Computer Science from the Computer Science Department, University of Crete, Heraklion, Greece, in 1994 and in 1997, respectively, and the Ph.D. degree from the Department of Electrical Engineering, Delft University of Technology, The Netherlands, in 2001. He has been a Postdoctorate Researcher in the area of multimedia analysis at the University of Amsterdam, and a Postdoctorate Researcher in the area of vision-based human machine interaction at TU Delft. Between 2005 and 2007 was a Lecturer in Computer Vision at the Department of Computer Science, University of York, York, UK. Since 2007 he is a Lecturer in Computer Vision in the Department of Electronic Engineering in the Queen Mary, University of London. He is/has been in the organizing committee of IEEE SMC 2004 and of Face and Gesture Recognition 2008 and is the general chair of WIAMIS 2009. He is associate editor in the Image and Vision Computing Journal and in the Journal of Multimedia. His research interests lie in the areas of computer vision and pattern recognition, with emphasis on motion analysis, and their applications in multimedia data management, multimodal human computer interaction, and visual communications. Currently, he is interested in the analysis of Human Motion, including the detection, tracking and understanding of facial and body gestures.

    View full text