3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos

Jun Wan; Qiuqi Ruan; Wei Li; Gaoyun An; Ruizhen Zhao

doi:10.1117/1.JEI.23.2.023017

8 April 2014 3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos

Jun Wan, Qiuqi Ruan, Wei Li, Gaoyun An, Ruizhen Zhao

Author Affiliations +

Journal of Electronic Imaging, Vol. 23, Issue 2, 023017 (April 2014). https://doi.org/10.1117/1.JEI.23.2.023017

Abstract

Human activity recognition based on RGB-D data has received more attention in recent years. We propose a spatiotemporal feature named three-dimensional (3D) sparse motion scale-invariant feature transform (SIFT) from RGB-D data for activity recognition. First, we build pyramids as scale space for each RGB and depth frame, and then use Shi-Tomasi corner detector and sparse optical flow to quickly detect and track robust keypoints around the motion pattern in the scale space. Subsequently, local patches around keypoints, which are extracted from RGB-D data, are used to build 3D gradient and motion spaces. Then SIFT-like descriptors are calculated on both 3D spaces, respectively. The proposed feature is invariant to scale, transition, and partial occlusions. More importantly, the running time of the proposed feature is fast so that it is well-suited for real-time applications. We have evaluated the proposed feature under a bag of words model on three public RGB-D datasets: one-shot learning Chalearn Gesture Dataset, Cornell Activity Dataset-60, and MSR Daily Activity 3D dataset. Experimental results show that the proposed feature outperforms other spatiotemporal features and are comparative to other state-of-the-art approaches, even though there is only one training sample for each class.

Citation Download Citation

Jun Wan, Qiuqi Ruan, Wei Li, Gaoyun An, and Ruizhen Zhao "3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos," Journal of Electronic Imaging 23(2), 023017 (8 April 2014). https://doi.org/10.1117/1.JEI.23.2.023017

Published: 8 April 2014

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available