Abstract
In this paper, we address the problem of Content Based Video Retrieval using a multivariate time series modeling of features. We particularly focus on representing the dynamics of geometric features on the Spatio-Temporal Volume (STV) created from a real world video shot. The STV intrinsically holds the video content by capturing the dynamics of the appearance of the foreground object over time, and hence can be considered as a dynamical system. We have captured the geometric property of the parameterized STV using the Gaussian curvature computed at each point on its surface. The change of Gaussian curvature over time is then modeled as a Linear Dynamical System (LDS). Due to its capability to efficiently model the dynamics of a multivariate signal, Auto Regressive Moving Average (ARMA) model is used to represent the time series data. Parameters of the ARMA model are then used for video content representation. To discriminate between a pair of video shots (time series), we have used the subspace angle between a pair of feature vectors formed using ARMA model parameters. Experiments are done on four publicly available benchmark datasets, shot using a static camera. We present both qualitative and quantitative analysis of our proposed framework. Comparative results with three recent works on video retrieval also show the efficiency of our proposed framework.
Similar content being viewed by others
References
Aggarwal G, Chowdhury A, Chellappa R (2004) A system identification approach for video-based face recognition. In: ICPR, pp 175–178
Auguste R, El Ghini A, Bilasco M, Ihaddadene N, Djeraba C (2010) Motion similarity measure between video sequences using multivariate time series modeling. In: ICMWI, pp 292–296
Babu RV, Ramakrishnan KR (2007) Compressed domain video retrieval using object and global motion descriptors. Multiméd Tools Appl 32(1):93–113
Barnich O, Van Droogenbroeck M (2011) ViBe: a universal background subtraction algorithm for video sequences. IEEE Trans Image Process 20(6):1709–1724
Bashir FI, Member S, Khokhar AA, Member S, Schonfeld D, Member S (2007) Real-time motion trajectory-based indexing and retrieval of video sequences. IEEE Trans Multiméd 9:58–65
Bissacco A, Chiuso A, Ma Y, Soatto S (2001) Recognition of human gaits. In: CVPR, vol 2, pp 52–57
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. Trans Pattern Anal Mach Intell 23(3):257–267
Brendel W, Todorovic S (2010) Activities as time series of human postures. In: ECCV, pp 721–734
Chattopadhyay C, Das S (2012) A novel hyperstring based descriptor for an improved representation of motion trajectory and retrieval of similar video shots with static camera. In: Emerging Area in Information Technology (EAIT)
Chattopadhyay C, Das S (2012) Enhancing the MST-CSS representation using robust geometric features, for efficient content based video retrieval (CBVR). In: ISM
Chellappa R, Sankaranarayanan AC, Veeraraghavan A, Turaga P (2010) Statistical methods and models for video-based tracking, modeling, and recognition. Found Trends Signal Process 3:1–151
Chen CC, Ryoo MS, Aggarwal JK (2010) UT-tower dataset: aerial view activity classification challenge. http://cvrc.ece.utexas.edu/SDHA2010/Aerial_View_Activity.html
Chen PY, Chen ALP (2003) Video retrieval based on video motion tracks of moving objects. In: Proceedings of SPIE, vol 5307, pp 550–558
Cui B, Zhao Z, Tok WH (2012) A framework for similarity search of time series cliques with natural relations. IEEE Trans Knowl Data Eng 24(3):385–398
Das S, Chattopadhyay C, Dyana A (2011) Vidlookup: a web-based online CBVR system for query video shots. Demo at ICCV
Deng Y, Manjunath BS (1998) NeTra-V: toward an object-based video representation. IEEE Trans Circuits Syst Video Technol 8:616–627
Doretto G, Chiuso A, Wu YN, Soatto S (2003) Dynamic textures. Int J Compu Vis 51:91–109
Dyana A, Das S (2009) Trajectory representation using Gabor features for motion-based video retrieval. Pattern Recogn Lett 30: 877–892
Erol B, Kossentini F (2005) Shape-based retrieval of video objects. IEEE Trans Multiméd 7:179–182
Florez OU, Lim S (2009) Discovery of time series in video data through distribution of spatiotemporal gradients. ACM symposium on applied computing
Gao HP, Yang ZQ (2010) Content based video retrieval using spatiotemporal salient objects. In: Intelligence Information Processing and Trusted Computing (IPTC)
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Machine Intell 29(12):2247–2253
Hsieh JW, Yu SL, Chen YS (2006) Motion-based video retrieval by trajectory matching. IEEE Trans Circuits Syst Video Technol 16:396–409
Lee SL, Chun SJ, Kim DH, Lee JH, Chung CW (2000) Similarity search for multidimensional data sequences. In: Proceeding of ICDE
Liang B, Xiao W, Liu X (2012) Design of video retrieval system using MPEG-7 descriptors. Procedia Eng 29:2578–2582
Lin J, Li Y (2009) Finding structural similarity in time series data using bag-of-patterns representation. In: SSDBM, 2009, pp 461–477
Ma Y, Zhang H (2002) Motion texture: a new motion based video representation. In: International conference of pattern recognition
Martin R (2000) A metric for ARMA processes. IEEE Trans Signal Process 48(4):1164–1170
Meyers D, Skinner S, Sloan K (1992) Surfaces from Contours. ACM Trans Graphics 11(3):228–258
O’Neill B (1997) Elementary differential geometry, 2nd edn. Academic Press, New York. http://www.apnet.com
Popivanov I, Miller RJ (2002) Similarity search over time series data using wavelets. Proceedings of the 18th ICDE, pp 212–221
Rodriguez MD, Ahmed J, Shah M (2008) Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of CVPR
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. ICPR
Scott C, Nowak R (2006) Robust contour matching via the order-preserving assignment problem. IEEE Trans Image Process 15(7):1831–1838
Srivastava A, Turaga P, Kurtek S (2012) On advances in differential-geometric approaches for 2D and 3D shape analyses and activity recognition. IVC 30:398–416
Turaga P, Veeraraghavan A, Srivastava A, Chellappa R (2011) Statistical computations on Grassmann and Stiefel Manifolds for image and video-based recognition. IEEE Trans Pattern Anal Mach Intell 33(11):2273–2286
Turaga PK, Veeraraghavan A, Srivastava A, Chellappa R (2011) Statistical computations on Grassmann and Stiefel manifolds for image and video-based recognition. IEEE Trans Pattern Anal Mach Intell 33(11):2273–2286
VPLab-VID: http://www.cse.iitm.ac.in/vplab/videos.html
Yilmaz A, Shah M (2008) A differential geometric approach to representing the human actions. Comput Vis Image Underst 109(3):335–351
Zhang D, Zuo W, Zhang D, Zhang H (2010) Time series classification using support vector machine with gaussian elastic metric kernel. In: Proceeding of ICPR, pp 29–32
Acknowledgments
We are thankful to Prof. Sukhendu Das and the members of Visualization and Perception Lab for their support and comments. Additional thanks to Debarun Kar and Prahllad Deb for their insightful suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chattopadhyay, C., Maurya, A.K. Multivariate time series modeling of geometric features of spatio-temporal volumes for content based video retrieval. Int J Multimed Info Retr 3, 15–28 (2014). https://doi.org/10.1007/s13735-013-0042-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-013-0042-8