Extended histogram: probabilistic modelling of video content temporal evolutions

Shabaninia, Elham; Naghsh-Nilchi, Ahmad Reza; Kasaei, Shohreh

doi:10.1007/s11045-018-0550-z

Extended histogram: probabilistic modelling of video content temporal evolutions

Published: 03 February 2018

Volume 30, pages 175–193, (2019)
Cite this article

Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Elham Shabaninia¹,
Ahmad Reza Naghsh-Nilchi¹ &
Shohreh Kasaei²

178 Accesses
2 Citations
Explore all metrics

Abstract

A probabilistic video content analysis method called extended histogram (EH) is proposed for modelling temporal evolutions of a set of histograms extracted from video frames. In EH, the number of counts for each histogram bin is considered as a random variable (instead of a single value) to account for bin variations. This representation is especially suitable for modelling the dynamic behaviour of a tracked video content of interest in a general manner. The pitfall of such a modelling is its negligence of the temporal order of observations in the collection. To overcome that problem, a hierarchical approach called hierarchical extended histogram (HEH) is proposed for extracting EHs in different levels of the temporal pyramid. Once these generative models are identified for each video, an information-based metric is proposed to be used for defining the similarity of the two EHs. Having this metric, EHs can be used in many different tasks including video retrieval, classification, summarization, and so forth. Especially in the case of discriminant learning, probabilistic kernels based on this metric are also defined to be able to use EHs/HEHs alongside machine learning models such as the SVM. Person re-identification and human action recognition are used as pilot applications to show the capabilities of proposed representations. Experimental results show the significant effectiveness of proposed models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compact Video Description and Representation for Automated Summarization of Human Activities

Abnormal Event Detection Based on Multi-scale Markov Random Field

Video Tracking with Probabilistic Cooccurrence Feature Extraction

References

Aggarwal, J., & Ryoo, M. S. (2011). Human activity analysis: A review. ACM Computing Surveys (CSUR), 43(3), 16.
Article Google Scholar
Bazzani, L., Cristani, M., & Murino, V. (2013). Symmetry-driven accumulation of local features for human characterization and re-identification. Computer Vision and Image Understanding, 117(2), 130–144.
Article Google Scholar
Bedagkar-Gala, A., & Shah, S. K. (2014). A survey of approaches and trends in person re-identification. Image and Vision Computing, 32(4), 270–286.
Article Google Scholar
Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer.
MATH Google Scholar
Chaudhry, R., Ravichandran, A., Hager, G., & Vidal, R. (2009). Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In Computer vision and pattern recognition, CVPR 2009. IEEE conference on (pp. 1932–1939). IEEE.
Chen, L., Wei, H., & Ferryman, J. (2013a). A survey of human motion analysis using depth imagery. Pattern Recognition Letters, 34(15), 1995–2006.
Article Google Scholar
Chen, Y., Lin, W., Zhang, C., Chen, Z., Xu, N., & Xie, J. (2013b). Intra-and-inter-constraint-based video enhancement based on piecewise tone mapping. IEEE Transactions on Circuits and Systems for Video Technology, 23(1), 74–82.
Article Google Scholar
Cippitelli, E., Gasparrini, S., Gambi, E., & Spinsante, S. (2016). A human activity recognition system using skeleton data from RGBD sensors. Computational Intelligence and Neuroscience, 2016, 4351435. https://doi.org/10.1155/2016/4351435.
Costantini, L., Seidenari, L., Serra, G., Capodiferro, L., & Del Bimbo, A. (2011). Space-time Zernike moments and pyramid kernel descriptors for action classification. In International conference on image analysis and processing (pp. 199–208). Berlin: Springer.
Faria, D. R., Premebida, C., & Nunes, U. (2014). A probabilistic approach for human everyday activities recognition using body motion from RGB-D images. In Robot and human interactive communication, 2014 RO-MAN: The 23rd IEEE international symposium on (pp. 732–737). IEEE.
Fathi, A., & Naghsh-Nilchi, A. R. (2012). Noise tolerant local binary pattern operator for efficient texture analysis. Pattern Recognition Letters, 33(9), 1093–1100.
Article Google Scholar
Gaglio, S., Re, G. L., & Morana, M. (2015). Human activity recognition process using 3-D posture data. IEEE Transactions on Human-Machine Systems, 45(5), 586–597.
Article Google Scholar
Gao, C., Wang, J., Liu, L., Yu, J.-G., & Sang, N. (2016). Temporally aligned pooling representation for video-based person re-identification. In Image processing (ICIP), 2016 IEEE international conference on (pp. 4284–4288). IEEE.
Grauman, K., & Darrell, T. (2005). The pyramid match kernel: Discriminative classification with sets of image features. In Computer vision, 2005. ICCV 2005. Tenth IEEE international conference on (Vol. 2, pp. 1458–1465). IEEE.
Gray, D., Brennan, S., & Tao, H. (2007). Evaluating appearance models for recognition, reacquisition, and tracking. In IEEE international workshop on performance evaluation of tracking and surveillance. Citeseer
Gupta, R., Chia, A. Y.-S., & Rajan, D. (2013). Human activities recognition using depth images. In Proceedings of the 21st ACM international conference on multimedia (pp. 283–292). ACM.
Haibin, L., & Jacobs, D. W. (2005). Using the inner-distance for classification of articulated shapes. In Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer society conference on (Vol. 2, Vol. 712, pp. 719–726). https://doi.org/10.1109/cvpr.2005.362.
Hershey, J. R., & Olsen, P. A. (2007). Approximating the Kullback–Leibler divergence between Gaussian mixture models. In Acoustics, speech and signal processing, 2007. ICASSP. IEEE international conference on (Vol. 4, pp. IV-317–IV-320). IEEE.
Hirzer, M., Beleznai, C., Roth, P. M., & Bischof, H. (2011). Person re-identification by descriptive and discriminative classification. In Image analysis (pp. 91–102). Berlin: Springer.
Javed, O., Shafique, K., Rasheed, Z., & Shah, M. (2008). Modeling inter-camera space time and appearance relationships for tracking across non-overlapping views. Computer Vision and Image Understanding, 109(2), 146–162.
Article Google Scholar
Ji, X., Cheng, J., Tao, D., Wu, X., & Feng, W. (2017). The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences. Knowledge-Based Systems, 122, 64–74.
Article Google Scholar
Johnson, A. E., & Hebert, M. (1999). Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(5), 433–449.
Article Google Scholar
Karanam, S., Li, Y., & Radke, R. J. (2015). Sparse re-id: Block sparsity for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 33–40).
Koppula, H. S., Gupta, R., & Saxena, A. (2013). Learning human activities and object affordances from RGB-D videos. The International Journal of Robotics Research, 32(8), 951–970.
Article Google Scholar
Lazebnik, S., Schmid, C., & Ponce, J. (2005). A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1265–1278.
Article Google Scholar
Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145–151.
Article MathSciNet MATH Google Scholar
Lin, W., Shen, Y., Yan, J., Xu, M., Wu, J., Wang, J., et al. (2017). Learning correspondence structures for person re-identification. IEEE Transactions on Image Processing, 26(5), 2438–2453.
Article MathSciNet MATH Google Scholar
Ling, H., & Okada, K. (2006). Diffusion distance for histogram comparison. In Computer vision and pattern recognition, 2006 IEEE computer society conference on (Vol. 1, pp. 246–253). IEEE.
Liu, Z., Chen, J., & Wang, Y. (2016). A fast adaptive spatio-temporal 3D feature for video-based person re-identification. In Image processing (ICIP), 2016 IEEE international conference on (pp. 4294–4298). IEEE.
Madden, C., Cheng, E. D., & Piccardi, M. (2007). Tracking people across disjoint camera views by an illumination-tolerant appearance representation. Machine Vision and Applications, 18(3–4), 233–247.
Article MATH Google Scholar
McLaughlin, N., Martinez del Rincon, J., & Miller, P. (2016). Recurrent convolutional network for video-based person re-identification. In The IEEE conference on computer vision and pattern recognition (CVPR).
Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.
Article Google Scholar
Moreno, P. J., Ho, P. P., & Vasconcelos, N. (2003). A Kullback–Leibler divergence based kernel for SVM classification in multimedia applications. In Advances in neural information processing systems.
Mortensen, E. N., Deng, H., & Shapiro, L. (2005). A SIFT descriptor with global context. In Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer society conference on (Vol. 1, pp. 184–190): IEEE.
Ni, B., Pei, Y., Moulin, P., & Yan, S. (2013). Multilevel depth and image fusion for human activity detection. IEEE Transactions on Cybernetics, 43(5), 1383–1394.
Article Google Scholar
Oreifej, O., & Liu, Z. (2013). Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In Computer vision and pattern recognition (CVPR), 2013 IEEE conference on (pp. 716–723). IEEE.
Osada, R., Funkhouser, T., Chazelle, B., & Dobkin, D. (2002). Shape distributions. ACM Transactions on Graphics (TOG), 21(4), 807–832.
Article MathSciNet MATH Google Scholar
Parisi, G. I., Weber, C., & Wermter, S. (2015). Self-organizing neural integration of pose-motion features for human action recognition. Frontiers in Neurorobotics, 9, 3.
Article Google Scholar
Poppe, R. (2010). A survey on vision-based human action recognition. Image and Vision Computing, 28(6), 976–990.
Article Google Scholar
Posada, D., & Buckley, T. R. (2004). Model selection and model averaging in phylogenetics: Advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology, 53(5), 793–808.
Article Google Scholar
Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121.
Article MATH Google Scholar
Shotton, J., Cook, A. F. M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake A. (2011). Real-time human pose recognition in parts from a single depth image. In CVPR.
Sung, J., Ponce, C., Selman, B., & Saxena, A. (2012). Unstructured human activity detection from RGBD images. In Robotics and automation (ICRA), 2012 IEEE international conference on (pp. 842–849). IEEE.
Tu, Z., & Yuille, A. L. (2004). Shape matching and recognition-using generative models and informative features. In Computer vision-ECCV 2004 (pp. 195–209). Berlin: Springer.
Wang, T., Gong, S., Zhu, X., & Wang, S. (2016). Person re-identification by discriminative selection in video ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 2501–2514.
Article Google Scholar
Xia, L., Chen, C.-C., & Aggarwal, J. (2012) View invariant human action recognition using histograms of 3D joints. In Computer vision and pattern recognition workshops (CVPRW), 2012 IEEE computer society conference on (pp. 20–27). IEEE.
Xie, J., Lin, W., Li, H., Xu, N., Gao, H., & Zhang, L. (2011). A new temporal-constraint-based algorithm by handling temporal qualities for video enhancement. In Circuits and systems (ISCAS), 2011 IEEE international symposium on (pp. 2789–2792). IEEE.
Yang, X., & Tian, Y. (2014). Effective 3D action recognition using eigenjoints. Journal of Visual Communication and Image Representation, 25(1), 2–11.
Article MathSciNet Google Scholar
You, J., Wu, A., Li, X., & Zheng, W.-S. (2016). Top-push video-based person re-identification. In CVPR.
Zhu, X., Jing, X.-Y., Wu, F., & Feng, H. (2016). Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. In IJCAI.

Download references

Author information

Authors and Affiliations

Department of Artificial Intelligence, Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran
Elham Shabaninia & Ahmad Reza Naghsh-Nilchi
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Shohreh Kasaei

Authors

Elham Shabaninia
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Reza Naghsh-Nilchi
View author publications
You can also search for this author in PubMed Google Scholar
Shohreh Kasaei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmad Reza Naghsh-Nilchi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shabaninia, E., Naghsh-Nilchi, A.R. & Kasaei, S. Extended histogram: probabilistic modelling of video content temporal evolutions. Multidim Syst Sign Process 30, 175–193 (2019). https://doi.org/10.1007/s11045-018-0550-z

Download citation

Received: 13 March 2017
Revised: 09 December 2017
Accepted: 06 January 2018
Published: 03 February 2018
Issue Date: 15 January 2019
DOI: https://doi.org/10.1007/s11045-018-0550-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extended histogram: probabilistic modelling of video content temporal evolutions

Abstract

Access this article

Similar content being viewed by others

Compact Video Description and Representation for Automated Summarization of Human Activities

Abnormal Event Detection Based on Multi-scale Markov Random Field

Video Tracking with Probabilistic Cooccurrence Feature Extraction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Extended histogram: probabilistic modelling of video content temporal evolutions

Abstract

Access this article

Similar content being viewed by others

Compact Video Description and Representation for Automated Summarization of Human Activities

Abnormal Event Detection Based on Multi-scale Markov Random Field

Video Tracking with Probabilistic Cooccurrence Feature Extraction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation