Skip to main content
Log in

Extended histogram: probabilistic modelling of video content temporal evolutions

  • Published:
Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Abstract

A probabilistic video content analysis method called extended histogram (EH) is proposed for modelling temporal evolutions of a set of histograms extracted from video frames. In EH, the number of counts for each histogram bin is considered as a random variable (instead of a single value) to account for bin variations. This representation is especially suitable for modelling the dynamic behaviour of a tracked video content of interest in a general manner. The pitfall of such a modelling is its negligence of the temporal order of observations in the collection. To overcome that problem, a hierarchical approach called hierarchical extended histogram (HEH) is proposed for extracting EHs in different levels of the temporal pyramid. Once these generative models are identified for each video, an information-based metric is proposed to be used for defining the similarity of the two EHs. Having this metric, EHs can be used in many different tasks including video retrieval, classification, summarization, and so forth. Especially in the case of discriminant learning, probabilistic kernels based on this metric are also defined to be able to use EHs/HEHs alongside machine learning models such as the SVM. Person re-identification and human action recognition are used as pilot applications to show the capabilities of proposed representations. Experimental results show the significant effectiveness of proposed models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Aggarwal, J., & Ryoo, M. S. (2011). Human activity analysis: A review. ACM Computing Surveys (CSUR), 43(3), 16.

    Article  Google Scholar 

  • Bazzani, L., Cristani, M., & Murino, V. (2013). Symmetry-driven accumulation of local features for human characterization and re-identification. Computer Vision and Image Understanding, 117(2), 130–144.

    Article  Google Scholar 

  • Bedagkar-Gala, A., & Shah, S. K. (2014). A survey of approaches and trends in person re-identification. Image and Vision Computing, 32(4), 270–286.

    Article  Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer.

    MATH  Google Scholar 

  • Chaudhry, R., Ravichandran, A., Hager, G., & Vidal, R. (2009). Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In Computer vision and pattern recognition, CVPR 2009. IEEE conference on (pp. 1932–1939). IEEE.

  • Chen, L., Wei, H., & Ferryman, J. (2013a). A survey of human motion analysis using depth imagery. Pattern Recognition Letters, 34(15), 1995–2006.

    Article  Google Scholar 

  • Chen, Y., Lin, W., Zhang, C., Chen, Z., Xu, N., & Xie, J. (2013b). Intra-and-inter-constraint-based video enhancement based on piecewise tone mapping. IEEE Transactions on Circuits and Systems for Video Technology, 23(1), 74–82.

    Article  Google Scholar 

  • Cippitelli, E., Gasparrini, S., Gambi, E., & Spinsante, S. (2016). A human activity recognition system using skeleton data from RGBD sensors. Computational Intelligence and Neuroscience, 2016, 4351435. https://doi.org/10.1155/2016/4351435.

  • Costantini, L., Seidenari, L., Serra, G., Capodiferro, L., & Del Bimbo, A. (2011). Space-time Zernike moments and pyramid kernel descriptors for action classification. In International conference on image analysis and processing (pp. 199–208). Berlin: Springer.

  • Faria, D. R., Premebida, C., & Nunes, U. (2014). A probabilistic approach for human everyday activities recognition using body motion from RGB-D images. In Robot and human interactive communication, 2014 RO-MAN: The 23rd IEEE international symposium on (pp. 732–737). IEEE.

  • Fathi, A., & Naghsh-Nilchi, A. R. (2012). Noise tolerant local binary pattern operator for efficient texture analysis. Pattern Recognition Letters, 33(9), 1093–1100.

    Article  Google Scholar 

  • Gaglio, S., Re, G. L., & Morana, M. (2015). Human activity recognition process using 3-D posture data. IEEE Transactions on Human-Machine Systems, 45(5), 586–597.

    Article  Google Scholar 

  • Gao, C., Wang, J., Liu, L., Yu, J.-G., & Sang, N. (2016). Temporally aligned pooling representation for video-based person re-identification. In Image processing (ICIP), 2016 IEEE international conference on (pp. 4284–4288). IEEE.

  • Grauman, K., & Darrell, T. (2005). The pyramid match kernel: Discriminative classification with sets of image features. In Computer vision, 2005. ICCV 2005. Tenth IEEE international conference on (Vol. 2, pp. 1458–1465). IEEE.

  • Gray, D., Brennan, S., & Tao, H. (2007). Evaluating appearance models for recognition, reacquisition, and tracking. In IEEE international workshop on performance evaluation of tracking and surveillance. Citeseer

  • Gupta, R., Chia, A. Y.-S., & Rajan, D. (2013). Human activities recognition using depth images. In Proceedings of the 21st ACM international conference on multimedia (pp. 283–292). ACM.

  • Haibin, L., & Jacobs, D. W. (2005). Using the inner-distance for classification of articulated shapes. In Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer society conference on (Vol. 2, Vol. 712, pp. 719–726). https://doi.org/10.1109/cvpr.2005.362.

  • Hershey, J. R., & Olsen, P. A. (2007). Approximating the Kullback–Leibler divergence between Gaussian mixture models. In Acoustics, speech and signal processing, 2007. ICASSP. IEEE international conference on (Vol. 4, pp. IV-317–IV-320). IEEE.

  • Hirzer, M., Beleznai, C., Roth, P. M., & Bischof, H. (2011). Person re-identification by descriptive and discriminative classification. In Image analysis (pp. 91–102). Berlin: Springer.

  • Javed, O., Shafique, K., Rasheed, Z., & Shah, M. (2008). Modeling inter-camera space time and appearance relationships for tracking across non-overlapping views. Computer Vision and Image Understanding, 109(2), 146–162.

    Article  Google Scholar 

  • Ji, X., Cheng, J., Tao, D., Wu, X., & Feng, W. (2017). The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences. Knowledge-Based Systems, 122, 64–74.

    Article  Google Scholar 

  • Johnson, A. E., & Hebert, M. (1999). Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(5), 433–449.

    Article  Google Scholar 

  • Karanam, S., Li, Y., & Radke, R. J. (2015). Sparse re-id: Block sparsity for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 33–40).

  • Koppula, H. S., Gupta, R., & Saxena, A. (2013). Learning human activities and object affordances from RGB-D videos. The International Journal of Robotics Research, 32(8), 951–970.

    Article  Google Scholar 

  • Lazebnik, S., Schmid, C., & Ponce, J. (2005). A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1265–1278.

    Article  Google Scholar 

  • Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145–151.

    Article  MathSciNet  MATH  Google Scholar 

  • Lin, W., Shen, Y., Yan, J., Xu, M., Wu, J., Wang, J., et al. (2017). Learning correspondence structures for person re-identification. IEEE Transactions on Image Processing, 26(5), 2438–2453.

    Article  MathSciNet  MATH  Google Scholar 

  • Ling, H., & Okada, K. (2006). Diffusion distance for histogram comparison. In Computer vision and pattern recognition, 2006 IEEE computer society conference on (Vol. 1, pp. 246–253). IEEE.

  • Liu, Z., Chen, J., & Wang, Y. (2016). A fast adaptive spatio-temporal 3D feature for video-based person re-identification. In Image processing (ICIP), 2016 IEEE international conference on (pp. 4294–4298). IEEE.

  • Madden, C., Cheng, E. D., & Piccardi, M. (2007). Tracking people across disjoint camera views by an illumination-tolerant appearance representation. Machine Vision and Applications, 18(3–4), 233–247.

    Article  MATH  Google Scholar 

  • McLaughlin, N., Martinez del Rincon, J., & Miller, P. (2016). Recurrent convolutional network for video-based person re-identification. In The IEEE conference on computer vision and pattern recognition (CVPR).

  • Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.

    Article  Google Scholar 

  • Moreno, P. J., Ho, P. P., & Vasconcelos, N. (2003). A Kullback–Leibler divergence based kernel for SVM classification in multimedia applications. In Advances in neural information processing systems.

  • Mortensen, E. N., Deng, H., & Shapiro, L. (2005). A SIFT descriptor with global context. In Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer society conference on (Vol. 1, pp. 184–190): IEEE.

  • Ni, B., Pei, Y., Moulin, P., & Yan, S. (2013). Multilevel depth and image fusion for human activity detection. IEEE Transactions on Cybernetics, 43(5), 1383–1394.

    Article  Google Scholar 

  • Oreifej, O., & Liu, Z. (2013). Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In Computer vision and pattern recognition (CVPR), 2013 IEEE conference on (pp. 716–723). IEEE.

  • Osada, R., Funkhouser, T., Chazelle, B., & Dobkin, D. (2002). Shape distributions. ACM Transactions on Graphics (TOG), 21(4), 807–832.

    Article  MathSciNet  MATH  Google Scholar 

  • Parisi, G. I., Weber, C., & Wermter, S. (2015). Self-organizing neural integration of pose-motion features for human action recognition. Frontiers in Neurorobotics, 9, 3.

    Article  Google Scholar 

  • Poppe, R. (2010). A survey on vision-based human action recognition. Image and Vision Computing, 28(6), 976–990.

    Article  Google Scholar 

  • Posada, D., & Buckley, T. R. (2004). Model selection and model averaging in phylogenetics: Advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology, 53(5), 793–808.

    Article  Google Scholar 

  • Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121.

    Article  MATH  Google Scholar 

  • Shotton, J., Cook, A. F. M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake A. (2011). Real-time human pose recognition in parts from a single depth image. In CVPR.

  • Sung, J., Ponce, C., Selman, B., & Saxena, A. (2012). Unstructured human activity detection from RGBD images. In Robotics and automation (ICRA), 2012 IEEE international conference on (pp. 842–849). IEEE.

  • Tu, Z., & Yuille, A. L. (2004). Shape matching and recognition-using generative models and informative features. In Computer vision-ECCV 2004 (pp. 195–209). Berlin: Springer.

  • Wang, T., Gong, S., Zhu, X., & Wang, S. (2016). Person re-identification by discriminative selection in video ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 2501–2514.

    Article  Google Scholar 

  • Xia, L., Chen, C.-C., & Aggarwal, J. (2012) View invariant human action recognition using histograms of 3D joints. In Computer vision and pattern recognition workshops (CVPRW), 2012 IEEE computer society conference on (pp. 20–27). IEEE.

  • Xie, J., Lin, W., Li, H., Xu, N., Gao, H., & Zhang, L. (2011). A new temporal-constraint-based algorithm by handling temporal qualities for video enhancement. In Circuits and systems (ISCAS), 2011 IEEE international symposium on (pp. 2789–2792). IEEE.

  • Yang, X., & Tian, Y. (2014). Effective 3D action recognition using eigenjoints. Journal of Visual Communication and Image Representation, 25(1), 2–11.

    Article  MathSciNet  Google Scholar 

  • You, J., Wu, A., Li, X., & Zheng, W.-S. (2016). Top-push video-based person re-identification. In CVPR.

  • Zhu, X., Jing, X.-Y., Wu, F., & Feng, H. (2016). Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. In IJCAI.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmad Reza Naghsh-Nilchi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shabaninia, E., Naghsh-Nilchi, A.R. & Kasaei, S. Extended histogram: probabilistic modelling of video content temporal evolutions. Multidim Syst Sign Process 30, 175–193 (2019). https://doi.org/10.1007/s11045-018-0550-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11045-018-0550-z

Keywords

Navigation