Skip to main content
Log in

Time-Delayed Correlation Analysis for Multi-Camera Activity Understanding

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We propose a novel approach to understanding activities from their partial observations monitored through multiple non-overlapping cameras separated by unknown time gaps. In our approach, each camera view is first decomposed automatically into regions based on the correlation of object dynamics across different spatial locations in all camera views. A new Cross Canonical Correlation Analysis (xCCA) is then formulated to discover and quantify the time delayed correlations of regional activities observed within and across multiple camera views in a single common reference space. We show that learning the time delayed activity correlations offers important contextual information for (i) spatial and temporal topology inference of a camera network; (ii) robust person re-identification and (iii) global activity interpretation and video temporal segmentation. Crucially, in contrast to conventional methods, our approach does not rely on either intra-camera or inter-camera object tracking; it thus can be applied to low-quality surveillance videos featured with severe inter-object occlusions. The effectiveness and robustness of our approach are demonstrated through experiments on 330 hours of videos captured from 17 cameras installed at two busy underground stations with complex and diverse scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximisation technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41(1), 164–171.

    Article  MATH  MathSciNet  Google Scholar 

  • Bhattacharyya, A. (1943). On a measure of divergence between two statistical populations defined by probability distributions. Bulletin of the Calcutta Mathematical Society, 35, 99–109.

    MATH  MathSciNet  Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    Article  MATH  Google Scholar 

  • Chen, T. P., Haussecker, H., Bovyrin, A., Belenov, R., Rodyushkin, K., Kuranov, A., & Eruhimov, V. (2005). Computer vision workload analysis: Case study of video surveillance systems. Intel Technology Journal, 9(2), 109–118.

    Google Scholar 

  • Cohen, N., Gatusso, J., & MacLennan-Brown, K. (2006). CCTV operational requirements manual—is your CCTV system fit for purpose? Home Office Scientific Development Branch, version 4 (55/06) edition.

  • Comaniciu, D., Ramesh, V., & Meer, P. (2000). Real-time tracking of non-rigid objects using mean shift. In IEEE international conference on computer vision and pattern recognition, pp. 142–149.

  • Du, Y., Chen, F., & Xu, W. (2007). Human interaction representation and recognition through motion decomposition. IEEE Signal Processing Letters, 14(12), 952–955.

    Article  Google Scholar 

  • Friedman, N., & Russell, S. (1997). Image segmentation in video sequences: a probabilistic approach. In Uncertainty in artificial intelligence, pp. 175–181.

  • Fukunaga, K., & Hostetler, L. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions of Information Theory, 21, 32–40.

    Article  MATH  MathSciNet  Google Scholar 

  • Gheissari, N., Sebastian, T. B., Rittscher, J., & Hartley, R. (2006). Person reidentification using spatiotemporal appearance. In IEEE international conference on computer vision and pattern recognition, pp. 1528–1535.

  • Gong, S., & Xiang, T. (2003). Recognition of group activities using dynamic probabilistic networks. In IEEE international conference on computer vision, pp. 742–749.

  • Gray, D., & Tao, H. (2008). Viewpoint invariant pedestrian recognition with an ensemble of localized features. In European conference on computer vision, pp. 262–275.

  • Hotelling, H. (1936). Relations between two sets of variates. Biometrika, pp. 321–377.

  • Hu, W., Hu, M., Zhou, X., Tan, T., Lou, J., & Maybank, S. (2006a). Principal axis-based correspondence between multiple cameras for people tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 663–671.

    Article  Google Scholar 

  • Hu, W., Xiao, X., Fu, Z., Xie, D., Tan, T., & Maybank, S. (2006b). A system for learning statistical motion patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1450–1464.

    Article  Google Scholar 

  • Javed, O., Rasheed, Z., Shafique, K., & Shah, M. (2003). Tracking across multiple cameras with disjoint views. In IEEE international conference on computer vision, pp. 952–957.

  • Javed, O., Shafique, K., & Shah, M. (2005). Appearance modeling for tracking in multiple non-overlapping cameras. In IEEE international conference on computer vision and pattern recognition, pp. 26–33.

  • Kendall, M., & Ord, J. K. (1990). Time series. Sevenoaks: Edward Arnold.

    MATH  Google Scholar 

  • Kratz, L., & Nishino, K. (2009). Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. In IEEE international conference on computer vision and pattern recognition, pp. 1446–1453.

  • Kruegle, H. (2006). CCTV surveillance: video practices and technology. Stoneham: Butterworth-Heinemann.

    Google Scholar 

  • Lee, L., Romano, R., & Stein, G. (2000). Monitoring activities from multiple video streams: establishing a common coordinate frame. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 758–768.

    Article  Google Scholar 

  • Li, J., Gong, S., & Xiang, T. (2008). Scene segmentation for behaviour correlation. In European conference on computer vision, pp. 383–395.

  • Liao, T. W. (2005). Clustering of time series data—a survey. Pattern Recognition, 38(11), 1857–1874.

    Google Scholar 

  • Loy, C. C., Xiang, T., & Gong, S. (2009). Multi-camera activity correlation analysis. In IEEE international conference on computer vision and pattern recognition, pp. 1988–1995.

  • Makris, D., Ellis, T., & Black, J. (2004). Bridging the gaps between cameras. In IEEE international conference on computer vision and pattern recognition, pp. 205–210.

  • Murphy, K. P. (2002). Dynamic Bayesian networks: representation, inference and learning. PhD thesis, University of California at Berkeley, Computer Science Division.

  • Neapolitan, R. E. (2003). Learning Bayesian network. New York: Prentice Hall.

    Google Scholar 

  • Ng, A. Y., Jordan, M. I., & Weiss, Y. (2001). On spectral clustering: analysis and an algorithm. In Advances in neural information processing systems, pp. 849–856.

  • Oliver, N., Rosario, B., & Pentland, A. (2000). A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 831–843.

    Article  Google Scholar 

  • Pilet, J., Strecha, C., & Fua, P. (2008). Making background subtraction robust to sudden illumination changes. In European conference on computer vision, pp. 567–580.

  • Prosser, B., Gong, S., & Xiang, T. (2008). Multi-camera matching using bi-directional cumulative brightness transfer functions. In British machine vision conference.

  • Russell, D., & Gong, S. (2006). Minimum cuts of a time-varying background. In British machine vision conference, pp. 809–818.

  • Saleemi, I., Shafique, K., & Shah, M. (2009). Probabilistic modeling of scene dynamics for applications in visual surveillance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8), 1472–1485.

    Article  Google Scholar 

  • Stauffer, C., & Grimson, W. E. L. (2000). Learning patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 747–757.

    Article  Google Scholar 

  • Sung, K., Hwang, Y., & Kweon, I. (2008). Robust background maintenance for dynamic scenes with global intensity level changes. In International conference on ubiquitous robots and ambient intelligence, pp. 759–762.

  • Tieu, K., Dalley, G., & Grimson, W. E. L. (2005). Inference of non-overlapping camera network topology by measuring statistical dependence. In IEEE international conference on computer vision, pp. 1842–1849.

  • van den Hengel, A., Dick, A., & Hill, R. (2006). Activity topology estimation for large networks of cameras. In IEEE conference on advanced video and signal based surveillance.

  • Wang, X., Ma, X., & Grimson, W. E. L. (2009). Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(3), 539–555.

    Article  Google Scholar 

  • Wang, X., Tieu, K., & Grimson, W. E. L. (2010). Correspondence-free activity analysis and scene modeling in multiple camera views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 56–71.

    Article  Google Scholar 

  • Xie, B., Ramesh, V., & Boult, T. (2004). Sudden illumination change detection using order consistency. Image and Vision Computing, 22(2), 117–125.

    Article  Google Scholar 

  • Yang, Y., Liu, J., & Shah, M. (2009). Video scene understanding using multi-scale analysis. In International conference of computer vision.

  • Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In Advances in neural information processing systems, pp. 1601–1608.

  • Zelniker, E. E., Gong, S., & Xiang, T. (2008). Global abnormal behaviour detection using a network of CCTV cameras. In IEEE international workshop on visual surveillance.

  • Zheng, W., Gong, S., & Xiang, T. (2009). Associating groups of people. In British machine vision conference.

  • Zhou, H., & Kimber, D. (2006). Unusual event detection via multi-camera video mining. In IEEE international conference on pattern recognition, pp. 1161–1166.

  • Zivkovic, Z., & van der Heijden, F. (2006). Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters, 27(7), 773–780.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Change Loy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Loy, C.C., Xiang, T. & Gong, S. Time-Delayed Correlation Analysis for Multi-Camera Activity Understanding. Int J Comput Vis 90, 106–129 (2010). https://doi.org/10.1007/s11263-010-0347-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-010-0347-5

Keywords

Navigation