Abstract
We propose a novel approach to understanding activities from their partial observations monitored through multiple non-overlapping cameras separated by unknown time gaps. In our approach, each camera view is first decomposed automatically into regions based on the correlation of object dynamics across different spatial locations in all camera views. A new Cross Canonical Correlation Analysis (xCCA) is then formulated to discover and quantify the time delayed correlations of regional activities observed within and across multiple camera views in a single common reference space. We show that learning the time delayed activity correlations offers important contextual information for (i) spatial and temporal topology inference of a camera network; (ii) robust person re-identification and (iii) global activity interpretation and video temporal segmentation. Crucially, in contrast to conventional methods, our approach does not rely on either intra-camera or inter-camera object tracking; it thus can be applied to low-quality surveillance videos featured with severe inter-object occlusions. The effectiveness and robustness of our approach are demonstrated through experiments on 330 hours of videos captured from 17 cameras installed at two busy underground stations with complex and diverse scenes.
Similar content being viewed by others
References
Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximisation technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41(1), 164–171.
Bhattacharyya, A. (1943). On a measure of divergence between two statistical populations defined by probability distributions. Bulletin of the Calcutta Mathematical Society, 35, 99–109.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Chen, T. P., Haussecker, H., Bovyrin, A., Belenov, R., Rodyushkin, K., Kuranov, A., & Eruhimov, V. (2005). Computer vision workload analysis: Case study of video surveillance systems. Intel Technology Journal, 9(2), 109–118.
Cohen, N., Gatusso, J., & MacLennan-Brown, K. (2006). CCTV operational requirements manual—is your CCTV system fit for purpose? Home Office Scientific Development Branch, version 4 (55/06) edition.
Comaniciu, D., Ramesh, V., & Meer, P. (2000). Real-time tracking of non-rigid objects using mean shift. In IEEE international conference on computer vision and pattern recognition, pp. 142–149.
Du, Y., Chen, F., & Xu, W. (2007). Human interaction representation and recognition through motion decomposition. IEEE Signal Processing Letters, 14(12), 952–955.
Friedman, N., & Russell, S. (1997). Image segmentation in video sequences: a probabilistic approach. In Uncertainty in artificial intelligence, pp. 175–181.
Fukunaga, K., & Hostetler, L. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions of Information Theory, 21, 32–40.
Gheissari, N., Sebastian, T. B., Rittscher, J., & Hartley, R. (2006). Person reidentification using spatiotemporal appearance. In IEEE international conference on computer vision and pattern recognition, pp. 1528–1535.
Gong, S., & Xiang, T. (2003). Recognition of group activities using dynamic probabilistic networks. In IEEE international conference on computer vision, pp. 742–749.
Gray, D., & Tao, H. (2008). Viewpoint invariant pedestrian recognition with an ensemble of localized features. In European conference on computer vision, pp. 262–275.
Hotelling, H. (1936). Relations between two sets of variates. Biometrika, pp. 321–377.
Hu, W., Hu, M., Zhou, X., Tan, T., Lou, J., & Maybank, S. (2006a). Principal axis-based correspondence between multiple cameras for people tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 663–671.
Hu, W., Xiao, X., Fu, Z., Xie, D., Tan, T., & Maybank, S. (2006b). A system for learning statistical motion patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1450–1464.
Javed, O., Rasheed, Z., Shafique, K., & Shah, M. (2003). Tracking across multiple cameras with disjoint views. In IEEE international conference on computer vision, pp. 952–957.
Javed, O., Shafique, K., & Shah, M. (2005). Appearance modeling for tracking in multiple non-overlapping cameras. In IEEE international conference on computer vision and pattern recognition, pp. 26–33.
Kendall, M., & Ord, J. K. (1990). Time series. Sevenoaks: Edward Arnold.
Kratz, L., & Nishino, K. (2009). Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. In IEEE international conference on computer vision and pattern recognition, pp. 1446–1453.
Kruegle, H. (2006). CCTV surveillance: video practices and technology. Stoneham: Butterworth-Heinemann.
Lee, L., Romano, R., & Stein, G. (2000). Monitoring activities from multiple video streams: establishing a common coordinate frame. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 758–768.
Li, J., Gong, S., & Xiang, T. (2008). Scene segmentation for behaviour correlation. In European conference on computer vision, pp. 383–395.
Liao, T. W. (2005). Clustering of time series data—a survey. Pattern Recognition, 38(11), 1857–1874.
Loy, C. C., Xiang, T., & Gong, S. (2009). Multi-camera activity correlation analysis. In IEEE international conference on computer vision and pattern recognition, pp. 1988–1995.
Makris, D., Ellis, T., & Black, J. (2004). Bridging the gaps between cameras. In IEEE international conference on computer vision and pattern recognition, pp. 205–210.
Murphy, K. P. (2002). Dynamic Bayesian networks: representation, inference and learning. PhD thesis, University of California at Berkeley, Computer Science Division.
Neapolitan, R. E. (2003). Learning Bayesian network. New York: Prentice Hall.
Ng, A. Y., Jordan, M. I., & Weiss, Y. (2001). On spectral clustering: analysis and an algorithm. In Advances in neural information processing systems, pp. 849–856.
Oliver, N., Rosario, B., & Pentland, A. (2000). A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 831–843.
Pilet, J., Strecha, C., & Fua, P. (2008). Making background subtraction robust to sudden illumination changes. In European conference on computer vision, pp. 567–580.
Prosser, B., Gong, S., & Xiang, T. (2008). Multi-camera matching using bi-directional cumulative brightness transfer functions. In British machine vision conference.
Russell, D., & Gong, S. (2006). Minimum cuts of a time-varying background. In British machine vision conference, pp. 809–818.
Saleemi, I., Shafique, K., & Shah, M. (2009). Probabilistic modeling of scene dynamics for applications in visual surveillance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8), 1472–1485.
Stauffer, C., & Grimson, W. E. L. (2000). Learning patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 747–757.
Sung, K., Hwang, Y., & Kweon, I. (2008). Robust background maintenance for dynamic scenes with global intensity level changes. In International conference on ubiquitous robots and ambient intelligence, pp. 759–762.
Tieu, K., Dalley, G., & Grimson, W. E. L. (2005). Inference of non-overlapping camera network topology by measuring statistical dependence. In IEEE international conference on computer vision, pp. 1842–1849.
van den Hengel, A., Dick, A., & Hill, R. (2006). Activity topology estimation for large networks of cameras. In IEEE conference on advanced video and signal based surveillance.
Wang, X., Ma, X., & Grimson, W. E. L. (2009). Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(3), 539–555.
Wang, X., Tieu, K., & Grimson, W. E. L. (2010). Correspondence-free activity analysis and scene modeling in multiple camera views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 56–71.
Xie, B., Ramesh, V., & Boult, T. (2004). Sudden illumination change detection using order consistency. Image and Vision Computing, 22(2), 117–125.
Yang, Y., Liu, J., & Shah, M. (2009). Video scene understanding using multi-scale analysis. In International conference of computer vision.
Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In Advances in neural information processing systems, pp. 1601–1608.
Zelniker, E. E., Gong, S., & Xiang, T. (2008). Global abnormal behaviour detection using a network of CCTV cameras. In IEEE international workshop on visual surveillance.
Zheng, W., Gong, S., & Xiang, T. (2009). Associating groups of people. In British machine vision conference.
Zhou, H., & Kimber, D. (2006). Unusual event detection via multi-camera video mining. In IEEE international conference on pattern recognition, pp. 1161–1166.
Zivkovic, Z., & van der Heijden, F. (2006). Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters, 27(7), 773–780.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Loy, C.C., Xiang, T. & Gong, S. Time-Delayed Correlation Analysis for Multi-Camera Activity Understanding. Int J Comput Vis 90, 106–129 (2010). https://doi.org/10.1007/s11263-010-0347-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-010-0347-5