Abstract
In this paper, we present an approach for consistently labeling people and for detecting human–object interactions using mono-camera surveillance video. The approach is based on a robust appearance-based correlogram model combined with histogram information to model color distributions of people and objects in the scene. The models are dynamically built from non-stationary objects, which are the outputs of background subtraction, and are used to identify objects on a frame-by-frame basis. We are able to detect when people merge into groups and to segment them even during partial occlusion. We can also detect when a person deposits or removes an object. The models persist when a person or object leaves the scene and are used to identify them when they reappear. Experiments show that the models are able to accommodate perspective foreshortening that occurs with overhead camera angles, as well as partial occlusion. The results show that this is an effective approach that is able to provide important information to algorithms performing higher-level analysis, such as activity recognition, where human–object interactions play an important role.
Similar content being viewed by others
References
Haritaoglu I, Harwood D, Davis LS (2000) W4: real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell 22(8):809–830
McKenna SJ, Jabri S, Duric Z, Rosenfeld A, Wechsler H (2000) Tracking groups of people. Comput Vis Image Understanding 80(1):42–56
Mittal A, Davis LS (2002) M2Tracker: a multi-view approach to segmenting and tracking people in a cluttered scene using region-based stereo. In: Proceedings of the 7th European conference on computer vision (ECCV 2002), Copenhagen, Denmark, May/June 2002, vol 1, pp 18–36
Krumm J, Harris S, Meyers B, Brumitt B, Hale M, Shafer S (2000) Multi-camera multi-person tracking for EasyLiving. In: Proceedings of the 3rd IEEE international workshop on visual surveillance (VS 2000), Dublin, Ireland, July 2000, pp 3–10
Wren C, Azarbayejani A, Darrel T, Pentland A (1997) Pfinder: real-time tracking of the human body. IEEE Trans Pattern Anal Mach Intell 19(7):780–785
Senior A, Hampapur A, Tian Y-L, Brown L, Pankanti S, Bolle R (2001) Appearance models for occlusion handling. In: Proceedings of the 2nd IEEE international workshop on performance evaluation of tracking and surveillance (PETS 2001), Kauai, Hawaii, December 2001
Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’99), Fort Collins, Colorado, June 1999, pp 246–252
Fuentes LM, Velastin SA (2001) People tracking in surveillance applications. In: Proceedings of the 2nd IEEE international workshop on performance evaluation of tracking and surveillance (PETS 2001), Kauai, Hawaii, December 2001
Moon H, Chellappa R, Rosenfeld A (2001) 3D object tracking using shape-encoded particle propagation. In: Proceedings of the 8th IEEE international conference on computer vision (ICCV 2001), Vancouver, Canada, July 2001, pp 307–314
Elgammal AM, Davis LS (2001) Probabilistic framework for segmenting people under occlusion. In: Proceedings of the 8th IEEE international conference on computer vision (ICCV 2001), Vancouver, Canada, July 2001, vol 2, pp 145–152
Philomin V, Davis LS, Duraiswami R (2000) Tracking humans from a moving platform. In: Proceedings of the 15th IEEE international conference on pattern recognition (ICPR 2000), Barcelona, Spain, September 2000, pp 4171–4179
Nakajima C, Pontil M, Heisele B, Poggio T (2000) People recognition in image sequences by supervised learning. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks (IJCNN 2000), Como, Italy, July 2000
Raja Y, McKenna SJ, Gong S (1998) Segmentation and tracking using color mixture models. In: Proceedings of the 3rd Asian conference on computer vision (ACCV’98), Hong Kong, China, January 1998, vol 1, pp 601–614
Huang J, Kumar SR, Mitra M, Zhu W-J, Zabih R (1999) Spatial color indexing and applications. Int J Comput Vis 35(3):245–268
Li J, Chua CS, Ho YK (2002) Color based multiple people tracking. In: Proceedings of the 7th international conference on control, automation, robotics and vision (ICARCV 2002), Singapore, December 2002, pp 309–314
Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybern 3(6):610–621
Rao A, Srihari RK, Zhang Z (2000) Geometric histograms: a distribution of geometric configurations of color subsets. In: Proceedings of the international conference on internet imaging, San Jose, California, January 2000, vol 3964, pp 91–101
Kovalev V, Petrou M (1996) Multidimensional co-occurrence matrices for object recognition and matching. Graph Models Image Process 58(3):187–197
Kovalev V, Volmer S (1998) Color co-occurrence descriptors for querying-by-example. In: Proceedings of the 5th international conference on multimedia modeling (MMM’98), Lausanne, Switzerland, October 1998, pp 32–38
Kim K, Chalidabhongse TH, Harwood D, Davis LS (2004) Background modeling by codebook construction. In: Proceedings of the IEEE international conference on image processing (ICIP 2004), Singapore, October 2004
Kohonen T (1988) Learning vector quantization. Neural Netw 1:3–16
Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32
Pass G, Zabih R (1999) Histogram refinement for content-based image retrieval. ACM J Multimedia Syst 7(3):234–240
Horprasert T, Harwood D, Davis LS (2000) A robust background subtraction and shadow detection. In: Proceedings of the 4th Asian conference on computer vision (ACCV 2000), Taipei, Taiwan, January 2000
Kalman RE (1960) A new approach to linear filtering and prediction problems. Basic Eng D–T ASME 82(1):35–45
Isard M, Blake A (1998) Condensation—conditional density propagation for visual tracking. Int J Comput Vis 29(1):5–28
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Balcells, M., DeMenthon, D. & Doermann, D. An appearance-based approach for consistent labeling of humans and objects in video. Pattern Anal Applic 7, 373–385 (2004). https://doi.org/10.1007/s10044-004-0237-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-004-0237-y