Abstract
We present a novel multi-modal fusion framework for non-sequential person detection, localization and identification from multiple views. Our goal is independent processing of randomly-accessed sections of video, either individual frames or small batches thereof. This way, we aim to limit the error propagation that makes the existing approaches unsuitable for fully-autonomous tracking of multiple people in long video sequences. Our framework uses one or more trained classifiers to fuse multiple weak feature maps. We perform experimental validation on a challenging dataset, demonstrating how the framework can, depending on the provided feature maps, be used either only to improve generic person detection, or enable simultaneous detection and recognition of individuals. Finally, we show that tracking-by-identification using the output of the proposed framework outperforms the state-of-the-art identification-by-tracking approach in terms of preserved track identities.
Keywords
- Support Vector Machine
- Discriminative Information
- Support Vector Machine Parameter
- Identity Switch
- Person Detection
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Iwase, S., Saito, H.: Parallel tracking of all soccer players by integrating detected positions in multiple view images. In: ICPR 2004, pp. 751–754 (2004)
Xu, M., Orwell, J., Jones, G.: Tracking football players with multiple cameras. In: ICIP 2004, pp. 2909–2912 (2004)
Otsuka, K., Mukawa, N.: Multiview occlusion analysis for tracking densely populated objects based on 2-d visual angles. In: CVPR 2004, pp. 90–97 (2004)
Kristan, M., Perš, J., Perše, M., Kovačič, S.: Closed-world tracking of multiple interacting targets for indoor-sports applications. Computer Vision and Image Understanding 113, 598–611 (2009)
Fleuret, F., Berclaz, J., Lengagne, R., Fua, P.: Multicamera people tracking with a probabilistic occupancy map. IEEE TPAMI 30, 267–282 (2008)
Khan, S., Shah, M.: Tracking multiple occluding people by localizing on multiple scene planes. IEEE TPAMI 31, 505–519 (2009)
Berclaz, J., Fleuret, F., Turetken, E., Fua, P.: Multiple object tracking using k-shortest paths optimization. IEEE TPAMI 33, 1806–1819 (2011)
Yilmaz, A., Javed, O., Shah, M.: Object tracking: A survey. ACM Computing Surveys 38 (2006)
Berclaz, J., Fleuret, F., Fua, P.: Principled detection-by-classification from multiple views. In: VISAPP 2008, pp. 375–382 (2008)
Alahi, A., Boursier, Y., Jacques, L., Vandergheynst, P.: Sport players detection and tracking with a mixed network of planar and omnidirectional cameras. In: ICDSC 2009, pp. 1–8 (2009)
Delannay, D., Danhier, N., Vleeschouwer, C.D.: Detection and recognition of sports (wo)men from multiple views. In: ICDSC 2009, pp. 1–7 (2009)
Ahn, J., Gobron, S., Silvestre, Q., Shitrit, H.B., Raca, M., Pettré, J., Thalmann, D., Fua, P., Boulic, R.: Long term real trajectory reuse through region goal satisfaction. In: Allbeck, J.M., Faloutsos, P. (eds.) MIG 2011. LNCS, vol. 7060, pp. 412–423. Springer, Heidelberg (2011)
Ben Shitrit, H., Berclaz, J., Fleuret, F., Fua, P.: Tracking multiple people under global appearance constraints. In: ICCV 2011, pp. 137–144 (2011)
Breitenstein, M.D., Reichlin, F., Leibe, B., Koller-Meier, E., Van Gool, L.: Online multiperson tracking-by-detection from a single, uncalibrated camera. IEEE TPAMI 33, 1820–1833 (2011)
Zivkovic, Z., van der Heijden, F.: Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters 27, 773–780 (2006)
Werlberger, M., Trobin, W., Pock, T., Wendel, A., Cremers, D., Bischof, H.: Anisotropic Huber-L1 optical flow. In: BMVC 2009 (2009)
Werlberger, M., Pock, T., Bischof, H.: Motion estimation with non-local total variation regularization. In: CVPR 2010 (2010)
Li, M., Zhang, Z., Huang, K., Tan, T.: Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In: ICPR 2008, pp. 1–4 (2008)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR 2005, pp. 886–893 (2005)
Vleeschouwer, C.D., Chen, F., Delannay, D., Parisot, C., Chaudy, C., Martrou, E., Cavallaro, A.: Distributed video acquisition and annotation for sport-event summarization. In: NEM Summit 2008: Towards Future Media Internet (2008)
D’Orazio, T., Leo, M., Mosca, N., Spagnolo, P., Mazzeo, P.L.: A semi-automatic system for ground truth generation of soccer video sequences. In: AVSS 2009, pp. 559–564 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mandeljc, R., Kovačič, S., Kristan, M., Perš, J. (2013). Non-sequential Multi-view Detection, Localization and Identification of People Using Multi-modal Feature Maps. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds) Computer Vision – ACCV 2012. ACCV 2012. Lecture Notes in Computer Science, vol 7726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37431-9_53
Download citation
DOI: https://doi.org/10.1007/978-3-642-37431-9_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37430-2
Online ISBN: 978-3-642-37431-9
eBook Packages: Computer ScienceComputer Science (R0)