ABSTRACT
Activity and gait recognition are among the various applications that necessitate view-specific input. In a real surveillance scenario it is impractical to assume that the desired canonical view will always be available. We present a framework to generate the canonical view of any translating object in a scene monitored by multiple cameras. The method is capable of recovering this view despite the fact that none of the cameras can see it individually. In this two step process, first the camera and scene geometry is used to identify the sagittal plane of the object, which is used to define the canonical view. Next, each original view is warped to the canonical view through planar homographies learnt from geometric constraints. The warped images are then combined by way of evidence fusion to recover the shape energy map which is used to obtain the final binary silhouette of the object's shape. Results presented for various indoor and outdoor sequences demonstrate the efficacy of this method in generating the shape of the object as seen from the canonical view, while resolving occlusions.
- S. Avidan and A. Shashua. Novel view synthesis by cascading trilinear tensors. IEEE Trans. Visualization and Computer Graphics 4(4):293--306, 1998. Google ScholarDigital Library
- A. R. Chowdhury, A. Kale, and R. Chellappa. Video synthesis of arbitrary views for approximately planar scenes. In Proc. Int. Conf. Acoustics, Speech, and Signal Process.volume 3, pages 497--500, April 2003.Google ScholarCross Ref
- R. Collins, R. Gross, and J. Shi. Silhouette-based human identification from body shape and gait. In Proc. Int. Conf. on Auto. Face and Gesture Recognition 2002. Google ScholarDigital Library
- J. Davis and A. Bobick. The representation and recognition of action using temporal templates. In Proc. Comp. Vis. and Pattern Rec.pages 928--934. IEEE, 1997. Google ScholarDigital Library
- J. Davis and A. Tyagi. A reliable-inference framework for recognition of human actions. In Advanced Video and Signal Based Surveillance pages 169--176. IEEE, 2003. Google ScholarDigital Library
- J. Davis and A. Tyagi. Minimal-latency human action recognition using reliable-inference. Image and Vision Computing 24(5): 455--472, May 2006. Google ScholarDigital Library
- T. Denton, M. F. Demirci, J. Abrahamson, A. Shokoufandeh, and S. Dickinson. Selecting canonical views for view-based 3-d object recognition. In Proc. Int. Conf. Pat. Rec.pages 273--276, 2004. Google ScholarDigital Library
- A. Habed and B. Boufama. Novel view synthesis:a comparative analysis study. In Vision Interface pages 217--224, 2000.Google Scholar
- R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision Cambridge University Press, ISBN: 0521540518, second edition, 2004. Google ScholarDigital Library
- P. Huang, C. Harris, and M. Nixon. Recognising humans by gait via parametric canonical space. Artif. Intell. in Eng.13: 359--366, 1999.Google ScholarCross Ref
- T. Huang and A. Netravali. Motion and structure from feature correspondences: A review. In Proc. IEEE volume 82, pages 252--268, Feb 1994.Google ScholarCross Ref
- T. Jebara, A. Azarbeyejani, and A. Pentland. 3D structure from 2D motion. IEEE Signal Processing Magazine 16(3), 1999.Google ScholarCross Ref
- K. Jeong and C. Jaynes. Moving shadow detection using a combined geometric and color classification approach. In Wkshp. on Motion and Video Computing Jan 2004. Google ScholarDigital Library
- S. M. Khan and M. Shah. A multiview approach to tracking people in crowded scenes using a planar homography constraint. In Proc. European Conf. Comp. Vis. 2006. Google ScholarDigital Library
- K. Kim, T. H. Chalidabhongse, D. Harwood, and L. Davis. Real-time foreground-background segmentation using codebook model. Elsevier Real-Time Imaging 11(3): 172--185, June 2005. Google ScholarDigital Library
- S. Mahamud, M. Hebert, Y. Omori,and J. Ponce. Provably-convergent iterative methods for projective structure from motion. In Proc. Comp. Vis. and Pattern Rec. 2001.Google ScholarCross Ref
- J. A. Nelder and R. Mead. A simplex method for function minimization. Comput. J. pages 308--313, 1965.Google Scholar
- V. Parameswaran and R. Chellappa. View invariants for human action recognition. In Proc. Comp. Vis. and Pattern Rec. pages 613--619, 2003.Google ScholarCross Ref
- M. Pollefeys, L. V. Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, and R. Koch. Visual modeling with a hand-held camera. Int. J. of Comp. Vis. 59(3): 207--232, 2004. Google ScholarDigital Library
- M. Pollefeys, R. Koch, and L. V. Gool. Self calibration and metric reconstruction in spite of varying and unknown internal camera parameters. In Proc. Int. Conf. Comp. Vis. pages 90--96, 1998. Google ScholarDigital Library
- C. Rao and M. Shah. A view-invariant representation and learning of human action. In Proc. Wkshp. on Detection and Recognition of Events in Video pages 55--63. IEEE, 2001.Google Scholar
- C. Stauffer and W. Grimson. Adaptive background mixture models for real-time tracking. In Proc. Comp. Vis. and Pattern Rec. pages 246--252. IEEE, 1999.Google ScholarCross Ref
- P. Sturm and W. Triggs. A factorization based algorithm for multi-image projective structure and motion. In Proc. European Conf. Comp. Vis. pages 709--720, 1996. Google ScholarDigital Library
- R. Szeliski. Rapid octree construction from image sequences. CVGIP: Image Understanding 58(1): 23--32, July 1993. Google ScholarDigital Library
- M. Vergauwen, F. Verbiest, V. Ferrari, C. Strecha, and L. van Gool. Wide-baseline 3D reconstruction from digital stills. In Int. Wkshp. on Visualization and Animation of Reality-based 3D Models Engadin, Switzerland, Feb 2003.Google Scholar
- Z. Zhang, R. Deriche, O. D. Faugeras, and Q.-T. Luong. A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry.Artificial Intelligence 78(1-2):87--119, 1995. Google ScholarDigital Library
- T. Zhao and R. Nevatia. Tracking multiple humans in complex situations.IEEE Trans. Patt. Analy. and Mach. Intell.26(9): 1208--1221, Sept. 2004. Google ScholarDigital Library
Index Terms
- Multiview fusion for canonical view generation based on homography constraints
Recommendations
An adaptive particle filter tracking method based on homography and common FOV
RACS '12: Proceedings of the 2012 ACM Research in Applied Computation SymposiumIn object tracking, methods based on a particle filter are widely used, but the technique alone often fails in various situations. Sometimes multi-camera systems using homography are tried to solve problems like occlusion. We propose an adaptive ...
Homography-based block motion estimation for video coding of PTZ cameras
We propose a homography-based search (HBS) algorithm for block motion estimation.We use optical flow tracking algorithm to obtain homography between two frames.Adaptive thresholds are adopted in our method to classify different kinds of blocks. Due to ...
HVC-Net: Unifying Homography, Visibility, and Confidence Learning for Planar Object Tracking
Computer Vision – ECCV 2022AbstractRobust and accurate planar tracking over a whole video sequence is vitally important for many vision applications. The key to planar object tracking is to find object correspondences, modeled by homography, between the reference image and the ...
Comments