Abstract
Many of today’s vision algorithms are very successful in controlled environments. Real-world environments, however, cannot be controlled and are most often dynamic with respect to illumination changes, motion, occlusions, multiple people, etc. Since most computer vision algorithms are limited to a particular situation they lack robustness in the context of dynamically changing environments. In this paper we argue that the integration of information coming from different visual cues and models is essential to increase robustness as well as generality of computer vision algorithms. Two examples are discussed where robustness of simple models is leveraged by cue and model integration. In the first example mutual information is used as a means to combine different object models for face detection without prior learning. The second example discusses experimental results on multi-cue tracking of faces based on the principles of self-organization of the integration mechanism and self-adaptation of the cue models during tracking.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
C.G. Bräutigam. A Model-Free Voting Approach to Cue Integration. PhD thesis, Dept. of Numerical Analysis and Computing Science, KTH (Royal Institute of Technology), August 1998.
C. Bregler and J. Malik. Learning appearance based models: Mixtures of second moment experts. In Advances in Neural Information Precessing Systems, 1996.
T. Choudhury, B. Clarkson, T. Jebara, and A. Pentland. Multimodal person recognition using unconstrained audio and video. In Proceedings of the 2nd International Conference on Audio-Visual Biometric Person Authentication, 1998.
J. Clark and A. Yuille. Data fusion for sensory information processing. Kluwer Academic Publishers, Boston, Ma.-USA, 1994.
J.L. Crowley and F. Berard. Multi-modal tracking of faces for video communications. In IEEE Conference on Computer Vision and Pattern Recognition, 1997.
R. Duda and P. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, Inc., 1973.
Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing. Addison-Wesley, 1993.
M. Isard and A. Blake. Condensation-conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1):5–28, 1998.
M. Isard and A. Blake. Icondensation: Unifying low-level and high-level tracking in a stochastic framework. In ECCV’98 Fifth European Conference on Computer Vision, Volume I, pages 893–908, 1998.
A. Jain, P. Duin, and J. Mao. Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):4–37, January 2000.
M.I. Jordan and R.A. Jacobs. Hierachical mixtures of experts and the EM algorithm. Neural Computation, 6(2), March 1994.
J. Kittler, M. Hatef, R. Duin, and J. Matas. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226–239, March 1998.
D. Kragić and H. I. Christensen. Integration of visual cues for active tracking of an end-effector. In IROS’99, volume 1, pages 362–368, October 1999.
A. Maki, J.-O. Eklundh, and P. Nordlund. A computational model of depth-based attention. In International Conference on Pattern Recogntion, 1996.
B. Moghaddam and A. Pentland. Probabilistic visual learning for object representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):696–710, 1997.
N. Oliver, F. Berard, J. Coutaz, and A. Pentland. Lafter: Lips and face tracker. In Proceedings IEEE Conf. Computer Vision and Pattern Recognition, pages 100–110, 1997.
B. Parhami. Voting algorithms. IEEE Transactions on Reliability, 43(3):617–629, 1994.
K. Toyama and G. Hager. Incremental focus of attention. In IEEE Conference on Computer Vision and Pattern Recognition, 1996.
J. Triesch, D.H. Ballard, and R.A. Jacobs. Fast temproal dynamics of visual cue integration. Technical report, University of Rochester, Computer Science Department, September 2000.
J. Triesch and C. von der Malsburg. Self-organized integration of adaptive visual cues for face tracking. In International Conference on Face and Gesture Recogintion, pages 102–107, 2000.
M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71–86, 1991.
T. Uhlin, P. Nordlund, A. Maki, and J.-O. Eklundh. Towards an active visual observer. In ICCV’95 Fifth International Conference on Computer Vision, pages 679–686, 1995.
L. Xu, A. Krzyzak, and C. Suen. Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Transactions on Systems, Man and Cybernetics, 22(3):418–435, May/June 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schiele, B., Spengler, M., Kruppa, H. (2002). Towards Robust Perception and Model Integration. In: Hager, G.D., Christensen, H.I., Bunke, H., Klein, R. (eds) Sensor Based Intelligent Robots. Lecture Notes in Computer Science, vol 2238. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45993-6_9
Download citation
DOI: https://doi.org/10.1007/3-540-45993-6_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43399-6
Online ISBN: 978-3-540-45993-4
eBook Packages: Springer Book Archive