Abstract
Traditional approaches to object detection only look at local pieces of the image, whether it be within a sliding window or the regions around an interest point detector. However, such local pieces can be ambiguous, especially when the object of interest is small, or imaging conditions are otherwise unfavorable. This ambiguity can be reduced by using global features of the image — which we call the “gist” of the scene — as an additional source of evidence. We show that by combining local and global features, we get significantly improved detection rates. In addition, since the gist is much cheaper to compute than most local detectors, we can potentially gain a large increase in speed as well.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. IEEE Trans. on Pattern Analysis and Machine Intelligence 26(11), 1475–1490 (2004)
Agarwal, S., Roth, D.: Learning a sparse representation for object detection. In: Heyden, A., et al. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 113–127. Springer, Heidelberg (2002)
Biederman, I.: On the semantics of a glance at a scene. In: Kubovy, M., Pomerantz, J. (eds.) Perceptual organization, pp. 213–253. Erlbaum, Mahwah (1981)
Bishop, C.M.: Mixture density networks. Technical Report NCRG 4288, Neural Computing Research Group, Department of Computer Science, Aston University (1994)
Bouchard, G., Triggs, B.: A hierarchical part-based model for visual object categorization. In: CVPR (2005)
Csurka, G., Dance, C., Bray, C., Fan, L., Willamowski, J.: Visual categorization with bags of keypoints. In: ECCV workshop on statistical learning in computer vision (2004)
Carbonetto, P., de Freitas, N., Barnard, K.: A statistical model for general contextual object recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 350–362. Springer, Heidelberg (2004)
Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. Intl. J. Computer Vision 61(1) (2005)
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Annals of statistics 28(2), 337–374 (2000)
Fink, M., Perona, P.: Mutual boosting for contextual influence. In: Advances in Neural Info. Proc. Systems (2003)
Fergus, R., Perona, P., Zisserman, A.: A sparse object category model for efficient learning and exhaustive recognition. In: CVPR (2005)
Friedman, J.: Greedy function approximation: a gradient boosting machine. Annals of Statistics 29, 1189–1232 (2001)
Hoiem, D., Efros, A.A., Hebert, M.: Geometric context from a single image. In: IEEE Conf. on Computer Vision and Pattern Recognition (2005)
Hinton, G.: Training products of experts by minimizing contrastive divergence. Neural Computation 14, 1771–1800 (2002)
He, X., Zemel, R., Carreira-Perpinan, M.: Multiscale conditional random fields for image labelling. In: CVPR (2004)
Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Computation 6, 181–214 (1994)
Lienhart, R., Kuranov, A., Pisarevsky, V.: Empirical analysis of detection cascades of boosted classifiers for rapid object detection. In: DAGM 25th Pattern Recognition Symposium (2003)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Intl. J. Computer Vision 60(2), 91–110 (2004)
Mohan, A., Papageorgiou, C., Poggio, T.: Example-based object detection in images by components. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(4), 349–361 (2001)
Mikolajczyk, K., Schmid, C., Zisserman, A.: Human detection based on a probabilistic assembly of robust part detectors. In: Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic (May 2004)
Murphy, K., Torralba, A., Freeman, W.: Using the forest to see the trees: a graphical model relating features, objects and scenes. In: Advances in Neural Info. Proc. Systems (2003)
Navon, D.: Forest before the trees: the precedence of global features in visual perception. Cognitive Psychology 9, 353–383 (1977)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Intl. J. Computer Vision 42(3), 145–175 (2001)
Papageorgiou, C., Poggio, T.: A trainable system for object detection. Intl. J. Computer Vision 38(1), 15–33 (2000)
Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola, A., Bartlett, P., Schoelkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers. MIT Press, Cambridge (1999)
Rowley, H.A., Baluja, S., Kanade, T.: Human face detection in visual scenes. In: Advances in Neural Info. Proc. Systems, vol. 8 (1995)
Schneiderman, H., Kanade, T.: A statistical model for 3D object detection applied to faces and cars. In: CVPR (2000)
Schyns, P., Oliva, A.: From blobs to boundary edges: Evidence for time and spatial scale dependent scene recognition. Psychological Science 5, 195–200 (1994)
Serre, T., Wolf, L., Poggio, T.: A new biologically motivated framework for robust object recognition. In: CVPR (2005)
Singhal, A., Luo, J., Zhu, W.: Probabilistic spatial context models for scene content understanding. In: CVPR (2003)
Torralba, A., Murphy, K., Freeman, W.: Contextual models for object detection using boosted random fields. In: Advances in Neural Info. Proc. Systems (2004)
Torralba, A., Murphy, K., Freeman, W., Rubin, M.: Context-based vision system for place and object recognition. In: Intl. Conf. Computer Vision (2003)
Torralba, A., Oliva, A.: Depth estimation from image structure. IEEE Trans. on Pattern Analysis and Machine Intelligence 24(9), 1225 (2002)
Torralba, A.: Contextual priming for object detection. Intl. J. Computer Vision 53(2), 153–167 (2003)
Viola, P., Jones, M.: Robust real-time object detection. Intl. J. Computer Vision 57(2), 137–154 (2004)
Viola, P., Jones, M., Snow, D.: Detecting pedestrians using patterns of motion and appearance. In: IEEE Conf. on Computer Vision and Pattern Recognition (2003)
Vidal-Naquet, M., Ullman, S.: Object recognition with informative features and linear classification. In: IEEE Conf. on Computer Vision and Pattern Recognition (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Murphy, K., Torralba, A., Eaton, D., Freeman, W. (2006). Object Detection and Localization Using Local and Global Features. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds) Toward Category-Level Object Recognition. Lecture Notes in Computer Science, vol 4170. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11957959_20
Download citation
DOI: https://doi.org/10.1007/11957959_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68794-8
Online ISBN: 978-3-540-68795-5
eBook Packages: Computer ScienceComputer Science (R0)