Abstract
The well-built dataset is a pre-requisite for object categorization. However, the processes of collecting and labeling the images are laborious and monotonous. In this paper, we focus on an automatic labeling of images by using a bounding box for each visual object. We propose a two-stage localization approach for image labeling which combines the Efficient Subwindow Search scheme with Multiple Instance Learning. We firstly detect the object coarsely by the the Efficient Subwindow Search scheme, and then we finely localize the object by Multiple Instance learning. Our approach has two advantages, one is to speed up the object search, and the other is to locate the object precisely in a tighter box than the Efficient Subwindow Search scheme. We evaluate the image labeling performance by the detection precision and the detection consistency with the ground truth label. Our approach is simple and fast in object localization. The experiment results demonstrate that our approach is more effective and accurate than the BOW model in the precision and consistency of detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding 106, 59–70 (2007)
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset (2007)
Everingham, M., Zisserman, A., Williams, C., Van Gool, L.: The PASCAL Visual Object Classes Challenge, VOC 2006 Results (2006), http://www.pascalnetwork.org/challenges/VOC/voc2006/results.pdf
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: A database and web-based tool for image annotation. Int. J. Comput. Vision 77, 157–173 (2008)
Yao, B., Yang, X., Zhu, S.C.: Introduction to a large-scale general purpose ground truth database: Methodology, annotation tool and benchmarks, pp. 169–183 (2007)
Feng, H., Chua, T.: A bootstrapping approach to annotating large image collection. In: Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval, pp. 55–62 (2003)
Fergus, R., Perona, P., Zisserman, A.: A visual category filter for google images. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 242–256. Springer, Heidelberg (2004)
Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Tenth IEEE International Conference on Computer Vision, vol. 2, pp. 1816–1823 (2005)
Li, J., Wang, G., Fei-Fei, L.: Optimol: automatic object picture collection via incremental model learning. In: Computer Vision and Pattern Recognition (2006)
Collins, B., Deng, J., Kai, L., Fei-Fei, L.: Towards scalable dataset construction: An active learning approach. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 86–98. Springer, Heidelberg (2008)
Berg, T.L., Forsyth, D.A.: Animals on the web. In: Computer Vision and Pattern Recognition, pp. 1463–1470 (2006)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, pp. 886–893 (2005)
Lampert, C.H., Blaschko, M.B., Hofmann, T.: Efficient Subwindow Search: A Branch and Bound Framework for Object Localization. IEEE Pattern Analysis and Machine Learning 31(12), 2129–2142 (2009)
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Neural Information Processing Systems (2003)
Maron, O., Ratan, A.: Multiple-instance learning for natural scene classification. In: International Conference on Machine Learning (1998)
Viola, P., Jones, M.: Fast multi-view face detection. In: CVPR (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qu, Y., Wu, D., Cheng, Y., Chen, C. (2010). Two-Stage Localization for Image Labeling. In: Qiu, G., Lam, K.M., Kiya, H., Xue, XY., Kuo, CC.J., Lew, M.S. (eds) Advances in Multimedia Information Processing - PCM 2010. PCM 2010. Lecture Notes in Computer Science, vol 6297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15702-8_52
Download citation
DOI: https://doi.org/10.1007/978-3-642-15702-8_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15701-1
Online ISBN: 978-3-642-15702-8
eBook Packages: Computer ScienceComputer Science (R0)