Abstract
Active learning and crowdsourcing are promising ways to efficiently build up training sets for object recognition, but thus far techniques are tested in artificially controlled settings. Typically the vision researcher has already determined the dataset’s scope, the labels “actively” obtained are in fact already known, and/or the crowd-sourced collection process is iteratively fine-tuned. We present an approach for live learning of object detectors, in which the system autonomously refines its models by actively requesting crowd-sourced annotations on images crawled from the Web. To address the technical issues such a large-scale system entails, we introduce a novel part-based detector amenable to linear classifiers, and show how to identify its most uncertain instances in sub-linear time with a hashing-based solution. We demonstrate the approach with experiments of unprecedented scale and autonomy, and show it successfully improves the state-of-the-art for the most challenging objects in the PASCAL VOC benchmark. In addition, we show our detector competes well with popular nonlinear classifiers that are much more expensive to train.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
We use Locality-constrained Linear Coding (LLC) by Wang et al. (2010) to obtain the sparse coding, though other algorithms could also be used for this step.
Hyperplane hashes can be used with existing approximate near-neighbor search algorithms; we use the formulation by Charikar (2002), which guarantees the probability with which the nearest neighbor will be returned.
References
Boureau, Y.-L., Bach, F., LeCun, Y., Ponce, J. (2010). Learning mid-level features for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Charikar, M. (2002). Similarity estimation techniques from rounding algorithms. In Symposium on Theory of Computing.
Chum, O., Zisserman, A. (2007). An exemplar model for learning object classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Dalal, N., Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Dean, T., Ruzon, M., Segal, M., Shlens, J., Vijayanarasimhan, S., Yagnik, J. (2013). Fast, accurate detection of 100,000 object classes on a single machine. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). A large-scale hierarchical image database: Imagenet. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2010). The pascal visual object classes challenge. International Journal of Computer Vision, 88(2), 303–338.
Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99(1), 5555.
Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A. (2005). Learning object categories from Google’s image search. In Proceedings of the International Conference on Computer Vision (ICCV).
Jain, P., Vijayanarasimhan, S., Grauman, K. (2010). Hashing hyperplane queries to near points with applications to large-scale active learning. In Advances in Neural Information Processing Systems (NIPS).
Joachims, T. (2006). Training linear SVMs in linear time. In International Conference on Knowledge Discovery and Data Mining (KDD).
Joshi, A., Porikli, F., Papanikolopoulos, N. (2009). Multi-class active learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Kapoor, A., Grauman, K., Urtasun, R., Darrell, T. (2007). Active learning with Gaussian processes for object categorization. In International Conference on Computer Vision (ICCV).
Lampert, C., Blaschko, M., & Hofmann, T. (2008). Object localization by efficient subwindow search: Beyond sliding windows. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (ICCV).
Lee, Y. J., Grauman, K. (2010). Object-graphs for context-aware category discovery. In Proceedings of IEEE International Conference on Computer Vision (CVPR).
Li, L., Wang, G., & Fei-Fei, G. (2007). Automatic online picture collection via incremental model learning: Optimol. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Pirsiavash, H., Ramanan, D. (2012). Steerable part models. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Qi, G., Hua, X., Rui, Y., Tang, J., Zhang, H. (2008). Two-dimensional active learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2007). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77, 157–173.
Siddiquie, B., & Gupta, A. (2010). Modeling context for multi-class active learning: Beyond active noun tagging. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Song, H., Zickler, S., Althoff, T., Girshick, R., Fritz, M., Geyer, C., Felzenszwalb, P., Darrell, T. (2012). Sparselet models for efficient multiclass object detection. In Proceedings of the European Conference on Computer Vision.
Sorokin, A., Forsyth, D. (2008). Utility data annotation with Amazon mechanical turk. In Workshop on Internet Vision.
Tong, S., Koller, D. (2000). Support vector machine active learning with applications to text classification. In Proceedings of the International Conference on Machine Learning (ICML).
Torralba, A., Murphy, K., & Freeman, W. (2007). Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5), 854–869.
Uijlings, J., Smeulders, A., Scha, R. (2009). What is the spatial extent of an object? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A. (2009). Multiple kernels for object detection. In International Conference on Computer Vision (ICCV).
Vijayanarasimhan, S., & Grauman, K. (2008). Multiple-instance learning for weakly supervised object categorization: Keywords to visual categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Vijayanarasimhan, S., & Grauman, K. (2011). Training object detectors with crawled data and crowds: Large-scale live active learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Vijayanarasimhan, S., Grauman, K. (2008). Multi-level active prediction of useful image annotations for recognition. In Advances in Neural Information Processing Systems (NIPS).
Vijayanarasimhan, S., Kapoor, A. (2010). Visual recognition and detection under bounded computational resources. In Proceedings of IEEE International Conference on Computer Vision (CVPR).
Vijayanarasimhan, S., Jain, P., & Grauman, K. (2014). Hashing hyperplane queries to near points with applications to large-scale active learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(2), 276–288.
Viola, P., Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
von Ahn, L., Dabbish, L. (2004). Labeling images with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI).
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y. (2010). Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Welinder, P., & Perona, P. (2010). Rating annotators and obtaining cost-effective labels: Online crowdsourcing. In Workshop on Advancing Computer Vision with Humans in the Loop (ACVHL).
Yang, J., Yu, K., Gong, Y., Huang, T. (2009). Linear spatial pyramid matching sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Acknowledgments
The authors thank the anonymous reviewers for their helpful comments. This research is supported in part by NSF CAREER IIS-0747356 and DARPA Mind’s Eye.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Martial Hebert.
Rights and permissions
About this article
Cite this article
Vijayanarasimhan, S., Grauman, K. Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds. Int J Comput Vis 108, 97–114 (2014). https://doi.org/10.1007/s11263-014-0721-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-014-0721-9