Abstract
Raw visual data used to train classifiers is abundant and easy to gather, but lacks semantic labels that describe visual concepts of interest. These labels are necessary for supervised learning and can require significant human effort to collect. We discuss four labeling objectives that play an important role in the design of frameworks aimed at collecting label information for large training sets while maintaining low human effort: discovery, efficiency, exploitation and accuracy. We introduce a framework that explicitly models and balances these four labeling objectives with the use of (1) hierarchical clustering, (2) a novel interestingness measure that defines structural change within the hierarchy, and (3) an iterative group-based labeling process that exploits relationships between labeled and unlabeled data. Results on benchmark data show that our framework collects labeled training data more efficiently than existing labeling techniques and trains higher performing visual classifiers. Further, we show that our resulting framework is fast and significantly reduces human interaction time when labeling real-world multi-concept imagery depicting outdoor environments.
























Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
This work focuses on the classification task, not detection in a multi-concept scene. While some multi-concept datasets are used for evaluation, each image is first decomposed via segmentation or region proposal to generate a set of single-concept training examples.
This is equivalent to 21 labeling interactions.
The largest dataset used in the original BBAL experiments was 5000 images (Vijayanarasimhan et al. 2010).
References
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Susstrunk, S. (2012). Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2282.
Biswas, A., & Jacobs, D. (2012). Active image clustering: Seeking constraints from humans to complement algorithms. In Proceedings of computer vision and pattern recognition (pp. 2152—2159). IEEE.
Chaaraoui, A. A., Climent-Pérez, P., & Flórez-Revuelta, F. (2012). A review on vision techniques applied to human behaviour analysis for ambient-assisted living. Expert Systems with Applications, 39(12), 10873–10888.
Chang, J. C., Kittur, A., & Hahn, N. (2016). Alloy: Clustering with crowds and computation. In Proceedings of the CHI conference on human factors in computing systems (pp. 3180–3191). ACM.
Chatterjee, A., Rakshit, A., & Singh, N. N. (2012). Vision based autonomous robot navigation: Algorithms and implementations (Vol. 455). Berlin: Springer.
Chen, J., Cui, Y., Ye, G., Liu, D., & Chang, S. F. (2014). Event-driven semantic concept discovery by exploiting weakly tagged internet images. In Proceedings of international conference on multimedia retrieval (p. 1). ACM.
Chilton, L. B., Little, G., Edge, D., Weld, D. S., & Landay, J. A. (2013). Cascade: Crowdsourcing taxonomy creation. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1999–2008). ACM.
Dai, D., Prasad, M., Leistner, C., & Van Gool, L. (2012). Ensemble partitioning for unsupervised image categorization. In Proceedings of European conference on computer vision (pp. 483–496). Springer.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the conference on computer vision and pattern recognition (Vol. 1, pp. 886–893). IEEE.
Deng, J., Dong, W., Socher, R., Li, L. J., Li K., & Fei-Fei L. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of computer vision and pattern recognition. IEEE.
Deng, J., Russakovsky, O., Krause, J., Bernstein, M. S., Berg, A., & Fei-Fei, L. (2014). Scalable multi-label annotation. In Proceedings of human factors in computing systems (pp. 3099–3102). ACM.
Fei-Fei, L., & Perona, P. (2005). A bayesian hierarchical model for learning natural scene categories. In Proceedings of computer vision and pattern recognition (Vol. 2, pp. 524–531). IEEE.
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.
Frénay, B., & Verleysen, M. (2014). Classification in the presence of label noise: a survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5), 845–869.
Galleguillos, C., McFee, B., & Lanckriet, G. (2014). Iterative category discovery via multiple kernel metric learning. International Journal of Computer Vision, 108(1–2), 115–132. doi:10.1007/s11263-013-0679-z.
Gilbert, A., & Bowden, R. (2011). igroup: Weakly supervised image and video grouping. In Proceedings of international conference on computer vision (pp. 2166–2173).
Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical report, California Institute of Technology.
Holub, A., Perona, P., & Burl, M. C. (2008). Entropy-based active learning for object recognition. In Proceedings of computer vision and pattern recognition workshops (pp. 1–8). IEEE.
Jain, P., & Kapoor, A. (2009). Active learning for large multi-class problems. In Proceedings of computer vision and pattern recognition (pp. 762–769). IEEE.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093
Joshi, A. J., Porikli, F., & Papanikolopoulos, N. (2009). Multi-class active learning for image classification. In Proceedings of computer vision and pattern recognition (pp. 2372–2379).
Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2007). Active learning with gaussian processes for object categorization. In Proceedings of international conference on computer vision (pp. 1–8). IEEE.
Krishna, R., Hata, K., Chen, S., Kravitz, J., Shamma, D. A., Fei-Fei, L., et al. (2016). Embracing error to enable rapid crowdsourcing. In Proceedings of the CHI conference on human factors in computing systems. ACM.
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical report, University of Toronto.
Krizhevsky, A., Sutskever, I., & Hinton G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
Lee, Y. J., & Grauman, K. (2011). Learning the easy things first: Self-paced visual category discovery. In Proceedings of computer vision and pattern recognition (pp. 1721–1728). IEEE.
Lee, Y. J., & Grauman, K. (2012). Object-graphs for context-aware visual category discovery. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2), 346–358.
Lennon, C., Bodt, B., Childers, M., Camden, R., Suppé, A., Navarro-Serment, L., et al. (2013). Performance evaluation of a semantic perception classifier. Technical report ARL-TR-6653, Army Research Labs.
Li, X., & Guo, Y. (2013). Adaptive active learning for image classification. In Proceedings of computer vision and pattern recognition. IEEE.
Liu, D., & Chen, T. (2007). Unsupervised image categorization and object localization using topic models and correspondences between images. In Proceedings of international conference on computer vision (pp. 1–7). IEEE.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Munoz, D. (2013). Inference machines: Parsing scenes via iterated predictions. PhD thesis, The Robotics Institute, Carnegie Mellon University.
Nettleton, D., Orriols-Puig, A., & Fornells, A. (2010). A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review, 33(4), 275–306.
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning (Vol. 2, p. 5).
Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 29(1), 51–59.
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.
Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173.
Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010). Adapting visual category models to new domains. In Proceedings of the European conference on computer vision (pp. 213–226). Springer.
Settles, B. (2010). Active learning literature survey. Madison: University of Wisconsin.
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings of European conference on computer vision (pp. 1–15). Springer.
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In Proceedings of international conference on computer vision (pp. 370–377).
Sorokin, A., & Forsyth, D. (2008). Utility data annotation with amazon mechanical turk. In Computer vision and pattern recognition workshops
Sun, C., Gan, C., & Nevatia, R. (2015). Automatic concept discovery from parallel text and visual corpora. In Proceedings of the IEEE international conference on computer vision (pp. 2596–2604).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of computer vision and pattern recognition. IEEE.
Tamuz, O., Liu, C., Belongie, S., Shamir, O., & Kalai, A. T. (2011). Adaptively learning the crowd kernel. In Proceedings of the international conference on machine learning. IEEE.
Tuytelaars, T., Lampert, C. H., Blaschko, M. B., & Buntine, W. (2010). Unsupervised object discovery: A comparison. International Journal of Computer Vision, 88(2), 284–302.
Vijayanarasimhan, S., & Grauman, K. (2014). Large-scale live active learning: Training object detectors with crawled data and crowds. International Journal of Computer Vision, 108(1–2), 97–114.
Vijayanarasimhan, S., Jain, P., & Grauman, K. (2010). Far-sighted active learning on a budget for image and video recognition. In Proceedings of the conference on computer vision and pattern recognition (pp. 3035–3042). IEEE.
Ward, J. H, Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.
Wigness, M., Draper, B. A., Beveridge, J. R. (2014). Selectively guiding visual concept discovery. In Proceedings of the winter conference on applications of computer vision. IEEE.
Wigness, M., Draper, B. A., & Beveridge, J. R. (2015). Efficient label collection for unlabeled image datasets. In Proceedings of computer vision and pattern recognition. IEEE.
Wigness, M., Rogers III J. G., Navarro-Serment, L. E., Suppe, A., & Draper, B. A. (2016). Reducing adaptation latency for multi-concept visual perception in outdoor environments. In Proceedings of international conference on intelligent robots and systems. IEEE.
Xiong, C., Johnson, D. M., & Corso, J. J. (2012). Spectral active clustering via purification of the \(k\)-nearest neighbor graph. In Proceedings of European conference on data mining.
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In: Advances in neural information processing systems (pp. 487–495).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by T. E. Boult.
Rights and permissions
About this article
Cite this article
Wigness, M., Draper, B.A. & Beveridge, J.R. Efficient Label Collection for Image Datasets via Hierarchical Clustering. Int J Comput Vis 126, 59–85 (2018). https://doi.org/10.1007/s11263-017-1039-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-017-1039-1