Skip to main content
Log in

Efficient Label Collection for Image Datasets via Hierarchical Clustering

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Raw visual data used to train classifiers is abundant and easy to gather, but lacks semantic labels that describe visual concepts of interest. These labels are necessary for supervised learning and can require significant human effort to collect. We discuss four labeling objectives that play an important role in the design of frameworks aimed at collecting label information for large training sets while maintaining low human effort: discovery, efficiency, exploitation and accuracy. We introduce a framework that explicitly models and balances these four labeling objectives with the use of (1) hierarchical clustering, (2) a novel interestingness measure that defines structural change within the hierarchy, and (3) an iterative group-based labeling process that exploits relationships between labeled and unlabeled data. Results on benchmark data show that our framework collects labeled training data more efficiently than existing labeling techniques and trains higher performing visual classifiers. Further, we show that our resulting framework is fast and significantly reduces human interaction time when labeling real-world multi-concept imagery depicting outdoor environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

Notes

  1. http://www.image-net.org/papers/ImageNet_2010.pdf.

  2. This work focuses on the classification task, not detection in a multi-concept scene. While some multi-concept datasets are used for evaluation, each image is first decomposed via segmentation or region proposal to generate a set of single-concept training examples.

  3. This is equivalent to 21 labeling interactions.

  4. The largest dataset used in the original BBAL experiments was 5000 images (Vijayanarasimhan et al. 2010).

References

  • Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Susstrunk, S. (2012). Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2282.

    Article  Google Scholar 

  • Biswas, A., & Jacobs, D. (2012). Active image clustering: Seeking constraints from humans to complement algorithms. In Proceedings of computer vision and pattern recognition (pp. 2152—2159). IEEE.

  • Chaaraoui, A. A., Climent-Pérez, P., & Flórez-Revuelta, F. (2012). A review on vision techniques applied to human behaviour analysis for ambient-assisted living. Expert Systems with Applications, 39(12), 10873–10888.

    Article  Google Scholar 

  • Chang, J. C., Kittur, A., & Hahn, N. (2016). Alloy: Clustering with crowds and computation. In Proceedings of the CHI conference on human factors in computing systems (pp. 3180–3191). ACM.

  • Chatterjee, A., Rakshit, A., & Singh, N. N. (2012). Vision based autonomous robot navigation: Algorithms and implementations (Vol. 455). Berlin: Springer.

    MATH  Google Scholar 

  • Chen, J., Cui, Y., Ye, G., Liu, D., & Chang, S. F. (2014). Event-driven semantic concept discovery by exploiting weakly tagged internet images. In Proceedings of international conference on multimedia retrieval (p. 1). ACM.

  • Chilton, L. B., Little, G., Edge, D., Weld, D. S., & Landay, J. A. (2013). Cascade: Crowdsourcing taxonomy creation. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1999–2008). ACM.

  • Dai, D., Prasad, M., Leistner, C., & Van Gool, L. (2012). Ensemble partitioning for unsupervised image categorization. In Proceedings of European conference on computer vision (pp. 483–496). Springer.

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the conference on computer vision and pattern recognition (Vol. 1, pp. 886–893). IEEE.

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li K., & Fei-Fei L. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of computer vision and pattern recognition. IEEE.

  • Deng, J., Russakovsky, O., Krause, J., Bernstein, M. S., Berg, A., & Fei-Fei, L. (2014). Scalable multi-label annotation. In Proceedings of human factors in computing systems (pp. 3099–3102). ACM.

  • Fei-Fei, L., & Perona, P. (2005). A bayesian hierarchical model for learning natural scene categories. In Proceedings of computer vision and pattern recognition (Vol. 2, pp. 524–531). IEEE.

  • Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.

    Article  Google Scholar 

  • Frénay, B., & Verleysen, M. (2014). Classification in the presence of label noise: a survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5), 845–869.

    Article  Google Scholar 

  • Galleguillos, C., McFee, B., & Lanckriet, G. (2014). Iterative category discovery via multiple kernel metric learning. International Journal of Computer Vision, 108(1–2), 115–132. doi:10.1007/s11263-013-0679-z.

    Article  MathSciNet  MATH  Google Scholar 

  • Gilbert, A., & Bowden, R. (2011). igroup: Weakly supervised image and video grouping. In Proceedings of international conference on computer vision (pp. 2166–2173).

  • Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical report, California Institute of Technology.

  • Holub, A., Perona, P., & Burl, M. C. (2008). Entropy-based active learning for object recognition. In Proceedings of computer vision and pattern recognition workshops (pp. 1–8). IEEE.

  • Jain, P., & Kapoor, A. (2009). Active learning for large multi-class problems. In Proceedings of computer vision and pattern recognition (pp. 762–769). IEEE.

  • Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093

  • Joshi, A. J., Porikli, F., & Papanikolopoulos, N. (2009). Multi-class active learning for image classification. In Proceedings of computer vision and pattern recognition (pp. 2372–2379).

  • Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2007). Active learning with gaussian processes for object categorization. In Proceedings of international conference on computer vision (pp. 1–8). IEEE.

  • Krishna, R., Hata, K., Chen, S., Kravitz, J., Shamma, D. A., Fei-Fei, L., et al. (2016). Embracing error to enable rapid crowdsourcing. In Proceedings of the CHI conference on human factors in computing systems. ACM.

  • Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical report, University of Toronto.

  • Krizhevsky, A., Sutskever, I., & Hinton G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).

  • Lee, Y. J., & Grauman, K. (2011). Learning the easy things first: Self-paced visual category discovery. In Proceedings of computer vision and pattern recognition (pp. 1721–1728). IEEE.

  • Lee, Y. J., & Grauman, K. (2012). Object-graphs for context-aware visual category discovery. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2), 346–358.

    Article  Google Scholar 

  • Lennon, C., Bodt, B., Childers, M., Camden, R., Suppé, A., Navarro-Serment, L., et al. (2013). Performance evaluation of a semantic perception classifier. Technical report ARL-TR-6653, Army Research Labs.

  • Li, X., & Guo, Y. (2013). Adaptive active learning for image classification. In Proceedings of computer vision and pattern recognition. IEEE.

  • Liu, D., & Chen, T. (2007). Unsupervised image categorization and object localization using topic models and correspondences between images. In Proceedings of international conference on computer vision (pp. 1–7). IEEE.

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Munoz, D. (2013). Inference machines: Parsing scenes via iterated predictions. PhD thesis, The Robotics Institute, Carnegie Mellon University.

  • Nettleton, D., Orriols-Puig, A., & Fornells, A. (2010). A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review, 33(4), 275–306.

    Article  Google Scholar 

  • Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning (Vol. 2, p. 5).

  • Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 29(1), 51–59.

    Article  Google Scholar 

  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.

    Article  MATH  Google Scholar 

  • Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173.

  • Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010). Adapting visual category models to new domains. In Proceedings of the European conference on computer vision (pp. 213–226). Springer.

  • Settles, B. (2010). Active learning literature survey. Madison: University of Wisconsin.

    MATH  Google Scholar 

  • Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings of European conference on computer vision (pp. 1–15). Springer.

  • Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In Proceedings of international conference on computer vision (pp. 370–377).

  • Sorokin, A., & Forsyth, D. (2008). Utility data annotation with amazon mechanical turk. In Computer vision and pattern recognition workshops

  • Sun, C., Gan, C., & Nevatia, R. (2015). Automatic concept discovery from parallel text and visual corpora. In Proceedings of the IEEE international conference on computer vision (pp. 2596–2604).

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of computer vision and pattern recognition. IEEE.

  • Tamuz, O., Liu, C., Belongie, S., Shamir, O., & Kalai, A. T. (2011). Adaptively learning the crowd kernel. In Proceedings of the international conference on machine learning. IEEE.

  • Tuytelaars, T., Lampert, C. H., Blaschko, M. B., & Buntine, W. (2010). Unsupervised object discovery: A comparison. International Journal of Computer Vision, 88(2), 284–302.

    Article  Google Scholar 

  • Vijayanarasimhan, S., & Grauman, K. (2014). Large-scale live active learning: Training object detectors with crawled data and crowds. International Journal of Computer Vision, 108(1–2), 97–114.

    Article  MathSciNet  Google Scholar 

  • Vijayanarasimhan, S., Jain, P., & Grauman, K. (2010). Far-sighted active learning on a budget for image and video recognition. In Proceedings of the conference on computer vision and pattern recognition (pp. 3035–3042). IEEE.

  • Ward, J. H, Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.

    Article  MathSciNet  Google Scholar 

  • Wigness, M., Draper, B. A., Beveridge, J. R. (2014). Selectively guiding visual concept discovery. In Proceedings of the winter conference on applications of computer vision. IEEE.

  • Wigness, M., Draper, B. A., & Beveridge, J. R. (2015). Efficient label collection for unlabeled image datasets. In Proceedings of computer vision and pattern recognition. IEEE.

  • Wigness, M., Rogers III J. G., Navarro-Serment, L. E., Suppe, A., & Draper, B. A. (2016). Reducing adaptation latency for multi-concept visual perception in outdoor environments. In Proceedings of international conference on intelligent robots and systems. IEEE.

  • Xiong, C., Johnson, D. M., & Corso, J. J. (2012). Spectral active clustering via purification of the \(k\)-nearest neighbor graph. In Proceedings of European conference on data mining.

  • Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In: Advances in neural information processing systems (pp. 487–495).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maggie Wigness.

Additional information

Communicated by T. E. Boult.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wigness, M., Draper, B.A. & Beveridge, J.R. Efficient Label Collection for Image Datasets via Hierarchical Clustering. Int J Comput Vis 126, 59–85 (2018). https://doi.org/10.1007/s11263-017-1039-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-017-1039-1

Keywords

Navigation