Efficient Label Collection for Image Datasets via Hierarchical Clustering

Wigness, Maggie; Draper, Bruce A.; Beveridge, J. Ross

doi:10.1007/s11263-017-1039-1

Efficient Label Collection for Image Datasets via Hierarchical Clustering

Published: 24 August 2017

Volume 126, pages 59–85, (2018)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

1272 Accesses
8 Citations
Explore all metrics

Abstract

Raw visual data used to train classifiers is abundant and easy to gather, but lacks semantic labels that describe visual concepts of interest. These labels are necessary for supervised learning and can require significant human effort to collect. We discuss four labeling objectives that play an important role in the design of frameworks aimed at collecting label information for large training sets while maintaining low human effort: discovery, efficiency, exploitation and accuracy. We introduce a framework that explicitly models and balances these four labeling objectives with the use of (1) hierarchical clustering, (2) a novel interestingness measure that defines structural change within the hierarchy, and (3) an iterative group-based labeling process that exploits relationships between labeled and unlabeled data. Results on benchmark data show that our framework collects labeled training data more efficiently than existing labeling techniques and trains higher performing visual classifiers. Further, we show that our resulting framework is fast and significantly reduces human interaction time when labeling real-world multi-concept imagery depicting outdoor environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-automatic Image Annotation

UBHIC: Top-Down Semi-supervised Hierarchical Image Classification Algorithm

Iterative Active Classification of Large Image Collection

Notes

http://www.image-net.org/papers/ImageNet_2010.pdf.
This work focuses on the classification task, not detection in a multi-concept scene. While some multi-concept datasets are used for evaluation, each image is first decomposed via segmentation or region proposal to generate a set of single-concept training examples.
This is equivalent to 21 labeling interactions.
The largest dataset used in the original BBAL experiments was 5000 images (Vijayanarasimhan et al. 2010).

References

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Susstrunk, S. (2012). Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2282.
Article Google Scholar
Biswas, A., & Jacobs, D. (2012). Active image clustering: Seeking constraints from humans to complement algorithms. In Proceedings of computer vision and pattern recognition (pp. 2152—2159). IEEE.
Chaaraoui, A. A., Climent-Pérez, P., & Flórez-Revuelta, F. (2012). A review on vision techniques applied to human behaviour analysis for ambient-assisted living. Expert Systems with Applications, 39(12), 10873–10888.
Article Google Scholar
Chang, J. C., Kittur, A., & Hahn, N. (2016). Alloy: Clustering with crowds and computation. In Proceedings of the CHI conference on human factors in computing systems (pp. 3180–3191). ACM.
Chatterjee, A., Rakshit, A., & Singh, N. N. (2012). Vision based autonomous robot navigation: Algorithms and implementations (Vol. 455). Berlin: Springer.
MATH Google Scholar
Chen, J., Cui, Y., Ye, G., Liu, D., & Chang, S. F. (2014). Event-driven semantic concept discovery by exploiting weakly tagged internet images. In Proceedings of international conference on multimedia retrieval (p. 1). ACM.
Chilton, L. B., Little, G., Edge, D., Weld, D. S., & Landay, J. A. (2013). Cascade: Crowdsourcing taxonomy creation. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1999–2008). ACM.
Dai, D., Prasad, M., Leistner, C., & Van Gool, L. (2012). Ensemble partitioning for unsupervised image categorization. In Proceedings of European conference on computer vision (pp. 483–496). Springer.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the conference on computer vision and pattern recognition (Vol. 1, pp. 886–893). IEEE.
Deng, J., Dong, W., Socher, R., Li, L. J., Li K., & Fei-Fei L. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of computer vision and pattern recognition. IEEE.
Deng, J., Russakovsky, O., Krause, J., Bernstein, M. S., Berg, A., & Fei-Fei, L. (2014). Scalable multi-label annotation. In Proceedings of human factors in computing systems (pp. 3099–3102). ACM.
Fei-Fei, L., & Perona, P. (2005). A bayesian hierarchical model for learning natural scene categories. In Proceedings of computer vision and pattern recognition (Vol. 2, pp. 524–531). IEEE.
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.
Article Google Scholar
Frénay, B., & Verleysen, M. (2014). Classification in the presence of label noise: a survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5), 845–869.
Article Google Scholar
Galleguillos, C., McFee, B., & Lanckriet, G. (2014). Iterative category discovery via multiple kernel metric learning. International Journal of Computer Vision, 108(1–2), 115–132. doi:10.1007/s11263-013-0679-z.
Article MathSciNet MATH Google Scholar
Gilbert, A., & Bowden, R. (2011). igroup: Weakly supervised image and video grouping. In Proceedings of international conference on computer vision (pp. 2166–2173).
Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical report, California Institute of Technology.
Holub, A., Perona, P., & Burl, M. C. (2008). Entropy-based active learning for object recognition. In Proceedings of computer vision and pattern recognition workshops (pp. 1–8). IEEE.
Jain, P., & Kapoor, A. (2009). Active learning for large multi-class problems. In Proceedings of computer vision and pattern recognition (pp. 762–769). IEEE.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093
Joshi, A. J., Porikli, F., & Papanikolopoulos, N. (2009). Multi-class active learning for image classification. In Proceedings of computer vision and pattern recognition (pp. 2372–2379).
Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2007). Active learning with gaussian processes for object categorization. In Proceedings of international conference on computer vision (pp. 1–8). IEEE.
Krishna, R., Hata, K., Chen, S., Kravitz, J., Shamma, D. A., Fei-Fei, L., et al. (2016). Embracing error to enable rapid crowdsourcing. In Proceedings of the CHI conference on human factors in computing systems. ACM.
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical report, University of Toronto.
Krizhevsky, A., Sutskever, I., & Hinton G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
Lee, Y. J., & Grauman, K. (2011). Learning the easy things first: Self-paced visual category discovery. In Proceedings of computer vision and pattern recognition (pp. 1721–1728). IEEE.
Lee, Y. J., & Grauman, K. (2012). Object-graphs for context-aware visual category discovery. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2), 346–358.
Article Google Scholar
Lennon, C., Bodt, B., Childers, M., Camden, R., Suppé, A., Navarro-Serment, L., et al. (2013). Performance evaluation of a semantic perception classifier. Technical report ARL-TR-6653, Army Research Labs.
Li, X., & Guo, Y. (2013). Adaptive active learning for image classification. In Proceedings of computer vision and pattern recognition. IEEE.
Liu, D., & Chen, T. (2007). Unsupervised image categorization and object localization using topic models and correspondences between images. In Proceedings of international conference on computer vision (pp. 1–7). IEEE.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Munoz, D. (2013). Inference machines: Parsing scenes via iterated predictions. PhD thesis, The Robotics Institute, Carnegie Mellon University.
Nettleton, D., Orriols-Puig, A., & Fornells, A. (2010). A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review, 33(4), 275–306.
Article Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning (Vol. 2, p. 5).
Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 29(1), 51–59.
Article Google Scholar
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.
Article MATH Google Scholar
Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173.
Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010). Adapting visual category models to new domains. In Proceedings of the European conference on computer vision (pp. 213–226). Springer.
Settles, B. (2010). Active learning literature survey. Madison: University of Wisconsin.
MATH Google Scholar
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings of European conference on computer vision (pp. 1–15). Springer.
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In Proceedings of international conference on computer vision (pp. 370–377).
Sorokin, A., & Forsyth, D. (2008). Utility data annotation with amazon mechanical turk. In Computer vision and pattern recognition workshops
Sun, C., Gan, C., & Nevatia, R. (2015). Automatic concept discovery from parallel text and visual corpora. In Proceedings of the IEEE international conference on computer vision (pp. 2596–2604).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of computer vision and pattern recognition. IEEE.
Tamuz, O., Liu, C., Belongie, S., Shamir, O., & Kalai, A. T. (2011). Adaptively learning the crowd kernel. In Proceedings of the international conference on machine learning. IEEE.
Tuytelaars, T., Lampert, C. H., Blaschko, M. B., & Buntine, W. (2010). Unsupervised object discovery: A comparison. International Journal of Computer Vision, 88(2), 284–302.
Article Google Scholar
Vijayanarasimhan, S., & Grauman, K. (2014). Large-scale live active learning: Training object detectors with crawled data and crowds. International Journal of Computer Vision, 108(1–2), 97–114.
Article MathSciNet Google Scholar
Vijayanarasimhan, S., Jain, P., & Grauman, K. (2010). Far-sighted active learning on a budget for image and video recognition. In Proceedings of the conference on computer vision and pattern recognition (pp. 3035–3042). IEEE.
Ward, J. H, Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.
Article MathSciNet Google Scholar
Wigness, M., Draper, B. A., Beveridge, J. R. (2014). Selectively guiding visual concept discovery. In Proceedings of the winter conference on applications of computer vision. IEEE.
Wigness, M., Draper, B. A., & Beveridge, J. R. (2015). Efficient label collection for unlabeled image datasets. In Proceedings of computer vision and pattern recognition. IEEE.
Wigness, M., Rogers III J. G., Navarro-Serment, L. E., Suppe, A., & Draper, B. A. (2016). Reducing adaptation latency for multi-concept visual perception in outdoor environments. In Proceedings of international conference on intelligent robots and systems. IEEE.
Xiong, C., Johnson, D. M., & Corso, J. J. (2012). Spectral active clustering via purification of the \(k\)-nearest neighbor graph. In Proceedings of European conference on data mining.
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In: Advances in neural information processing systems (pp. 487–495).

Download references

Author information

Authors and Affiliations

U.S. Army Research Laboratory, Adelphi, MD, USA
Maggie Wigness
Colorado State University, Fort Collins, CO, USA
Bruce A. Draper & J. Ross Beveridge

Authors

Maggie Wigness
View author publications
You can also search for this author in PubMed Google Scholar
Bruce A. Draper
View author publications
You can also search for this author in PubMed Google Scholar
J. Ross Beveridge
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maggie Wigness.

Additional information

Communicated by T. E. Boult.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wigness, M., Draper, B.A. & Beveridge, J.R. Efficient Label Collection for Image Datasets via Hierarchical Clustering. Int J Comput Vis 126, 59–85 (2018). https://doi.org/10.1007/s11263-017-1039-1

Download citation

Received: 08 July 2016
Accepted: 07 August 2017
Published: 24 August 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s11263-017-1039-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Label Collection for Image Datasets via Hierarchical Clustering

Abstract

Access this article

Similar content being viewed by others

Semi-automatic Image Annotation

UBHIC: Top-Down Semi-supervised Hierarchical Image Classification Algorithm

Iterative Active Classification of Large Image Collection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient Label Collection for Image Datasets via Hierarchical Clustering

Abstract

Access this article

Similar content being viewed by others

Semi-automatic Image Annotation

UBHIC: Top-Down Semi-supervised Hierarchical Image Classification Algorithm

Iterative Active Classification of Large Image Collection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation