Skip to main content
Log in

Toward semantic image similarity from crowdsourced clustering

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Determining the similarity between images is a fundamental step in many applications, such as image categorization, image labeling and image retrieval. Automatic methods for similarity estimation often fall short when semantic context is required for the task, raising the need for human judgment. Such judgments can be collected via crowdsourcing techniques, based on tasks posed to web users. However, to allow the estimation of image similarities in reasonable time and cost, the generation of tasks to the crowd must be done in a careful manner. We observe that distances within local neighborhoods provide valuable information that allows a quick and accurate construction of the global similarity metric. This key observation leads to a solution based on clustering tasks, comparing relatively similar images. In each query, crowd members cluster a small set of images into bins. The results yield many relative similarities between images, which are used to construct a global image similarity metric. This metric is progressively refined, and serves to generate finer, more local queries in subsequent iterations. We demonstrate the effectiveness of our method on datasets where ground truth is available, and on a collection of images where semantic similarities cannot be quantified. In particular, we show that our method outperforms alternative baseline approaches, and prove the usefulness of clustering queries, and of our progressive refinement process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Crowdsourcing is a general name for processes that involve posing many small-scale tasks to the crowd of web users, and piecing together the crowd’s answers to achieve a larger-scale goal, such as constructing a large knowledge base.

References

  1. Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a mahalanobis metric from equivalence constraints. J. Mach. Learn. Res. 6(6), 937–965 (2005)

    MathSciNet  MATH  Google Scholar 

  2. Biswas, A., Jacobs, D.: Active image clustering with pairwise constraints from humans. Int. J. Comput. Vis. 108(1–2), 133–147 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  3. Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al.: Shapenet: An information-rich 3d model repository. arXiv:1512.03012 (arXiv preprint) (2015)

  4. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition, IEEE, 2005, vol. 1, pp. 886–893 (2005)

  5. Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Using the crowd for top-k and group-by queries. In: International conference on database theory, pp. 225–236. ACM (2013)

  6. Frome, A., Singer, Y., Sha, F., Malik, J.: Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: International conference on computer vision, IEEE. pp. 1–8 (2007)

  7. Gomes, R.G., Welinder, P., Krause, A., Perona, P.: Crowdclustering. In: Advances in neural information processing systems. pp. 558–566 (2011)

  8. Lowe, D.G.: Object recognition from local scale-invariant features. In: International conference on computer vision, IEEE 1999, vol. 2, pp. 1150–1157 (1999)

  9. Lun, Z., Kalogerakis, E., Sheffer, A.: Elements of style: learning perceptual shape style similarity. ACM Trans. Gr. (TOG) 34(4), 84 (2015)

    Google Scholar 

  10. Marcus, A., Wu, E., Karger, D., Madden, S., Miller, R.: Human-powered sorts and joins. Proc. VLDB Endow. 5(1), 13–24 (2011)

    Article  Google Scholar 

  11. O’Donovan, P., Lībeks, J., Agarwala, A., Hertzmann, A.: Exploratory font selection using crowdsourced attributes. ACM Trans. Gr. (TOG) 33(4), 92 (2014)

  12. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

    Article  MATH  Google Scholar 

  13. Saleh, B., Dontcheva, M., Hertzmann, A., Liu, Z.: Learning style similarity for searching infographics. In: Proceedings of the 41st Graphics Interface Conference, pp. 59–64. Canadian Information Processing Society (2015)

  14. Sammon, J.W.: A nonlinear mapping for data structure analysis. In: IEEE transactions on computers (1969)

  15. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: International conference on computer vision, IEEE 2003, pp. 1470–1477 (2003)

  16. Tamuz, O., Liu, C., Shamir, O., Kalai, A., Belongie, S.J.: Adaptively learning the crowd kernel. In: International conference on machine learning (ICML-11), pp. 673–680. ACM (2011)

  17. Wang, C., Blei, D., Li, F.-F.: Simultaneous image classification and annotation. Computer vision and pattern recognition, IEEE 2009, pp. 1903–1910 (2009)

  18. Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: crowdsourcing entity resolution. Proc. VLDB Endow. 5(11), 1483–1494 (2012)

    Article  Google Scholar 

  19. Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems, pp. 1473–1480 (2005)

  20. Wilber, M.J., Kwak, I.S., Belongie, S.J.: Cost-effective hits for relative similarity comparisons. In: Conference on human computation and crowdsourcing (2014)

  21. Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning with application to clustering with side-information. Adv. Neural Inf. Proc. Syst. 15, 505–512 (2003)

    Google Scholar 

  22. Yi, J., Jin, R., Jain, S., Yang, T., Jain, A.K.: Semi-crowdsourced clustering: Generalizing crowd labeling by robust distance metric learning. In: Advances in neural information processing systems, pp. 1772–1780 (2012)

  23. Zha, Z.-J., Hua, X.-S., Mei, T., Wang, J., Qi, G.-J., Wang, Z.: Joint multi-label multi-instance learning for image classification. Computer vision and pattern recognition, IEEE 2008, pp. 1–8 (2008)

Download references

Acknowledgments

This research was supported by a Google Focused Research Award, the Israeli Science Foundation (ISF, Grant No. 1636/13), by ICRC-The Blavatnik Interdisciplinary Cyber Research Center, and by the European Research Council under the FP7, ERC Grant MoDaS, Agreement 291071.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yael Amsterdamer.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kleiman, Y., Goldberg, G., Amsterdamer, Y. et al. Toward semantic image similarity from crowdsourced clustering. Vis Comput 32, 1045–1055 (2016). https://doi.org/10.1007/s00371-016-1266-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-016-1266-4

Keywords

Navigation