Skip to main content
Log in

Improving the quality of K-NN graphs through vector sparsification: application to image databases

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

\(K\)-nearest neighbor (K-NN) graphs are an essential component of many established methods for content-based image retrieval and automated image annotation. The performance of such methods relies heavily on the semantic quality of the graphs, which can be measured as the proportion of neighbors sharing the same class label as their query images. Due to the noise in image features, the K-NN graphs produced by existing methods may suffer from low semantic quality. In this article, we propose NNF-Descent for the efficient construction of K-NN graphs based on nearest-neighbor and feature descent, in which selective sparsification of feature vectors is interleaved with neighborhood refinement operations in an effort to improve the semantic quality of the result. A variant of the Laplacian Score is proposed for the identification of noisy features local to individual images, whose values are then set to \(0\) (the global mean value after standardization). We show through extensive experiments on several datasets that NNF-Descent is able to increase the proportion of semantically-related images over unrelated images within the neighbor sets, and that the proposed method generalizes well for other types of data which are represented by high-dimensional feature vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  2. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  MATH  Google Scholar 

  3. Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbor. In: Proceedings of the international conference on machine learning, pp 97–104

  4. Brito M, Chávez E, Quiroz A, Yukich J (1997) Connectivity of the mutual \(k\)-nearest-neighbor graph in clustering and outlier detection. Stat Probab Lett 35(1):33–42

    Article  MATH  Google Scholar 

  5. Chen J, Fang H, Saad Y (2009) Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. J Mach Learn Res 10:1989–2012

    MathSciNet  MATH  Google Scholar 

  6. Dias DB, Madeo RCB, Rocha T, Bíscaro HH, Peres SM (2009) Hand movement recognition for Brazilian sign language: a study using distance-based neural networks. In: Proceedings of the international joint conference on neural networks, pp 697–704

  7. Dong W, Charikar M, Li K (2011) Efficient \(k\)-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th international conference on world wide web, pp 577–586

  8. Dy JG, Brodley CE, Kak AC, Broderick LS, Aisen AM (2003) Unsupervised feature selection applied to content-based retrieval of lung images. IEEE Trans Pattern Anal Mach Intell 25(3):373–378

    Article  Google Scholar 

  9. Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic Press, San Diego

  10. Geusebroek JM, Burghouts GJ, Smeulders AWM (2005) The Amsterdam library of object images. Int J Comput Vision 61(1):103–112

    Article  Google Scholar 

  11. Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th international conference on very large data bases, pp 518–529

  12. Guldogan E, Gabbouj M (2008) Feature selection for content-based image retrieval. Signal Image Video Process 2(3):241–250

    Article  MATH  Google Scholar 

  13. He R, Zhu Y, Zhan W (2009) Fast manifold-ranking for content-based image retrieval. ISECS Int Colloq Comput Commun Control Manag 2:299–302

    Google Scholar 

  14. He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Advances in neural information processing systems, vol 18. MIT Press, Cambridge, MA, pp 507–514

  15. Houle ME, Oria V, Satoh S, Sun J (2011) Knowledge propagation in large image databases using neighborhood information. In: Proceedings of the ACM multimedia, pp 1033–1036

  16. Houle ME, Oria V, Satoh S, Sun J (2013) Annotation propagation in image databases using similarity graphs. TOMCCAP 10(1):7

    Article  Google Scholar 

  17. Houle ME, Ma X, Oria V, Sun J (2014) Improving the quality of K-NN graphs for image databases through vector sparsification. In: Proceedings of the international conference on multimedia retrieval, pp 89–96

  18. Jiang W, Er G, Dai Q, Gu J (2006) Similarity-based online feature selection in content-based image retrieval. IEEE T Image Process 15(3):702–712

    Article  Google Scholar 

  19. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324

    Article  MATH  Google Scholar 

  20. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  21. Qin D, Gammeter S, Bossard L, Quack T, Gool LJV (2011) Hello neighbor: accurate object retrieval with \(k\)-reciprocal nearest neighbors. In: Proceedings of the 24th IEEE conference on computer vision and pattern recognition, pp 777–784

  22. Rashedi E, Nezamabadi-pour H, Saryazdi S (2009) GSA: a gravitational search algorithm. Inf Sci 179(13):2232–2248

    Article  MATH  Google Scholar 

  23. Rashedi E, Nezamabadi-pour H, Saryazdi S (2013) A simultaneous feature adaptation and feature selection method for content-based image retrieval systems. Knowl Based Syst 39:85–94

    Article  Google Scholar 

  24. Robnik-Sikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1–2):23–69

    Article  MATH  Google Scholar 

  25. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326

    Article  Google Scholar 

  26. Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision, pp 138–142

  27. Sun Y, Bhanu B (2010) Image retrieval with feature selection and relevance feedback. In: Proceedings of the 17th IEEE international conference on image processing, pp 3209–3212

  28. Tang J, Hong R, Yan S, Chua TS, Qi GJ, Jain R (2011) Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images. ACM Trans Intell Syst Technol 2(2):14

    Article  Google Scholar 

  29. Tong H, He J, Li M, Ma WY, Zhang HJ, Zhang C (2006) Manifold-ranking-based keyword propagation for image retrieval. In: Proceedings of EURASIP journal advances in signal processing

  30. Vasconcelos N, Vasconcelos M (2004) Scalable discriminant feature selection for image retrieval and recognition. Proc IEEE Conf Comput Vis Pattern Recognit 2:770–775

    Google Scholar 

  31. Yang Y, Shen HT, Ma Z, Huang Z, Zhou X (2011) l\(_{\text{2, } \text{1 }}\)-norm regularized discriminative feature selection for unsupervised learning. In: Proceedings of the international joint conferences on artificial intelligence, pp 1589–1594

  32. Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning, pp 1151–1157

  33. Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th international conference on machine learning, pp 912–919

Download references

Acknowledgments

Michael Houle acknowledges the financial support of JSPS Kakenhi Kiban (C) Research Grant 24500135 and the JST ERATO Kawarabayashi Large Graph Project. Vincent Oria acknowledges the financial support of NSF under Grant 1241976.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jichao Sun.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Houle, M.E., Ma, X., Oria, V. et al. Improving the quality of K-NN graphs through vector sparsification: application to image databases. Int J Multimed Info Retr 3, 259–274 (2014). https://doi.org/10.1007/s13735-014-0067-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-014-0067-7

Keywords

Navigation