Improving the quality of K-NN graphs through vector sparsification: application to image databases

Houle, Michael E.; Ma, Xiguo; Oria, Vincent; Sun, Jichao

doi:10.1007/s13735-014-0067-7

Improving the quality of K-NN graphs through vector sparsification: application to image databases

Regular Paper
Published: 21 September 2014

Volume 3, pages 259–274, (2014)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Michael E. Houle¹,
Xiguo Ma²,
Vincent Oria² &
…
Jichao Sun²

345 Accesses
1 Citation
Explore all metrics

Abstract

\(K\)-nearest neighbor (K-NN) graphs are an essential component of many established methods for content-based image retrieval and automated image annotation. The performance of such methods relies heavily on the semantic quality of the graphs, which can be measured as the proportion of neighbors sharing the same class label as their query images. Due to the noise in image features, the K-NN graphs produced by existing methods may suffer from low semantic quality. In this article, we propose NNF-Descent for the efficient construction of K-NN graphs based on nearest-neighbor and feature descent, in which selective sparsification of feature vectors is interleaved with neighborhood refinement operations in an effort to improve the semantic quality of the result. A variant of the Laplacian Score is proposed for the identification of noisy features local to individual images, whose values are then set to \(0\) (the global mean value after standardization). We show through extensive experiments on several datasets that NNF-Descent is able to increase the proportion of semantically-related images over unrelated images within the neighbor sets, and that the proposed method generalizes well for other types of data which are represented by high-dimensional feature vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Central Similarity Hashing Method via Weighted Partial-Softmax Loss

RefinerHash: a new hashing-based re-ranking technique for image retrieval

Article 08 April 2024

Farzad Sabahi, M. Omair Ahmad & M.N.S. Swamy

The Open Images Dataset V4

Article 13 March 2020

Alina Kuznetsova, Hassan Rom, … Vittorio Ferrari

References

Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Article MATH Google Scholar
Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbor. In: Proceedings of the international conference on machine learning, pp 97–104
Brito M, Chávez E, Quiroz A, Yukich J (1997) Connectivity of the mutual \(k\)-nearest-neighbor graph in clustering and outlier detection. Stat Probab Lett 35(1):33–42
Article MATH Google Scholar
Chen J, Fang H, Saad Y (2009) Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. J Mach Learn Res 10:1989–2012
MathSciNet MATH Google Scholar
Dias DB, Madeo RCB, Rocha T, Bíscaro HH, Peres SM (2009) Hand movement recognition for Brazilian sign language: a study using distance-based neural networks. In: Proceedings of the international joint conference on neural networks, pp 697–704
Dong W, Charikar M, Li K (2011) Efficient \(k\)-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th international conference on world wide web, pp 577–586
Dy JG, Brodley CE, Kak AC, Broderick LS, Aisen AM (2003) Unsupervised feature selection applied to content-based retrieval of lung images. IEEE Trans Pattern Anal Mach Intell 25(3):373–378
Article Google Scholar
Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic Press, San Diego
Geusebroek JM, Burghouts GJ, Smeulders AWM (2005) The Amsterdam library of object images. Int J Comput Vision 61(1):103–112
Article Google Scholar
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th international conference on very large data bases, pp 518–529
Guldogan E, Gabbouj M (2008) Feature selection for content-based image retrieval. Signal Image Video Process 2(3):241–250
Article MATH Google Scholar
He R, Zhu Y, Zhan W (2009) Fast manifold-ranking for content-based image retrieval. ISECS Int Colloq Comput Commun Control Manag 2:299–302
Google Scholar
He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Advances in neural information processing systems, vol 18. MIT Press, Cambridge, MA, pp 507–514
Houle ME, Oria V, Satoh S, Sun J (2011) Knowledge propagation in large image databases using neighborhood information. In: Proceedings of the ACM multimedia, pp 1033–1036
Houle ME, Oria V, Satoh S, Sun J (2013) Annotation propagation in image databases using similarity graphs. TOMCCAP 10(1):7
Article Google Scholar
Houle ME, Ma X, Oria V, Sun J (2014) Improving the quality of K-NN graphs for image databases through vector sparsification. In: Proceedings of the international conference on multimedia retrieval, pp 89–96
Jiang W, Er G, Dai Q, Gu J (2006) Similarity-based online feature selection in content-based image retrieval. IEEE T Image Process 15(3):702–712
Article Google Scholar
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Article MATH Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Qin D, Gammeter S, Bossard L, Quack T, Gool LJV (2011) Hello neighbor: accurate object retrieval with \(k\)-reciprocal nearest neighbors. In: Proceedings of the 24th IEEE conference on computer vision and pattern recognition, pp 777–784
Rashedi E, Nezamabadi-pour H, Saryazdi S (2009) GSA: a gravitational search algorithm. Inf Sci 179(13):2232–2248
Article MATH Google Scholar
Rashedi E, Nezamabadi-pour H, Saryazdi S (2013) A simultaneous feature adaptation and feature selection method for content-based image retrieval systems. Knowl Based Syst 39:85–94
Article Google Scholar
Robnik-Sikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1–2):23–69
Article MATH Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Article Google Scholar
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision, pp 138–142
Sun Y, Bhanu B (2010) Image retrieval with feature selection and relevance feedback. In: Proceedings of the 17th IEEE international conference on image processing, pp 3209–3212
Tang J, Hong R, Yan S, Chua TS, Qi GJ, Jain R (2011) Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images. ACM Trans Intell Syst Technol 2(2):14
Article Google Scholar
Tong H, He J, Li M, Ma WY, Zhang HJ, Zhang C (2006) Manifold-ranking-based keyword propagation for image retrieval. In: Proceedings of EURASIP journal advances in signal processing
Vasconcelos N, Vasconcelos M (2004) Scalable discriminant feature selection for image retrieval and recognition. Proc IEEE Conf Comput Vis Pattern Recognit 2:770–775
Google Scholar
Yang Y, Shen HT, Ma Z, Huang Z, Zhou X (2011) l\(_{\text{2, } \text{1 }}\)-norm regularized discriminative feature selection for unsupervised learning. In: Proceedings of the international joint conferences on artificial intelligence, pp 1589–1594
Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning, pp 1151–1157
Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th international conference on machine learning, pp 912–919

Download references

Acknowledgments

Michael Houle acknowledges the financial support of JSPS Kakenhi Kiban (C) Research Grant 24500135 and the JST ERATO Kawarabayashi Large Graph Project. Vincent Oria acknowledges the financial support of NSF under Grant 1241976.

Author information

Authors and Affiliations

National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan
Michael E. Houle
New Jersey Institute of Technology, University Heights, Newark, NJ, 07102, USA
Xiguo Ma, Vincent Oria & Jichao Sun

Authors

Michael E. Houle
View author publications
You can also search for this author in PubMed Google Scholar
Xiguo Ma
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Oria
View author publications
You can also search for this author in PubMed Google Scholar
Jichao Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jichao Sun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Houle, M.E., Ma, X., Oria, V. et al. Improving the quality of K-NN graphs through vector sparsification: application to image databases. Int J Multimed Info Retr 3, 259–274 (2014). https://doi.org/10.1007/s13735-014-0067-7

Download citation

Received: 18 July 2014
Revised: 26 August 2014
Accepted: 29 August 2014
Published: 21 September 2014
Issue Date: November 2014
DOI: https://doi.org/10.1007/s13735-014-0067-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the quality of K-NN graphs through vector sparsification: application to image databases

Abstract

Access this article

Similar content being viewed by others

A Central Similarity Hashing Method via Weighted Partial-Softmax Loss

RefinerHash: a new hashing-based re-ranking technique for image retrieval

The Open Images Dataset V4

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving the quality of K-NN graphs through vector sparsification: application to image databases

Abstract

Access this article

Similar content being viewed by others

A Central Similarity Hashing Method via Weighted Partial-Softmax Loss

RefinerHash: a new hashing-based re-ranking technique for image retrieval

The Open Images Dataset V4

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation