Abstract
\(K\)-nearest neighbor (K-NN) graphs are an essential component of many established methods for content-based image retrieval and automated image annotation. The performance of such methods relies heavily on the semantic quality of the graphs, which can be measured as the proportion of neighbors sharing the same class label as their query images. Due to the noise in image features, the K-NN graphs produced by existing methods may suffer from low semantic quality. In this article, we propose NNF-Descent for the efficient construction of K-NN graphs based on nearest-neighbor and feature descent, in which selective sparsification of feature vectors is interleaved with neighborhood refinement operations in an effort to improve the semantic quality of the result. A variant of the Laplacian Score is proposed for the identification of noisy features local to individual images, whose values are then set to \(0\) (the global mean value after standardization). We show through extensive experiments on several datasets that NNF-Descent is able to increase the proportion of semantically-related images over unrelated images within the neighbor sets, and that the proposed method generalizes well for other types of data which are represented by high-dimensional feature vectors.
Similar content being viewed by others
References
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbor. In: Proceedings of the international conference on machine learning, pp 97–104
Brito M, Chávez E, Quiroz A, Yukich J (1997) Connectivity of the mutual \(k\)-nearest-neighbor graph in clustering and outlier detection. Stat Probab Lett 35(1):33–42
Chen J, Fang H, Saad Y (2009) Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. J Mach Learn Res 10:1989–2012
Dias DB, Madeo RCB, Rocha T, Bíscaro HH, Peres SM (2009) Hand movement recognition for Brazilian sign language: a study using distance-based neural networks. In: Proceedings of the international joint conference on neural networks, pp 697–704
Dong W, Charikar M, Li K (2011) Efficient \(k\)-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th international conference on world wide web, pp 577–586
Dy JG, Brodley CE, Kak AC, Broderick LS, Aisen AM (2003) Unsupervised feature selection applied to content-based retrieval of lung images. IEEE Trans Pattern Anal Mach Intell 25(3):373–378
Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic Press, San Diego
Geusebroek JM, Burghouts GJ, Smeulders AWM (2005) The Amsterdam library of object images. Int J Comput Vision 61(1):103–112
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th international conference on very large data bases, pp 518–529
Guldogan E, Gabbouj M (2008) Feature selection for content-based image retrieval. Signal Image Video Process 2(3):241–250
He R, Zhu Y, Zhan W (2009) Fast manifold-ranking for content-based image retrieval. ISECS Int Colloq Comput Commun Control Manag 2:299–302
He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Advances in neural information processing systems, vol 18. MIT Press, Cambridge, MA, pp 507–514
Houle ME, Oria V, Satoh S, Sun J (2011) Knowledge propagation in large image databases using neighborhood information. In: Proceedings of the ACM multimedia, pp 1033–1036
Houle ME, Oria V, Satoh S, Sun J (2013) Annotation propagation in image databases using similarity graphs. TOMCCAP 10(1):7
Houle ME, Ma X, Oria V, Sun J (2014) Improving the quality of K-NN graphs for image databases through vector sparsification. In: Proceedings of the international conference on multimedia retrieval, pp 89–96
Jiang W, Er G, Dai Q, Gu J (2006) Similarity-based online feature selection in content-based image retrieval. IEEE T Image Process 15(3):702–712
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Qin D, Gammeter S, Bossard L, Quack T, Gool LJV (2011) Hello neighbor: accurate object retrieval with \(k\)-reciprocal nearest neighbors. In: Proceedings of the 24th IEEE conference on computer vision and pattern recognition, pp 777–784
Rashedi E, Nezamabadi-pour H, Saryazdi S (2009) GSA: a gravitational search algorithm. Inf Sci 179(13):2232–2248
Rashedi E, Nezamabadi-pour H, Saryazdi S (2013) A simultaneous feature adaptation and feature selection method for content-based image retrieval systems. Knowl Based Syst 39:85–94
Robnik-Sikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1–2):23–69
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision, pp 138–142
Sun Y, Bhanu B (2010) Image retrieval with feature selection and relevance feedback. In: Proceedings of the 17th IEEE international conference on image processing, pp 3209–3212
Tang J, Hong R, Yan S, Chua TS, Qi GJ, Jain R (2011) Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images. ACM Trans Intell Syst Technol 2(2):14
Tong H, He J, Li M, Ma WY, Zhang HJ, Zhang C (2006) Manifold-ranking-based keyword propagation for image retrieval. In: Proceedings of EURASIP journal advances in signal processing
Vasconcelos N, Vasconcelos M (2004) Scalable discriminant feature selection for image retrieval and recognition. Proc IEEE Conf Comput Vis Pattern Recognit 2:770–775
Yang Y, Shen HT, Ma Z, Huang Z, Zhou X (2011) l\(_{\text{2, } \text{1 }}\)-norm regularized discriminative feature selection for unsupervised learning. In: Proceedings of the international joint conferences on artificial intelligence, pp 1589–1594
Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning, pp 1151–1157
Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th international conference on machine learning, pp 912–919
Acknowledgments
Michael Houle acknowledges the financial support of JSPS Kakenhi Kiban (C) Research Grant 24500135 and the JST ERATO Kawarabayashi Large Graph Project. Vincent Oria acknowledges the financial support of NSF under Grant 1241976.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Houle, M.E., Ma, X., Oria, V. et al. Improving the quality of K-NN graphs through vector sparsification: application to image databases. Int J Multimed Info Retr 3, 259–274 (2014). https://doi.org/10.1007/s13735-014-0067-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-014-0067-7