Abstract
We present an alternative technique for similarity estimation under locality sensitive hashing (LSH) schemes with discrete output. By utilising control variates and extra information, we are able to achieve better theoretical variance reductions compared to maximum likelihood estimation with extra information. We show that our method obtains equivalent results, but slight modifications can provide better empirical results and stability at lower dimensions. Finally, we compare the various methods’ performances on the MNIST and Gisette dataset, and show that our model achieves better accuracy and stability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66(4), 671–687 (2003)
Broder, A.Z.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences 1997, Proceedings, pp. 21–29. IEEE (1997)
Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing, pp. 380–388. ACM (2002)
Glynn, P.W., Szechtman, R.: Some new perspectives on the method of control variates. In: Fang, K.T., Niederreiter, H., Hickernell, F.J. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2000, pp. 27–49. Springer, Heidelberg (2002). https://doi.org/10.1007/978-3-642-56046-0_3
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM (JACM) 42(6), 1115–1145 (1995)
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 545–552. MIT Press (2005)
Guyon, I., Gunn, S.R., Ben-Hur, A., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. In: NIPS, vol. 4, pp. 545–552 (2004)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC 1998, pp. 604–613. ACM, New York (1998). https://doi.org/10.1145/276698.276876, http://doi.acm.org/10.1145/276698.276876
Kang, K.: Using the multivariate normal to improve random projections. In: Yin, H., et al. (eds.) IDEAL 2017. LNCS, vol. 10585, pp. 397–405. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68935-7_43
Kang, K.: Correlations between random projections and the bivariate normal. Data Min. Knowl. Disc. 35(4), 1622–1653 (2021). https://doi.org/10.1007/s10618-021-00764-6
Kang, K., Wong, W.P.: Improving sign random projections with additional information. In: International Conference on Machine Learning, pp. 2479–2487. PMLR (2018)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Li, P., Hastie, T.J., Church, K.W.: Improving random projections using marginal information. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 635–649. Springer, Heidelberg (2006). https://doi.org/10.1007/11776420_46
Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml
Rubinstein, R.Y., Marcus, R.: Efficiency of multivariate control variates in Monte Carlo simulation. Oper. Res. 33(3), 661–677 (1985)
Slaney, M., Casey, M.: Locality-sensitive hashing for finding nearest neighbors [lecture notes]. IEEE Signal Process. Mag. 25(2), 128–131 (2008)
Szechtman, R., Glynn, P.W.: Constrained Monte Carlo and the method of control variates. In: Proceeding of the 2001 Winter Simulation Conference (Cat. No. 01CH37304), vol. 1, pp. 394–400. IEEE (2001)
Acknowledgements
This work is funded by the Singapore Ministry of Education Academic Research Fund Tier 2 Grant MOE2018-T2-2-013, as well as with the support of the Singapore University of Technology and Design’s Undergraduate Research Opportunities Programme.
The authors also thank the anonymous reviewers for their comments and suggestions for improvement, which has helped to enhance the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chew, J., Kang, K. (2021). Control Variates for Similarity Search. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13019. Springer, Cham. https://doi.org/10.1007/978-3-030-88004-0_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-88004-0_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88003-3
Online ISBN: 978-3-030-88004-0
eBook Packages: Computer ScienceComputer Science (R0)