Skip to main content
Log in

Large-scale structural learning and predicting via hashing approximation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

By combining the structural information with nonparallel support vector machine, structural nonparallel support vector machine (SNPSVM) can fully exploit prior knowledge to directly improve the algorithm’s generalization capacity. However, the scalability issue how to train SNPSVM efficiently on data with huge dimensions has not been studied. In this paper, we integrate linear SNPSVM with b-bit minwise hashing scheme to speedup the training phase for large-scale and high-dimensional statistical learning, and then we address the problem of speeding-up its prediction phase via locality-sensitive hashing. For one-against-one multi-class classification problems, a two-stage strategy is put forward: a series of hash-based classifiers are built in order to approximate the exact results and filter the hypothesis space in the first stage and then the classification can be refined by solving a multi-class SNPSVM on the remaining classes in the second stage. The proposed method can deal with large-scale classification problems with a huge number of features. Experimental results on two large-scale datasets (i.e., news20 and webspam) demonstrate the efficiency of structural learning via b-bit minwise hashing. Experimental results on the ImageNet-BOF dataset, and several large-scale UCI datasets show that the proposed hash-based prediction can be more than two orders of magnitude faster than the exact classifier with minor losses in quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Adankon M, Cheriet M (2009) Model selection for the LS-SVM application to handwriting recognition. Pattern Recognit 42(12):3264–3270

    Article  MATH  Google Scholar 

  2. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(Nov):2399–2434

    MathSciNet  MATH  Google Scholar 

  3. Boley D, Cao D (2004) Training support vector machine using adaptive clustering. In: Proceedings of the 2004 SIAM international conference on data mining. SIAM, pp 126–137

  4. Bordes A, Ertekin S, Weston J, Bottou L (2005) Fast kernel classifiers with online and active learning. J Mach Learn Res 6(Sep):1579–1619

    MathSciNet  MATH  Google Scholar 

  5. Broder AZ (1997) On the resemblance and containment of documents. In: Compression and complexity of sequences 1997. Proceedings. IEEE, pp 21–29

  6. Cakir F, Sclaroff S (2015) Adaptive hashing for fast similarity search. In: Proceedings of the IEEE international conference on computer vision, pp 1044–1052

  7. Chakrabarti A, Satuluri V, Srivathsan A, Parthasarathy S (2015) A bayesian perspective on locality sensitive hashing with extensions for kernel methods. ACM Trans Knowl Discov Data (TKDD) 10(2):19

    Google Scholar 

  8. Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: Proceedings of the thirty-fourth annual ACM symposium on theory of computing. ACM, pp 380–388

  9. Chen D, Tian Y, Liu X (2016) Structural nonparallel support vector machine for pattern recognition. Pattern Recognit 60:296–305

    Article  MATH  Google Scholar 

  10. Chi L, Zhu X (2017) Hashing techniques: A survey and taxonomy. ACM Comput Surv (CSUR) 50(1):11

    Article  Google Scholar 

  11. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  12. Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on Computational geometry. ACM, pp 253–262

  13. Deng N, Tian Y, Zhang CH (2013) Support vector machines: optimization based theory, algorithms and extensions. CRC Press, Boca Raton

    MATH  Google Scholar 

  14. Eshghi K, Kafai M (2016) Support vector machines with sparse binary high-dimensional feature vectors, HPE-2016-30

  15. Fetterly D, Manasse M, Najork M, Wiener J (2003) A large-scale study of the evolution of web pages. In: Proceedings of the 12th international conference on World Wide Web. ACM, pp 669–678

  16. Gollapudi S, Sharma A (2009) An axiomatic approach for result diversification. In: Proceedings of the 18th international conference on World Wide Web. ACM, pp 381–390

  17. Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929

    Article  Google Scholar 

  18. Hoi SC, Jin R, Zhu J, Lyu MR (2009) Semisupervised SVM batch mode active learning with applications to image retrieval. ACM Trans Inf Syst (TOIS) 27(3):16

    Article  Google Scholar 

  19. Hsieh CJ, Si S, Dhillon IS (2014) A divide-and-conquer solver for kernel support vector machines. In: ICML, pp 566–574

  20. Huang LK, Yang Q, Zheng WS (2017) Online hashing. arXiv preprint arXiv:1704.01897

  21. Huang YP, Chang TW, Sandnes FE (2006) An efficient fuzzy hashing model for image retrieval. In: Fuzzy information processing society. NAFIPS 2006. Annual meeting of the North American. IEEE, pp 223–228

  22. Ito K, Ishio T, Inoue K (2017) Web-service for finding cloned files using b-bit minwise hashing. In: Software Clones (IWSC), 2017 IEEE 11th international workshop on. IEEE, pp 1–2

  23. Jayadeva RK, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910. doi:10.1109/TPAMI.2007.1068

    Article  MATH  Google Scholar 

  24. Kevenaar TA, Schrijen GJ, van der Veen M, Akkermans AH, Zuo F (2005) Face recognition with renewable and privacy preserving binary templates. In: Automatic identification advanced technologies. Fourth IEEE workshop on. IEEE, pp 21–26

  25. Langford J, Li L, Zhang T (2009) Sparse online learning via truncated gradient. J Mach Learn Res 10(Mar):777–801

    MathSciNet  MATH  Google Scholar 

  26. Li P, König A, Gui W (2010) b-bit minwise hashing for estimating three-way similarities. In: Advances in neural information processing systems, pp 1387–1395

  27. Li P, König C (2010) b-bit minwise hashing. In: Proceedings of the 19th international conference on World Wide Web. ACM, pp 671–680

  28. Li P, Shrivastava A, Moore JL, König AC (2011) Hashing algorithms for large-scale learning. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ (eds) Advances in neural information processing systems, vol. 24. Curran Associates, Inc., pp 2672–2680. http://papers.nips.cc/paper/4403-hashing-algorithms-for-large-scale-learning.pdf

  29. Litayem S, Joly A, Boujemaa N (2012) Hash-based support vector machines approximation for large scale prediction. In: BMVC, pp 1–11

  30. Liu W, Wang J, Kumar S, Chang SF (2011) Hashing with graphs. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 1–8

  31. Liu X, Mu Y, Lang B, Chang SF (2012) Compact hashing for mixed image-keyword query over multi-label images. In: Proceedings of the 2nd ACM international conference on multimedia retrieval. ACM, p 18

  32. Mu Y, Yan S (2010) Non-metric locality-sensitive hashing. In: AAAI, pp 539–544

  33. Najork M, Gollapudi S, Panigrahy R (2009) Less is more: sampling the neighborhood graph makes salsa better and faster. In: Proceedings of the second ACM international conference on web search and data mining. ACM, pp 242–251

  34. Noble W (2004) Support vector machine applications in computational biology. MIT Press, Cambridge

    Google Scholar 

  35. Oyedotun OK, Khashman A (2016) Deep learning in vision-based static hand gesture recognition. Neural Comput Appl 28(12):3941–3951

    Article  Google Scholar 

  36. Pandey S, Broder A, Chierichetti F, Josifovski V, Kumar R, Vassilvitskii S (2009) Nearest-neighbor caching for content-match applications. In: Proceedings of the 18th international conference on world wide web. ACM, pp 441–450

  37. Shao YH, Zhang CH, Wang XB, Deng NY (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–968

    Article  Google Scholar 

  38. Shi Q, Petterson J, Dror G, Langford J, Smola A, Vishwanathan S (2009) Hash kernels for structured data. J Mach Learn Res 10(Nov):2615–2637

    MathSciNet  MATH  Google Scholar 

  39. Tian YJ, Ju XC, Qi ZQ, Shi Y (2013) Improved twin support vector machine. Sci China Math 57(2):417–432

    Article  MathSciNet  MATH  Google Scholar 

  40. Tian YJ, Qi ZQ, Ju XC, Shi Y, Liu XH (2013) Nonparallel support vector machines for pattern classification. IEEE Trans Cybern 44(7):1067–1079

    Article  Google Scholar 

  41. Urvoy T, Chauveau E, Filoche P, Lavergne T (2008) Tracking web spam with html style similarities. ACM Trans Web (TWEB) 2(1):3

    Google Scholar 

  42. Vapnik V (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  43. Xue H, Chen S, Yang Q (2011) Structural regularized support vector machine: a framework for structural large margin classier. IEEE Trans Neural Netw 22(4):573–587

    Article  Google Scholar 

  44. Yeung DS, Wang DF, Ng WY, Tsang CC, Wang XZ (2007) Structured large margin machines: sensitive to data distributions. Mach Learn 68(2):171–200. doi:10.1007/s10994-007-5015-9

    Article  Google Scholar 

  45. Yu FX, Kumar S, Gong Y, Chang SF (2014) Circulant binary embedding. In: International conference on machine learning, vol 6, p 7

  46. Yu H, Yang J, Han J, Li X (2005) Making svms scalable to large data sets using hierarchical cluster indexing. Data Min Knowl Discov 11(3):295–321

    Article  MathSciNet  Google Scholar 

  47. Yu HF, Hsieh CJ, Chang KW, Lin CJ (2012) Large linear classification when data cannot fit in memory. ACM Trans Knowl Discov Data (TKDD) 5(4):23

    Google Scholar 

  48. Zhang H, Cao X, Ho JK, Chow TW (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inf 13(2):520–531

    Article  Google Scholar 

  49. Zhang H, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inf 13(2):616–624

    Article  Google Scholar 

  50. Zhang K, Tsang IW, Kwok JT (2008) Improved nyström low-rank approximation and error analysis. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1232–1239

Download references

Acknowledgements

This work has been partially supported by grants from National Natural Science Foundation of China (Nos. 61472390, 71731009, 71331005, 91546201) and the Beijing Natural Science Foundation (No. 1162005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingjie Tian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, D., Tian, Y. Large-scale structural learning and predicting via hashing approximation. Neural Comput & Applic 31, 2889–2903 (2019). https://doi.org/10.1007/s00521-017-3238-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-017-3238-7

Keywords

Navigation