Abstract
By combining the structural information with nonparallel support vector machine, structural nonparallel support vector machine (SNPSVM) can fully exploit prior knowledge to directly improve the algorithm’s generalization capacity. However, the scalability issue how to train SNPSVM efficiently on data with huge dimensions has not been studied. In this paper, we integrate linear SNPSVM with b-bit minwise hashing scheme to speedup the training phase for large-scale and high-dimensional statistical learning, and then we address the problem of speeding-up its prediction phase via locality-sensitive hashing. For one-against-one multi-class classification problems, a two-stage strategy is put forward: a series of hash-based classifiers are built in order to approximate the exact results and filter the hypothesis space in the first stage and then the classification can be refined by solving a multi-class SNPSVM on the remaining classes in the second stage. The proposed method can deal with large-scale classification problems with a huge number of features. Experimental results on two large-scale datasets (i.e., news20 and webspam) demonstrate the efficiency of structural learning via b-bit minwise hashing. Experimental results on the ImageNet-BOF dataset, and several large-scale UCI datasets show that the proposed hash-based prediction can be more than two orders of magnitude faster than the exact classifier with minor losses in quality.
Similar content being viewed by others
References
Adankon M, Cheriet M (2009) Model selection for the LS-SVM application to handwriting recognition. Pattern Recognit 42(12):3264–3270
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(Nov):2399–2434
Boley D, Cao D (2004) Training support vector machine using adaptive clustering. In: Proceedings of the 2004 SIAM international conference on data mining. SIAM, pp 126–137
Bordes A, Ertekin S, Weston J, Bottou L (2005) Fast kernel classifiers with online and active learning. J Mach Learn Res 6(Sep):1579–1619
Broder AZ (1997) On the resemblance and containment of documents. In: Compression and complexity of sequences 1997. Proceedings. IEEE, pp 21–29
Cakir F, Sclaroff S (2015) Adaptive hashing for fast similarity search. In: Proceedings of the IEEE international conference on computer vision, pp 1044–1052
Chakrabarti A, Satuluri V, Srivathsan A, Parthasarathy S (2015) A bayesian perspective on locality sensitive hashing with extensions for kernel methods. ACM Trans Knowl Discov Data (TKDD) 10(2):19
Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: Proceedings of the thirty-fourth annual ACM symposium on theory of computing. ACM, pp 380–388
Chen D, Tian Y, Liu X (2016) Structural nonparallel support vector machine for pattern recognition. Pattern Recognit 60:296–305
Chi L, Zhu X (2017) Hashing techniques: A survey and taxonomy. ACM Comput Surv (CSUR) 50(1):11
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on Computational geometry. ACM, pp 253–262
Deng N, Tian Y, Zhang CH (2013) Support vector machines: optimization based theory, algorithms and extensions. CRC Press, Boca Raton
Eshghi K, Kafai M (2016) Support vector machines with sparse binary high-dimensional feature vectors, HPE-2016-30
Fetterly D, Manasse M, Najork M, Wiener J (2003) A large-scale study of the evolution of web pages. In: Proceedings of the 12th international conference on World Wide Web. ACM, pp 669–678
Gollapudi S, Sharma A (2009) An axiomatic approach for result diversification. In: Proceedings of the 18th international conference on World Wide Web. ACM, pp 381–390
Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929
Hoi SC, Jin R, Zhu J, Lyu MR (2009) Semisupervised SVM batch mode active learning with applications to image retrieval. ACM Trans Inf Syst (TOIS) 27(3):16
Hsieh CJ, Si S, Dhillon IS (2014) A divide-and-conquer solver for kernel support vector machines. In: ICML, pp 566–574
Huang LK, Yang Q, Zheng WS (2017) Online hashing. arXiv preprint arXiv:1704.01897
Huang YP, Chang TW, Sandnes FE (2006) An efficient fuzzy hashing model for image retrieval. In: Fuzzy information processing society. NAFIPS 2006. Annual meeting of the North American. IEEE, pp 223–228
Ito K, Ishio T, Inoue K (2017) Web-service for finding cloned files using b-bit minwise hashing. In: Software Clones (IWSC), 2017 IEEE 11th international workshop on. IEEE, pp 1–2
Jayadeva RK, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910. doi:10.1109/TPAMI.2007.1068
Kevenaar TA, Schrijen GJ, van der Veen M, Akkermans AH, Zuo F (2005) Face recognition with renewable and privacy preserving binary templates. In: Automatic identification advanced technologies. Fourth IEEE workshop on. IEEE, pp 21–26
Langford J, Li L, Zhang T (2009) Sparse online learning via truncated gradient. J Mach Learn Res 10(Mar):777–801
Li P, König A, Gui W (2010) b-bit minwise hashing for estimating three-way similarities. In: Advances in neural information processing systems, pp 1387–1395
Li P, König C (2010) b-bit minwise hashing. In: Proceedings of the 19th international conference on World Wide Web. ACM, pp 671–680
Li P, Shrivastava A, Moore JL, König AC (2011) Hashing algorithms for large-scale learning. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ (eds) Advances in neural information processing systems, vol. 24. Curran Associates, Inc., pp 2672–2680. http://papers.nips.cc/paper/4403-hashing-algorithms-for-large-scale-learning.pdf
Litayem S, Joly A, Boujemaa N (2012) Hash-based support vector machines approximation for large scale prediction. In: BMVC, pp 1–11
Liu W, Wang J, Kumar S, Chang SF (2011) Hashing with graphs. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 1–8
Liu X, Mu Y, Lang B, Chang SF (2012) Compact hashing for mixed image-keyword query over multi-label images. In: Proceedings of the 2nd ACM international conference on multimedia retrieval. ACM, p 18
Mu Y, Yan S (2010) Non-metric locality-sensitive hashing. In: AAAI, pp 539–544
Najork M, Gollapudi S, Panigrahy R (2009) Less is more: sampling the neighborhood graph makes salsa better and faster. In: Proceedings of the second ACM international conference on web search and data mining. ACM, pp 242–251
Noble W (2004) Support vector machine applications in computational biology. MIT Press, Cambridge
Oyedotun OK, Khashman A (2016) Deep learning in vision-based static hand gesture recognition. Neural Comput Appl 28(12):3941–3951
Pandey S, Broder A, Chierichetti F, Josifovski V, Kumar R, Vassilvitskii S (2009) Nearest-neighbor caching for content-match applications. In: Proceedings of the 18th international conference on world wide web. ACM, pp 441–450
Shao YH, Zhang CH, Wang XB, Deng NY (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–968
Shi Q, Petterson J, Dror G, Langford J, Smola A, Vishwanathan S (2009) Hash kernels for structured data. J Mach Learn Res 10(Nov):2615–2637
Tian YJ, Ju XC, Qi ZQ, Shi Y (2013) Improved twin support vector machine. Sci China Math 57(2):417–432
Tian YJ, Qi ZQ, Ju XC, Shi Y, Liu XH (2013) Nonparallel support vector machines for pattern classification. IEEE Trans Cybern 44(7):1067–1079
Urvoy T, Chauveau E, Filoche P, Lavergne T (2008) Tracking web spam with html style similarities. ACM Trans Web (TWEB) 2(1):3
Vapnik V (1998) Statistical learning theory. Wiley, New York
Xue H, Chen S, Yang Q (2011) Structural regularized support vector machine: a framework for structural large margin classier. IEEE Trans Neural Netw 22(4):573–587
Yeung DS, Wang DF, Ng WY, Tsang CC, Wang XZ (2007) Structured large margin machines: sensitive to data distributions. Mach Learn 68(2):171–200. doi:10.1007/s10994-007-5015-9
Yu FX, Kumar S, Gong Y, Chang SF (2014) Circulant binary embedding. In: International conference on machine learning, vol 6, p 7
Yu H, Yang J, Han J, Li X (2005) Making svms scalable to large data sets using hierarchical cluster indexing. Data Min Knowl Discov 11(3):295–321
Yu HF, Hsieh CJ, Chang KW, Lin CJ (2012) Large linear classification when data cannot fit in memory. ACM Trans Knowl Discov Data (TKDD) 5(4):23
Zhang H, Cao X, Ho JK, Chow TW (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inf 13(2):520–531
Zhang H, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inf 13(2):616–624
Zhang K, Tsang IW, Kwok JT (2008) Improved nyström low-rank approximation and error analysis. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1232–1239
Acknowledgements
This work has been partially supported by grants from National Natural Science Foundation of China (Nos. 61472390, 71731009, 71331005, 91546201) and the Beijing Natural Science Foundation (No. 1162005).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Chen, D., Tian, Y. Large-scale structural learning and predicting via hashing approximation. Neural Comput & Applic 31, 2889–2903 (2019). https://doi.org/10.1007/s00521-017-3238-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-017-3238-7