Abstract
Support vector data description (SVDD) has been widely used in novelty detection applications. Since the decision function of SVDD is expressed through the support vectors which contain sensitive information, the support vectors will be disclosed when SVDD is used to detect the unknown samples. Accordingly, privacy concerns arise. In addition, when it is applied to large datasets, SVDD does not scale well as its complexity is linear with the size of the training dataset (actually the number of support vectors). Our work here is distinguished in two aspects. First, by decomposing the kernel mapping space into three subspaces and exploring the pre-image of the center of SVDD’s sphere in the original space, a fast decision approach of SVDD, called FDA-SVDD, is derived, which includes three implementation versions, called FDA-SVDD-I, FDA-SVDD-II and FDA-SVDD-III. The decision complexity of the proposed method is reduced to only \(O\)(1). Second, as the decision function of FDA-SVDD only refers to the pre-image of the sphere center, the privacy of support vectors can be preserved. Therefore, the proposed FDA-SVDD is particularly attractive in privacy-preserving novelty detection applications. Empirical analysis conducted on UCI and USPS datasets demonstrates the effectiveness of the proposed approach and verifies the derived theoretical results.
Similar content being viewed by others
Notes
USPS dataset can be downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.
References
BakIr G, Zien A, Tsuda K (2004) Learning to find graph pre-images. in Proc. of the 26th DAGM Symposium on Pattern Recognition, pp 253–261
Chung FL, Deng ZH, Wang ST (2009) From minimum enclosing ball to fast fuzzy inference system training on large datasets. IEEE Trans Fuzzy Systems 17(1):173–184
Collobert R, Bengio S, Bengio Y (2002) A parallel mixture of SVMs for very large scale problems. Neural Comput 14(5):1105–1114
Cortes C, Vapnik VN (1995) Support vector networks. Mach Learn 20(3):273–297
Deng ZH, Chung FL, Wang ST (2008) FRSDE: fast reduced set density estimator using minimal enclosing ball approximation. Pattern Recogn 41:1363–1372
Frank A, Asuncion A (2010) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml.Irvine
Friedman A, Wolff R, Schuster A (2008) Providing k-anonymity in data mining. VLDB J 17(4):789–804
Geebelen D, Suykens JAK, Vandewalle J (2010) Reducing the number of support vectors of SVM classifiers using the smoothed separable case approximation. IEEE Trans Neural Netw Learn Systems 23(4):682–688
Ha MH, Wang C, Chen JQ (2013) The support vector machine based on intuitionistic fuzzy number and kernel function. Soft Comput 17(4):635–641
Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
Jeffreys H, Jeffreys BS (1988) Mean-value theorems. Methods of Mathematical Physics, ed. 3. Cambridge University Press, Cambridge, England, pp 49–50
Kwok JT, Tsang IW (2004) The pre-image problem in kernel methods. IEEE Trans Neural Netw 15(6):1517–1525
Lin KP, Chen MS (2011) On the design and analysis of the privacy-preserving SVM classifier. IEEE Trans Knowl Data Eng 23(11):1704–1717
Liu YH, Liu YC, Chen YJ (2010) Fast support vector data descriptions for novelty detection. IEEE Trans Neural Netw Learn Systems 21(8):1296–1313
Mozafari B, Zaniolo C (2009) Publishing naive bayesian classifiers: privacy without accuracy loss. In: Proceedings of the 35th International Conference on Very Large Data Bases (VLDB) 2(1): 1174–1185
Ogiela MR, Ogiela U (2012) DNA-like linguistic secret sharing for strategic information systems. Int J Inf Manag 32(2):175–181
Osuna E, Girosi F (1999) Reducing the run-time complexity of support vector machines. in Advances in Kernel Methods: Support Vector Learning, Schölkopf B, Burges CJC, Smola A, Eds. Cambridge 271–283
Roberts S, Tarassenko L (1994) A probabilistic resource allocation network for novelty detection. Neural Comput 6:270–284
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Stokes K, Torra V (2012) Reidentification and k-anonymity: a model for disclosure risk in graphs. Soft Comput 16(10):1657–1670
Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Systems 10(5):557–570
Tang B, Mazzoni D (2006) Multiclass reduced-set support vector machines. in Proc. 23rd Int. Conf. Mach. Learning 921–928
Tenenbaum JB, Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Tex DMJ et al (2004) Support vector data description. Mach Learn 54(1):45–66
Towel GG (2000) Local expert autoassociators for anomaly detection. Proc. 17th ICML 1023–1030
Tsang IW, Kwok JT et al (2005) Core vector machines: fast SVM training on very large data sets. J Mach Learn Res 6:363–392
Tsang IW, Kwok JT, Zurada JM (2006) Generalized core vector machines. IEEE Trans Neural Netw 17(5):1126–1140
Vaidya J, Yu H, Jiang X (2008) Privacy-preserving SVM classification. Knowl Inf Systems 14:161–178
Wang C, Liu LZ, Gao LJ (2013) Research on k-Anonymity algorithm in privacy protection. Adv Mater Res 756–759:3471–3475
Wu MR, Ye JP (2009) A small sphere and large margin approach for novelty detection using training data with outliers. IEEE Trans Pattern Anal Mach Intell 31:2088–2092
Acknowledgments
This work was supported in part by the Hong Kong Polytechnic University under Grant G-UA68, by the National Natural Science Foundation of China under Grants 61170122, 61170029, 61272210, 61202311, 61370173, by the Natural Science Foundation of Jiangsu Province under Grants BK2011003, BK2011417, by Jiangsu 333 expert engineering Grant BRA2011142 and by 2011, 2012 Postgraduate Student’s Creative Research Fund of Jiangsu Province, the Natural Science Foundation of Zhejiang Province under Grants LY13F020011, LY14F010010, LY14F020009, and R1090244, and Independent Design Project of Zhejiang Province Key Technological Innovation Team under Grant 2011R09014-05.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by V. Loia.
Appendix
Appendix
Proof of Theorem 4:
According to Eq. (16) and Eq. (19), we have
Then,
Substitute Eq. (24) to Eq. (25), i.e.,
Clearly, Theorem 4 holds.
Rights and permissions
About this article
Cite this article
Hu, W., Wang, S., Chung, Fl. et al. Privacy preserving and fast decision for novelty detection using support vector data description. Soft Comput 19, 1171–1186 (2015). https://doi.org/10.1007/s00500-014-1331-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-014-1331-8