Abstract
A support vector machine is a means for computing a binary classifier from a set of observations. Here we assume that the observations are n feature vectors each of length m together with n binary labels, one at each observed feature vector. The feature vectors can be combined into a \(n\times m\) feature matrix. The classifier is computed via an optimization problem that depends on the feature matrix. The solution of this optimization problem is a vector of dimension m from which a classifier with good generalization properties can be computed directly. Here we show that the feature matrix can be replaced by a compressed feature matrix that comprises n feature vectors of length \(\ell <m\). The solution of the optimization problem for the compressed feature matrix has only dimension \(\ell \) and can computed faster since the optimization problem is smaller. Still, the solution to the compressed problem needs to be related to the original solution. We present a simple scheme that reconstructs the original solution from a solution of the compressed problem up to a small error. For the reconstruction guarantees we assume that the solution of the original problem is sparse. We show that sparse solutions can be promoted by a feature selection approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Candès, E.J., Tao, T.: Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006)
Clarkson, K.L., Woodruff, D.P.: Low rank approximation and regression in input sparsity time. In: Symposium on Theory of Computing Conference (STOC), pp. 81–90 (2013)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Crammer, K., Singer, Y.: On the learnability and design of output codes for multiclass problems. In: Computational Learning Theory (COLT), pp. 35–46 (2000)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Fornasier, M., Rauhut, H.: Compressive Sensing, Chap. 2. Springer, New York (2011)
Joachims, T.: Training linear SVMs in linear time. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 217–226 (2006)
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contem. Math. 26, 189–206 (1984)
Kimeldorf, G.S., Wahba, G.: A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Stat. 41, 495–502 (1970)
Lanckriet, G.R.G., Cristianini, N., Bartlett, P.L., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semi-definite programming. In: Proceedings of the Nineteenth International Conference (ICML), pp. 323–330 (2002)
Lin, C.-J., Weng, R.C., Keerthi, S.S.: Trust region newton method for logistic regression. J. Mach. Learn. Res. 9, 627–650 (2008)
Meng, X., Mahoney, M.W.: Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In: Symposium on Theory of Computing Conference (STOC), pp. 91–100 (2013)
Nelson, J., Nguyen, H.L.: OSNAP: faster numerical linear algebra algorithms via sparser subspace embeddings. In: Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 117–126 (2013)
Paul, S., Boutsidis, C., Magdon-Ismail, M., Drineas, P.: Random projections for support vector machines. In: International Conference on Artificial Intelligence (AISTATS) (2013)
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Vedaldi, A., Zisserman, A.: Sparse kernel approximations for efficient classification and detection. In: IEEE Conference on Computer Vision and Pattern Recognition (ICCV) (2012)
Vempala, S.: The Random Projection Method. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science Series. American Mathematical Society, Providence (2004)
Yu, H.-F., Hsieh, C.-J., Chang, K.-W., Lin, C.-J.: Large linear classification when data cannot fit in memory. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 833–842 (2010)
Zhang, L., Mahdavi, M., Jin, R., Yang, T.: Recovering optimal solution by dual random projection. In: Conference on Learning Theory (COLT) (2013)
Acknowledgments
This work has been carried out within the project CG Learning. The project CG Learning acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 255827.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Giesen, J., Laue, S., Mueller, J.K. (2016). Reconstructing a Sparse Solution from a Compressed Support Vector Machine. In: Kotsireas, I., Rump, S., Yap, C. (eds) Mathematical Aspects of Computer and Information Sciences. MACIS 2015. Lecture Notes in Computer Science(), vol 9582. Springer, Cham. https://doi.org/10.1007/978-3-319-32859-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-32859-1_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32858-4
Online ISBN: 978-3-319-32859-1
eBook Packages: Computer ScienceComputer Science (R0)