Abstract
This paper addresses an important problem of training set selection for support vector machines (SVMs). It is a critical step in case of large and noisy data sets due to high time and memory complexity of the SVM training. There have been several methods proposed so far, in majority underpinned with the analysis of data geometry either in the input or kernel space. Here, we propose a new dynamically adaptive genetic algorithm (DAGA) to select valuable training sets. We demonstrate that not only can DAGA quickly select the training data, but in addition it dynamically determines the desired training set size without any prior information. We analyze the impact of the support vectors ratio, defined as the percentage of support vectors in the training set, on the DAGA performance. Also, we investigate and discuss the possibility of incorporating reduced SVMs into the proposed algorithm. Extensive experimental study shows that DAGA offers fast and effective training set optimization that is independent on the entire training set size.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cortes, C., Vapnik, V.: Support-Vector Networks. Mach. Learn. 20(3), 273–297 (1995)
Kawulok, M., Nalepa, J.: Support vector machines training data selection using a genetic algorithm. In: Gimel’farb, G., Hancock, E., Imiya, A., Kuijper, A., Kudo, M., Omachi, S., Windeatt, T., Yamada, K. (eds.) SSPR & SPR 2012. LNCS, vol. 7626, pp. 557–565. Springer, Heidelberg (2012)
Nalepa, J., Kawulok, M.: Adaptive genetic algorithm to select training set for support vector machines. In: EvoIASP, EvoApp. LNCS. Springer (in press, 2014)
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods, pp. 169–184. MIT Press (1999)
Rodriguez-Lujan, I., Cruz, C.S., Huerta, R.: Hierarchical linear support vector machine. Patt. Recogn. 45(12), 4414–4427 (2012)
Le, Q., Sarlos, T., Smola, A.: Fastfood - approximating kernel expansions in loglinear time. In: Proc. ICML (2013)
Balcázar, J., Dai, Y., Watanabe, O.: A Random Sampling Technique for Training Support Vector Machines. In: Abe, N., Khardon, R., Zeugmann, T. (eds.) ALT 2001. LNCS (LNAI), vol. 2225, pp. 119–134. Springer, Heidelberg (2001)
Ferragut, E., Laska, J.: Randomized sampling for large data applications of SVM. In: Int. Conf. on Mach. Learning and App., vol. 1, pp. 350–355 (2012)
Lee, Y.J., Huang, S.Y.: Reduced support vector machines: A statistical theory. IEEE Trans. on Neural Networks 18(1), 1–13 (2007)
Chang, C.C., Pao, H.K., Lee, Y.J.: An RSVM based two-teachers-one-student semi-supervised learning algorithm. Neural Networks 25, 57–69 (2012)
Chien, L.J., Chang, C.C., Lee, Y.J.: Variant methods of reduced set selection for reduced support vector machines. J. Inf. Sci. Eng. 26(1), 183–196 (2010)
Koggalage, R., Halgamuge, S.: Reducing the number of training samples for fast support vector machine classification. Neural Information Process. Lett. and Reviews 2(3), 57–65 (2004)
Shin, H., Cho, S.: Neighborhood property-based pattern selection for SVMs. Neural Comput. 19(3), 816–855 (2007)
Abe, S., Inoue, T.: Fast Training of Support Vector Machines by Extracting Boundary Data. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) ICANN 2001. LNCS, vol. 2130, pp. 308–313. Springer, Heidelberg (2001)
Wang, D., Shi, L.: Selecting valuable training samples for SVMs via data structure analysis. Neurocomputing 71, 2772–2781 (2008)
Salvador, S., Chan, P.: Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. In: Proc. IEEE ICTAI, pp. 576–584 (2004)
Wang, J., Neskovic, P., Cooper, L.N.: Training data selection for SVMs. In: Adv. in Natural Comp., pp. 554–564. Springer (2005)
Lopez-Chau, A., Li, X., Yu, W.: Convex-concave hull for classification with SVM. In: Proc. IEEE ICDMW, pp. 431–438 (2012)
Zhang, W., King, I.: Locating support vectors via \(\beta \)-skeleton technique. In: Int. Conf. on Neural Inf. Process., pp. 1423–1427 (2002)
Tsang, I.W., Kwok, J.T., Cheung, P.M.: Core vector machines: Fast SVM training on very large data sets. J. of Machine Learning Research 6, 363–392 (2005)
Zeng, Z.Q., Xu, H.R., Xie, Y.Q., Gao, J.: A geometric approach to train SVM on very large data sets. Intell. Sys. and Knowl. Eng. 1, 991–996 (2008)
Musicant, D.R., Feinberg, A.: Active set support vector regression. IEEE Trans. on Neural Networks 15(2), 268–275 (2004)
Schohn, G., Cohn, D.: Less is more: Active learning with support vector machines. In: Int. Conf. on Mach. Learning, pp. 839–846 (2000)
Nalepa, J., Kawulok, M.: A memetic algorithm to select training data for support vector machines. In: Proc. of the 2014 Conf. on Genetic and Evolutionary Computation, GECCO 2014, pp. 573–580. ACM (2014)
Nalepa, J., Czech, Z.J.: New Selection Schemes in a Memetic Algorithm for the Vehicle Routing Problem with Time Windows. In: Tomassini, M., Antonioni, A., Daolio, F., Buesser, P. (eds.) ICANNGA 2013. LNCS, vol. 7824, pp. 396–405. Springer, Heidelberg (2013)
Elamin, E.E.A.: A proposed genetic algorithm selection method. In: 1st National Symposium (NITS) (2006)
Lee, J.S., Kuo, Y.M., Chung, P.C., Chen, E.L.: Naked image detection based on adaptive and extensible skin color model. Pattern Recognit. 40, 2261–2270 (2007)
Phung, S.L., Chai, D., Bouzerdoum, A.: Adaptive skin segmentation in color images. In: Proc. IEEE ICASSP, pp. 353–356 (2003)
Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification (2003)
Lin, K.M., Lin, C.J.: A study on reduced support vector machines. IEEE Trans. on Neural Networks 14(6), 1449–1459 (2003)
Simiński, K.: Transformation of Input Domain for SVM in Regression Task. In: Gruca, A., Czachórski, T., Kozielski, S. (eds.) Man-Machine Interactions 3. AISC, vol. 242, pp. 423–430. Springer, Heidelberg (2014)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kawulok, M., Nalepa, J. (2014). Dynamically Adaptive Genetic Algorithm to Select Training Data for SVMs. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-12027-0_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12026-3
Online ISBN: 978-3-319-12027-0
eBook Packages: Computer ScienceComputer Science (R0)