Clustering Aided Support Vector Machines

Ristanoski, Goce; Soni, Rahul; Rajasegarar, Sutharshan; Bailey, James; Leckie, Christopher

doi:10.1007/978-3-319-62416-7_23

Clustering Aided Support Vector Machines

Goce Ristanoski¹⁴,
Rahul Soni¹⁴,
Sutharshan Rajasegarar¹⁵,
James Bailey¹⁶ &
…
Christopher Leckie^14,16

Conference paper
First Online: 02 July 2017

3849 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10358))

Abstract

Support Vector Machines (SVMs) have proven to be an effective approach to learning a classifier from complex datasets. However, highly nonhomogeneous data distributions can pose a challenge for SVMs when the underlying dataset comprises clusters of instances with varying mixtures of class labels. To address this challenge we propose a novel approach, called a cluster-supported Support Vector Machine, in which information derived from clustering can be incorporated directly into the SVM learning process. We provide a theoretical derivation to show that when the total empirical loss is expressed in terms of the combined quadratic empirical loss from each cluster, we can still find a formulation of the optimisation problem that is a convex quadratic programming problem. We discuss the scenarios where this type of model would be beneficial, and present empirical evidence that demonstrates the improved accuracy of our combined model.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Muller, K.R., Smola, A.J., Ratsch, G., Scholkopf, B., Kohlmorgen, J., Vapnik, V.: Using Support Vector Machines for Time Series Prediction Advances in Kernel Methods-Support Vector Learning. MIT Press, Cambridge (1999)
Google Scholar
Schölkopf, B., Simard, P., Vapnik, V., Smola, A.J.: Improving the accuracy and speed of Support Vector Machines. Adv. Neural Inf. Proc. Syst. 9, 375–381 (1997)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Hsieh, C.J., Chang, K.W., Lin, C.J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine learning, pp. 408–415. ACM (2008)
Google Scholar
Ogawa, K., Suzuki, Y., Takeuchi, I.: Safe screening of non-support vectors in pathwise SVM computation. In: Proceedings of the 26th Annual International Conference on Machine Learning, vol. 3, pp. 1382–1390 (2013)
Google Scholar
Narasimhan, H., Agarwal, S.: A structural SVM based approach for optimizing partial AUC. In: Proceedings of the 26th Annual International Conference on Machine Learning, vol. 1, pp. 516–524 (2013)
Google Scholar
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)
MathSciNet MATH Google Scholar
Zhao, Z., Liu, J., Cox, J.: Safe and efficient screening for sparse support vector machine. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 542–551. ACM, (2014)
Google Scholar
Matsushima, S., Vishwanathan, S.V.N., Smola, A.J.: Linear support vector machines via dual cached loops. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 177–185. ACM, (2012)
Google Scholar
Caetano, T.S., McAuley, J.J., Cheng, L., Le, Q.V., Smola, A.J.: Learning graph matching. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 1048–1058 (2009)
Article Google Scholar
Tao, Q., Chu, D., Wang, J.: Recursive Support Vector Machines for dimensionality reduction. IEEE Trans. Neural Networks 19(1), 189–193 (2008)
Article Google Scholar
Hu, M., Chen, Y., Kwok, J.T.Y.: Building sparse multiple-kernel SVM classifiers. IEEE Trans. Neural Networks 20(5), 827–839 (2009)
Article Google Scholar
Meesrikamolkul, W., Niennattrakul, V., Ratanamahatana, C.A.: Shape-based clustering for time series data. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012. LNCS, vol. 7301, pp. 530–541. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30217-6_44
Chapter Google Scholar
Luong, B.T., Ruggieri, S., Turini, F.: k-NN as an implementation of situation testing for discrimination discovery and prevention. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 502–510. ACM (2011)
Google Scholar
Chan, J., Leckie, C., Bailey, J., Ramamohanarao, K.: TRIBAC: Discovering interpretable clusters and latent structures in graphs. In: 2014 IEEE International Conference on Data Mining, pp. 737–742. IEEE (2014)
Google Scholar
Yang, W., SG, E., Xu, H.: A divide and conquer framework for distributed graph clustering. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 504–513 (2015)
Google Scholar
Yi, J., Zhang, L., Wang, J., Jin, R., Jain, A.K.: A Single-pass algorithm for efficiently recovering sparse cluster centers of high-dimensional Data. In: Proceedings of the 31st International Conference on Machine Learning, pp. 658–666 (2014)
Google Scholar
Romano, S., Bailey, J., Nguyen, X.V., Verspoor, K.: Standardized mutual information for clustering comparisons: one step further in adjustment for chance. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1143–1151 (2014)
Google Scholar
Džeroski, S., Gjorgjioski, V., Slavkov, I., Struyf, J.: Analysis of time series data with predictive clustering trees. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 63–80. Springer, Heidelberg (2007). doi:10.1007/978-3-540-75549-4_5
Chapter Google Scholar
Yin, J., Wang, J.: A dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242. ACM, (2014)
Google Scholar
Liu, W., Chawla, S.: A Quadratic mean based supervised learning model for managing data skewness. In: Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 188–198 (2011)
Google Scholar
Calders, T., Verwer, S.: Three naive Bayes approaches for discrimination-free classification. Data Min. Knowl. Disc. 21(2), 277–292 (2010)
Article MathSciNet Google Scholar
Kamishima, T., Akaho, S., Asoh, H., Sakuma, J.: Fairness-aware classifier with prejudice remover regularizer. In: Flach, P.A., Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS, vol. 7524, pp. 35–50. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33486-3_3
Chapter Google Scholar
Yu, H., Yang, J., Han, J.: Classifying large data sets using SVMs with hierarchical clusters. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 306–315. ACM, (2003)
Google Scholar
Li, D.C., Fang, Y.H.: An algorithm to cluster data for efficient classification of support vector machines. Expert Syst. Appl. 34(3), 2013–2018 (2008)
Article Google Scholar
De Almeida, M.B., de Pádua Braga, A., Braga, J.P.: speeding SVMs learning with a priori cluster selection and k-means. In: Proceedings of the 6th Brazilian Symposium on Neural Networks, pp. 162–167. IEEE (2000)
Google Scholar
Shin, H., Cho, S.: Neighborhood property-ased pattern selection for support vector machines. Neural Comput. 19(3), 816–855 (2007)
Article MATH Google Scholar
Wang, W., Xu, Z.: A heuristic training for support vector regression. Neurocomput. 61, 259–275 (2004)
Article Google Scholar
Guo, G., Zhang, J.S.: Reducing examples to accelerate support vector regression. Pattern Recogn. Lett. 28(16), 2173–2183 (2007)
Article Google Scholar
García-Pedrajas, N.: Constructing ensembles of classifiers by means of weighted instance selection. IEEE Trans. Neural Networks 20(2), 258–277 (2009)
Article Google Scholar
Zhang, T., Zhou, Z.H.: Large margin distribution machine. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 313–322. ACM, (2014)
Google Scholar
Gu, Q., Han, J.: Clustered Support Vector Machines. In: AISTATS, pp. 307–315 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Data61, CSIRO, Melbourne, Australia
Goce Ristanoski, Rahul Soni & Christopher Leckie
School of Information Technology, Deakin University, Waurn Ponds, Australia
Sutharshan Rajasegarar
Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
James Bailey & Christopher Leckie

Authors

Goce Ristanoski
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Soni
View author publications
You can also search for this author in PubMed Google Scholar
Sutharshan Rajasegarar
View author publications
You can also search for this author in PubMed Google Scholar
James Bailey
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Leckie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Rahul Soni , Sutharshan Rajasegarar , James Bailey or Christopher Leckie .

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Leipzig, Sachsen, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ristanoski, G., Soni, R., Rajasegarar, S., Bailey, J., Leckie, C. (2017). Clustering Aided Support Vector Machines. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-62416-7_23
Published: 02 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62415-0
Online ISBN: 978-3-319-62416-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics