Parallel Algorithm of Local Support Vector Regression for Large Datasets

Bui, Le-Diem; Tran-Nguyen, Minh-Thu; Kim, Yong-Gi; Do, Thanh-Nghi

doi:10.1007/978-3-319-70004-5_10

Parallel Algorithm of Local Support Vector Regression for Large Datasets

Le-Diem Bui^19,21,
Minh-Thu Tran-Nguyen¹⁹,
Yong-Gi Kim²¹ &
…
Thanh-Nghi Do^19,20

Conference paper
First Online: 01 November 2017

1906 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10646))

Abstract

We propose the new parallel algorithm of local support vector regression (local SVR), called kSVR for effectively dealing with large datasets. The learning strategy of kSVR performs the regression task with two main steps. The first one is to partition the training data into k clusters, followed which the second one is to learn the SVR model from each cluster to predict the data locally in the parallel way on multi-core computers. The kSVR algorithm is faster than the standard SVR for the non-linear regression of large datasets while maintaining the high correctness in the prediction. The numerical test results on datasets from UCI repository showed that our proposed kSVR is efficient compared to the standard SVR.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
It must be noted that the complexity of the kSVR approach does not include the kmeans clustering used to partition the full dataset. But this step requires insignificant time compared with the quadratic programming solution.

References

Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995). doi:10.1007/978-1-4757-3264-1
Book MATH Google Scholar
Guyon, I.: Web page on svm applications (1999). http://www.clopinet.com/isabelle/Projects/-SVM/app-list.html
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, vol. 1, pp. 281–297. University of California Press, January 1967
Google Scholar
Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods. Cambridge University Press, New York (2000)
Book MATH Google Scholar
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 185–208 (1999)
Google Scholar
OpenMP Architecture Review Board: OpenMP application program interface version 3.0 (2008)
Google Scholar
Vapnik, V.: Principles of risk minimization for learning theory. In: Advances in Neural Information Processing Systems 4, [NIPS Conference, Denver, 2–5 December 1991], pp. 831–838 (1991)
Google Scholar
Bottou, L., Vapnik, V.: Local learning algorithms. Neural Comput. 4(6), 888–900 (1992)
Article Google Scholar
Vapnik, V., Bottou, L.: Local algorithms for pattern recognition and dependencies estimation. Neural Comput. 5(6), 893–909 (1993)
Article Google Scholar
Do, T.-N., Poulet, F.: Parallel learning of local SVM algorithms for classifying large datasets. In: Hameurlain, A., Küng, J., Wagner, R., Dang, T.K., Thoai, N. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXI. LNCS, vol. 10140, pp. 67–93. Springer, Heidelberg (2017). doi:10.1007/978-3-662-54173-9_4
Chapter Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM : a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(27), 1–27 (2011)
Article Google Scholar
Lin, C.: A practical guide to support vector classification (2003)
Google Scholar
Boser, B., Guyon, I., Vapnik, V.: An training algorithm for optimal margin classifiers. In: Proceedings of 5th ACM Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)
Google Scholar
Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: Principe, J., Gile, L., Morgan, N., Wilson, E. (eds.) Neural Networks for Signal Processing VII, pp. 276–285 (1997)
Google Scholar
Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the Twenty-Fourth International Conference Machine Learning, pp. 807–814. ACM (2007)
Google Scholar
Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, vol. 20, pp. 161–168. NIPS Foundation (2008). http://books.nips.cc
Do, T.N.: Parallel multiclass stochastic gradient descent algorithms for classifying million images with very-high-dimensional signatures into thousands classes. Vietnam J. Comput. Sci. 1(2), 107–115 (2014)
Article Google Scholar
Do, T.-N., Poulet, F.: Parallel multiclass logistic regression for classifying large scale image datasets. In: Le Thi, H.A., Nguyen, N.T., Do, T.V. (eds.) Advanced Computational Methods for Knowledge Engineering. AISC, vol. 358, pp. 255–266. Springer, Cham (2015). doi:10.1007/978-3-319-17996-4_23
Chapter Google Scholar
Do, T.-N., Tran-Nguyen, M.-T.: Incremental parallel support vector machines for classifying large-scale multi-class image datasets. In: Dang, T.K., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E. (eds.) FDSE 2016. LNCS, vol. 10018, pp. 20–39. Springer, Cham (2016). doi:10.1007/978-3-319-48057-2_2
Chapter Google Scholar
Yuan, G., Ho, C., Lin, C.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)
Article Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(4), 1871–1874 (2008)
MATH Google Scholar
Ho, C., Lin, C.: Large-scale linear support vector regression. J. Mach. Learn. Res. 13, 3323–3348 (2012)
MATH MathSciNet Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. HotCloud 2010, Berkeley, p. 10. USENIX Association (2010)
Google Scholar
Lin, C., Tsai, C., Lee, C., Lin, C.: Large-scale logistic regression and linear support vector machines using spark. In: IEEE International Conference on Big Data, Big Data 2014, Washington, DC, 27–30 October 2014, pp. 519–528 (2014)
Google Scholar
Zhuang, Y., Chin, W.-S., Juan, Y.-C., Lin, C.-J.: Distributed Newton methods for regularized logistic regression. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS, vol. 9078, pp. 690–703. Springer, Cham (2015). doi:10.1007/978-3-319-18032-8_54
Chapter Google Scholar
Chiang, W., Lee, M., Lin, C.: Parallel dual coordinate descent method for large-scale linear classification in multi-core environments. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13–17 August 2016, pp. 1485–1494 (2016)
Google Scholar
Tsai, C., Lin, C., Lin, C.: Incremental and decremental training for linear classification. In: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, New York, 24–27 August 2014, pp. 343–352 (2014)
Google Scholar
Huang, H., Lin, C.: Linear and kernel classification: when to use which? In: Proceedings of the SIAM International Conference on Data Mining (2016)
Google Scholar
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Collobert, R., Bengio, S., Bengio, Y.: A parallel mixture of SVMs for very large scale problems. Neural Comput. 14(5), 1105–1114 (2002)
Article MATH Google Scholar
Gu, Q., Han, J.: Clustered support vector machines. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2013, Scottsdale, 29 April–1 May 2013, vol. 31, pp. 307–315. JMLR Proceedings (2013)
Google Scholar
Do, T.-N.: Non-linear classification of massive datasets with a parallel algorithm of local support vector machines. In: Le Thi, H.A., Nguyen, N.T., Do, T.V. (eds.) Advanced Computational Methods for Knowledge Engineering. AISC, vol. 358, pp. 231–241. Springer, Cham (2015). doi:10.1007/978-3-319-17996-4_21
Chapter Google Scholar
Do, T.-N., Poulet, F.: Random local SVMs for classifying large datasets. In: Dang, T.K., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E. (eds.) FDSE 2015. LNCS, vol. 9446, pp. 3–15. Springer, Cham (2015). doi:10.1007/978-3-319-26135-5_1
Chapter Google Scholar
Chang, F., Guo, C.Y., Lin, X.R., Lu, C.J.: Tree decomposition for large-scale SVM problems. J. Mach. Learn. Res. 11, 2935–2972 (2010)
MATH MathSciNet Google Scholar
Chang, F., Liu, C.C.: Decision tree as an accelerator for support vector machines. In: Ding, X. (ed.) Advances in Character Recognition. InTech (2012)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.: Classification and Regression Trees. Wadsworth International, Belmont (1984)
MATH Google Scholar
Vincent, P., Bengio, Y.: K-local hyperplane and convex distance nearest neighbor algorithms. In: Advances in Neural Information Processing Systems, pp. 985–992. The MIT Press (2001)
Google Scholar
Zhang, H., Berg, A., Maire, M., Malik, J.: SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2126–2136 (2006)
Google Scholar
Yang, T., Kecman, V.: Adaptive local hyperplane classification. Neurocomputing 71(13–15), 3001–3004 (2008)
Article Google Scholar
Segata, N., Blanzieri, E.: Fast and scalable local kernel machines. J. Mach. Learn. Res. 11, 1883–1926 (2010)
MATH MathSciNet Google Scholar
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 97–104. ACM (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Technology, Can Tho University, Cantho, 92100, Vietnam
Le-Diem Bui, Minh-Thu Tran-Nguyen & Thanh-Nghi Do
UMI UMMISCO 209 (IRD/UPMC), UPMC, Sorbonne University, Pierre and Marie Curie University, Paris 6, France
Thanh-Nghi Do
AI Lab, Computer Science Department, Gyeongsang National University, Jinju, Korea
Le-Diem Bui & Yong-Gi Kim

Authors

Le-Diem Bui
View author publications
You can also search for this author in PubMed Google Scholar
Minh-Thu Tran-Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Gi Kim
View author publications
You can also search for this author in PubMed Google Scholar
Thanh-Nghi Do
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanh-Nghi Do .

Editor information

Editors and Affiliations

HCMC University of Technology, Ho Chi Minh City, Vietnam
Tran Khanh Dang
Johannes Kepler University Linz, Linz, Austria
Roland Wagner
Johannes Kepler University Linz, Linz, Austria
Josef Küng
Ho Chi Minh City University of Technolog , Ho Chi Minh City, Vietnam
Nam Thoai
Hosei University, Koganei, Tokyo, Japan
Makoto Takizawa
University of Vienna, Vienna, Austria
Erich J. Neuhold

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bui, LD., Tran-Nguyen, MT., Kim, YG., Do, TN. (2017). Parallel Algorithm of Local Support Vector Regression for Large Datasets. In: Dang, T., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E. (eds) Future Data and Security Engineering. FDSE 2017. Lecture Notes in Computer Science(), vol 10646. Springer, Cham. https://doi.org/10.1007/978-3-319-70004-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-70004-5_10
Published: 01 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70003-8
Online ISBN: 978-3-319-70004-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics