Abstract
Distributed data streams mining is increasingly demanded in most extensive application domains, like web traffic analysis and financial transactions. In distributed environments, it is impractical to transmit all data to one node for global model. It is reasonable to extract the essential parts of local models of subsidiary nodes, thereby integrating into the global model. In this paper we proposed an approach SVDDS to do this model integration in distributed environments. It is based on SVM theory, and trades off between the risk of the global model and the total transmission load. Our analysis and experiments show that SVDDS obviously lowers the total transmission load while the global accuracy drops comparatively little.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, Madison (2002)
Domingos, P., Hulten, G.: Mining High-Speed Data Streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Boston (2000)
Street, W.N., Kim, Y.: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. ACM, San Francisco (2001)
Syed, N.A., Liu, H., Sung, K.K.: Handling Concept Drifts in Incremental Learning with Support Vector Machines. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 317–321. ACM, New York (1999)
Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering Data Streams: Theory and Practice. IEEE Transactions on Knowledge and Data Engineering 15, 515–528 (2003)
Chen, L., Reddy, K., Agrawal, G.: GATES: A Grid-Based Middleware for Distributed Processing of Data Streams. In: Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing, pp. 192–201. IEEE, Honolulu (2004)
Beringer, J., Hüllermeier, E.: Online Clustering of Parallel Data Streams. Data & Knowledge Engineering 58, 180–204 (2006)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2006)
Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, Inc., Chichester (1998)
Schölkopf, B., Smola, A.J.: Learning with Kernels - Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)
Bennett, K.P., Bredensteiner, E.J.: Duality and Geometry in SVM Classifiers. In: Proceedings of the 17th International Conference on Machine Learning, pp. 57–64. Morgan Kaufmann Publishers Inc., Standord (2000)
Mavroforakis, M.E., Theodoridis, S.: A Geometric Approach to Support Vector Machine (SVM) Classification. IEEE Transactions on Neural Networks 17, 671–682 (2006)
Crisp, D.J., Burges, C.J.C.: A Geometric Interpretation of v-SVM Classifiers. Advances in Neural Information Processing Systems 12, 244–251 (1999)
Tax, D.M.J., Duin, R.P.W.: Support Vector Data Description. Machine Learning 54, 45–66 (2004)
Tax, D.M.J.: One-Class Classification. Vol. Doctor. Delft University of Technology, p. 198 (2001)
R Development Core Team: R: A Language and Environment for Statistical Computing (2008)
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab – An S4 Package for Kernel Methods in R. Journal of Statistical 11, 1–20 (2004)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2007)
Cauwenberghs, G., Poggio, T.: Incremental and Decremental Support Vector Machine Learning. In: Proceedings of the 14th Conference on Neural Information Processing Systems, pp. 409–415. MIT Press, Cambridge (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, P., Mao, G. (2010). Describing Data with the Support Vector Shell in Distributed Environments. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2010. Lecture Notes in Computer Science(), vol 6171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14400-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-14400-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14399-1
Online ISBN: 978-3-642-14400-4
eBook Packages: Computer ScienceComputer Science (R0)