Abstract
In recent years there is an increasing interest for analytical methods that learn patterns over large-scale data distributed over Peer-to-Peer (P2P) networks and support applications. Mining patterns in such distributed and dynamic environment is a challenging task, because centralization of data is not feasible. In this paper, we have proposed a distributed classification technique based on relevance vector machines (RVM) and local model exchange among neighboring peers in a P2P network. In such networks, the evaluation criteria for an efficient distributed classification algorithm is based on the size of resulting local models (communication efficiency) and their prediction accuracy. RVM utilizes dramatically fewer kernel functions than a state-of-the-art “support vector machine” (SVM), while demonstrating comparable generalization performance. This makes RVM a suitable choice to learn compact and accurate local models at each peer in a P2P network. Our model propagation approach, exchange resulting models with peers in a local neighborhood to produce more accurate network wide global model, while keeping the communication cost low throughout the network. Through extensive experimental evaluations, we demonstrate that by using more relevant and compact models, our approach outperforms the baseline model propagation approaches in terms of accuracy and communication cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ang, H.-H., Gopalkrishnan, V., Hoi, S. C., & Ng, W. W. (2008). Cascade RSVM in Peer-to-Peer Networks. In European Conference on Machine Learning and Knowledge Discovery in Databases.
Bhaduri, K., Wolff, R., Giannella, C., & Kargupta, H. (2008). Distributed decision-tree induction in peer-to-peer systems. Statistical Analysis and Data Mining, 1(2), 85–103.
Caruana, G., & Li, M. (2012). A survey of emerging approaches to spam filtering. ACM Computing Surveys, 44(2), Article 9, 27.
Chang, C.-C., & Lin, C.-J., LIBSVM. (2011). A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27.
Datta, S., Giannella, C., & Kargupta, H. (2009). Approximate distributed k-means clustering over a peer-to-peer network. Transactions on Knowledge and Data Engineering, 21(10), 1372–1388.
Lee, Y.-J., & Mangasarian, O. L.(2001). RSVM: Reduced support vector machines. In First SIAM International Conference on Data Mining, 5–7.
Luo, P., Xiong, H., Kevin, L., & Shi, Z. (2007). Distributed classification in peer-to-peer networks. In 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’07)
MacKay, D. J. (1996). Bayesian methods for back propagation networks. Models of neural networks III (pp. 211–254). New York: Springer
Odysseas, P., Siberski, W., & Siersdorfer, S. (2011). Collaborative classification over P2P networks. In 20th International Conference Companion on World Wide Web (WWW ’11)
Tipping, M. E. (2001). Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.
Wolff, R., & Schuster, A. (2004). Association rule mining in peer-to-peer systems. Transactions on Systems, Man, and Cybernetics, Part B, 34(6), 2426–2438.
Acknowledgements
This work is funded by the Seventh Framework Program of European Commission, through the project REDUCTION (No. 288254). www.reduction-project.eu.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Khan, M.U., Nanopoulos, A., Schmidt-Thieme, L. (2015). P2P RVM for Distributed Classification. In: Lausen, B., Krolak-Schwerdt, S., Böhmer, M. (eds) Data Science, Learning by Latent Structures, and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44983-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-662-44983-7_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44982-0
Online ISBN: 978-3-662-44983-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)