skip to main content
10.1145/2833258.2833304acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

A Communication-Efficient Distributed Algorithm for Large-scale Classification within P2P Networks

Published: 03 December 2015 Publication History

Abstract

This paper proposes a supervised and fully-distributed intelligent classification algorithm that is accurate and scalable for large networks. In addition, the resulting algorithm has the following interesting features: fully-distributed, asynchronous, light-weight, online learning, and fast responses. These characteristics make it scalable for large networks. A major distinction of our method compared to the other approaches is that it forms a single global classifier, instead of building many local classifiers (one at every site). Fine-granularity components of the classifier are distributed across the network by using Distributed Hash Table (DHT) --- which provides efficient linking to these components and ensures the system remains fully-distributed. Our simulation results also show that the proposed method is more communication-efficient than several other distributed algorithms. The results also show that the distributed algorithm is able to produce accurate results that are comparable to the available state-of-the-art machine learning techniques.

References

[1]
H. H. Ang, V. Gopalkrishna, S. Hoi., and W. K. Ng. Cascade RSVM in Peer-to-Peer Networks. In W. Daelemans, B. Goethals, and K. Morik, editors, Machine Learning and Knowledge Discovery in Databases, volume 5211 of Lecture Notes in Computer Science, pages 55--70. Springer Berlin Heidelberg, 2008.
[2]
H. H. Ang, V. Gopalkrishnan, S. Hoi, W. K. Ng, and A. Datta. Classification in P2P Networks by Bagging Cascade RSVMs. In Proceeding of Sixth International Workshops Databases, Information Systems, and Peer-to-Peer Computing, (DBISP2P 2008), Auckland, New Zealand, 23-23 August, 2008, pages 13---25, Auckland, New Zealand, August 2008. DBLP.
[3]
O. Babaoglu and M. Marzolla. The people's cloud. Spectrum, IEEE, 51(10):50--55, October 2014.
[4]
K. Bache and M. Lichman. UCI machine learning repository, 2013. {Online viewed on April 3, 2013} http://archive.ics.uci.edu/ml.
[5]
K. Bhaduri, R. Wolff, C. Giannella, and H. Kargupta. Distributed decision tree induction in Peer-to-Peer systems. Statistical Analysis and Data Mining, 1(2):85--103, June 2008.
[6]
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Statistics/Probability Series. Wadsworth Publishing Company, Belmont, California, U.S.A, 1984.
[7]
D. N. da Hora, D. F. Macedo, L. B. Oliveira, I. G. Siqueira, A. A. F. Loureiro, J. M. Nogueira, and G. Pujolle. Enhancing peer-to-peer content discovery techniques over mobile ad hoc networks. Comput. Commun., 32(13--14):1445--1459, 2009.
[8]
P. A. Forero, A. Cano, and G. B. Giannakis. Consensus-Based Distributed Support Vector Machines. Machine Learning Research, 99:1663--1707, 2010.
[9]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. SIGKDD Explorer Newsletters, 11(1):10--18, 2009.
[10]
M. A. Hall. Correlation-based: Feature Selection for Discrete and Numeric Class Machine Learning. In Proceedings of the Seventeenth International Conference on Machine Learning, (ICML '00), Stanford, CA, USA, 29 June -- 2 July, 2000, ICML '00, pages 359--366, San Francisco, CA, USA, June 2000. Morgan Kaufmann Publishers Inc.
[11]
A. I. Khan. A Peer-to-Peer Associative Memory for Intelligent Information Systems. In Proceeding Thirteenth Australasian Conference on Information Systems, (ACIS 2002), Melbourne, Australia, December, 2002, pages 705--709, 2002.
[12]
A. Lazarevic and Z. Obradovic. Boosting algorithms for parallel and distributed learning. Distributed Parallel Databases, 11(2):203--229, 2002.
[13]
Y. LeCun and C. Cortes. MNIST handwritten digit database, 2010. {Online viewed on June 5, 2012} http://yann.lecun.com/exdb/mnist/.
[14]
P. Luo, H. Xiong, K. Lü, and Z. Shi. Distributed classification in Peer-to-Peer networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD '07), San Jose, California, USA, 12-15 August, 2007, pages 968--976, August 2007.
[15]
A. Montresor and M. Jelasity. PeerSim: A scalable P2P simulator. In Proceedings of the 9th International Conference on Peer-to-Peer, (P2P'09), Seattle, Washington, USA, 9-11 September 2009, pages 99--100, September 2009.
[16]
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable Peer-to-Peer lookup service for Internet applications. SIGCOMM Computing Communication Review, 31(4):149--160, 2001.
[17]
R. Wolff and A. Schuster. Association rule mining in Peer-to-Peer systems. IEEE Transactions System, Man, and Cybernetics, 34(6):2426--2438, 2004.
[18]
B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph. Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing. Technical Report UCB/CSD-01-1141, EECS Department, University of California, Berkeley, Apr 2001.

Cited By

View all
  • (2018)Distributed classification for image spam detectionMultimedia Tools and Applications10.1007/s11042-017-4944-y77:11(13249-13278)Online publication date: 1-Jun-2018
  • (2016)Distributed pattern recognition within DHT-based networks for imbalanced datasets2016 First IEEE International Conference on Computer Communication and the Internet (ICCCI)10.1109/CCI.2016.7778867(10-13)Online publication date: Oct-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SoICT '15: Proceedings of the 6th International Symposium on Information and Communication Technology
December 2015
372 pages
ISBN:9781450338431
DOI:10.1145/2833258
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • SOICT: School of Information and Communication Technology - HUST
  • NAFOSTED: The National Foundation for Science and Technology Development

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Big Data
  2. P2P-based Cloud
  3. distributed classification
  4. large-scale data mining
  5. peer-to-peer

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SoICT 2015

Acceptance Rates

SoICT '15 Paper Acceptance Rate 49 of 106 submissions, 46%;
Overall Acceptance Rate 147 of 318 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Distributed classification for image spam detectionMultimedia Tools and Applications10.1007/s11042-017-4944-y77:11(13249-13278)Online publication date: 1-Jun-2018
  • (2016)Distributed pattern recognition within DHT-based networks for imbalanced datasets2016 First IEEE International Conference on Computer Communication and the Internet (ICCCI)10.1109/CCI.2016.7778867(10-13)Online publication date: Oct-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media