Skip to main content
Log in

A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used in machine learning for data compaction and efficient learning by eliminating the curse of dimensionality. There exist many solutions for feature selection when the data are located at a central location. However, it becomes extremely challenging to perform the same when the data are distributed across a large number of peers or machines. Centralizing the entire dataset or portions of it can be very costly and impractical because of the large number of data sources, the asynchronous nature of the peer-to-peer networks, dynamic nature of the data/network, and privacy concerns. The solution proposed in this paper allows us to perform feature selection in an asynchronous fashion with a low communication overhead where each peer can specify its own privacy constraints. The algorithm works based on local interactions among participating nodes. We present results on real-world dataset in order to test the performance of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bhaduri K, Wolff R, Giannella C, Kargupta H (2008) Distributed decision tree induction in peer-to-peer systems. Stat Anal Data Min J 1(2): 85–103

    Article  MathSciNet  Google Scholar 

  2. Chen R, Sivakumar K, Kargupta H (2004) Collective mining of Bayesian networks from distributed heterogeneous data. Knowl Inf Syst 6(2): 164–187

    Google Scholar 

  3. Cho V, Wüthrich B (2002) Distributed mining of classification rules. Knowl Inf Syst 4(1): 1–30

    Article  MATH  Google Scholar 

  4. Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu M (2003) Tools for Privacy Preserving Distributed Data Mining. ACM SIGKDD Explorations 4(2): 28–34

    Article  Google Scholar 

  5. Das K, Bhaduri K, Kargupta H (2009) A distributed asynchronous local algorithm using multi-party optimization based privacy preservation, Proceedings of P2P’09, Seattle, pp 212–221

  6. Das K, Bhaduri K, Liu K, Kargupta H (2008) Distributed identification of top-l inner product elements and its application in a peer-to-peer network. TKDE 20(4): 475–488

    Google Scholar 

  7. Datta S, Bhaduri K, Giannella C, Wolff R, Kargupta H (2006) Distributed data mining in peer-to-peer networks. IEEE Internet Comput 10(4): 18–26

    Article  Google Scholar 

  8. Datta S, Giannella C, Kargupta H (2006) k-Means clustering over a large, dynamic network, Proceedings of SDM’06, MD, pp 153–164

  9. Evfimevski A, Gehrke J, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining, Proceedings of SIGMOD/PODS’03, San Diego

  10. Gilburd B, Schuster A, Wolff R (2004) k-TTP: a new privacy model for large-scale distributed environments, Proceedings of KDD’04, Seattle, pp 563–568

  11. Jung JJ (2009) Consensus-based evaluation framework for distributed information retrieval systems. Knowl Inf Syst 18(2): 199–211

    Article  Google Scholar 

  12. Kargupta H, Das K, Liu K (2007) Multi-party, privacy-preserving distributed data mining using a game theoretic framework, Proceedings of PKDD’07, Warsaw, pp 523–531

  13. Kargupta H, Huang W, Sivakumar K, Johnson EL (2001) Distributed clustering using collective principal component analysis. Knowl Inf Syst 3(4): 422–448

    Article  MATH  Google Scholar 

  14. Kargupta H, Sivakumar K (2004) Existential pleasures of distributed data mining: data mining: next generation challenges and future directions, AAAI/MIT Press

  15. Keogh EJ, Chakrabarti K, Pazzani MJ, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3): 263–286

    Article  MATH  Google Scholar 

  16. Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer, London

    MATH  Google Scholar 

  17. Liu K, Bhaduri K, Das K, Nguyen P, Kargupta H (2006) Client-side web mining for community formation in peer-to-peer environments. SIGKDD Explor 8(2): 11–20

    Article  MATH  Google Scholar 

  18. Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) l-diversity: privacy beyond k-anonymity, Proceedings of ICDE’06, Atlanta, p 24

  19. Maulik U, Bandyopadhyay S, Trinder JC (2001) SAFE: an efficient feature extraction technique. Knowl Inf Syst 3(3): 374–387

    Article  MATH  Google Scholar 

  20. Saroiu S, Gummadi PK, Gribble SD (2002) A measurement study of peer-to-peer file sharing systems, Proceedings of multimedia computing and networking (MMCN’02), San Jose

  21. Sayal M, Scheuermann P (2001) Distributed web log mining using maximal large itemsets. Knowl Inf Syst 3(4): 389–404

    Article  MATH  Google Scholar 

  22. Scherber D, Papadopoulos H (2005) Distributed computation of averages over ad hoc networks. IEEE J Sel Areas Commun 23(4): 776–787

    Article  Google Scholar 

  23. Schuster A, Wolff R, Trock D (2005) A high-performance distributed algorithm for mining association rules. Knowl Inf Syst 7(4): 458–475

    Article  Google Scholar 

  24. Tan P-N, Steinbach M, Kumar V (2006) Introduction to data mining. Addison-Wesley , Reading

    Google Scholar 

  25. Teng Z, Du W (2009) Hybrid multi-group approach for privacy-preserving data mining. Knowl Inf Syst 19(2): 133–157

    Article  Google Scholar 

  26. Waxman BM (1991) Routing of multipoint connections, pp 347–352

  27. Wolff R, Schuster A (2004) Association rule mining in peer-to-peer systems. IEEE Trans Syst Man Cybernet Part B 34(6): 2426–2438

    Article  Google Scholar 

  28. Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. Proceedings of ICML-97, Nashville, pp 412–420

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kamalika Das.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, K., Bhaduri, K. & Kargupta, H. A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks. Knowl Inf Syst 24, 341–367 (2010). https://doi.org/10.1007/s10115-009-0274-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0274-3

Keywords

Navigation