A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks

Das, Kamalika; Bhaduri, Kanishka; Kargupta, Hillol

doi:10.1007/s10115-009-0274-3

A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks

Regular Paper
Published: 25 November 2009

Volume 24, pages 341–367, (2010)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Kamalika Das¹,
Kanishka Bhaduri² &
Hillol Kargupta^3,4

255 Accesses
30 Citations
Explore all metrics

Abstract

In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used in machine learning for data compaction and efficient learning by eliminating the curse of dimensionality. There exist many solutions for feature selection when the data are located at a central location. However, it becomes extremely challenging to perform the same when the data are distributed across a large number of peers or machines. Centralizing the entire dataset or portions of it can be very costly and impractical because of the large number of data sources, the asynchronous nature of the peer-to-peer networks, dynamic nature of the data/network, and privacy concerns. The solution proposed in this paper allows us to perform feature selection in an asynchronous fashion with a low communication overhead where each peer can specify its own privacy constraints. The algorithm works based on local interactions among participating nodes. We present results on real-world dataset in order to test the performance of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stratified random sampling from streaming and stored data

Article 23 October 2020

Big data preprocessing: methods and prospects

Article Open access 01 November 2016

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Article Open access 06 November 2019

References

Bhaduri K, Wolff R, Giannella C, Kargupta H (2008) Distributed decision tree induction in peer-to-peer systems. Stat Anal Data Min J 1(2): 85–103
Article MathSciNet Google Scholar
Chen R, Sivakumar K, Kargupta H (2004) Collective mining of Bayesian networks from distributed heterogeneous data. Knowl Inf Syst 6(2): 164–187
Google Scholar
Cho V, Wüthrich B (2002) Distributed mining of classification rules. Knowl Inf Syst 4(1): 1–30
Article MATH Google Scholar
Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu M (2003) Tools for Privacy Preserving Distributed Data Mining. ACM SIGKDD Explorations 4(2): 28–34
Article Google Scholar
Das K, Bhaduri K, Kargupta H (2009) A distributed asynchronous local algorithm using multi-party optimization based privacy preservation, Proceedings of P2P’09, Seattle, pp 212–221
Das K, Bhaduri K, Liu K, Kargupta H (2008) Distributed identification of top-l inner product elements and its application in a peer-to-peer network. TKDE 20(4): 475–488
Google Scholar
Datta S, Bhaduri K, Giannella C, Wolff R, Kargupta H (2006) Distributed data mining in peer-to-peer networks. IEEE Internet Comput 10(4): 18–26
Article Google Scholar
Datta S, Giannella C, Kargupta H (2006) k-Means clustering over a large, dynamic network, Proceedings of SDM’06, MD, pp 153–164
Evfimevski A, Gehrke J, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining, Proceedings of SIGMOD/PODS’03, San Diego
Gilburd B, Schuster A, Wolff R (2004) k-TTP: a new privacy model for large-scale distributed environments, Proceedings of KDD’04, Seattle, pp 563–568
Jung JJ (2009) Consensus-based evaluation framework for distributed information retrieval systems. Knowl Inf Syst 18(2): 199–211
Article Google Scholar
Kargupta H, Das K, Liu K (2007) Multi-party, privacy-preserving distributed data mining using a game theoretic framework, Proceedings of PKDD’07, Warsaw, pp 523–531
Kargupta H, Huang W, Sivakumar K, Johnson EL (2001) Distributed clustering using collective principal component analysis. Knowl Inf Syst 3(4): 422–448
Article MATH Google Scholar
Kargupta H, Sivakumar K (2004) Existential pleasures of distributed data mining: data mining: next generation challenges and future directions, AAAI/MIT Press
Keogh EJ, Chakrabarti K, Pazzani MJ, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3): 263–286
Article MATH Google Scholar
Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer, London
MATH Google Scholar
Liu K, Bhaduri K, Das K, Nguyen P, Kargupta H (2006) Client-side web mining for community formation in peer-to-peer environments. SIGKDD Explor 8(2): 11–20
Article MATH Google Scholar
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) l-diversity: privacy beyond k-anonymity, Proceedings of ICDE’06, Atlanta, p 24
Maulik U, Bandyopadhyay S, Trinder JC (2001) SAFE: an efficient feature extraction technique. Knowl Inf Syst 3(3): 374–387
Article MATH Google Scholar
Saroiu S, Gummadi PK, Gribble SD (2002) A measurement study of peer-to-peer file sharing systems, Proceedings of multimedia computing and networking (MMCN’02), San Jose
Sayal M, Scheuermann P (2001) Distributed web log mining using maximal large itemsets. Knowl Inf Syst 3(4): 389–404
Article MATH Google Scholar
Scherber D, Papadopoulos H (2005) Distributed computation of averages over ad hoc networks. IEEE J Sel Areas Commun 23(4): 776–787
Article Google Scholar
Schuster A, Wolff R, Trock D (2005) A high-performance distributed algorithm for mining association rules. Knowl Inf Syst 7(4): 458–475
Article Google Scholar
Tan P-N, Steinbach M, Kumar V (2006) Introduction to data mining. Addison-Wesley , Reading
Google Scholar
Teng Z, Du W (2009) Hybrid multi-group approach for privacy-preserving data mining. Knowl Inf Syst 19(2): 133–157
Article Google Scholar
Waxman BM (1991) Routing of multipoint connections, pp 347–352
Wolff R, Schuster A (2004) Association rule mining in peer-to-peer systems. IEEE Trans Syst Man Cybernet Part B 34(6): 2426–2438
Article Google Scholar
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. Proceedings of ICML-97, Nashville, pp 412–420

Download references

Author information

Authors and Affiliations

Stinger Ghaffarian Technologies Inc., IDU Group, NASA Ames Research Center, Moffett Field, CA, 94035, USA
Kamalika Das
Mission Critical Technologies Inc., IDU Group, NASA Ames Research Center, Moffett Field, CA, 94035, USA
Kanishka Bhaduri
Department of CSEE, University of Maryland, Baltimore County, Baltimore, MD, 21250, USA
Hillol Kargupta
AGNIK LLC, Baltimore, MD, USA
Hillol Kargupta

Authors

Kamalika Das
View author publications
You can also search for this author in PubMed Google Scholar
Kanishka Bhaduri
View author publications
You can also search for this author in PubMed Google Scholar
Hillol Kargupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kamalika Das.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, K., Bhaduri, K. & Kargupta, H. A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks. Knowl Inf Syst 24, 341–367 (2010). https://doi.org/10.1007/s10115-009-0274-3

Download citation

Received: 15 August 2008
Revised: 17 July 2009
Accepted: 10 October 2009
Published: 25 November 2009
Issue Date: September 2010
DOI: https://doi.org/10.1007/s10115-009-0274-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks

Abstract

Access this article

Similar content being viewed by others

Stratified random sampling from streaming and stored data

Big data preprocessing: methods and prospects

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks

Abstract

Access this article

Similar content being viewed by others

Stratified random sampling from streaming and stored data

Big data preprocessing: methods and prospects

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation