A distributed approach to enabling privacy-preserving model-based classifier training

Luo, Hangzai; Fan, Jianping; Lin, Xiaodong; Zhou, Aoying; Bertino, Elisa

doi:10.1007/s10115-008-0167-x

A distributed approach to enabling privacy-preserving model-based classifier training

Regular Paper
Published: 26 September 2008

Volume 20, pages 157–185, (2009)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Hangzai Luo¹,
Jianping Fan²,
Xiaodong Lin³,
Aoying Zhou¹ &
…
Elisa Bertino⁴

141 Accesses
7 Citations
3 Altmetric
Explore all metrics

Abstract

This paper proposes a novel approach for privacy-preserving distributed model-based classifier training. Our approach is an important step towards supporting customizable privacy modeling and protection. It consists of three major steps. First, each data site independently learns a weak concept model (i.e., local classifier) for a given data pattern or concept by using its own training samples. An adaptive EM algorithm is proposed to select the model structure and estimate the model parameters simultaneously. The second step deals with combined classifier training by integrating the weak concept models that are shared from multiple data sites. To reduce the data transmission costs and the potential privacy breaches, only the weak concept models are sent to the central site and synthetic samples are directly generated from these shared weak concept models at the central site. Both the shared weak concept models and the synthetic samples are then incorporated to learn a reliable and complete global concept model. A computational approach is developed to automatically achieve a good trade off between the privacy disclosure risk, the sharing benefit and the data utility. The third step deals with validating the combined classifier by distributing the global concept model to all these data sites in the collaboration network while at the same time limiting the potential privacy breaches. Our approach has been validated through extensive experiments carried out on four UCI machine learning data sets and two image data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonlinear Model Combination Approach to Decentralised and Privacy-Preserving Classification

Privacy-Utility Feature Selection as a tool in Private Data Classification

A New Method for Constructing Ensemble Classifier in Privacy-Preserving Distributed Environment

References

Westin AF (1967) Privacy and freedom. Atheneum, New York
Google Scholar
Rosenthal A, Winslett M (2004) Security of shared data in large systems: state of the art and research directions. In: ACM SIGMOD
Thuraisingham BM (2002) Data mining, national security, privacy and civil liberties. SIGKDD Explor Newsl 4(2): 1–5
Article Google Scholar
Aggarwal G, Bawa M, Ganesan P, Garcia-Molina H, Kenthapadi K, Mishra N, Motwani R, Srivastava U, Thomas D, Widom J, Xu Y (2004) Vision paper: enabling privacy for the paranoids. In: VLDB, pp 708–719
Hore B, Mehrotra S, Tsudik G (2004) A privacy-preserving index for range queries. In: VLDB, pp 720–731
Deutsch A, Papakonstantinou Y (2005) Privacy in database publishing. In ICDT, pp 230–245
Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertainty 10(5): 571–588
MATH MathSciNet Google Scholar
Kantarcioglu M, Jin J, Clifton C (2004) What do data mining results violate privacy. In: ACM SIGKDD
Liew CK, Coi UJ, Liew CJ (1985) A data distortion by probability distribution. ACM Trans Database Syst 10(3): 395–411
Article MATH Google Scholar
Muralidhar K, Sarathy R (1999) Security of random data perturbation methods. ACM Trans Database Syst 24(4): 487–493
Article Google Scholar
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM SIGMOD, pp 439–450
Agrawal D, Aggarwal C (2001) On the design and quantification of privacy preserving data mining algorithms. In: ACM PODS
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: ACM SIGKDD
Evfimievski A, Gehrke J, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: ACM PODS
Wang K, Yu PS, Chakraborty S (2004) Bottom-up generalization: a data mining solution to privacy protection. In: IEEE ICDM
Ma D, Sivakumar K, Kargupta H (2004) privacy sensitive bayesian network parameter learning. In: IEEE ICDM
Yao A (1986) How to generate and exchange secrets. In: IEEE Symp. on Foundations of Computer Science, pp 162–167
Lindell Y, Israel R, Pinkas B (2000) Privacy preserving data mining. CRYPTO, pp 36–54
Goldreich O, Micali S, Wigderson A (1987) How to play any mental game- a completeness theorem for protocols with honest majority. In: STOC
Du W, Atallah MJ (2001) Privacy-preserving cooperative statistical analysis. In: 17th Annual Computer Security Applications Conference, pp 103–110
Du W, Han Y, Chen S (2004) Privacy-preserving multivariate statistical analysis: Linear regression and classification. In: SIAM Conference on Data Mining
Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitional data. In: ACM SIGKDD
Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: ACM SIGKDD
Wright R, Yang Z (2004) Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In: ACM SIGKDD
Chen K, Liu L (2005) Privacy preserving data classification with rotation perturbation. In: IEEE ICDM, pp 589–592
Oliveira S, Zaiane OR (2003) Privacy preserving clustering by data transformation. In: SBBD
Domingo-Ferrer J, Mateo-Sanz JM (2001) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1): 189–201
Article Google Scholar
Fienberg SE, Makov UE, Steele RJ (1998) Disclosure limitation using perturbation and related methods for categorial data. J Official Stat 14(4): 485–502
Google Scholar
Raghunathan TJ, Reiter JP, Rubin D (2003) Multiple imputation for statistical disclosure limitation. J Official Stat 19(1): 1–16
Google Scholar
Crises G (2004) Synthetic microdata generation for database privacy protection. Technical report, CRISES Research Group, CRIREP-04-009
Merugu S, Ghosh J (2003) Privacy-preserving distributed clustering using generative models. In: IEEE ICDM
Chan, P, Stolfo, S, Wolpert, D (eds) (1996) Working Notes of AAAI Workshop on Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms, vol 36. AAAI/MIT Press, Cambridge
Google Scholar
Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: IEEE ICDM
Huang Z, Du W, Chen B (2005) Deriving private information from randomized data. In: ACM SIGMOD
Zhu Y, Liu L (2004) Optimal randomization for privacy preserving data mining. In: ACM SIGKDD, pp 761–766
Xiong L, Chitti S, Liu L (2007) Mining multiple private databases using a knn classifier. In: SAC
Kim J, Winkler WE (2003) Multiplicative noise for masking continuous data. Technical report, US Bureau of Census, Statistics Research Division technical report statistics 2003-01
Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1): 92–106
Article Google Scholar
Ting K, Witten I (1999) Issues in stacked generalization. J Artif Intell Res 10: 271–289
MATH Google Scholar
Fan J, Luo H, Hacid M-S, Bertino E (2005) A novel approach for privacy-preserving video sharing. In: ACM CIKM, pp 609–616
Figueiredo M, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24: 381–396
Article Google Scholar
McLachlan G, Krishnan T (2000) The EM algorithm and extensions. Wiley, New York
Google Scholar
Ueda N, Nakano R, Ghahramani Z, Hinton GE (2002) Smem algorithm for mixture models. Neural Comput 12(9): 2109–2128
Article Google Scholar
Luo H (2007) Concept-based large-scale video database browsing and retrieval via visualization. Ph.D. thesis, The University of North Carolina at Charlotte, pp 58–60. http://hdl.handle.net/2029/87
Hyvarinen A (1998) New approximations of dioeerential entropy for independent component analysisand projection pursuit. In: Annual Conference on Neural Information Processing Systems, vol 10, pp 273–279
Gomantam S, Karr AF, Sanil AP (2005) Data swapping as a decision problem. J Official Stat 13(4): 635–655
Google Scholar
Lamber D (1993) Measures of disclosure risk and harm. J Official Stat 9: 313–331
Google Scholar
Nigam K, McCallum A, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using em. Mach Learn 39(2-3): 103–134
Article MATH Google Scholar
Joachims T (1999) Transductive inference for text classification using support vector machine. In: ICML
Hettich S, Blake C, Merz C (1998) Uci respository of machine learning databases. Technical report. http://www.ics.uci.edu/~mlearn/

Download references

Author information

Authors and Affiliations

Shanghai Key Lab of Trustworthy Computing, East China Normal University, Shanghai, China
Hangzai Luo & Aoying Zhou
Department of Computer Science, University of North Carolina, Charlotte, NC, 28223, USA
Jianping Fan
Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH, 45221, USA
Xiaodong Lin
Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
Elisa Bertino

Authors

Hangzai Luo
View author publications
Search author on:PubMed Google Scholar
Jianping Fan
View author publications
Search author on:PubMed Google Scholar
Xiaodong Lin
View author publications
Search author on:PubMed Google Scholar
Aoying Zhou
View author publications
Search author on:PubMed Google Scholar
Elisa Bertino
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Jianping Fan.

Additional information

This project is supported by National Science Foundation under 0208539-IIS and 0601542-IIS, grants from AO Foundation and CERIAS, Shanghai Pujiang Program under 08PJ1404600, National Natural Science Foundation of China under 60496325 and National Hi-tech R&D Program of China under 2006AA010111.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, H., Fan, J., Lin, X. et al. A distributed approach to enabling privacy-preserving model-based classifier training. Knowl Inf Syst 20, 157–185 (2009). https://doi.org/10.1007/s10115-008-0167-x

Download citation

Received: 28 August 2007
Revised: 17 May 2008
Accepted: 04 August 2008
Published: 26 September 2008
Issue Date: August 2009
DOI: https://doi.org/10.1007/s10115-008-0167-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A distributed approach to enabling privacy-preserving model-based classifier training

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Nonlinear Model Combination Approach to Decentralised and Privacy-Preserving Classification

Privacy-Utility Feature Selection as a tool in Private Data Classification

A New Method for Constructing Ensemble Classifier in Privacy-Preserving Distributed Environment

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now