Skip to main content
Log in

A survey: deriving private information from perturbed data

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Privacy-preserving data mining has attracted the attention of a large number of researchers. Many data perturbation methods have been proposed to ensure individual privacy. Such methods seem to be successful in providing privacy and accuracy. On one hand, different methods are utilized to preserve privacy. On the other hand, various data reconstruction approaches have been proposed to derive private information from perturbed data. Thus, many researchers have been conducting various studies about data reconstruction methods and the resilience of data perturbation schemes. In this survey, we focus on data reconstruction methods due to their importance in privacy-preserving data mining. We provide a detailed review of the data reconstruction methods and the data perturbation schemes attacked by different data reconstruction techniques. We merge our review with the evaluation metrics and the data sets used in current attack techniques. Finally, we pose some open questions to provide a better understanding of these approaches and to guide future study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aggarwal CC, Yu PS (2008) A survey of randomization methods for privacy preserving data mining. In: Aggarwal CC, Yu PS (eds) Privacy-preserving data mining: models and algorithms. Springer, New York, pp 137–156

    Chapter  Google Scholar 

  • Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. Santa Barbara, pp 247–255

  • Agrawal R, Ghosh SP, Imielinski T, Iyer BR, Swami A (1992) An interval classifier for database mining applications. In: Proceedings of the 18th international conference on very large databases. Vancouver, British Columbia, pp 560–573

  • Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data. Dallas, pp 439–450

  • Agrawal S, Haritsa JR (2005) A framework for high-accuracy privacy-preserving mining. In: Proceedings of 21st international conference on data engineering. Los Alamitos, pp 193–204

  • Alaggan M, Gambs S, Kermarrec A-M (2012) BLIP: non-interactive differentially-private similarity computation on bloom filters. Lecture notes in computer science, vol 7596. pp 202–216. doi:10.1007/978-3-642-33536-5_20

  • Amiri A (2007) Dare to share: protecting sensitive knowledge with data sanitization. Decis Support Syst 43(1):181–191. doi:10.1016/j.dss.2006.08.007

    Article  Google Scholar 

  • Atallah M, Elmagarmid A, Ibrahim M, Bertino E, Verykios V (1999) Disclosure limitation of sensitive rules. In: Proceedings of the 1999 workshop on knowledge and data engineering exchange. Chicago, pp 45–52

  • Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 22 Sept 2013

  • Balu R, Furon T, Gambs S (2014) Challenging differential privacy: The case of non-interactive mechanisms. Lecture notes in computer science, vol 8713. pp 146–164. doi:10.1007/978-3-319-11212-1_9

  • Calandrino JA, Kilzer A, Narayanan A, Felten EW, Shmatikov V (2011) You might also like: privacy risks of collaborative filtering. In: Proceedings of the 2011 IEEE symposium on security and privacy. Berkeley, pp 231–246

  • Canny J (2002) Collaborative filtering with privacy via factor analysis. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. Tampere, pp 238–245

  • Chen K, Liu L (2005) Privacy preserving data classification with rotation perturbation. In: Proceedings of the 5th IEEE international conference on data mining. Houston, pp 589–592

  • Chen K, Sun G, Liu L (2007) Towards attack-resilient geometric data perturbation. In: Proceedings of the 2007 SIAM international conference on data mining. Minneapolis, pp 78–89

  • Chen K, Liu L (2008) A survey of multiplicative perturbation for privacy preserving data mining. In: Aggarwal CC, Yu PS (eds) Privacy-preserving data mining: models and algorithms. Springer, New York, pp 157–181

    Chapter  Google Scholar 

  • Domingo-Ferrer J, Sebé F, Castellà-Roca J (2004) On the security of noise addition for privacy in statistical databases. Lecture notes in computer science, vol 3050. pp 149–161. doi:10.1007/978-3-540-25955-8_12

  • Du W, Zhan Z (2003) Using randomized response techniques for privacy-preserving data mining. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. Washington, pp 505–510

  • Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining. Edmonton, pp 217–228

  • Evfimievski A, Gehrke J, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. San Diego, pp 211–222

  • Giannella CR, Liu K, Kargupta H (2013) Breaching Euclidean distance-preserving data perturbation using few known inputs. Data Knowl Eng 83:93–110. doi:10.1016/j.datak.2012.10.004

    Article  Google Scholar 

  • Guo L, Wu X (2009) Privacy preserving categorical data analysis with unknown distortion parameters. Trans Data Priv 2:185–205

    MathSciNet  Google Scholar 

  • Guo S, Wu X (2006a) On the use of spectral filtering for privacy preserving data mining. In: Proceedings of the 21st annual ACM symposium on applied computing. Dijon, pp 622–626

  • Guo S, Wu X (2006b) Deriving private information from general linear transformation perturbed data. Technical report, The University of North Carolina at Charlotte, Charlotte

  • Guo S, Wu X, Li Y (2006a) Deriving private information from perturbed data using IQR based approach. In: Proceedings of the 22nd international conference on data engineering workshops. Atlanta, pp 92–101

  • Guo S, Wu X, Li Y (2006b) On the lower bound of reconstruction error for spectral filtering based privacy preserving data mining. Lecture notes in computer science, vol 4213. pp 520–527. doi:10.1007/11871637_51

  • Guo S (2007) Analysis of and techniques for privacy preserving data mining. Dissertation, University of North Carolina at Charlotte

  • Guo S, Wu X (2007) Deriving private information from arbitrarily projected data. Lecture notes in computer science, vol 4426. pp 84–95. doi:10.1007/978-3-540-71701-0_11

  • Guo S, Wu X, Li Y (2008) Determining error bounds for spectral filtering based reconstruction methods in privacy preserving data mining. Knowl Inf Syst 17(2):217–240. doi:10.1007/s10115-008-0123-9

    Article  MathSciNet  Google Scholar 

  • Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inform Syst 22(1):5–53. doi:10.1145/963770.963772

    Article  Google Scholar 

  • Huang Z, Du W, Chen B (2005) Deriving private information from randomized data. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data. Baltimore, pp 37–48

  • Huang Z, Du W (2008) OptRR: optimizing randomized response schemes for privacy-preserving data mining. In: Proceedings of the 2008 IEEE 24th international conference on data engineering. Cancun, pp 705–714

  • Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New York

    Book  Google Scholar 

  • Iyengar VS (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining. Edmonton, pp 279–288

  • Johnson WB, Lindenstrauss J (1984) Extension of Lipshitz mappings into Hilbert space. Contemp Math 26:189–206. doi:10.1090/conm/026/737400

    Article  MathSciNet  MATH  Google Scholar 

  • Kaplan E, Pedersen TB, Savas E, Saygin Y (2010) Discovering private trajectories using background information. Data Knowl Eng 69(7):723–736. doi:10.1016/j.datak.2010.02.008

    Article  Google Scholar 

  • Kargupta H, Datta S, Wang Q, Sivakumar K (2003a) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE international conference on data mining. Melbourne, pp 99–106

  • Kargupta H, Dutta H, Datta S, Sivakumar K (2003) Analysis of privacy preserving random perturbation techniques: further explorations. In: Proceedings of the 2003 ACM workshop on privacy in the electronic society. Washington, pp 31–38

  • Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inf Syst 7(4):387–414. doi:10.1007/s10115-004-0173-6

    Article  Google Scholar 

  • Kenthapadi K, Korolova A, Mironov I, Mishra N (2013) Privacy via the Johnson–Lindenstrauss transform. J Priv Confid 5(1):39–71

    Google Scholar 

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86. doi:10.1214/aoms/1177729694

    Article  MathSciNet  MATH  Google Scholar 

  • Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Tran Knowl Data Eng 18(1):92–106. doi:10.1109/TKDE.2006.14

    Article  Google Scholar 

  • Liu K, Giannella C, Kargupta H (2006b) An attacker’s view of distance preserving maps for privacy preserving data mining. Lecture notes in computer science, vol 4213. pp 297–308. doi:10.1007/11871637_30

  • Liu K (2007) Multiplicative data perturbation for privacy preserving data mining. Dissertation, University of Maryland, Baltimore County

  • Liu K, Giannella C, Kargupta H (2008a) A survey of attack techniques on privacy-preserving data perturbation methods. In: Aggarwal CC, Yu PS (eds) Privacy-preserving data mining: models and algorithms. Springer, New York, pp 359–381

    Chapter  Google Scholar 

  • Liu L, Wang J, Zhang J (2008b) Privacy vulnerabilities with background information in data perturbation. Technical report. Department of Computer Science, University of Kentucky

  • Mukherjee S, Banerjee S, Chen Z, Gangopadhyay A (2008) A privacy preserving technique for distance-based classification with worst case privacy guarantees. Data Knowl Eng 66(2):264–268. doi:10.1016/j.datak.2008.03.004

    Article  Google Scholar 

  • Muralidhar K, Parsa R, Sarathy R (1999) A general additive data perturbation method for database security. Manage Sci 45(10):1399–1415. doi:10.1287/mnsc.45.10.1399

    Article  Google Scholar 

  • Oliveira SRM, Zaïane OR (2002) Privacy preserving frequent itemset mining. In: Proceedings of the IEEE international conference on privacy. Security and data mining. Maebashi City, pp 43–54

  • Oliveira SRM, Zaïane OR (2003a) Protecting sensitive knowledge by data sanitization. In: Proceedings of the 3rd IEEE international conference on data mining. Melbourne, pp 613–616

  • Oliveira SRM, Zaïane OR (2003b) Privacy preserving clustering by data transformation. In: Proceedings of the 18th Brazilian symposium on databases. Manaus, pp 304–318

  • Polat H, Du W (2003) Privacy-preserving collaborative filtering using randomized perturbation techniques. In: Proceedings of the 3rd IEEE international conference on data mining. Melbourne, pp 625–628

  • Polat H, Du W (2005) SVD-based collaborative filtering with privacy. In: Proceedings of the 21st annual ACM symposium on applied computing. Dijon, pp 791–795

  • Polat H, Du W (2006) Achieving private recommendations using randomized response techniques. Lecture notes in computer science, vol 3918. pp 637–646. doi:10.1007/11731139_73

  • Rizvi SJ, Haritsa JR (2002) Maintaining data privacy in association rule mining. Proceedings of the 28th international conference on very large data bases. Hong Kong, pp 682–693

  • Sang Y, Shen H, Tian H (2009) Reconstructing data perturbed by random projections when the mixing matrix is known. Lecture notes in computer science, vol 5782. pp 334–349. doi:10.1007/978-3-642-04174-7_22

  • Sang Y, Shen H, Tian H (2012) Effective reconstruction of data perturbed by random projections. IEEE Trans Comput 61(1):101–117. doi:10.1109/TC.2011.83

    Article  MathSciNet  Google Scholar 

  • Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. SIGMOD Rec 30(4):45–54. doi:10.1145/604264.604271

    Article  Google Scholar 

  • Sramka M, Safavi-Naini R, Denzinger J (2009) An attack on the privacy of sanitized data that fuses the outputs of multiple data miners. In: Proceedings of the 9th IEEE international conference on data mining workshops. Miami, pp 130–137

  • Sramka M (2010) A privacy attack that removes the majority of the noise from perturbed data. In: Proceedings of the 2010 international joint conference on neural networks. Barcelona, pp 1–8

  • Sramka M, Safavi-Naini R, Denzinger J, Askari M (2010) A practice-oriented framework for measuring privacy and utility in data sanitization systems. In: Proceedings of the 12th international conference on extending database technology workshops. Lausanne

  • Sramka M (2012) Breaching privacy using data mining: removing noise from perturbed data. Stud Comput Intell 394:135–157. doi:10.1007/978-3-642-25237-2_9

  • Stewart GW, Sun J (1990) Matrix perturbation theory. Academic Press, Waltham

    MATH  Google Scholar 

  • Székely GJ, Rizzo ML (2004) Testing for equal distributions in high dimension. InterStat 5:1–6

    Google Scholar 

  • Turgay EO, Pedersen TB, Saygin Y, Savas E, Levi A (2008) Disclosure risks of distance preserving data transformations. Lecture notes in computer science, vol 5069. pp 79–94. doi:10.1007/978-3-540-69497-7_8

  • Warner SL (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60(309):63–69. doi:10.1080/01621459.1965.10480775

    Article  MATH  Google Scholar 

  • Zhang S, Ford J, Makedon F (2006) Deriving private information from randomly perturbed ratings. In: Proceedings of the 6th SIAM international conference on data mining. Bethesda, pp 59–69

  • Zhao J, Yang J, Zhang J (2014) Privacy properties of random projection perturbation when random matrix is leaking. J Comput Inf Syst 10(8):3465–3472

    MathSciNet  Google Scholar 

  • Zhu Z, Wang G, Du W (2009) Deriving private information from association rule mining results. In: Proceedings of the 25th international conference on data engineering. Shanghai, pp 18–29

Download references

Acknowledgments

This work is supported by Grant 113E262 from TUBITAK.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huseyin Polat.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Okkalioglu, B.D., Okkalioglu, M., Koc, M. et al. A survey: deriving private information from perturbed data. Artif Intell Rev 44, 547–569 (2015). https://doi.org/10.1007/s10462-015-9439-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-015-9439-5

Keywords

Navigation