Abstract
Privacy-preserving data mining has attracted the attention of a large number of researchers. Many data perturbation methods have been proposed to ensure individual privacy. Such methods seem to be successful in providing privacy and accuracy. On one hand, different methods are utilized to preserve privacy. On the other hand, various data reconstruction approaches have been proposed to derive private information from perturbed data. Thus, many researchers have been conducting various studies about data reconstruction methods and the resilience of data perturbation schemes. In this survey, we focus on data reconstruction methods due to their importance in privacy-preserving data mining. We provide a detailed review of the data reconstruction methods and the data perturbation schemes attacked by different data reconstruction techniques. We merge our review with the evaluation metrics and the data sets used in current attack techniques. Finally, we pose some open questions to provide a better understanding of these approaches and to guide future study.
Similar content being viewed by others
References
Aggarwal CC, Yu PS (2008) A survey of randomization methods for privacy preserving data mining. In: Aggarwal CC, Yu PS (eds) Privacy-preserving data mining: models and algorithms. Springer, New York, pp 137–156
Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. Santa Barbara, pp 247–255
Agrawal R, Ghosh SP, Imielinski T, Iyer BR, Swami A (1992) An interval classifier for database mining applications. In: Proceedings of the 18th international conference on very large databases. Vancouver, British Columbia, pp 560–573
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data. Dallas, pp 439–450
Agrawal S, Haritsa JR (2005) A framework for high-accuracy privacy-preserving mining. In: Proceedings of 21st international conference on data engineering. Los Alamitos, pp 193–204
Alaggan M, Gambs S, Kermarrec A-M (2012) BLIP: non-interactive differentially-private similarity computation on bloom filters. Lecture notes in computer science, vol 7596. pp 202–216. doi:10.1007/978-3-642-33536-5_20
Amiri A (2007) Dare to share: protecting sensitive knowledge with data sanitization. Decis Support Syst 43(1):181–191. doi:10.1016/j.dss.2006.08.007
Atallah M, Elmagarmid A, Ibrahim M, Bertino E, Verykios V (1999) Disclosure limitation of sensitive rules. In: Proceedings of the 1999 workshop on knowledge and data engineering exchange. Chicago, pp 45–52
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 22 Sept 2013
Balu R, Furon T, Gambs S (2014) Challenging differential privacy: The case of non-interactive mechanisms. Lecture notes in computer science, vol 8713. pp 146–164. doi:10.1007/978-3-319-11212-1_9
Calandrino JA, Kilzer A, Narayanan A, Felten EW, Shmatikov V (2011) You might also like: privacy risks of collaborative filtering. In: Proceedings of the 2011 IEEE symposium on security and privacy. Berkeley, pp 231–246
Canny J (2002) Collaborative filtering with privacy via factor analysis. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. Tampere, pp 238–245
Chen K, Liu L (2005) Privacy preserving data classification with rotation perturbation. In: Proceedings of the 5th IEEE international conference on data mining. Houston, pp 589–592
Chen K, Sun G, Liu L (2007) Towards attack-resilient geometric data perturbation. In: Proceedings of the 2007 SIAM international conference on data mining. Minneapolis, pp 78–89
Chen K, Liu L (2008) A survey of multiplicative perturbation for privacy preserving data mining. In: Aggarwal CC, Yu PS (eds) Privacy-preserving data mining: models and algorithms. Springer, New York, pp 157–181
Domingo-Ferrer J, Sebé F, Castellà-Roca J (2004) On the security of noise addition for privacy in statistical databases. Lecture notes in computer science, vol 3050. pp 149–161. doi:10.1007/978-3-540-25955-8_12
Du W, Zhan Z (2003) Using randomized response techniques for privacy-preserving data mining. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. Washington, pp 505–510
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining. Edmonton, pp 217–228
Evfimievski A, Gehrke J, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. San Diego, pp 211–222
Giannella CR, Liu K, Kargupta H (2013) Breaching Euclidean distance-preserving data perturbation using few known inputs. Data Knowl Eng 83:93–110. doi:10.1016/j.datak.2012.10.004
Guo L, Wu X (2009) Privacy preserving categorical data analysis with unknown distortion parameters. Trans Data Priv 2:185–205
Guo S, Wu X (2006a) On the use of spectral filtering for privacy preserving data mining. In: Proceedings of the 21st annual ACM symposium on applied computing. Dijon, pp 622–626
Guo S, Wu X (2006b) Deriving private information from general linear transformation perturbed data. Technical report, The University of North Carolina at Charlotte, Charlotte
Guo S, Wu X, Li Y (2006a) Deriving private information from perturbed data using IQR based approach. In: Proceedings of the 22nd international conference on data engineering workshops. Atlanta, pp 92–101
Guo S, Wu X, Li Y (2006b) On the lower bound of reconstruction error for spectral filtering based privacy preserving data mining. Lecture notes in computer science, vol 4213. pp 520–527. doi:10.1007/11871637_51
Guo S (2007) Analysis of and techniques for privacy preserving data mining. Dissertation, University of North Carolina at Charlotte
Guo S, Wu X (2007) Deriving private information from arbitrarily projected data. Lecture notes in computer science, vol 4426. pp 84–95. doi:10.1007/978-3-540-71701-0_11
Guo S, Wu X, Li Y (2008) Determining error bounds for spectral filtering based reconstruction methods in privacy preserving data mining. Knowl Inf Syst 17(2):217–240. doi:10.1007/s10115-008-0123-9
Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inform Syst 22(1):5–53. doi:10.1145/963770.963772
Huang Z, Du W, Chen B (2005) Deriving private information from randomized data. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data. Baltimore, pp 37–48
Huang Z, Du W (2008) OptRR: optimizing randomized response schemes for privacy-preserving data mining. In: Proceedings of the 2008 IEEE 24th international conference on data engineering. Cancun, pp 705–714
Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New York
Iyengar VS (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining. Edmonton, pp 279–288
Johnson WB, Lindenstrauss J (1984) Extension of Lipshitz mappings into Hilbert space. Contemp Math 26:189–206. doi:10.1090/conm/026/737400
Kaplan E, Pedersen TB, Savas E, Saygin Y (2010) Discovering private trajectories using background information. Data Knowl Eng 69(7):723–736. doi:10.1016/j.datak.2010.02.008
Kargupta H, Datta S, Wang Q, Sivakumar K (2003a) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE international conference on data mining. Melbourne, pp 99–106
Kargupta H, Dutta H, Datta S, Sivakumar K (2003) Analysis of privacy preserving random perturbation techniques: further explorations. In: Proceedings of the 2003 ACM workshop on privacy in the electronic society. Washington, pp 31–38
Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inf Syst 7(4):387–414. doi:10.1007/s10115-004-0173-6
Kenthapadi K, Korolova A, Mironov I, Mishra N (2013) Privacy via the Johnson–Lindenstrauss transform. J Priv Confid 5(1):39–71
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86. doi:10.1214/aoms/1177729694
Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Tran Knowl Data Eng 18(1):92–106. doi:10.1109/TKDE.2006.14
Liu K, Giannella C, Kargupta H (2006b) An attacker’s view of distance preserving maps for privacy preserving data mining. Lecture notes in computer science, vol 4213. pp 297–308. doi:10.1007/11871637_30
Liu K (2007) Multiplicative data perturbation for privacy preserving data mining. Dissertation, University of Maryland, Baltimore County
Liu K, Giannella C, Kargupta H (2008a) A survey of attack techniques on privacy-preserving data perturbation methods. In: Aggarwal CC, Yu PS (eds) Privacy-preserving data mining: models and algorithms. Springer, New York, pp 359–381
Liu L, Wang J, Zhang J (2008b) Privacy vulnerabilities with background information in data perturbation. Technical report. Department of Computer Science, University of Kentucky
Mukherjee S, Banerjee S, Chen Z, Gangopadhyay A (2008) A privacy preserving technique for distance-based classification with worst case privacy guarantees. Data Knowl Eng 66(2):264–268. doi:10.1016/j.datak.2008.03.004
Muralidhar K, Parsa R, Sarathy R (1999) A general additive data perturbation method for database security. Manage Sci 45(10):1399–1415. doi:10.1287/mnsc.45.10.1399
Oliveira SRM, Zaïane OR (2002) Privacy preserving frequent itemset mining. In: Proceedings of the IEEE international conference on privacy. Security and data mining. Maebashi City, pp 43–54
Oliveira SRM, Zaïane OR (2003a) Protecting sensitive knowledge by data sanitization. In: Proceedings of the 3rd IEEE international conference on data mining. Melbourne, pp 613–616
Oliveira SRM, Zaïane OR (2003b) Privacy preserving clustering by data transformation. In: Proceedings of the 18th Brazilian symposium on databases. Manaus, pp 304–318
Polat H, Du W (2003) Privacy-preserving collaborative filtering using randomized perturbation techniques. In: Proceedings of the 3rd IEEE international conference on data mining. Melbourne, pp 625–628
Polat H, Du W (2005) SVD-based collaborative filtering with privacy. In: Proceedings of the 21st annual ACM symposium on applied computing. Dijon, pp 791–795
Polat H, Du W (2006) Achieving private recommendations using randomized response techniques. Lecture notes in computer science, vol 3918. pp 637–646. doi:10.1007/11731139_73
Rizvi SJ, Haritsa JR (2002) Maintaining data privacy in association rule mining. Proceedings of the 28th international conference on very large data bases. Hong Kong, pp 682–693
Sang Y, Shen H, Tian H (2009) Reconstructing data perturbed by random projections when the mixing matrix is known. Lecture notes in computer science, vol 5782. pp 334–349. doi:10.1007/978-3-642-04174-7_22
Sang Y, Shen H, Tian H (2012) Effective reconstruction of data perturbed by random projections. IEEE Trans Comput 61(1):101–117. doi:10.1109/TC.2011.83
Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. SIGMOD Rec 30(4):45–54. doi:10.1145/604264.604271
Sramka M, Safavi-Naini R, Denzinger J (2009) An attack on the privacy of sanitized data that fuses the outputs of multiple data miners. In: Proceedings of the 9th IEEE international conference on data mining workshops. Miami, pp 130–137
Sramka M (2010) A privacy attack that removes the majority of the noise from perturbed data. In: Proceedings of the 2010 international joint conference on neural networks. Barcelona, pp 1–8
Sramka M, Safavi-Naini R, Denzinger J, Askari M (2010) A practice-oriented framework for measuring privacy and utility in data sanitization systems. In: Proceedings of the 12th international conference on extending database technology workshops. Lausanne
Sramka M (2012) Breaching privacy using data mining: removing noise from perturbed data. Stud Comput Intell 394:135–157. doi:10.1007/978-3-642-25237-2_9
Stewart GW, Sun J (1990) Matrix perturbation theory. Academic Press, Waltham
Székely GJ, Rizzo ML (2004) Testing for equal distributions in high dimension. InterStat 5:1–6
Turgay EO, Pedersen TB, Saygin Y, Savas E, Levi A (2008) Disclosure risks of distance preserving data transformations. Lecture notes in computer science, vol 5069. pp 79–94. doi:10.1007/978-3-540-69497-7_8
Warner SL (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60(309):63–69. doi:10.1080/01621459.1965.10480775
Zhang S, Ford J, Makedon F (2006) Deriving private information from randomly perturbed ratings. In: Proceedings of the 6th SIAM international conference on data mining. Bethesda, pp 59–69
Zhao J, Yang J, Zhang J (2014) Privacy properties of random projection perturbation when random matrix is leaking. J Comput Inf Syst 10(8):3465–3472
Zhu Z, Wang G, Du W (2009) Deriving private information from association rule mining results. In: Proceedings of the 25th international conference on data engineering. Shanghai, pp 18–29
Acknowledgments
This work is supported by Grant 113E262 from TUBITAK.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Okkalioglu, B.D., Okkalioglu, M., Koc, M. et al. A survey: deriving private information from perturbed data. Artif Intell Rev 44, 547–569 (2015). https://doi.org/10.1007/s10462-015-9439-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-015-9439-5