Abstract
Data mining is gaining societal momentum due to the ever increasing availability of large amounts of human data, easily collected by a variety of sensing technologies. We are therefore faced with unprecedented opportunities and risks: a deeper understanding of human behavior and how our society works is darkened by a greater chance of privacy intrusion and unfair discrimination based on the extracted patterns and profiles. Consider the case when a set of patterns extracted from the personal data of a population of individual persons is released for a subsequent use into a decision making process, such as, e.g., granting or denying credit. First, the set of patterns may reveal sensitive information about individual persons in the training population and, second, decision rules based on such patterns may lead to unfair discrimination, depending on what is represented in the training cases. Although methods independently addressing privacy or discrimination in data mining have been proposed in the literature, in this context we argue that privacy and discrimination risks should be tackled together, and we present a methodology for doing so while publishing frequent pattern mining results. We describe a set of pattern sanitization methods, one for each discrimination measure used in the legal literature, to achieve a fair publishing of frequent patterns in combination with two possible privacy transformations: one based on \(k\)-anonymity and one based on differential privacy. Our proposed pattern sanitization methods based on \(k\)-anonymity yield both privacy- and discrimination-protected patterns, while introducing reasonable (controlled) pattern distortion. Moreover, they obtain a better trade-off between protection and data quality than the sanitization methods based on differential privacy. Finally, the effectiveness of our proposals is assessed by extensive experiments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Article 20, General Data Protection Regulation, unofficial consolidated version provided by the Rapporteur, 22 October 2013. http://www.janalbrecht.eu/fileadmin/material/Dokumente/DPR-Regulation-inofficial-consolidated-LIBE.
Discrimination on the basis of an attribute value happens if a person with that attribute value is treated less favorably than a person with another value.
Discrimination occurs when a higher proportion of people not in the group is able to comply.
\(\alpha \) states an acceptable level of discrimination according to laws and regulations. For example, the U.S. Equal Pay Act United States Congress (1963) states that “a selection rate for any race, sex, or ethnic group which is less than four-fifths of the rate for the group with the highest rate will generally be regarded as evidence of adverse impact”. This amounts to using \(clift\) with \(\alpha =1.25\).
References
Aggarwal CC, Yu PS (2008) Privacy preserving data mining: models and algorithms. Springer, Berlin
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases. VLDB pp 487–499
Agrawal R, Srikant R (2000) Privacy preserving data mining. In: SIGMOD 2000. ACM Press, New York, pp 439–450
Atzori M, Bonchi F, Giannotti F, Pedreschi D (2008) Anonymity preserving pattern discovery. VLDB J 17(4):703–727
Australian Legislation (2014) (a) Victorian Current Acts - Equal Opportunity Act - 2010 (amended Sept. 17, 2014); (b) Queensland - Anti-Discrimination Act 1991 (current as at July 1, 2014)
Berendt B, Preibusch S (2014) Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence. Artif Intell Law 22(2):175–209
Bhaskar R, Laxman S, Smith A, Thakurta A (2010) Discovering frequent patterns in sensitive data. In KDD 2010. ACM Press, New York, pp 503–512
Bonomi L (2013) Mining frequent patterns with differential privacy. PVLDB 6(12):1422–1427
Calders T, Goethals B (2007) Non-derivable itemset mining. DMKD 14(1):171–206
Calders T, Verwer S (2010) Three naive Bayes approaches for discrimination-free classification. Data Min. Knowl. Discov. 21(2):277–292
Custers B, Calders T, Schermer B, Zarsky TZ (eds) Discrimination and privacy in the information society—data mining and profiling in large databases. Studies in Applied Philosophy, Epistemology and Rational Ethics 3. Springer, Berlin (2013)
Dalenius T (1974) The invasion of privacy problem and statistics production—an overview. Statistik Tidskrift 12:213–225
Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212
Dwork C (2006) Differential privacy. In: ICALP 2006 LNCS 4052. Springer, Berlin, pp 112
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel RS (2012) Fairness through awareness. In: ITCS 2012. ACM Press, New York, pp 214–226
European Union Legislation (1995) Directive 95/46/EC
European Union Legislation (2014) (a) Racial Equality Directive, 2000/43/EC; (b) Employment Equality Directive, 2000/78/EC; (c) European Parliament legislative resolution on equal treatment between persons irrespective of religion or belief, disability, age or sexual orientation (A6-0149/2009)
Frank A, Asuncion A (2010) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine http://archive.ics.uci.edu/ml/datasets
Friedman A, Wolff R, Schuster A (2008) Providing \(k\)-anonymity in data mining. VLDB J 17(4):789–804
Friedman A, Schuster A (2010) Data mining with differential privacy. In: KDD 2010. ACM, New York, pp 493–502
Fung BCM, Wang K, Fu AW-C, Yu PS (2010) Introduction to privacy-preserving data publishing: concepts and techniques. Chapman & Hall/CRC, Boca Raton
Gehrke J, Hay M, Lui E, Pass R (2012) Crowd-blending privacy. In: CRYPTO pp 479–496
Greenwood PE, Nikulin MS (1996) A guide to chi-squared testing. Wiley, New York
Hajian S, Domingo-Ferrer J, Martínez-Ballesté A (2011) Rule protection for indirect discrimination prevention in data mining. In: MDAI 2011 Lectuer Notes in Computer Science vol 6820. Springer, Berlin, pp 211–222
Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459
Hajian S, Monreale A, Pedreschi D, Domingo-Ferrer J, Giannotti F (2012) Injecting discrimination and privacy awareness into pattern discovery. In: 2012 IEEE 12th International Conference on Data Mining Workshops. IEEE Computer Society, pp 360–369
Hajian S, Domingo-Ferrer J (2012) A study on the impact of data anonymization on anti-discrimination. In: 2012 IEEE 12th International Conference on Data Mining Workshops. IEEE Computer Society, pp 352–359
Hajian S, Domingo-Ferrer J, Farràs O (2014) Generalization-based privacy preservation and discrimination prevention in data publishing and mining. Data Min Knowl Discov 28(5–6):1158–1188
Hay M, Rastogi V, Miklau G, Suciu D (2010) Boosting the accuracy of differentially private histograms through consistency. Proc VLDB 3(1):1021–1032
Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Schulte-Nordholt E, Spicer K, de Wolf P-P (2012) Statistical disclosure control. Wiley, New York
Kamiran F, Calders T (2011) Data preprocessing techniques for classification without discrimination. Knowl Inf Syst 33(1):1–33
Kamiran F, Calders T, Pechenizkiy M (2010) Discrimination aware decision tree learning. In: Proceedings of IEEE International Conference on Data Mining, pp 869–874
Kamiran F, Karim A, Zhang X (2010) Decision theory for discrimination-aware classification. In: ICDM IEEE, pp 924–929
Kamiran F, Zliobaite I, Calders T (2013) Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl Inf Syst 35(3):613–644
Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: ECML/PKDD. Lecture Notes in Computer Science vol 7524. Springer, Berlin pp 35–50
Kantarcioglu M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: KDD. ACM Press, New York, pp 599–604
Lee J, Clifton C (2012) Differential identifiability. In: KDD 2012. ACM Press, New York, pp 1041–1049
Li N, Qardaji WH, Su D, Cao J (2012) PrivBasis: frequent itemset mining with differential privacy. Proc VLDB 5(11):1340–1351
Li N, Li T, Venkatasubramanian S (2007) \(t\)-Closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: IEEE 23rd International Conference on Data Engineering (ICDE) pp 106–115
Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM), pp 369–376
Loung BL, Ruggieri S, Turini F (2011) k-NN as an implementation of situation testing for discrimination discovery and prevention. In: ACM international conference on knowledge discovery and data mining (KDD 2011). ACM Press, New York, pp 502–510
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) \(l\)-Diversity: privacy beyond \(k\)-anonymity. ACM Trans Knowl Discov Data (TKDD) 1(1), Article 3
McSherry F, Talwar K (2007) Mechanism design via differential privacy. In: Proceedings of the 48th IEEE Symposium on Foundations of Computer Science (FOCS), pp 94–103
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th International Conference on Database Theory
Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD). ACM Press, New York, pp 560–568
Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of the SIAM International Conference on Data Mining (SDM). SIAM, pp 581–592
Pedreschi D, Ruggieri S, Turini F (2009) Integrating induction and deduction for finding evidence of discrimination. In: 12th ACM International Conference on Artificial Intelligence and Law (ICAIL). ACM Press, New York, pp 157–166
Pedreschi D, Ruggieri S, Turini F (2013) The discovery of discrimination. In: Custers BHM, Calders T, Schermer BW, Zarsky TZ (eds) Discrimination and privacy in the information society, volume 3 of studies in applied philosophy. Epistemology and Rational Ethics. Springer, Berlin, p 4357
Ruggieri S, Pedreschi D, Turini F (2010) Data mining for discrimination discovery. ACM Trans Knowl Discov Data (TKDD) 4(2), Article 9
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Soria-Comas J, Domingo-Ferrer J (2012) Sensitivity-independent differential privacy via prior knowledge refinement. Int J Uncertain Fuzziness Knowl Based Syst 20(6):855–876
Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
United States Congress, US Equal Pay Act (1963) http://archive.eeoc.gov/epa/anniversary/epa-40.html
Zemel RS, Wu Y, Swersky K, Pitassi T, Dwork C (2013) Learning fair representations. ICML 3:325–333
Zeng C, Naughton JF, Cai J-Y (2012) On differentially private frequent itemset mining. PVLDB 6(1):25–36
Zliobaite I, Kamiran F, Calders T (2011) Handling conditional discrimination. In: Proceedings of 13th IEEE International Conference on Data Mining (ICDM) pp 992–1001
Acknowledgments
The following funding sources are gratefully acknowledged: Government of Catalonia (ICREA Acadèmia Prize to the second author and Grant 2014 SGR 537), Spanish Government (Project TIN2011-27076-C03-01 “CO-PRIVACY”), European Commission (Projects FP7 “DwB”, FP7-SMARTCITIES n. 609042 “PETRA”, FP7 “Inter-Trust” and H2020 “CLARUS”) and Templeton World Charity Foundation (Grant TWCF0095/AB60 “CO-UTILITY”). The authors are with the UNESCO Chair in Data Privacy. The views in this paper are the authors’ own and do not necessarily reflect the views of UNESCO or the Templeton World Charity Foundation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Bart Goethals.
Rights and permissions
About this article
Cite this article
Hajian, S., Domingo-Ferrer, J., Monreale, A. et al. Discrimination- and privacy-aware patterns. Data Min Knowl Disc 29, 1733–1782 (2015). https://doi.org/10.1007/s10618-014-0393-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-014-0393-7