Abstract
K-anonymity (Samarati and Sweeny 1998; Samarati, IEEE Trans Knowl Data Eng, 13(6):1010–1027, 2001; Sweeny, Int J Uncertain, Fuzziness Knowl-Based Syst, 10(5):557–570, 2002) and its variants, l-diversity (Machanavajjhala et al., ACM TKDD, 2007) and tcloseness (Li et al. 2007) among others are anonymization techniques for relational data and transaction data, which are used to protect privacy against re-identification attacks. A relational dataset D is k-anonymous if every record in D has at least k-1 other records with identical quasi-identifier attribute values. The combination of released data with external data will never allow the recipient to associate each released record with less than k individuals (Samarati, IEEE Trans Knowl Data Eng, 13(6):1010–1027, 2001). However, the current concept of k-anonymity on transaction data treats all items as quasi-identifiers. The anonymized data set has k identical transactions in groups and suffers from lower data utility (He and Naughton 2009; He et al. 2011; Liu and Wang 2010; Terrovitis et al., VLDB J, 20(1):83–106, 2011; Terrovitis et al. 2008). To improve the utility of anonymized transaction data, this work proposes a novel anonymity concept on transaction data that contain both quasi-identifier items (QID) and sensitive items (SI). A transaction that contains sensitive items must have at least k-1 other identical transactions (Ghinita et al. IEEE TKDE, 33(2):161–174, 2011; Xu et al. 2008). For a transaction that does not contain a sensitive item, no anonymization is required. A transaction dataset that satisfies this property is said to be sensitive k-anonymous. Three algorithms, Sensitive Transaction Neighbors (STN) Gray Sort Clustering (GSC) and Nearest Neighbors for K-anonymization (K-NN), are developed. These algorithms use adding/deleting QID items and only adding SI to achieve sensitive k-anonymity on transaction data. Additionally, a simple “privacy value” is proposed to evaluate the degree of privacy for different types of k-anonymity on transaction data. Extensive numerical simulations were carried out to demonstrate the characteristics of the proposed algorithms and also compared to other types of k-anonymity approaches. The results show that each technique possesses its own advantage under different criteria such as running time, operation, and information loss. The results obtained here can be used as a guideline of the selection of anonymization technique on different data sets and for different applications.
Similar content being viewed by others
References
Aggarwal G, Feder T, Kenthapadi K, Khuller S, Panigrahy R, Thomas D, Zhu A (2006) Achieving anonymity via clustering. In: Proc. of the 25th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 153–162
Barbaro M, Jr TZ (2006) A face is exposed for AOL searcher no. 4417749 New York Times
Fung BCM, Wang K, Chen R, Yu PS. (2010) Privacy-preserving data publishing: a survey on recent developments. ACM Comput Surv 42(4)
Ghinita G, Tao Y, Kalnis P (2008) On the anonymization of sparse high-dimensional data. In: Proc. of ICDE, pp 715–724
Ghinita G, Kalnis P, Tao Y (2011) Anonymous publication of sensitive transactional data. In: IEEE TKDE, 33(2):161–174
He Y, Naughton JF (2009) Anonymization of set-valued data via top-down, local generalization. In: Proc. of PVLDB, pp 934–945
He Y, Barman S, Naughton JE (2011) Preventing equivalence attacks in updated, anonymized data. In: Proc. of ICDE
Hong TP, Lin CW, Yang KT, Wang SL (2013) Using TF-IDF to hide sensitive itemsets. Applied Intelligence, pp 502–510
IBM Quest Market-Basket Synthetic Data Generator, http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData
Islam MZ, Brankovic L (2011) Privacy preserving data mining: a noise addition framework using a novel clustering technique. Knowledge-based Systems, pp 1214–1223
LeFevre K, DeWitt D, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: Proc. of SIGMOD, p 25
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: Proc. of ICDE, pp 106–115
Liu JQ, Wang K (2010) Anonymizing transaction data by integrating suppression and generalization. In: Proc. of PAKDD,pp 171–180
Liu L, Zhu H, Huang Z (2011) Analysis of the minimal privacy disclosure for web services collaborations with role mechanisms. Expert Syst Appl 38(4):4540–4549
Loukides G, Shao J (2011) Preventing range disclosure in k-anonymised data. Expert Syst Appl 38(4):4559–4574
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM TKDD, article 3
Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity In: Proc. of PODS, pp 223–228
Mortazavi R, Jalili S, Gohargazi H (2013) Multivariate microaggregation by iterative optimization. Appl Intell, pp 529–544
Motwani R, Nabar SU (2008) Anonymizing unstructured data, arXiv: 0810.5582v2, [cs.DB]
Ni W, Chong Z (2012) Clustering-oriented privacy-preserving data publishing. Knowl-Based Syst, pp 264–270
Park H, Shim K (2007) Approximate algorithms for k-anonymity. In: Proc. of ACM SIGMOD, pp 67–78
Samarati P, Sweeny L (1998) Generalizing data to provide anonymity when disclosing information. In: Proc. of ACM symposium on principles of database systems, p 188
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13 (6): 1010–1027
Sweeny L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain, Fuzziness Knowl-Based Syst 10 (5): 557–570
Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain, Fuzziness Knowl-Based Syst 10(5):571–588
Terrovitis M, Mamoulis N, Kalnis P (2011) Local and global recoding methods for anonymizing set-valued data. VLDB J 20 (1): 83–106
Terrovitis M, Mamoulis N, Kalnis P (2008) Privacy-preserving anonymization of set-valued data. In: Proc. of PVLDB, pp 115–125
Wang SL, Tsai YC, Kao HY, Hong TP (2010) Anonymizing set-valued social data. In: Proc. of the IEEE International Symposium on Social Computing and Networking (SocialNet)
Wang SL, Tsai YC, Kao HY, Hong TP (2011) Extending suppression for anonymization on set-valued data. Int J Innov Comput, Inf Control 7(12):6849–6863
Wang SL, Tsai YC, Kuo HY, Hong TP (2011) K-anonymity on sensitive transaction items. In: Proc. of the IEEE International Conference on GrC, pp 723–727
Xu T, Wang K, Fu AWC, Yu PS (2008) Anonymizing transaction databases for publication. In: Proc. of SIGKDD, pp 767–775
Xu Y, Fung BCM, Wang K, Fu AWC, Pei J (2008) Publishing sensitive transactions for itemset utility. In: Proc. of ICDM,pp 1109–1114
Xue M, Karras P, Raissi C, Vaidya J, Tan K (2012) Anonymizing set-valued data by nonreciprocal recording. In: Proc. of SIGKDD, pp 1050–1058
Yang W, Qiao S (2010) A novel anonymization algorithm: privacy protection and knowledge preservation. Expert Syst Appl 37(1):756–766
Acknowledgments
This work was supported in part by the National Science Council, Taiwan, under grants NSC-100-2221-E-390-030, NSC-101-2221-E-390-028-MY3.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, SL., Tsai, YC., Kao, HY. et al. On anonymizing transactions with sensitive items. Appl Intell 41, 1043–1058 (2014). https://doi.org/10.1007/s10489-014-0554-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-014-0554-9