Abstract
Establishing strategic partnership often requires organizations to publish and share meaningful data to support collaborative business activities. An equally important concern for them is to protect sensitive patterns like unique emerging sales opportunities embedded in their data. In this paper, we contribute to the area of data sanitization by introducing an optimization-based local recoding methodology to hide emerging patterns from a dataset but with the underlying frequent itemsets preserved as far as possible. We propose a novel heuristic solution that captures the unique properties of hiding EPs to carry out iterative local recoding generalization. Also, we propose a metric which measures (i) frequentitemset distortion that quantifies the quality of published data and (ii) the degree of reduction in emerging patterns, to guide a bottom-up recoding process. We have implemented our proposed solution and experimentally verified its effectiveness with a benchmark dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adam, N.R., Worthmann, J.C.: Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21(4), 515–556 (1989)
Agrawal, D., Aggarwal, C.: On the design and quantification of privacy preserving data mining algorithms. In: PODS (2001)
Bailey, J., Manoukian, T., Ramamohanarao, K.: Fast algorithms for mining emerging patterns. In: ECML/PKDD (2002)
Bayardo, J.R.: Efficiently mining long patterns from databases. In: SIGMOD (1998)
Bayardo, R., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE (2005)
Davenport, T.H., Harris, J.G.: Competing on Analytics: The New Science of Winning, 1st edn. Harvard Business School Press (2007)
Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: SIGKDD (1999)
Dong, G., Zhang, X., Wong, L.: CAEP: Classification by aggregating emerging patterns. In: Arikawa, S., Furukawa, K. (eds.) DS 1999. LNCS (LNAI), vol. 1721, p. 30. Springer, Heidelberg (1999)
Du, Y., Xia, T., Tao, Y., Zhang, D., Zhu, F.: On multidimensional k-anonymity with local recoding generalization. In: ICDE, pp. 1422–1424 (2007)
Evfimievski, A., Strikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: SIGKDD (2002)
Fan, H., Ramamohanarao, K.: A Bayesian approach to use emerging patterns for classification. In: ADC (2003)
Fung, B., Wang, K., Fu, A., Yu, P.: Privacy-Preserving Data Publishing: Concepts and Techniques. Chapman & Hall/CRC (2010)
Fung, B., Wang, K., Wang, L., Debbabi, M.: A framework for privacy-preserving cluster analysis. In: ISI (2008)
Fung, B., Wang, K., Yu, P.: Top-down specialization for information and privacy preservation. In: ICDE (2005)
Fung, B., Wang, K., Yu, P.: Anonymizing classification data for privacy preservation. TKDE 10(5), 711–725 (2007)
Ramamohanarao, H.F.K.: Pattern based classifiers. In: WWW (2007)
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: Random-data perturbation techniques and privacy-preserving data mining. KAIS 7(4), 387–414 (2005)
LeFevre, K., Dewitt, D., Ramakrishnan, R.: Incognito: Efficient full-domain k-anonymity. In: SIGMOD (2005)
LeFevre, K., Dewitt, D., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE (2006)
Li, J., Wong, R., Fu, A., Pei, J.: Anonymization by local recoding in data with attribute hierarchical taxonomies. TKDE 20(9), 1181–1194 (2008)
Li, T., Li, N.: On the tradeoff between privacy and utility in data publishing. In: SIGKDD (2009)
Machanavajjhala, A., Kifer, D., Gehrke, J., Vénkitasubramaniam, M.: L-diversity: Privacy beyond k-anonymity. TKDD 1(1), 3 (2007)
MAFIA. Mining Maximal Frequent Itemsets, http://himalaya-tools.sourceforge.net/Mafia/
Moustakides, G., Verykios, V.: A maxmin approach for hiding frequent itemsets. DKE 65(1), 75–79 (2008)
Oliveira, S., Zaiane, O.R.: Privacy preserving frequent itemset mining. In: ICDM Workshop on Privacy, Security and Data Mining, vol. 14, pp. 43–54 (2002)
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. IJUFKS 10(5), 571–588 (2002)
Sweeney, L.: k-anonymity: A model for protecting privacy. In: IJUFKS, pp. 557–570 (2002)
Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving anonymization of set-valued data. PVLDB 1(1), 115–125 (2008)
Tobji, M.A.B., Abrougui, A., Yaghlane, B.B.: Gufi: A new algorithm for general updating of frequent itemsets. In: CSEWORKSHOPS (2008)
UCI Machine Learning Repository. Adult Datas, http://archive.ics.uci.edu/ml/datasets
Wang, Z., Fan, H., Ramamohanarao, K.: Exploiting maximal emerging patterns for classification. In: AUS-AI (2004)
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.: Utility-based anonymization using local recoding. In: SIGKDD (2006)
Xu, Y., Wang, K., Fu, A.W.-C., Yu, P.S.: Anonymizing transaction databases for publication. In: SIGKDD (2008)
Zhang, X., Dong, G., Ramamohanarao, K.: Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets. In: SIGKDD (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cheng, M.W.K., Choi, B.K.K., Cheung, W.K.W. (2010). Hiding Emerging Patterns with Local Recoding Generalization. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2010. Lecture Notes in Computer Science(), vol 6118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13657-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-13657-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13656-6
Online ISBN: 978-3-642-13657-3
eBook Packages: Computer ScienceComputer Science (R0)