Abstract
Privacy preserving association rule mining has been an active research area since recently. To this problem, there have been two different approaches—perturbation based and secure multiparty computation based. One drawback of the perturbation based approach is that it cannot always fully preserve individual’s privacy while achieving precision of mining results. The secure multiparty computation based approach works only for distributed environment and needs sophisticated protocols, which constrains its practical usage. In this paper, we propose a new approach for preserving privacy in association rule mining. The main idea is to use keyed Bloom filters to represent transactions as well as data items. The proposed approach can fully preserve privacy while maintaining the precision of mining results. The tradeoff between mining precision and storage requirement is investigated. We also propose δ-folding technique to further reduce the storage requirement without sacrificing mining precision and running time.
Similar content being viewed by others
References
Agrawal, D., & Aggarwal, C. C. (2001). On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Santa Barbara, California (pp. 247–255).
Agrawal, R., Imilienski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Database (pp. 207–216). New York: ACM Press.
Agrawal, R., Kiernan, J., Srikant, R., & Xu, Y. (2004). Order preserving encryption for numeric data. In Proceedings of the ACM SIGMOD International Conference on Management of Database, Paris, France (pp. 563–574).
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB’94, Santiago, Chile (pp. 487–499).
Agrawal, R. & Srikant, R. (2000). Privacy-preserving data mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Dallas, Texas (pp. 439–450).
Atallah, M., Bertino, E., Elmagarmid, A. K., Ibrahim, M., & Verykios, V. S. (1999). Disclosure limitation of sensitive rules. Proceedings of the IEEE Knowledge and Data Engineering Exchange Workshop, Chicago, Illinois (pp. 45–52).
Bloom, B. (1970) Space time tradeoffs in hash coding with allowable errors. Communications of theACM, 13(7), 422–426.
Border, A. Z., & Mitzenmacher, M. (2002). Network applications of bloom filters: A survey. In Proceedings of the 40th Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, Illinois (pp. 636–646).
Chernoff, H. (1952). A measure of asymptotic efficiency for tests based on the sum of observations. Annals of Mathematical Statistics, 23, 493–509.
Cohen, S., & Matias, Y. (2003). Spectral bloom filters. In Proceedings of the ACM SIGMOD International Conference on Management of Database, San Diego, California (pp. 241–252).
Dasseni, E., Verykios, V. S., Elmagarmid, A. K., & Bertino, E. (2001). Hiding association rules by using confidence and support. In Proceedings of the 4th International Information Hiding Workshop, Pittsburg, Pennsylvania (pp. 369–383).
Du, W., & Atallah, M. J. (2001). Secure multi-party computation problems and their applications: A review and open problems. In Proceedings of New Security Paradigms Workshop 2001, Cloudcroft, New Mexico (pp. 11–20).
Du, W., & Zhan, Z. (2002). Building decision tree classifier on private data. In Proceedings of IEEE ICDM’02 Workshop on Privacy, Security, and Data Mining, volume 14, Maebashi City, Japan (pp. 1–8).
Evfimievski, A.,Srikant, R., Agrawal, R., & Gehrke, J. (2002). Privacy preserving mining of association rules. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada (pp. 217–228).
Evfimievski, A., Gehrke, J., & Srikant, R. (2003). Limiting privacy breaches in privacy preserving data mining. In Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database System, San Diego, California (pp. 211–222).
Fan, L., Cao, P., Almeida, J., & Border, A. Z. (2000). Summary cache: A scalable wide-area web cachesharing protocol. IEEE/ACM Transactions on Networking, 8(3), 281–293.
Hacigumus, H., Iyer, B., Li, C., & Mehrotra, S. (2002a). Executing SQL over encrypted data in the database-service-provider model. In Proceedings of the ACM SIGMOD International Conference on Management of Database, Madison, Wisconsin (pp. 216–227).
Hacigumus, H., Iyer, B., & Mehrotra, S. (2002b). Providing database as a service. In Proceedings of the International Conference on Data Engineering, San Jose, California (pp. 29–40).
Hacigumus, H., Iyer, B., & Mehrotra, S. (2004). Efficient execution of aggregation queries over encrypted relational databases. In Proceedings of International Conference on Database Systems for Advanced Applications, (pp. 125–136). Jeju Island, Korea.
Hoeffding, W. (1963). Probability for sums of bounded random variables. Journal of the American Statistical Association, 58, 13–30.
Iyer, B., Mehrotra, S., Mykletun, E., Tsudik, G., & Wu, Y. (2004). A framework for efficient storagesecurity in RDBMS. In Proceedings of International Conference on EDBT, Crete, Greece (pp. 147–164).
Kantarcıoǧlu, M., & Clifton, C. (2002). Privacy preserving distributed mining of association rules on horizontally partitioned data. In Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Madison, Wisconsin (pp. 24–31).
Kantarcıoǧlu, M., Jin, J., & Clifton, C. (2004). When do data mining results violate privacy? In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington (pp. 599–604).
Kargupta, H., Datta, S., Wang, Q., & Sivakumar, K. (2003). On the privacy preserving properties of random data perturbation techniques. In Proceedings of the 3rd International Conference on DataMining, Melbourne, Florida (pp. 99–106).
Li, Z., & Ross, K. A. (1995). PERF join: An alternative to semijoin and Bloom join. In Proceedings of the International Conference on Information and Knowledge Management, Baltimore, Maryland (pp. 137–144).
Lindell, Y., & Pinkas, B. (2002). Privacy preserving data mining. Journal of Cryptology, 15(3), 177–206.
Mullin, J. K. (1990). Optimal semijoins for distributed database systems. IEEE Transactions on Software Engineering, 16(5), 558–560.
Mykletun, E., Narasimha, M., & Tsudik, G. (2004). Authentication and integrity in outsourced databases. In Proceedings of the 11th ISOC Annual Network and Distributed System Security Symposium, San Diego, California.
Oliveira, S., & Zaiane, O. (2002). Privacy preserving frequent itemset mining. In Proceedings of the IEEE ICDM Workshop on Privacy, Security and Data Mining, Maebashi City, Japan (pp. 43–54).
Oliveira, S., & Zaiane, O. (2003a). Algorithms for balancing privacy and knowledge discovery in association rule mining. In Proceedings of the 7th International Database Engineering and Applications Symposium, Hongkong, China (pp. 54–63).
Oliveira, S., & Zaiane, O. (2003b). Protecting sensitive knowledge by data sanitization. In Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, Florida (pp. 211–218).
Pang, H., & Tan, K. L. (2004). Authenticating query results in edge computing. In Proceedings of the 20th International Conference on Data Engineering, Boston, Massachusetts (pp. 560–571).
Pinkas, B. (2002). Cryptographic techniques for privacy preserving data mining. ACM SIGKDDExplorations, 4(2), 12–19.
Rizvi, S., & Haritsa, J. (2002). Maintaining data privacy in association rule mining. In VLDB’02, Hongkong, China (pp. 682–693).
Saygin, Y., Verykios, V. S., & Clifton, C. (2001). Using unknowns to prevent discovery of association rules. Sigmod Record, 30(4), 45–54.
Vaidya, J., & Clifton, C. (2002). Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Alberta, Canada (pp. 639–644).
Yao, A. (1986). How to generate and exchange secrets. In Proceedings of the 27th IEEEFOCS, Ontario, Canada (pp. 162–167).
Zheng, Z., Kohavi, R., & Mason, L. (2001).Real world performance of association rule algorithms.In Proceedings of the 7th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California (pp. 401–406).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qiu, L., Li, Y. & Wu, X. Preserving privacy in association rule mining with bloom filters. J Intell Inf Syst 29, 253–278 (2007). https://doi.org/10.1007/s10844-006-0018-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-006-0018-8