Skip to main content
Log in

Preserving privacy in association rule mining with bloom filters

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Privacy preserving association rule mining has been an active research area since recently. To this problem, there have been two different approaches—perturbation based and secure multiparty computation based. One drawback of the perturbation based approach is that it cannot always fully preserve individual’s privacy while achieving precision of mining results. The secure multiparty computation based approach works only for distributed environment and needs sophisticated protocols, which constrains its practical usage. In this paper, we propose a new approach for preserving privacy in association rule mining. The main idea is to use keyed Bloom filters to represent transactions as well as data items. The proposed approach can fully preserve privacy while maintaining the precision of mining results. The tradeoff between mining precision and storage requirement is investigated. We also propose δ-folding technique to further reduce the storage requirement without sacrificing mining precision and running time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal, D., & Aggarwal, C. C. (2001). On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Santa Barbara, California (pp. 247–255).

  • Agrawal, R., Imilienski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Database (pp. 207–216). New York: ACM Press.

    Google Scholar 

  • Agrawal, R., Kiernan, J., Srikant, R., & Xu, Y. (2004). Order preserving encryption for numeric data. In Proceedings of the ACM SIGMOD International Conference on Management of Database, Paris, France (pp. 563–574).

  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB’94, Santiago, Chile (pp. 487–499).

  • Agrawal, R. & Srikant, R. (2000). Privacy-preserving data mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Dallas, Texas (pp. 439–450).

  • Atallah, M., Bertino, E., Elmagarmid, A. K., Ibrahim, M., & Verykios, V. S. (1999). Disclosure limitation of sensitive rules. Proceedings of the IEEE Knowledge and Data Engineering Exchange Workshop, Chicago, Illinois (pp. 45–52).

  • Bloom, B. (1970) Space time tradeoffs in hash coding with allowable errors. Communications of theACM, 13(7), 422–426.

    Article  MATH  Google Scholar 

  • Border, A. Z., & Mitzenmacher, M. (2002). Network applications of bloom filters: A survey. In Proceedings of the 40th Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, Illinois (pp. 636–646).

  • Chernoff, H. (1952). A measure of asymptotic efficiency for tests based on the sum of observations. Annals of Mathematical Statistics, 23, 493–509.

    MathSciNet  Google Scholar 

  • Cohen, S., & Matias, Y. (2003). Spectral bloom filters. In Proceedings of the ACM SIGMOD International Conference on Management of Database, San Diego, California (pp. 241–252).

  • Dasseni, E., Verykios, V. S., Elmagarmid, A. K., & Bertino, E. (2001). Hiding association rules by using confidence and support. In Proceedings of the 4th International Information Hiding Workshop, Pittsburg, Pennsylvania (pp. 369–383).

  • Du, W., & Atallah, M. J. (2001). Secure multi-party computation problems and their applications: A review and open problems. In Proceedings of New Security Paradigms Workshop 2001, Cloudcroft, New Mexico (pp. 11–20).

  • Du, W., & Zhan, Z. (2002). Building decision tree classifier on private data. In Proceedings of IEEE ICDM’02 Workshop on Privacy, Security, and Data Mining, volume 14, Maebashi City, Japan (pp. 1–8).

  • Evfimievski, A.,Srikant, R., Agrawal, R., & Gehrke, J. (2002). Privacy preserving mining of association rules. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada (pp. 217–228).

  • Evfimievski, A., Gehrke, J., & Srikant, R. (2003). Limiting privacy breaches in privacy preserving data mining. In Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database System, San Diego, California (pp. 211–222).

  • Fan, L., Cao, P., Almeida, J., & Border, A. Z. (2000). Summary cache: A scalable wide-area web cachesharing protocol. IEEE/ACM Transactions on Networking, 8(3), 281–293.

    Article  Google Scholar 

  • Hacigumus, H., Iyer, B., Li, C., & Mehrotra, S. (2002a). Executing SQL over encrypted data in the database-service-provider model. In Proceedings of the ACM SIGMOD International Conference on Management of Database, Madison, Wisconsin (pp. 216–227).

  • Hacigumus, H., Iyer, B., & Mehrotra, S. (2002b). Providing database as a service. In Proceedings of the International Conference on Data Engineering, San Jose, California (pp. 29–40).

  • Hacigumus, H., Iyer, B., & Mehrotra, S. (2004). Efficient execution of aggregation queries over encrypted relational databases. In Proceedings of International Conference on Database Systems for Advanced Applications, (pp. 125–136). Jeju Island, Korea.

  • Hoeffding, W. (1963). Probability for sums of bounded random variables. Journal of the American Statistical Association, 58, 13–30.

    Article  MATH  MathSciNet  Google Scholar 

  • Iyer, B., Mehrotra, S., Mykletun, E., Tsudik, G., & Wu, Y. (2004). A framework for efficient storagesecurity in RDBMS. In Proceedings of International Conference on EDBT, Crete, Greece (pp. 147–164).

  • Kantarcıoǧlu, M., & Clifton, C. (2002). Privacy preserving distributed mining of association rules on horizontally partitioned data. In Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Madison, Wisconsin (pp. 24–31).

  • Kantarcıoǧlu, M., Jin, J., & Clifton, C. (2004). When do data mining results violate privacy? In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington (pp. 599–604).

  • Kargupta, H., Datta, S., Wang, Q., & Sivakumar, K. (2003). On the privacy preserving properties of random data perturbation techniques. In Proceedings of the 3rd International Conference on DataMining, Melbourne, Florida (pp. 99–106).

  • Li, Z., & Ross, K. A. (1995). PERF join: An alternative to semijoin and Bloom join. In Proceedings of the International Conference on Information and Knowledge Management, Baltimore, Maryland (pp. 137–144).

  • Lindell, Y., & Pinkas, B. (2002). Privacy preserving data mining. Journal of Cryptology, 15(3), 177–206.

    Article  MATH  MathSciNet  Google Scholar 

  • Mullin, J. K. (1990). Optimal semijoins for distributed database systems. IEEE Transactions on Software Engineering, 16(5), 558–560.

    Article  Google Scholar 

  • Mykletun, E., Narasimha, M., & Tsudik, G. (2004). Authentication and integrity in outsourced databases. In Proceedings of the 11th ISOC Annual Network and Distributed System Security Symposium, San Diego, California.

  • Oliveira, S., & Zaiane, O. (2002). Privacy preserving frequent itemset mining. In Proceedings of the IEEE ICDM Workshop on Privacy, Security and Data Mining, Maebashi City, Japan (pp. 43–54).

  • Oliveira, S., & Zaiane, O. (2003a). Algorithms for balancing privacy and knowledge discovery in association rule mining. In Proceedings of the 7th International Database Engineering and Applications Symposium, Hongkong, China (pp. 54–63).

  • Oliveira, S., & Zaiane, O. (2003b). Protecting sensitive knowledge by data sanitization. In Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, Florida (pp. 211–218).

  • Pang, H., & Tan, K. L. (2004). Authenticating query results in edge computing. In Proceedings of the 20th International Conference on Data Engineering, Boston, Massachusetts (pp. 560–571).

  • Pinkas, B. (2002). Cryptographic techniques for privacy preserving data mining. ACM SIGKDDExplorations, 4(2), 12–19.

    Google Scholar 

  • Rizvi, S., & Haritsa, J. (2002). Maintaining data privacy in association rule mining. In VLDB’02, Hongkong, China (pp. 682–693).

  • Saygin, Y., Verykios, V. S., & Clifton, C. (2001). Using unknowns to prevent discovery of association rules. Sigmod Record, 30(4), 45–54.

    Article  Google Scholar 

  • Vaidya, J., & Clifton, C. (2002). Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Alberta, Canada (pp. 639–644).

  • Yao, A. (1986). How to generate and exchange secrets. In Proceedings of the 27th IEEEFOCS, Ontario, Canada (pp. 162–167).

  • Zheng, Z., Kohavi, R., & Mason, L. (2001).Real world performance of association rule algorithms.In Proceedings of the 7th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California (pp. 401–406).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ling Qiu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qiu, L., Li, Y. & Wu, X. Preserving privacy in association rule mining with bloom filters. J Intell Inf Syst 29, 253–278 (2007). https://doi.org/10.1007/s10844-006-0018-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-006-0018-8

Keywords

Navigation