Skip to main content
Log in

Privacy leakage in multi-relational databases: a semi-supervised learning perspective

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

In multi-relational databases, a view, which is a context- and content-dependent subset of one or more tables (or other views), is often used to preserve privacy by hiding sensitive information. However, recent developments in data mining present a new challenge for database security even when traditional database security techniques, such as database access control, are employed. This paper presents a data mining framework using semi-supervised learning that demonstrates the potential for privacy leakage in multi-relational databases. Many different types of semi-supervised learning techniques, such as the K-nearest neighbor (KNN) method, can be used to demonstrate privacy leakage. However, we also introduce a new approach to semi-supervised learning, hyperclique pattern-based semi-supervised learning (HPSL), which differs from traditional semi-supervised learning approaches in that it considers the similarity among groups of objects instead of only pairs of objects. Our experimental results show that both the KNN and HPSL methods have the ability to compromise database security, although the HPSL is better at this privacy violation (has higher prediction accuracy) than the KNN method. Finally, we provide a principle for avoiding privacy leakage in multi-relational databases via semi-supervised learning and illustrate this principle with a simple preventive technique whose effectiveness is demonstrated by experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS) (2001)

  2. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data (1993)

  3. Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD Conference on Management of Data (2000)

  4. Bayardo, R.J., Srikant, R.: Technological solutions for protecting privacy. In: IEEE Computer (2003)

  5. Bertino, E., Ooi, B.C., Yang, Y., Deng, R.H.: Privacy and ownership preserving of outsourced medical data. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), pp. 521–532 (2005)

  6. Carminati, B., Ferrari, E., Bertino, E.: Assuring security properties in third-party architectures. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), pp. 547–548 (2005)

  7. Castelli V., Cover T.M.(1996): The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans. Inf. Theory 42(6): 2102–2117

    Article  MATH  MathSciNet  Google Scholar 

  8. Codd E.(1970): A relational model for large shared data banks. Comm. ACM 13(6): 377–387

    Article  MATH  Google Scholar 

  9. Denning, D., Akl, S., Morgenstern, M., Neumanna, P.: Views for multilevel database security. In: IEEE Symposium on Security and Privacy (1986)

  10. Denning, D., Lunt, T., Schell, R., Heckman, M., Shockley, W.: Views for multilevel database security. In: IEEE Symposium on Security and Privacy (1986)

  11. Domingos, P.: Prospects and challenges for multi-relational data mining. SIGKDD explorations (2003)

  12. Du, W., Han, Y.S., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Proceedings of the 4th SIAM International Conference on Data Mining (2004)

  13. Duin, R.: Classifiers in almost empty spaces. In: Proceedings of 15th International Conference on Pattern Recognition (2000)

  14. Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (2002)

  15. Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)

  16. Faloutsos, C., Jagadish, H.V., Sidiropoulos, N.: Recovering information from summary data. In: Proceedings of 23rd International Conference on Very Large Data Bases (VLDB), pp. 36–45 (1997)

  17. Ferrari, E., Thuraisingham, B.M.: Security and privacy for web databases and services. In: Proceedings of the 9th International Conference on Extending Database Technology (EDBT), pp. 17–28 (2004)

  18. Ghahramani, Z., Jordan, M.I.: Supervised learning from incomplete data via an EM approach. In: NIPS, pp. 120–127 (1993)

  19. Han, E.-H., Boley, D., Gini, M., Gross, R., Hastings, K., , G., Kumar, V., Mobasher, B., Moore, J.: Webace: a web agent for document categorization and exploration. In: Proceedings of the 2nd International Conference on Autonomous Agents (1998)

  20. Huang, Y., Xiong, H., Wu, W., Zhang, Z.: A hybrid approach for mining maixmal hyperclique patterns. In: ICTAI, pp. 354–361 (2004)

  21. Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: Proceedings of the ACM SIGMOD Conference, pp. 37–48 (2005)

  22. Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 387–394 (2003)

  23. Karypis, G.: Cluto: Software for clustering high dimensional datasets. /www.cs.umn.edu/~karypis

  24. Lewis, D.: Reuters-21578 text categorization text collection 1.0. In: http://www.research.att.com/~lewis

  25. Nigam K., McCallum A., Thrun S., Mitchell T.M.(2000): Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2/3): 103–134

    Article  MATH  Google Scholar 

  26. Porter, M.F.: An algorithm for suffix stripping. In: Program, 14(3), (1980)

  27. Raudys S., Jain A. (1991): Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Trans. Pattern Anal. Mach. Intell. 13(3): 252–264

    Article  Google Scholar 

  28. Seeger, M.: Learning with labeled and unlabeled data. In: Technical Report, University of Edinburgh (2001)

  29. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000)

  30. Steinbach, M., Tan, P.N., Xiong, H., Kumar, V.: Generalizing the notion of support. In: Proceedings of the 2004 ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining pp. 689–694. ACM Press (2004)

  31. TREC.: In: http://trec.nist.gov.

  32. Xiong, H., Steinbach, M., Kumar, V.: Privacy leakage in databases via pattern based semi-supervised learning. In: Proceedings of the ACM Conference on information and Knowledge Management (CIKM) (2005)

  33. Xiong, H., Tan, P., Kumar, V.: Mining strong affinity association patterns in data sets with skewed support distribution. In: Proceedings of the third IEEE International Conference on Data Mining (ICDM), pp. 387–394 (2003)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Xiong.

Additional information

A preliminary version of this work has been published as a two-page short paper in ACM CIKM 2005 (Proceedings of the ACM conference on information and knowledge management (CIKM) 2005).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiong, H., Steinbach, M. & Kumar, V. Privacy leakage in multi-relational databases: a semi-supervised learning perspective. The VLDB Journal 15, 388–402 (2006). https://doi.org/10.1007/s00778-006-0011-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-006-0011-4

Keywords

Navigation