Privacy leakage in multi-relational databases: a semi-supervised learning perspective

Xiong, Hui; Steinbach, Michael; Kumar, Vipin

doi:10.1007/s00778-006-0011-4

Privacy leakage in multi-relational databases: a semi-supervised learning perspective

Special Issue Paper
Published: 01 August 2006

Volume 15, pages 388–402, (2006)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Hui Xiong¹,
Michael Steinbach² &
Vipin Kumar²

108 Accesses
2 Citations
Explore all metrics

Abstract

In multi-relational databases, a view, which is a context- and content-dependent subset of one or more tables (or other views), is often used to preserve privacy by hiding sensitive information. However, recent developments in data mining present a new challenge for database security even when traditional database security techniques, such as database access control, are employed. This paper presents a data mining framework using semi-supervised learning that demonstrates the potential for privacy leakage in multi-relational databases. Many different types of semi-supervised learning techniques, such as the K-nearest neighbor (KNN) method, can be used to demonstrate privacy leakage. However, we also introduce a new approach to semi-supervised learning, hyperclique pattern-based semi-supervised learning (HPSL), which differs from traditional semi-supervised learning approaches in that it considers the similarity among groups of objects instead of only pairs of objects. Our experimental results show that both the KNN and HPSL methods have the ability to compromise database security, although the HPSL is better at this privacy violation (has higher prediction accuracy) than the KNN method. Finally, we provide a principle for avoiding privacy leakage in multi-relational databases via semi-supervised learning and illustrate this principle with a simple preventive technique whose effectiveness is demonstrated by experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Itemset-Based Variable Construction in Multi-relational Supervised Learning

WordificationMI: multi-relational data mining through multiple-instance propositionalization

Article 13 May 2019

Luis A. Quintero-Domínguez, Carlos Morell & Sebastián Ventura

Mining Interesting Patterns in Multi-relational Data with N-ary Relationships

References

Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS) (2001)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data (1993)
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD Conference on Management of Data (2000)
Bayardo, R.J., Srikant, R.: Technological solutions for protecting privacy. In: IEEE Computer (2003)
Bertino, E., Ooi, B.C., Yang, Y., Deng, R.H.: Privacy and ownership preserving of outsourced medical data. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), pp. 521–532 (2005)
Carminati, B., Ferrari, E., Bertino, E.: Assuring security properties in third-party architectures. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), pp. 547–548 (2005)
Castelli V., Cover T.M.(1996): The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans. Inf. Theory 42(6): 2102–2117
Article MATH MathSciNet Google Scholar
Codd E.(1970): A relational model for large shared data banks. Comm. ACM 13(6): 377–387
Article MATH Google Scholar
Denning, D., Akl, S., Morgenstern, M., Neumanna, P.: Views for multilevel database security. In: IEEE Symposium on Security and Privacy (1986)
Denning, D., Lunt, T., Schell, R., Heckman, M., Shockley, W.: Views for multilevel database security. In: IEEE Symposium on Security and Privacy (1986)
Domingos, P.: Prospects and challenges for multi-relational data mining. SIGKDD explorations (2003)
Du, W., Han, Y.S., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Proceedings of the 4th SIAM International Conference on Data Mining (2004)
Duin, R.: Classifiers in almost empty spaces. In: Proceedings of 15th International Conference on Pattern Recognition (2000)
Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (2002)
Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Faloutsos, C., Jagadish, H.V., Sidiropoulos, N.: Recovering information from summary data. In: Proceedings of 23rd International Conference on Very Large Data Bases (VLDB), pp. 36–45 (1997)
Ferrari, E., Thuraisingham, B.M.: Security and privacy for web databases and services. In: Proceedings of the 9th International Conference on Extending Database Technology (EDBT), pp. 17–28 (2004)
Ghahramani, Z., Jordan, M.I.: Supervised learning from incomplete data via an EM approach. In: NIPS, pp. 120–127 (1993)
Han, E.-H., Boley, D., Gini, M., Gross, R., Hastings, K., , G., Kumar, V., Mobasher, B., Moore, J.: Webace: a web agent for document categorization and exploration. In: Proceedings of the 2nd International Conference on Autonomous Agents (1998)
Huang, Y., Xiong, H., Wu, W., Zhang, Z.: A hybrid approach for mining maixmal hyperclique patterns. In: ICTAI, pp. 354–361 (2004)
Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: Proceedings of the ACM SIGMOD Conference, pp. 37–48 (2005)
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 387–394 (2003)
Karypis, G.: Cluto: Software for clustering high dimensional datasets. /www.cs.umn.edu/~karypis
Lewis, D.: Reuters-21578 text categorization text collection 1.0. In: http://www.research.att.com/~lewis
Nigam K., McCallum A., Thrun S., Mitchell T.M.(2000): Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2/3): 103–134
Article MATH Google Scholar
Porter, M.F.: An algorithm for suffix stripping. In: Program, 14(3), (1980)
Raudys S., Jain A. (1991): Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Trans. Pattern Anal. Mach. Intell. 13(3): 252–264
Article Google Scholar
Seeger, M.: Learning with labeled and unlabeled data. In: Technical Report, University of Edinburgh (2001)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000)
Steinbach, M., Tan, P.N., Xiong, H., Kumar, V.: Generalizing the notion of support. In: Proceedings of the 2004 ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining pp. 689–694. ACM Press (2004)
TREC.: In: http://trec.nist.gov.
Xiong, H., Steinbach, M., Kumar, V.: Privacy leakage in databases via pattern based semi-supervised learning. In: Proceedings of the ACM Conference on information and Knowledge Management (CIKM) (2005)
Xiong, H., Tan, P., Kumar, V.: Mining strong affinity association patterns in data sets with skewed support distribution. In: Proceedings of the third IEEE International Conference on Data Mining (ICDM), pp. 387–394 (2003)

Download references

Author information

Authors and Affiliations

MSIS Department, Rutgers University, 180 University Avenue, Newark, NJ, 07102, USA
Hui Xiong
Department of Computer Science and Engineering, University of Minnesota, 4-192 EE/CS Building, 200 Union Street SE, Minneapolis, MN, 55455, USA
Michael Steinbach & Vipin Kumar

Authors

Hui Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Michael Steinbach
View author publications
You can also search for this author in PubMed Google Scholar
Vipin Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Xiong.

Additional information

A preliminary version of this work has been published as a two-page short paper in ACM CIKM 2005 (Proceedings of the ACM conference on information and knowledge management (CIKM) 2005).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiong, H., Steinbach, M. & Kumar, V. Privacy leakage in multi-relational databases: a semi-supervised learning perspective. The VLDB Journal 15, 388–402 (2006). https://doi.org/10.1007/s00778-006-0011-4

Download citation

Received: 30 September 2005
Accepted: 25 May 2006
Published: 01 August 2006
Issue Date: November 2006
DOI: https://doi.org/10.1007/s00778-006-0011-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Privacy leakage in multi-relational databases: a semi-supervised learning perspective

Abstract

Access this article

Similar content being viewed by others

Itemset-Based Variable Construction in Multi-relational Supervised Learning

WordificationMI: multi-relational data mining through multiple-instance propositionalization

Mining Interesting Patterns in Multi-relational Data with N-ary Relationships

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Privacy leakage in multi-relational databases: a semi-supervised learning perspective

Abstract

Access this article

Similar content being viewed by others

Itemset-Based Variable Construction in Multi-relational Supervised Learning

WordificationMI: multi-relational data mining through multiple-instance propositionalization

Mining Interesting Patterns in Multi-relational Data with N-ary Relationships

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation