Elsevier

Data & Knowledge Engineering

Volume 111, September 2017, Pages 1-21
Data & Knowledge Engineering

Secure logical schema and decomposition algorithm for proactive context dependent attribute based inference control

https://doi.org/10.1016/j.datak.2017.02.002Get rights and content

Abstract

Inference problem has always been an important and challenging topic of data privacy in databases. In relational databases, the traditional solution to this problem was to define views on relational schemas to restrict the subset of attributes and operations available to the users in order to prevent unwanted inferences. This method is a form of decomposition strategy, which mainly concentrates on the granularity of the accessible fields to the users, to prevent sensitive information inference. Nowadays, due to increasing data sharing among parties, the possibility of constructing complex indirect methods to obtain sensitive data has also increased. Therefore, we need to not only consider security threats due to direct access to sensitive data but also address indirect inference channels using functional and probabilistic dependencies (e.g., deducing gender of an individual from his/her name) while creating security views. In this paper, we propose a proactive and decomposition based inference control strategy for relational databases to prevent direct or indirect inference of private data. We introduce a new kind of context dependent attribute policy rule, which is named as security dependent set, as a set of attributes whose association should not be inferred. Then, we define a logical schema decomposition algorithm that prevents inference among attributes in security dependent set. The decomposition algorithm takes both functional and probabilistic dependencies into consideration in order to prevent all kinds of known inferences of relations among the attributes of security dependent sets. We prove that our proposed decomposition algorithm generates a secure logical schema that complies with the given security dependent set constraints. Since our proposed technique is purely proactive, it does not require any prior knowledge about executed queries and do not need to modify any submitted queries. It can also be embedded into any relational database management system without changing anything in the underlying system. We empirically compare our proposed method with the state of art reactive methods. Our extensive experimental analysis, conducted using TPC-H1 benchmark scheme, shows the effectives our proposed approach.

Introduction

The evolution of technology in business applications has increased the importance of protecting sensitive data, that needs to be accessed by many different users with different access privileges. Fine grained inference control methods has become crucial to prevent the inference of unwanted sensitive information from the data disclosed only to legitimate users. In relational database management systems, usually views have been used for this purpose. In order to satisfy the given privacy policies, in view based solutions, basically direct inferences of sensitive data is prevented without considering existence of the functional and probabilistic relationships among the attributes in the database schema. These approaches have very simple purpose; that is, to determine whether to grant or deny the access, based on the predefined constraints related to the role of the user. Especially, for the attribute based access control models, the attributes that are going to be related with each other in a query are used in making the grant or deny decision of the query. However, using functional and/or probabilistic dependencies, it might be possible to obtain sensitive relationships among attributes indirectly. The aim of this paper is to propose a formal model to address this kind of sensitivity problem and describe a decomposition based solution to it.

In order to illustrate the problem of indirect inference of the relationships among security dependent attributes2 through functional and probabilistic dependencies, consider a simple database application with an EMPLOYEE relation. Assume that the company would like to adjust the salaries of its employees based on their departments, positions and years of experiences according to market standards. The EMPLOYEE relation has the following schema:

EMPLOYEE=(id , phoneNumber, name, gender, salary, department, position, yearsOfExperience).

To reduce potential biases in determining salary increases, we may want to limit disclosure of gender and salary relationship using views. Also, assume that in this relation id and phoneNumber fields are two keys, and id is selected as the primary key. Therefore, probable join queries should be checked to guarantee that salary and gender fields cannot be related with each other by the help of these keys. In addition, we have assumed that there is no functional or probabilistic dependency other than the ones which make id and phoneNumber candidate keys for the EMPLOYEE relation.

A natural solution to this problem is decomposing the EMPLOYEE relation into two views as follows:

EMPLOYEE11=(id , name, gender, department, position, phoneNumber, yearsOfExperience)

EMPLOYEE12=(id , name, salary, department, position, phoneNumber, yearsOfExperience).

However, this decomposition is faulty since salary and gender relationship can be generated using the following query:

SELECT e1.gender, e2.salary FROM EMPLOYEE11 e1, EMPLOYEE12 e2 WHERE e1.id=e2.id.

To prevent this kind of join queries on keys, a further decomposition can be done:

EMPLOYEE21=(id , name, department, position, phoneNumber, yearsOfExperience)

EMPLOYEE22=(name, salary, department, position, yearsOfExperience)

EMPLOYEE23=(name, gender, department, position, yearsOfExperience).

This decomposition may initially seem to be correct as it is not possible to join EMPLOYEE22 and EMPLOYEE23 using the given keys to relate salary and gender fields. However, name and salary fields appear together in EMPLOYEE22 relation, and it is possible to infer gender from the name of an employee with high probability. Thus, in a correct solution, the possibility of relating salary with name should also be prevented as in the following decomposition:

EMPLOYEE31=(id , name, department, position, phoneNumber, yearsOfExperience)

EMPLOYEE32=(salary, department, position, yearsOfExperience)

EMPLOYEE33=(name, gender, department, position, yearsOfExperience).

One of the most important point of this example is the breaking the dependencies of gender and salary attributes with the keys. Some other key distribution alternatives may be possible for the decomposed relations, and this issue will be discussed later in the paper.

By using the schema obtained with EMPLOYEE31, EMPLOYEE32 and EMPLOYEE33, it is not possible to define any query which relates salary and gender attributes either through equijoins on keys, or through using the given probabilistic dependencies. Therefore, this decomposition satisfies the given security policy. Also note that there may be keyless relations obtained after the decomposition, as it is usual in views, and this issue will also be discussed in Section 4.

As in this example, a view based solution can be generated to satisfy a given privacy policy. There can be several policy rules, and views should be constructed to satisfy these policies. The need for defining different external layers for different inference control policies has increased by web based data sharing trend [1]. Therefore, a formal approach is needed to build a secure external layer by decomposing the relations into sub relations according to privacy policy rules and to generate relevant secure logical schema. These sub-relations can be used to generate accessible views for users. Notice that, if any attribute other than candidate keys, such as position in the example, can be used as a key due to data distribution; then, this attribute should be stated as a candidate key or should be perturbed. Most likely, this pseudo-key situation may happen either at the beginning or may occur after data is inserted to the schema. In order to solve this issue, pseudo-key can easily be added as a key to the system, and the decomposition can be applied again as the external layer is consisting of views only.

Most of the research addressing inference control problem is mainly focused on dynamic mechanisms employing query investigation or modification methods, and by also tracking the query history [2], [3], [4], [5], [6], [7]. On the other hand, our strategy in this paper is to decompose the relation into views in advance, in order to prevent the time spent by costly query modification or history tracking operations [8]. To the best of our knowledge, this is the first attempt in the literature to satisfy privacy for context dependent attribute based inference control using a proactive approach. The approach is labeled as ”proactive”, since it does not need to perform costly query history investigation, query modification, or row/attribute based joins or calculations existing in reactive strategies. Another problem with reactive strategies is the consistency. Basically, in a reactive strategy, the queries that can be issued by the user will dynamically change based on their usage history. Therefore, two users with the same rights may not be able to issue the same queries if their query history is different. This could create consistency problems from a regular user point of view. In addition to those, our method can be easily adapted for validating existing external schema against given attribute based policy rules. The proactive decomposition method described in this paper can easily be combined with other constraints such as attribute based policy rules during implementation.

In the rest of this paper, we first present a formal method to define secure logical schema for preserving context dependent attribute based privacy policies, then we define a decomposition algorithm that guarantees to produce a secure logical schema. Experiments are also performed for a comparative timing analysis against reactive strategies. This paper is organized as follows: Section 2 describes the related works in the field of security and privacy in databases. Then, the Section 3 gives the preliminaries and definitions used through the rest of the paper. Section 4 presents the decomposition algorithm that satisfies the access policies, and the proof of the algorithm. The next section, Section 5, contains experiments which makes a concrete comparison in between the proactive and reactive strategies and Section 6 briefly discusses the future work. Finally Section 7 contains the conclusion.

Section snippets

Related work

The field of database security is very popular, and several works in this field have influenced the ideas proposed in this paper. Although inference problem in privacy has been discussed in many researches [9], [10], [11], [12], [13], [14], the proposed solutions are mainly based on fragmentation and query updating techniques in order to prevent association attacks [15]. The most similar works to our method in terms of decomposing the relational schema according to dependency sets have been

Preliminaries and problem definition

In this section, we give the basic terms and the concepts used in the paper. This paper has two main objectives, namely, formally defining a secure logical schema which is in compliance with the given security constraints (security dependent sets), and developing a decomposition algorithm which divides relations into sub-relations to be able to satisfy the security constraints. The main reason for decomposition is to prevent obtaining securely dependent attributes together directly in a

Decomposition algorithm

The main purpose of the decomposition algorithm is to achieve the secure decomposition (Definition 9). In order to satisfy the goal, it is clear that the elements of each security dependent set should not be in the same sub-relation obtained after the decomposition of original relations. Furthermore, it should not be possible to meaningfully join sub-relations containing securely dependent attributes separately. Below we define an algorithm which exhaustively generates all the subsets of the

Experimental work

In this section, comparison between proactive and reactive strategies for context dependent attribute based inference control mechanisms is presented through experiments. In these experiments, TPC-H schema [31] and related, synthetically generated data sets are used. TPC-H schema is a well-known benchmark, which is used for query evaluation and its schema is given in the next subsection. In our experiments we have mainly concentrated on the overhead duration of the reactive algorithm since

Future work

Being the first attempt in the literature to formalize proactive context dependent attribute based inference control, our paper is touching on many different application specific and theoretical research issues about the subject. Firstly, the integration alternatives of this work to existing inference control mechanisms of database management systems should be investigated. The proposed solution could work as a part of a trusted access control system of a database which has proactive and

Conclusions

The aim of this paper is to construct proactive context dependent attribute based security mechanism for relational database system using given security dependent sets. Formal model of the system and the decomposition algorithm is given together with the proofs that the algorithm produces a schema that is in compliance with the given security dependencies.

The main objective in this work is to prevent inference of association of the attributes in each security dependent set, and this is

Uğur Turan graduated from Computer Engineering Department, Middle East Technical University with a high honor degree in 2006. He received his MS degree from the same department in 2009. He has worked on hybrid access control mechanisms during his MS study. Currently, he is a PhD candidate in the same department, and his research focuses on the database security applications. Since he has been working for software research companies for more than 12 years, his research interests have always been

References (34)

  • J. Biskup et al.

    Reducing inference control to access control for normalized database schemas

    Inf. Process. Lett.

    (2008)
  • M. Thuraisingham

    Security checking in relational database management systems augmented with inference engines

    Comput. Secur.

    (1987)
  • E. Ferrari, B. Thuraisingham, Security and privacy for web databases and services, in: Advances in Database Technology...
  • L. Sweeney

    K-anonymity: a model for protecting privacy

    Int. J. Uncertain. Fuzziness Knowl. -Based Syst

    (2002)
  • M. Stonebraker, E. Wong, Access control in a relational data base management system by query modification, in:...
  • A. Motro, An access authorization model for relational databases based on algebraic manipulation of view definitions,...
  • R. Agrawal, J. Kiernan, R. Srikant, Y. Xu, Hippocratic databases, in: Proceedings of the 28th International Conference...
  • O. Cooperation, Oracle database: Security guide, b14266.pdf (Jul. 2012). URL...
  • J. Shi, H. Zhu, G. Fu, T. Jiang, On the soundness property for sql queries of fine-grained access control in dbmss, in:...
  • L.J. Buczkowski, E. Perry, Database inference controller., in: DBSec, 1989, pp....
  • J.A. Goguen, J. Meseguer, Unwinding and inference control, in: 2012 IEEE Symposium on Security and Privacy, IEEE...
  • T.H. Hinke, Inference aggregation detection in database management systems, in: Proceedings of the 1988 IEEE Conference...
  • S. Dawson, S. De Capitani Di Vimercati, P. Samarati, Specification and enforcement of classification and inference...
  • D.G. Marks

    Inference in mls database systems

    IEEE Trans. Knowl. Data Eng.

    (1996)
  • S. Dawson, S. De Capitani di Vimercati, P. Lincoln, P. Samarati, Minimal data upgrading to prevent inference and...
  • S. de Capitani di Vimercati et al.

    Fragments and loose associations: respecting privacy in data publishing

    Proc. VLDB Endow

    (2010)
  • S.D.C. di Vimercati et al.

    Fragmentation in presence of data dependencies

    IEEE Trans. Dependable Secur. Comput.

    (2014)
  • Cited by (6)

    Uğur Turan graduated from Computer Engineering Department, Middle East Technical University with a high honor degree in 2006. He received his MS degree from the same department in 2009. He has worked on hybrid access control mechanisms during his MS study. Currently, he is a PhD candidate in the same department, and his research focuses on the database security applications. Since he has been working for software research companies for more than 12 years, his research interests have always been motivated from real-life problems. He is currently “Software Research Director” of a private company.

    Dr. Ismail H. Toroslu is a Professor of the Department of Computer Engineering, Middle East Technical University (METU) since 1993. He has received his B.S. and M.S. degrees in computer engineering from METU, Ankara in 1987 and Bilkent University, Ankara in 1989 respectively. Dr. Toroslu received his PhD from the Department of Electrical Engineering and Computer Science at Northwestern University, IL USA, in 1993. He has been a visiting professor in the Department of Computer Science at University of Central Florida between 2000 and 2002. His current research interests include data mining, information retrieval and intelligent data analysis. Dr. Toroslu has published more than 80 technical papers in variety of areas of computer science. Dr. Toroslu has also received IBM Faculty Award in 2010.

    Dr. Murat Kantarcioglu is a Professor of Computer Science and Director of the UTD Data Security and Privacy Lab at The University of Texas at Dallas. He holds a BS in Computer Engineering from Middle East Technical University, and MS and PhD degrees in Computer Science from Purdue University. He is recipient of an NSF CAREER award and a Purdue CERIAS Diamond Award for academic excellence. Currently, he is also a visiting scholar at Harvard's Data Privacy Lab.

    Dr. Kantarcioglu's research focuses on creating technologies that can efficiently extract useful information from any data without sacrificing privacy or security. His research has been supported by awards from NSF, AFOSR, ONR, NSA, and NIH. He has published over 165 peer-reviewed papers. His work has been covered by media outlets such as Boston Globe and ABC News, among others and has received three best paper awards. He is an IEEE senior member and ACM Distinguished Scientist.

    1

    TPC-H is an ad-hoc decision support benchmark of Transaction Processing Performance Council.

    View full text