Skip to main content
Log in

Privacy preservation for data cubes

  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

A range query finds the aggregated values over all selected cells of an online analytical processing (OLAP) data cube where the selection is specified by the ranges of contiguous values for each dimension. An important issue in reality is how to preserve the confidential information in individual data cells while still providing an accurate estimation of the original aggregated values for range queries. In this paper, we propose an effective solution, called the zero-sum method, to this problem. We derive theoretical formulas to analyse the performance of our method. Empirical experiments are also carried out by using analytical processing benchmark (APB) dataset from the OLAP Council. Various parameters, such as the privacy factor and the accuracy factor, have been considered and tested in the experiments. Finally, our experimental results show that there is a trade-off between privacy preservation and range query accuracy, and the zero-sum method has fulfilled three design goals: security, accuracy, and accessibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adam NR, Wortman JC (1989) Security-control methods for statistical databases. ACM Comput Surv 21(4):515–556

    Google Scholar 

  • Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proc of the ACM symposium on principles of database systems, Santa Barbara, CA, USA, pp 247–255

  • Agrawal R, Gupta A, Sarawagi S (1997) Modeling multidimensional databases. In: Proc of the 13th international conference on data engineering, Birmingham, UK, pp 232–243

  • Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proc of the ACM SIGMOD conference on management of data, Dallas, TX, USA, pp 439–450

  • Barbara D, DuMouched W, Faloutsos C, Haas PJ, Hellerstein JM, Ioannidis Y (1997) The New Jersey data reduction report. Data Eng Bull 20:3–45

  • Barbara D, Sullivan M (1997) Quasi-cubes: exploiting approximations in multidimensional databases. ACM SIGMOD Rec 26(3):12–17

    Google Scholar 

  • Beck LL (1980) A security mechanism for statistical databases. ACM TODS 5(3):316–338

    Google Scholar 

  • Chaudhuri S, Dayal U (1997) An overview of data warehousing and OLAP technology. SIGMOD Rec 26(1):65–74

    Google Scholar 

  • Conway R, Strip D (1976) Selective partial access to a database. In: Proc ACM annual conf, pp 85–89

  • Denning DE (1980) Secure statistical database with random sample queries. ACM TODS 5(3):291–315

    Google Scholar 

  • Denning DE (1982) Cryptography and data security. Addison-Wesley

  • Denning DE, Denning PJ, Schwartz MD (1979) The tracker: a threat to statistical database security. ACM TODS 4(1):76–96

    Google Scholar 

  • Dobkin D, Jones AK, Lipton RJ (1979) Secure database: protection against user influence. ACM TODS 4(1):97–106

  • Estivill-Castro V, Brankovic L (1999) Data swapping: balancing privacy against precision in mining for logic rules. In: Proc of international conference of data warehousing and knowledge discovery, Florence, Italy, pp 389–398

  • Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proc of the 8th ACM SIGKDD int'l conference on knowledge discovery in databases and data mining, Edmonton, Canada, pp 217–228

  • Faloutsos C, Jagadish H, Sidiropoulos N (1997) Recovering information from summary data. In: Proc of the 1997 VLDB. Athens, Greece, pp 36–45

  • Fellegi IP (1972) On the question of statistical confidentiality. Am Stat Assoc 67(337):7–18

    Google Scholar 

  • Gray J, Bosworth A, Layman A, Pirahesh H (1996) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-total. In: Proc of 12th international conference on data engineering, pp 152–159

  • Ho CT, Agrawal R, Megiddo N, Srikant R (1997) Range queries in OLAP data cubes. SIGMOD, Tucson, AZ, USA, pp 73–88

  • Kantarcioglu M, Clifton C (2002) Privacy-preserving distributed mining of association rules on horizontally partitioned data. The ACM SIGMOD workshop on research issues in data mining and knowledge discovery, Madison, WI, pp 24–31

  • Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Proc of 2003 IEEE international conference on data mining. Melbourne, FL, pp 99–106

  • Kimball R (1997) Ensuring that your data warehouse is secure. DBMS Mag 10(4):14

    Google Scholar 

  • Lee SY, Ling TW, Li HG (2000) Hierarchical compact cube for range-max queries. In: Proc of the 26th international conference on VLDB, Cairo, Egypt, pp 232–241

  • Liew CK, Choi U, Liew CJ (1985) A data distortion by probability distribution. ACM TODS 10(3):395–411

    Google Scholar 

  • Motwani R, Raghavan P (1995) Randomized algorithms. Cambridge University Press

  • OLAP council's release II of the analytical processing benchmark (APB-1) for OLAP server performance, http://www.olapcouncil.org/news/APB1r2b_PR.htm, November 16, 1998

  • Priebe T, Pernul G (2000) Towards OLAP security design—survey and research issues. In: Proc of the 3rd ACM international workshop on data warehousing and OLAP, McLean, VA, USA, pp 33–40

  • Shoshani A (1997) OLAP and statistical databases: similarities and differences. In: Proc of the 16th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Tucson, AZ, USA, pp 185–196

  • Traub JF, Yemini Y, Waznaikowski H (1984) The statistical security of a statistical database. ACM TODS 9(4):672–679

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sam Y. Sung.

Additional information

Sam Y. Sung is an Associate Professor in the Department of Computer Science, School of Computing, National University of Singapore. He received a B.Sc. from the National Taiwan University in 1973, the M.Sc. and Ph.D. in computer science from the University of Minnesota in 1977 and 1983, respectively. He was with the University of Oklahoma and University of Memphis in the United States before joining the National University of Singapore. His research interests include information retrieval, data mining, pictorial databases and mobile computing. He has published more than 80 papers in various conferences and journals, including IEEE Transaction on Software Engineering, IEEE Transaction on Knowledge & Data Engineering, etc.

Yao Liu received the B.E. degree in computer science and technology from Peking University in 1996 and the MS. degree from the Software Institute of the Chinese Science Academy in 1999. Currently, she is a Ph.D. candidate in the Department of Computer Science at the National University of Singapore. Her research interests include data warehousing, database security, data mining and high-speed networking.

Hui Xiong received the B.E. degree in Automation from the University of Science and Technology of China, Hefei, China, in 1995, the M.S. degree in Computer Science from the National University of Singapore, Singapore, in 2000, and the Ph.D. degree in Computer Science from the University of Minnesota, Minneapolis, MN, USA, in 2005. He is currently an Assistant Professor of Computer Information Systems in the Management Science & Information Systems Department at Rutgers University, NJ, USA. His research interests include data mining, databases, and statistical computing with applications in bioinformatics, database security, and self-managing systems. He is a member of the IEEE Computer Society and the ACM.

Peter A. Ng is currently the Chairperson and Professor of Computer Science at the University of Texas—Pan American. He received his Ph.D. from the University of Texas–Austin in 1974. Previously, he had served as the Vice President at the Fudan International Institute for Information Science and Technology, Shanghai, China, from 1999 to 2002, and the Executive Director for the Global e-Learning Project at the University of Nebraska at Omaha, 2000–2003. He was appointed as an Advisory Professor of Computer Science at Fudan University, Shanghai, China in 1999. His recent research focuses on document and information-based processing, retrieval and management. He has published many journal and conference articles in this area. He had served as the Editor-in-Chief for the Journal on Systems Integration (1991–2001) and as Advisory Editor for the Data and Knowledge Engineering Journal since 1989.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sung, S.Y., Liu, Y., Xiong, H. et al. Privacy preservation for data cubes. Knowl Inf Syst 9, 38–61 (2006). https://doi.org/10.1007/s10115-004-0193-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-004-0193-2

Keywords

Navigation