Suppressing microdata to prevent classification based inference

Azgin Hintoglu, Ayça; Saygın, Yücel

doi:10.1007/s00778-009-0170-1

Suppressing microdata to prevent classification based inference

Regular Paper
Published: 19 November 2009

Volume 19, pages 385–410, (2010)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Ayça Azgin Hintoglu¹ &
Yücel Saygın¹

113 Accesses
3 Citations
Explore all metrics

Abstract

The revolution of the Internet together with the progression in computer technology makes it easy for institutions to collect an unprecedented amount of personal data. This pervasive data collection rally coupled with the increasing necessity of dissemination and sharing of non-aggregated data, i.e., microdata, raised a lot of concerns about privacy. One method to ensure privacy is to selectively hide the confidential, i.e. sensitive, information before disclosure. However, with data mining techniques, it is now possible for an adversary to predict the hidden confidential information from the disclosed data sets. In this paper, we concentrate on one such data mining technique called classification. We extend our previous work on microdata suppression to prevent both probabilistic and decision tree classification based inference. We also provide experimental results showing the effectiveness of not only the proposed methods but also the hybrid methods, i.e., methods suppressing microdata against both classification models, on real-life data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Privacy-Preserving Data Mining Techniques: Survey and Challenges

An Analysis of Privacy Preservation Techniques in Data Mining

Privacy Preserving Data Mining: A Parametric Analysis

References

Wikipedia,: Privacy—Wikipedia the free encyclopedia. Available at http://en.wikipedia.org/wiki/Privacy (2005)
Report to Congress regarding the Terrorism Information Awareness Program, May 20, 2003
O’Leary D.E.: Knowledge discovery as a threat to database security. In: Piatetsky-Shapiro, G., Frawley, W. (eds) Knowledge Discovery in Databases, pp. 507–516. AAAI Press, MIT Press, Menlo Park, California (1991)
Google Scholar
O’Leary D.E.: Some privacy issues in knowledge discovery: the OECD personal privacy guidelines. IEEE Expert: Intelligent Syst. Appl. 10(2), 48–52 (1995)
MathSciNet Google Scholar
Klosgen, W.: Knowledge discovery in databases and data privacy. IEEE Expert, April 1995
Piatetsky-Shapiro, G.: Knowledge discovery in databases vs. personal privacy. IEEE Expert, April 1995
Selfridge, P.: Privacy and knowledge discovery in databases. IEEE Expert, April 1995
Azgın Hintoǧlu, A., Saygın, Y.: Suppressing microdata to prevent probabilistic classification based inference. In: Proceedings of the Workshop on Secure Data Management (SDM’05) (2005)
Cox L.H.: Suppression methodology and statistical disclosure control. J. Am. Stat. Assoc. 75(370), 377–385 (1980)
Article MATH Google Scholar
Sande, G.: Automated cell suppression to reserve confidentiality of business statistics. In: Proceedings of the 2nd International Workshop on Statistical Database Management, pp. 346–353 (1983)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. IEEE Sympos. Res. Security Privacy (1998)
Samarati P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Article Google Scholar
Sweeney L.: k-Anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Sys. 10(5), 557–570 (2002)
Article MATH MathSciNet Google Scholar
USC Annenberg School—Center for the Digital Future: The Highlights of the Digital Future Report, Year Five, Ten Years Ten Trends. Available at http://www.digitalcenter.org/pdf/Center-for-the-Digital-Future-2005-Highlights.pdf
Iyengar, V.S.: Transforming data to satisfy privacy constraints. SIGKDD (2002)
Øhrn A., Ohno-Machado L.: Using boolean reasoning to anonymize databases. Artif. Intell. Med. 15(3), 235–254 (1999)
Article Google Scholar
Nissenbaum H.: Protecting privacy in an information age: the problem of privacy in public. Law Philos. 17, 559–596 (1998)
Google Scholar
Sweeney, L.: Information Explosion. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Zayatz, L., Doyle, P., Theeuwes, J., Lane, J., (eds.) Urban Institute, Washington, DC (2001)
Dreiseitl, S., Vinterbo, S., Ohno-Machado, L.: Disambiguation data: extracting information from anonymized sources. In: Proceedings of the 2001 American Medical Informatics Annual Symposium, pp. 144–148 (2001)
Aggarwal, C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st VLDB Conference (2005)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-Diversity: privacy beyond k-anonymity. In: Proceedings of the 22nd IEEE International Conference on Data Engineering (2006)
Cover T.M., Thomas J.A.: Elements of information theory. Wiley, New York (1991)
Book MATH Google Scholar
UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLSummary.html
Mangasarian O.L., Wolberg W.H.: Cancer diagnosis via linear programming. SIAM News 23(5), 1–18 (1990)
Google Scholar
Tsochantaridis I., Joachims T., Hofmann T., Altun Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)
MathSciNet Google Scholar
Adam N.R., Wortmann J.C.: Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21(4), 515–556 (1989)
Article Google Scholar
Denning D.E.: Cryptography and Data Security. Addison-Wesley, (1982)
Domingo-Ferrer, J.: (eds) Inference control in statistical databases. Lecture Notes in Computer Science, vol. 2316. Springer-Verlag, Berlin (2002)
Farkas, C., Jajodia, S.: The inference problem: a survey. SIGKDD Explorations (2003)
Geurts, J.: Heuristics for cell suppression in tables. Technical Paper, Netherlands Central Bureau of Statistics (1992)
Kao M.Y.: Data security equals graph connectivity. SIAM J. Discret. Math. 9, 87–100 (1996)
Article MATH MathSciNet Google Scholar
Kelly J.P., Golden B.L., Assad A.A.: Cell suppression: disclosure protection for sensitive tabular data. Networks 22, 397–417 (1992)
Article MATH Google Scholar
Fischetti M., Salazar J.J.: Models and algortihms for the 2-dimensional cell suppression problem in statistical disclosure control. Math. Program. 84, 283–312 (1999)
MATH MathSciNet Google Scholar
Fischetti M., Salazar J.J.: Models and algorithms for optimizing cell suppression in tabular data with linear constraints. J. Am. Stat. Assoc. 95(451), 916–928 (2000)
Article Google Scholar
Willenborg, L., De Waal, T.: Statistical disclosure control in practice. Lecture Notes in Statistics, vol. 111. Springer Verlag, New York (1996)
Domingo-Ferrer J., Torra V.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Article Google Scholar
Torra, V.: Microaggregation for categorical variables: a median based approach. In: Domingo-Ferrer, J., Torra, V. (eds.), Privacy in Statistical Databases, vol. 3050, pp. 162–174 (2004)
Oganian A., Domingo-Ferrer J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. U.N. Econ. Comm. Eur. 18(4), 345–354 (2001)
Google Scholar
Solanas, A., Martinez-Balleste, A., Mateo-Sanz, J.M., Domingo-Ferrer, J.: Towards microaggregation with genetic algorithms. In: Proceedings of the Third IEEE Conference on Intelligent Systems, pp. 65–70 (2006)
Martinez-Balleste, A., Solanas, A., Domingo-Ferrer, J., Mateo-Sanz, J.M.: A Genetic approach to multivariate microaggregation for database privacy. In: Proceedings of 23rd IEEE Internation Conference on Data Engineering, pp. 180–185 (2007)
Hansen S.L., Mukherjee S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15(4), 1043–1044 (2003)
Article Google Scholar
Laszlo M., Mukherjee S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)
Article Google Scholar
Sande G.: Exact and approximate methods for data directed microaggregation in one or more dimensions. Int. J. Uncertain. Fuzziness Knowl. Syst. 10(5), 459–476 (2002)
Article MATH MathSciNet Google Scholar
Jajodia, S., Meadows, C. : Inference problems in multilevel secure database management systems. In: Abrams, M.D., Jajodia, S., Podell,H.J. (eds.) Information Security—An Integrated Collection of Essays, pp. 570–584. IEEE C. S. Press, (1989)
Quian, X., Stickel, M.E., Karp, P.D., Lunt, T.F., Garvey, T.D.: Detection and elimination of inference channels in multilevel relational database systems. In: Proceedings of IEEE Symp. Security and Privacy, pp. 196–205 (1993)
Stachour P., Thuraisingham B.: Design of LDV: A multilevel secure relational database management system. IEEE Trans. Knowl. Data Eng. 2(2), 190–209 (1990)
Article Google Scholar
Su T., Ozsoyoglu G.: Inference in MLS database systems. IEEE Trans. Knowl. Data Eng. 3(2–3), 147–168 (1991)
Google Scholar
Marks D.: Inference in MLS database systems. IEEE Trans. Knowl. Data Eng. 8(1), 46–55 (1996)
Article Google Scholar
Delugach H., Hinke T.: Wizard: A database inference analysis and detection system. IEEE Trans. Knowl. Data Eng. 8(1), 56–66 (1996)
Article Google Scholar
Hinke T., Delugach H., Wolf R.P.: Protecting databases from inference attacks. Comput. Secur. 16(8), 687–708 (1997)
Article Google Scholar
Dawson, S., di Vimercati, S.D.C., Lincoln, P., Samarati, P.: Minimal data upgrading to prevent inference and association. In: Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 114–125. ACP Press, (1999)
Brodsky A., Farkas C., Jajodia A.: Secure databases: Constraints, inference channels and monitoring disclosure. IEEE Trans. Knowl. Data Eng. 12(6), 900–919 (2000)
Article Google Scholar
Hinke T.H., Delugach H.S., Chandrasekhar A.: A fast algorithm for detecting second paths in database inference analysis. J. Comput. Secur. 3(2, 3), 147–168 (1995)
Google Scholar
Denning, D.: Commutative filters for reducing inference threats in multilevel database systems. In: Proceedings of IEEE Symposium on Security and Privacy, pp. 134–146 (1985)
Thuraisingham B.: Security checking in relational database management systems augmented with inference engines. Comput. Secur. 6, 479–492 (1987)
Article Google Scholar
Domingo-Ferrer J., Torra V.: Ordinal, continious and heteregoneous k-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)
Article MathSciNet Google Scholar
Domingo-Ferrer, J., Solanas, A., Martinez-Balleste, A.: Privacy in statistical databases:k-anonymity through microaggregation. In: Proceedings of IEEE Granular Computing (2006)
Wang, K., Fung, B.C.M., Yu, P.S.: Template-based privacy preservation in classification problems. In: ICDM ’05: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 466–473 (2005)
Chang, L., Moskowitz, I.S.: Parsimonious downgrading and decision trees applied to the inference problem. In: Proceedings of the Workshop of New Security Paradigms, pp. 82–89 (1999)

Download references

Author information

Authors and Affiliations

Sabancı University, Istanbul, Turkey
Ayça Azgin Hintoglu & Yücel Saygın

Authors

Ayça Azgin Hintoglu
View author publications
You can also search for this author in PubMed Google Scholar
Yücel Saygın
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yücel Saygın.

Additional information

This work was partially funded by the Information Society Technologies programme of the European Commission, Future and Emerging Technologies under the IST-6FP-014915 GeoPKDD project.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Azgin Hintoglu, A., Saygın, Y. Suppressing microdata to prevent classification based inference. The VLDB Journal 19, 385–410 (2010). https://doi.org/10.1007/s00778-009-0170-1

Download citation

Received: 24 April 2008
Revised: 11 August 2009
Accepted: 03 October 2009
Published: 19 November 2009
Issue Date: June 2010
DOI: https://doi.org/10.1007/s00778-009-0170-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Suppressing microdata to prevent classification based inference

Abstract

Access this article

Similar content being viewed by others

Privacy-Preserving Data Mining Techniques: Survey and Challenges

An Analysis of Privacy Preservation Techniques in Data Mining

Privacy Preserving Data Mining: A Parametric Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Suppressing microdata to prevent classification based inference

Abstract

Access this article

Similar content being viewed by others

Privacy-Preserving Data Mining Techniques: Survey and Challenges

An Analysis of Privacy Preservation Techniques in Data Mining

Privacy Preserving Data Mining: A Parametric Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation