Optimization algorithm for k-anonymization of datasets with low information loss

Murakami, Keisuke; Uno, Takeaki

doi:10.1007/s10207-017-0392-y

Optimization algorithm for k-anonymization of datasets with low information loss

Regular Contribution
Published: 23 October 2017

Volume 17, pages 631–644, (2018)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

Keisuke Murakami¹ &
Takeaki Uno²

540 Accesses
5 Citations
Explore all metrics

Abstract

Anonymization is the modification of data to mask the correspondence between a person and sensitive information in the data. Several anonymization models such as k-anonymity have been intensively studied. Recently, a new model with less information loss than existing models was proposed; this is a type of non-homogeneous generalization. In this paper, we present an alternative anonymization algorithm that further reduces the information loss using optimization techniques. We also prove that a modified dataset is checked whether it satisfies the k-anonymity by a polynomial-time algorithm. Computational experiments were conducted and demonstrated the efficiency of our algorithm even on large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Iterative local search for preserving data privacy

Article Open access 20 December 2024

K-Anonymity Algorithm Based on Improved Clustering

Data Anonymization Through Multi-modular Clustering

References

Sacharidis, D., Mouratidis, K., Papadias, D.: k-Anonymity in the presence of external databases. IEEE Trans. Knowl. Data Eng. 22(3), 392–403 (2010)
Article Google Scholar
Dalenius, T.: Finding a needle in a haystack or identifying anonymous census record. J. Off. Stat. 2(3), 329–336 (1986)
Google Scholar
Wang, K., Yu, P.S., Chakraborty, S.: Bottom-up generalization: a data mining solution to privacy protection. In: Fourth IEEE International Conference on Data Mining, 2004. ICDM’04, pp. 249–256 (2004)
Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information. In: Proceedings of the 17th ACM SIGMOD-SIGACT-SIGART Symposium on the Principles of Database Systems, p. 188 (1998)
Fung, B., Wang, K., YU, P.: Top-down specialization for information and privacy preservation. In: Proceedings. 21st International Conference on Data Engineering, 2005. ICDE 2005. IEEE, pp. 205–216 (2005)
Samarati, P.: Protecting respondants identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Article Google Scholar
Sun, X., Li, M., Wang, H., Plank, A.: An efficient hash-based algorithm for minimal k-anonymity. In: Proceedings of the Thirty-First Australasian Conference on Computer Science, vol. 74, pp. 101–107 (2008)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: ${k}$-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98–04, SRI Computer Science Laboratory (1998)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R., Incognito: Efficient full-domain ${k}$-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, vol. 21, pp. 49–60 (2005)
Machanavajjhala, A., Gehrke, J., Kifer, D.: l-Diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(3), 1–52 (2007)
Google Scholar
Domingo-Ferrer, J.: Microaggregation for database and location privacy. In: Etzion, O., Kuflik, T., Motro, A. (eds.) Next Generation Information Technologies and Systems. Lecture Notes in Computer Science, vol. 4032. Springer, Berlin, Heidelberg, pp. 106–116 (2006)
Chapter Google Scholar
Campan, A., Truta, T.M., Miller, J., Sinca, R.A.: A clustering approach for achieving data privacy. In: Proceedings of the International Data Mining Conference, pp. 321–330 (2007)
Goldberg, A.V., Tarjan, R.E.: Efficient maximum flow algorithms. Commun. ACM 57(8), 82–89 (2014)
Article Google Scholar
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Anonymizing tables. In: Proceedings of the 10th International Conference on Database Theory, LNCS, 3363, pp. 246–258 (2005)
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)
Article MathSciNet Google Scholar
Wong, W.K., Mamoulis, N., Cheung, D.W.-L.: Non-homogeneous generalization in privacy preserving data publishing. In: The ACM SIGMOD International Conference on Data Management (SIGMOD), pp. 747–758 (2010)
Murakami, K., Uno, T.: A matching model and an algorithm for k-anonymity of large-scale data. In: Proceedings of the 15th Korea-Japan Joint Workshop on Algorithms and Computation, pp. 154–160 (2012)
Shmueli, E., Tassa, T., Wasserstein, R., Shapira, B., Rokach, L.: Limiting disclosure of sensitive data in sequential releases of databases. Inf. Sci. 191, 98–127 (2012)
Article Google Scholar
Shmueli, E., Tassa, T.: Privacy by diversity in sequential releases of databases. Inf. Sci. 298, 344–372 (2015)
Article Google Scholar
Goldberg, A.V.: An Efficient implementation of a scaling minimum-cost flow algorithm. J. Algorithms 22(1), 1–29 (1997)
Article MathSciNet Google Scholar
Hall, P.: On representatives of subsets. J. Lond. Math. Soc. 10(1), 26–30 (1935)
Article Google Scholar
Goldberg, A.V., Tarjan, R.E.: Finding minimum-cost circulations by canceling negative cycles. J. ACM 36(4), 873–886 (1989)
Article MathSciNet Google Scholar
Sokkalingam, P.T.: New polynomial-time cycle-canceling algorithms for minimum cost flows. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA 02139, USA (1997)
Uno, T.: Multi-sorting algorithm for finding pairs of similar short substrings from large-scale string data. Knowl. Inf. Syst. 25(2), 229–251 (2010)
Article Google Scholar

Download references

Acknowledgements

Part of this research is supported by the Funding Program for World-Leading Innovative R&D on Science and Technology, Japan. We thank Professor Wong for providing us with the programs used in our experiments.

Author information

Authors and Affiliations

Kansai University, 3-3-35 Yamate, Suita, Osaka, 564-8680, Japan
Keisuke Murakami
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan
Takeaki Uno

Authors

Keisuke Murakami
View author publications
You can also search for this author in PubMed Google Scholar
Takeaki Uno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keisuke Murakami.

Appendix

The gaps between $\hbox {GCP}_\mathrm{t}$ and $\hbox {GCP}_\mathrm{unsort}$ are computed by

$$\begin{aligned} \mathrm{Gap\_sort} = \frac{\hbox {GCP}_\mathrm{unsort} - \hbox {GCP}_\mathrm{t}}{ \hbox {GCP}_\mathrm{t}} , \end{aligned}$$

(12)

where $\hbox {GCP}_\mathrm{unsort}$ represents the information loss of the anonymized dataset obtained using our algorithm without preliminarily sorting. Note that we do not compute Gap_sort for the instances of $|\mathcal{T}|=100\hbox {k}$ because the datasets are not partitioned; thus, $\hbox {GCP}_\mathrm{unsort}$ equals to $\hbox {GCP}_\mathrm{t}$ (Tables 19, 20).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Murakami, K., Uno, T. Optimization algorithm for k-anonymization of datasets with low information loss. Int. J. Inf. Secur. 17, 631–644 (2018). https://doi.org/10.1007/s10207-017-0392-y

Download citation

Published: 23 October 2017
Issue Date: November 2018
DOI: https://doi.org/10.1007/s10207-017-0392-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimization algorithm for k-anonymization of datasets with low information loss

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Iterative local search for preserving data privacy

K-Anonymity Algorithm Based on Improved Clustering

Data Anonymization Through Multi-modular Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now