Multivariate microaggregation by iterative optimization

Mortazavi, Reza; Jalili, Saeed; Gohargazi, Hojjat

doi:10.1007/s10489-013-0431-y

Multivariate microaggregation by iterative optimization

Published: 12 April 2013

Volume 39, pages 529–544, (2013)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Reza Mortazavi¹,
Saeed Jalili¹ &
Hojjat Gohargazi¹

401 Accesses
16 Citations
Explore all metrics

Abstract

Microaggregation is a well-known perturbative approach to publish personal or financial records while preserving the privacy of data subjects. Microaggregation is also a mechanism to realize the k-anonymity model for Privacy Preserving Data Publishing (PPDP). Microaggregation consists of two successive phases: partitioning the underlying records into small clusters with at least k records and aggregating the clustered records by a special kind of cluster statistic as a replacement. Optimal multivariate microaggregation has been shown to be NP-hard. Several heuristic approaches have been proposed in the literature. This paper presents an iterative optimization method based on the optimal solution of the microaggregation problem (IMHM). The method builds the groups based on constrained clustering and linear programming relaxation and fine-tunes the results within an integrated iterative approach. Experimental results on both synthetic and real-world data sets show that IMHM introduces less information loss for a given privacy parameter, and can be adopted for different real world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Near-Optimal Variable-Size Microaggregation

Beyond Multivariate Microaggregation for Large Record Anonymization

TBM, a transformation based method for microaggregation of large volume mixed data

Article 15 March 2016

Notes

The pseudo-code of CBFS is very similar to MDAV in Fig. 1, but line 6 is removed and the main loop is iterated until less than k unassigned records remain (2k changes to k in line 7 of Fig. 1).
The terms “merge” and “noMerge” are used in the DBA paper.
Referred to as “Lloyd quantization design algorithm” in the original paper.
Referred to as “cell” in the original paper.
Number of Pre-partitioned Blocks (NPB).
We have also used a semi-random initialization, which is discussed in Sect. 5.4.
Here, the index of each centroid is its implicit number.
A centroid is called unassigned, if none of its members are assigned yet. Similarly, a remained record is a record not in the Path.
We assume, NPB=1, i.e., |B|=|T|=n.
Please note that another procedure reorders records in a path for ImprovedMHM algorithm (line 19 in Fig. 2), so the term nlogn does not make sense here.
Please see the Appendix.
For simplicity, we removed the time complexity related to the dimension, d.
In the referred paper, number of clusters (c) and records in a cluster (n _i) was chosen randomly.

References

Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(5):557–570
Article MathSciNet MATH Google Scholar
Machanavajjhala A et al (2007) L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1):3
Article Google Scholar
Ninghui L, Tiancheng L, Venkatasubramanian S (2007) T-closeness: privacy beyond k-anonymity and l-diversity. In: ICDE
Google Scholar
Hong TP et al (2012) Using TF-IDF to hide sensitive itemsets. Appl Intell 1–9
Yin Y et al (2011) Privacy-preserving data mining. In: Data mining. Springer, Berlin
Chapter Google Scholar
Domingo-Ferrer J, Solanas A, Martinez-Balleste A (2006) Privacy in statistical databases: k-anonymity through microaggregation. In: Proceedings of IEEE granular computing, pp 774–777
Google Scholar
Oganian A, Karr AF (2011) Masking methods that preserve positivity constraints in microdata. J Stat Plan Inference 141(1):31–41
Article MathSciNet MATH Google Scholar
Domingo-Ferrer J, Torra V (2001) Disclosure control methods and information loss for microdata. In: Doyle P, Lane JI, Theeuwes JJM, Zayatz L (eds) Confidentiality, disclosure, and data access: theory and practical applications for statistical agencies, pp 93–112
Google Scholar
Xu C et al (2012) Efficient fuzzy ranking queries in uncertain databases. Appl Artif Intell 37(1):47–59
Article Google Scholar
Wong RC-W et al (2006) (α,k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Philadelphia, pp 754–759
Chapter Google Scholar
Sun X, Li M, Wang H (2011) A family of enhanced-diversity models for privacy preserving data publishing. Future Gener Comput Syst 27(3):348–356
Article MATH Google Scholar
Sun X, Sun L, Wang H (2011) Extended k-anonymity models against sensitive attribute disclosure. Comput Commun 34(4):526–535
Article Google Scholar
Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method
Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201
Article Google Scholar
Domingo-Ferrer J, Sramka M Disclosure control by computer scientists: an overview and an application of microaggregation to mobility data anonymization
Rebollo-Monedero D, Forné J, Soriano M (2011) An algorithm for k-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data Knowl Eng
Domingo-Ferrer J, Torra V (2005) Ordinal continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212
Article MathSciNet Google Scholar
Torra V (2004) In: Domingo-Ferrer J, Torra V (eds) Microaggregation for categorical variables: a median based approach, privacy in statistical databases. Springer, Berlin, pp 162–174
Chapter Google Scholar
Hansen SL, Mukherjee S (2003) A polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044
Article Google Scholar
Solanas A, Sebé F, Domingo-Ferrer J (2008) Micro-aggregation-based heuristics for p-sensitive k-anonymity: one step beyond. In: Proceedings of the 2008 international workshop on privacy and anonymity in information society. ACM, New York
Google Scholar
Domingo-Ferrer J, Sebé F, Solanas A (2008) An anonymity model achievable via microaggregation. In: Secure data management, pp 209–218
Chapter Google Scholar
Jian-min H, Ting-ting C, Hui-qun Y (2008) An improved V-MDAV algorithm for l-diversity. In: 2008 international symposiums on information processing (ISIP). IEEE, New York
Google Scholar
Skinner CJ et al (1994) Disclosure control for census microdata. J Off Stat 10(1):31–51
Google Scholar
Nin J, Herranz J, Torra V (2008) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412
Article Google Scholar
Yancey W, Winkler W, Creecy R (2002) Disclosure risk assessment in perturbative microdata protection. In: Inference control in statistical databases, pp 49–60
Google Scholar
Chang C-C, Li Y-C, Huang W-H (2007) TFRP: an efficient microaggregation algorithm for statistical disclosure control. J Syst Softw 80(11):1866–1878
Article Google Scholar
Domingo-Ferrer J, González-Nicolás Ú (2010) Hybrid microdata using microaggregation. Inf Sci 180(15):2834–2844
Article Google Scholar
Sun X et al (2012) An approximate microaggregation approach for microdata protection. Expert Syst Appl 39(2):2211–2219
Article Google Scholar
Crises G (2004) Microaggregation for privacy protection in statistical databases. Rovira i Virgili, Univ. Tarragona, Spain, Tech. Rep. CRIREP-04-005
Hundepool A et al (2006) CENEX SDC handbook on statistical disclosure control, version 1.01
Google Scholar
Oganian A, Domingo-Ferrer J (2001) On the complexity of optimal microaggregation for statistical disclosure control. Stat J United Nations Econ Comission Eur 18(4):345–354
Google Scholar
Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 symposium on design and analysis of longitudinal surveys, Ottawa, Canada
Google Scholar
Mateo Sanz JM, Domingo i Ferrer J (1998) A comparative study of microaggregation methods. Questiió 22(3):511–526
MATH Google Scholar
Domingo-Ferrer J et al (2006) Efficient multivariate data-oriented microaggregation. VLDB J 15(4):355–369
Article Google Scholar
Heaton B (2012) New record ordering heuristics for multivariate microaggregation. Nova Southeastern University
Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911
Article Google Scholar
Solanas A, Martinez-Balleste A, Domingo-Ferrer J (2006) V-MDAV: a multivariate microaggregation with variable group size. In: COMPSTAT’2006
Google Scholar
Chang CC, Li YC, Huang WH (2007) TFRP: an efficient microaggregation algorithm for statistical disclosure control. J Syst Softw 80(11):1866–1878
Article Google Scholar
Solanas A et al (2006) Multivariate microaggregation based genetic algorithms. In: 2006 3rd international IEEE conference on intelligent systems
Google Scholar
Lin JL et al (2010) Density-based microaggregation for statistical disclosure control. Expert Syst Appl 37(4):3256–3263
Article Google Scholar
Rebollo-Monedero D, Forné J, Soriano M (2011) An algorithm for k-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data Knowl Eng 70(10):892–921
Article Google Scholar
Panagiotakis C, Tziritas G (2011) Successive group selection for microaggregation. IEEE Trans Knowl Data Eng. http://www.computer.org/csdl/trans/tk/preprint/ttk2011990169-abs.html
Li Y et al (2002) A privacy-enhanced microaggregation method. Foundations of Information and Knowledge Systems, 237–250
Solanas A (2008) In: Yang A, Shan Y, Bui L (eds) Success in evolutionary computation. Springer, Berlin, pp 215–237
Chapter Google Scholar
Fayyoumi E, Oommen BJ (2010) A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases. Softw Pract Exp 40(12):1161–1188
Article Google Scholar
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations, California, USA
Cormen TH et al (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge, p 850
MATH Google Scholar
Ghouila-Houri A (1962) Caractérisation des matrices totalement unimodulaires. C R Math Acad Sci 254:1192–1194
MathSciNet MATH Google Scholar
Korte BH, Vygen J (2006) Combinatorial optimization: theory and algorithms. Algorithms and combinatorics, vol 21. Springer, Berlin
Google Scholar
Wong RCW et al. (2007) Minimality attack in privacy preserving data publishing. In: Proceedings of the 33rd international conference on Very large data bases. VLDB Endowment
Mosek AS (2010) The MOSEK optimization software. Online at http://www.mosek.com
Brand R, Domingo-Ferrer J, Mateo-Sanz J (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. European Project IST-2000-25069 CASC. http://neon.vb.cbs.nl/casc
UCI KDD archive. Available from: http://kdd.ics.uci.edu/
S, H. and B. SD (1999) The UCI KDD archive
Cormode G et al (2010) Minimizing minimality and maximizing utility: analyzing method-based attacks on anonymized data. Proc VLDB Endow 3(1–2):1045–1056
Google Scholar

Download references

Acknowledgement

This research is partially supported by ITRC (Iran Telecommunication Research Center) under contract No. 12200/500. We would like to thank the editor and anonymous reviewers for their detailed comments that help improve the paper.

Author information

Authors and Affiliations

Electrical and Computer Engineering Faculty, Tarbiat Modares University, Tehran, Iran
Reza Mortazavi, Saeed Jalili & Hojjat Gohargazi

Authors

Reza Mortazavi
View author publications
You can also search for this author in PubMed Google Scholar
Saeed Jalili
View author publications
You can also search for this author in PubMed Google Scholar
Hojjat Gohargazi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeed Jalili.

Appendix: Proofs related to the ImprovedMHM function (Fig. 4)

Assume a group of records G with n members {X ₁,X ₂,…,X _n} in a d dimensional space. Let $C_{1}^{n}[l]= \sum_{j\in \{1,2,\ldots ,n\}}X_{j}[l]/n$ represents the lth dimension (1≤l≤d) of the centroid of G, where X _j[l] denotes the lth dimension of X _j. Based on Eq. (1), it is easy to verify that the SSE of the group can be computed by SSE(G)=SSE(G[1])+SSE(G[2])+⋯+SSE(G[d]), where G[l]={X ₁[l],X ₂[l],…,X _n[l]}.

Lemma 1

If a new record X _n+1 is added to a group G ₁, the SSE of the new group G ₂=G ₁∪{X _n+1} denoted by SSE ₂ can be calculated based on the calculated SSE before the addition (SSE ₁) in O(1).

Proof

Let ΔSSE=SSE ₂−SSE ₁=∑_1≤l≤d SSE ₂[l]−SSE ₁[l]. Based on Eq. (1), we can expand the ΔSSE[l]:

(3)

The new centroid can be computed simply:

$$ C_{1}^{n+1}[l]={\bigl(n.C_{1}^{n}[l]+X_{n+1}[l] \bigr)}/{(n+1)}. $$

(4)

So Eq. (3) reduces to:

(5)

These two updates require O(1) computations. □

Equations (4) and (5) are used in the ImprovedMHM function (lines 25–26 of Fig. 4) to update the SSE and centroid of a new group.

Lemma 2

If a record X _n+1 replaces a member in the group G, say X ₁, the SSE of the new group can be calculated in O(1).

Proof

We have $\mathit{SSE}_{2}[l]=\sum_{2\leq j\leq n+1}(X_{j}[l]-C_{2}^{n+1}[l])^{2}$. Let Δ[l]=X _n+1[l]−X ₁[l], so $C_{2}^{n+1}[l]=C_{1}^{n}[l]+\Delta [l]/n$ and the SSE can be updated incrementally:

(6)

Equation (6) is used in line 19 of Fig. 4, and requires O(1) computations. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mortazavi, R., Jalili, S. & Gohargazi, H. Multivariate microaggregation by iterative optimization. Appl Intell 39, 529–544 (2013). https://doi.org/10.1007/s10489-013-0431-y

Download citation

Published: 12 April 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s10489-013-0431-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multivariate microaggregation by iterative optimization

Abstract

Access this article

Similar content being viewed by others

Efficient Near-Optimal Variable-Size Microaggregation

Beyond Multivariate Microaggregation for Large Record Anonymization

TBM, a transformation based method for microaggregation of large volume mixed data

Notes

References

Acknowledgement