TBM, a transformation based method for microaggregation of large volume mixed data

Salari, Mostafa; Jalili, Saeed; Mortazavi, Reza

doi:10.1007/s10618-016-0457-y

TBM, a transformation based method for microaggregation of large volume mixed data

Published: 15 March 2016

Volume 31, pages 65–91, (2017)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Mostafa Salari¹,
Saeed Jalili¹ &
Reza Mortazavi²

519 Accesses
7 Citations
Explore all metrics

Abstract

Due to recent advances in data collection and processing, data publishing has emerged by some organizations for scientific and commercial purposes. Published data should be anonymized such that staying useful while the privacy of data respondents is preserved. Microaggregation is a popular mechanism for data anonymization, but naturally operates on numerical datasets. However, the type of data in the real world is usually mixed i.e., there are both numeric and categorical attributes together. In this paper, we propose a novel transformation based method for microaggregation of mixed data called TBM. The method uses multidimensional scaling to generate a numeric equivalent from mixed dataset. The partitioning step of microaggregation is performed on the equivalent dataset but the aggregation step on the original data. TBM can microaggregate large mixed datasets in a short time with low information loss. Experimental results show that the proposed method attains better trade-off between data utility and privacy in a shorter time in comparison with the traditional methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Dongkuan Xu & Yingjie Tian

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Gbeminiyi John Oyewole & George Alex Thopil

Notes

Microaggregation with minimum information loss.
The definition of LCS in ontology is similar to CCG in VGH.
http://archive.ics.uci.edu/ml/datasets/Adult.

References

Abril D, Navarro-Arribas G, Torra V (2010a) Towards privacy preserving information retrieval through semantic microaggregation. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, pp 296–299. IEEE, Piscataway
Abril D, Navarro-Arribas G, Torra V (2010b) Towards semantic microaggregation of categorical data for confidential documents. Modeling decisions for artificial intelligence. Springer, Heidelberg, pp 266–276
Chapter Google Scholar
Alpaydin E (2010) Introduction to machine learning, 2nd edn. The MIT Press, London
MATH Google Scholar
Bai L, Liang J, Dang C (2011) An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowl Based Syst 24(6):785–795
Article Google Scholar
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Article MATH Google Scholar
Cao F, Liang J, Li D, Bai L, Dang C (2012) A dissimilarity measure for the k-modes clustering algorithm. Knowl Based Syst 26:120–127
Article Google Scholar
Chettri S, Borah B (2012) MDAV2K: a variable-size microaggregation technique for privacy preservation. In: International conference on information technology convergence and services, pp 105–118
Chettri S, Borah B (2013) An efficient microaggregation method for protecting mixed data. Computer networks and communications (NetCom). Springer, New York, pp 551–561
Chapter Google Scholar
Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212
Article MathSciNet Google Scholar
Domingo-Ferrer J, Martínez-Ballesté A, Mateo-Sanz JM, Sebé F (2006) Efficient multivariate data-oriented microaggregation. Int J Very Large Data Bases 15(4):355–369
Article Google Scholar
Fayyoumi E, Oommen BJ (2009) Achieving microaggregation for secure statistical databases using fixed-structure partitioning-based learning automata. IEEE Trans Syst Man Cybern B 39(5):1192–1205
Article Google Scholar
Ghinita G, Karras P, Kalnis P, Mamoulis N (2007) Fast data anonymization with low information loss. In: Proceedings of the 33rd international conference on Very large data bases, VLDB Endowment, pp 758–769
Guzman-Arenas A, Cuevas AD, Jimenez A (2011) The centroid or consensus of a set of objects with qualitative attributes. Expert Syst Appl 38(5):4908–4919
Article Google Scholar
Han J, Yu J, Mo Y, Lu J, Liu H (2014) Mage: a semantics retaining k-anonymization method for mixed data. Knowl Based Syst 55:75–86
Article Google Scholar
Hansen SL, Mukherjee S (2003) A polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044
Article Google Scholar
Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD), Singapore, pp 21–34
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304
Article Google Scholar
Jiang W, Clifton C (2006) A secure distributed framework for achieving k-anonymity. Int J Very Large Data Bases 15(4):316–333
Article Google Scholar
Juan Y, Jianmin H, Jianmin C, Zanzhu X (2009) TopDown-KACA: an efficient local-recoding algorithm for k-anonymity. In: IEEE international conference on granular computing, GRC’09, pp 727–732. IEEE, Piscataway
Kokolakis G, Fouskakis D (2009) Importance partitioning in micro-aggregation. Comput Stat Data Anal 53(7):2439–2445
Article MathSciNet MATH Google Scholar
Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Tran Knowl Data Eng 17(7):902–911
Article Google Scholar
Li J, Wong RCW, Fu AWC, Pei J (2006) Achieving k-anonymity by clustering in attribute hierarchical structures. In: Tjoa AM, Trujillo J (eds) DaWaK 2006. Springer, Berlin Heidelberg, pp 405–416
Google Scholar
Li N, Li T, Venkatasubramanian S (2007) t-closeness: Privacy beyond k-anonymity and l-diversity. In: Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE), vol 7, pp 106–115
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1):3
Article Google Scholar
Martínez S, Sánchez D, Valls A (2012) Semantic adaptive microaggregation of categorical microdata. Comput Secur 31(5):653–672
Article Google Scholar
Martínez S, Valls A, Snchez D (2012) Semantically-grounded construction of centroids for datasets with textual attributes. Knowl Based Syst 35:160–172
Article Google Scholar
Monreale A, Trasarti R, Pedreschi D, Renso C, Bogorny V (2011) C-safety: a framework for the anonymization of semantic trajectories. Trans Data Privacy 4(2):73–101
MathSciNet Google Scholar
Mortazavi R, Jalili S (2014) Fast data-oriented microaggregation algorithm for large numerical datasets. Knowl Based Syst 67:195–205
Article Google Scholar
Mortazavi R, Jalili S, Gohargazi H (2013) Multivariate microaggregation by iterative optimization. Appl Intell 39(3):529–544
Article Google Scholar
Pagliuca D, Seri G (1999) Some results of individual ranking method on the system of enterprise accounts annual survey. Esprit SDC Project, Deliverable MI-3 D 2:1999
Samarati P (2001) Protecting respondents identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Article Google Scholar
Solanas A, Martínez-Ballesté A (2006) V-MDAV: Variable group size multivariate microaggregation. COMPSTAT2006 pp 917–925
Solé M, Muntés-Mulero V, Nin J (2012) Efficient microaggregation techniques for large numerical data volumes. Int J Inf Secur 11(4):253–267
Article Google Scholar
Ting-ting C, Jian-min H, Hui-qun Y, Juan Y (2008) An efficient microaggregation algorithm for mixed data. In: Proceedings of the international conference on computer science and software engineering, IEEE Computer Society 3:1053–1056
Torra V (2004) Microaggregation for categorical variables: a median based approach. Privacy in statistical databases. Springer, Heidelberg, pp 162–174
Chapter Google Scholar
Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 133–138

Download references

Author information

Authors and Affiliations

Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
Mostafa Salari & Saeed Jalili
School of Engineering, Damghan University, Damghan, Iran
Reza Mortazavi

Authors

Mostafa Salari
View author publications
You can also search for this author in PubMed Google Scholar
Saeed Jalili
View author publications
You can also search for this author in PubMed Google Scholar
Reza Mortazavi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeed Jalili.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Responsible editor: Charu Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Salari, M., Jalili, S. & Mortazavi, R. TBM, a transformation based method for microaggregation of large volume mixed data. Data Min Knowl Disc 31, 65–91 (2017). https://doi.org/10.1007/s10618-016-0457-y

Download citation

Received: 08 March 2015
Accepted: 27 February 2016
Published: 15 March 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s10618-016-0457-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TBM, a transformation based method for microaggregation of large volume mixed data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TBM, a transformation based method for microaggregation of large volume mixed data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation