Skip to main content
Log in

TBM, a transformation based method for microaggregation of large volume mixed data

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Due to recent advances in data collection and processing, data publishing has emerged by some organizations for scientific and commercial purposes. Published data should be anonymized such that staying useful while the privacy of data respondents is preserved. Microaggregation is a popular mechanism for data anonymization, but naturally operates on numerical datasets. However, the type of data in the real world is usually mixed i.e., there are both numeric and categorical attributes together. In this paper, we propose a novel transformation based method for microaggregation of mixed data called TBM. The method uses multidimensional scaling to generate a numeric equivalent from mixed dataset. The partitioning step of microaggregation is performed on the equivalent dataset but the aggregation step on the original data. TBM can microaggregate large mixed datasets in a short time with low information loss. Experimental results show that the proposed method attains better trade-off between data utility and privacy in a shorter time in comparison with the traditional methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Microaggregation with minimum information loss.

  2. The definition of LCS in ontology is similar to CCG in VGH.

  3. http://archive.ics.uci.edu/ml/datasets/Adult.

References

  • Abril D, Navarro-Arribas G, Torra V (2010a) Towards privacy preserving information retrieval through semantic microaggregation. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, pp 296–299. IEEE, Piscataway

  • Abril D, Navarro-Arribas G, Torra V (2010b) Towards semantic microaggregation of categorical data for confidential documents. Modeling decisions for artificial intelligence. Springer, Heidelberg, pp 266–276

    Chapter  Google Scholar 

  • Alpaydin E (2010) Introduction to machine learning, 2nd edn. The MIT Press, London

    MATH  Google Scholar 

  • Bai L, Liang J, Dang C (2011) An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowl Based Syst 24(6):785–795

    Article  Google Scholar 

  • Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517

    Article  MATH  Google Scholar 

  • Cao F, Liang J, Li D, Bai L, Dang C (2012) A dissimilarity measure for the k-modes clustering algorithm. Knowl Based Syst 26:120–127

    Article  Google Scholar 

  • Chettri S, Borah B (2012) MDAV2K: a variable-size microaggregation technique for privacy preservation. In: International conference on information technology convergence and services, pp 105–118

  • Chettri S, Borah B (2013) An efficient microaggregation method for protecting mixed data. Computer networks and communications (NetCom). Springer, New York, pp 551–561

    Chapter  Google Scholar 

  • Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212

    Article  MathSciNet  Google Scholar 

  • Domingo-Ferrer J, Martínez-Ballesté A, Mateo-Sanz JM, Sebé F (2006) Efficient multivariate data-oriented microaggregation. Int J Very Large Data Bases 15(4):355–369

    Article  Google Scholar 

  • Fayyoumi E, Oommen BJ (2009) Achieving microaggregation for secure statistical databases using fixed-structure partitioning-based learning automata. IEEE Trans Syst Man Cybern B 39(5):1192–1205

    Article  Google Scholar 

  • Ghinita G, Karras P, Kalnis P, Mamoulis N (2007) Fast data anonymization with low information loss. In: Proceedings of the 33rd international conference on Very large data bases, VLDB Endowment, pp 758–769

  • Guzman-Arenas A, Cuevas AD, Jimenez A (2011) The centroid or consensus of a set of objects with qualitative attributes. Expert Syst Appl 38(5):4908–4919

    Article  Google Scholar 

  • Han J, Yu J, Mo Y, Lu J, Liu H (2014) Mage: a semantics retaining k-anonymization method for mixed data. Knowl Based Syst 55:75–86

    Article  Google Scholar 

  • Hansen SL, Mukherjee S (2003) A polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044

    Article  Google Scholar 

  • Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD), Singapore, pp 21–34

  • Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304

    Article  Google Scholar 

  • Jiang W, Clifton C (2006) A secure distributed framework for achieving k-anonymity. Int J Very Large Data Bases 15(4):316–333

    Article  Google Scholar 

  • Juan Y, Jianmin H, Jianmin C, Zanzhu X (2009) TopDown-KACA: an efficient local-recoding algorithm for k-anonymity. In: IEEE international conference on granular computing, GRC’09, pp 727–732. IEEE, Piscataway

  • Kokolakis G, Fouskakis D (2009) Importance partitioning in micro-aggregation. Comput Stat Data Anal 53(7):2439–2445

    Article  MathSciNet  MATH  Google Scholar 

  • Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Tran Knowl Data Eng 17(7):902–911

    Article  Google Scholar 

  • Li J, Wong RCW, Fu AWC, Pei J (2006) Achieving k-anonymity by clustering in attribute hierarchical structures. In: Tjoa AM, Trujillo J (eds) DaWaK 2006. Springer, Berlin Heidelberg, pp 405–416

    Google Scholar 

  • Li N, Li T, Venkatasubramanian S (2007) t-closeness: Privacy beyond k-anonymity and l-diversity. In: Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE), vol 7, pp 106–115

  • Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1):3

    Article  Google Scholar 

  • Martínez S, Sánchez D, Valls A (2012) Semantic adaptive microaggregation of categorical microdata. Comput Secur 31(5):653–672

    Article  Google Scholar 

  • Martínez S, Valls A, Snchez D (2012) Semantically-grounded construction of centroids for datasets with textual attributes. Knowl Based Syst 35:160–172

    Article  Google Scholar 

  • Monreale A, Trasarti R, Pedreschi D, Renso C, Bogorny V (2011) C-safety: a framework for the anonymization of semantic trajectories. Trans Data Privacy 4(2):73–101

    MathSciNet  Google Scholar 

  • Mortazavi R, Jalili S (2014) Fast data-oriented microaggregation algorithm for large numerical datasets. Knowl Based Syst 67:195–205

    Article  Google Scholar 

  • Mortazavi R, Jalili S, Gohargazi H (2013) Multivariate microaggregation by iterative optimization. Appl Intell 39(3):529–544

    Article  Google Scholar 

  • Pagliuca D, Seri G (1999) Some results of individual ranking method on the system of enterprise accounts annual survey. Esprit SDC Project, Deliverable MI-3 D 2:1999

  • Samarati P (2001) Protecting respondents identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027

    Article  Google Scholar 

  • Solanas A, Martínez-Ballesté A (2006) V-MDAV: Variable group size multivariate microaggregation. COMPSTAT2006 pp 917–925

  • Solé M, Muntés-Mulero V, Nin J (2012) Efficient microaggregation techniques for large numerical data volumes. Int J Inf Secur 11(4):253–267

    Article  Google Scholar 

  • Ting-ting C, Jian-min H, Hui-qun Y, Juan Y (2008) An efficient microaggregation algorithm for mixed data. In: Proceedings of the international conference on computer science and software engineering, IEEE Computer Society 3:1053–1056

  • Torra V (2004) Microaggregation for categorical variables: a median based approach. Privacy in statistical databases. Springer, Heidelberg, pp 162–174

    Chapter  Google Scholar 

  • Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 133–138

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeed Jalili.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Responsible editor: Charu Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salari, M., Jalili, S. & Mortazavi, R. TBM, a transformation based method for microaggregation of large volume mixed data. Data Min Knowl Disc 31, 65–91 (2017). https://doi.org/10.1007/s10618-016-0457-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-016-0457-y

Keywords

Navigation