Skip to main content
Log in

Data anonymization: a novel optimal k-anonymity algorithm for identical generalization hierarchy data in IoT

  • Special Issue Paper
  • Published:
Service Oriented Computing and Applications Aims and scope Submit manuscript

Abstract

Advancement in the Internet of Things (IoT) technologies makes life more convenient for people. Data sensed from the devices can be used for analyzing and responding to people’s needs seamlessly. An important consequence of such convenience is that privacy protection becomes a very important issue to be addressed effectively. Various data anonymization model has been proposed for such issue—one of the most widely applied models is the k-anonymity. The k-anonymity prevents the re-identification by replacing the input data with its more general form for transforming the data to have at least k identical tuples. In this paper, we focus on a special case of the input datasets which all the quasi-identifiers, the linkable attributes in the dataset, have identical data types, so-called identical generalization hierarchy (IGH). The solutions for such case will be applicable effectively to address the general IoT data privacy protection due to its data nature. We proposed a novel method to provide a globally optimized k-anonymity solution for the IGH datasets. The proposed algorithms determine an optimal solution based on the characteristics of the IGH data by visiting and evaluating only essential nodes of generalization lattice that satisfy the k-anonymity. Since the k-anonymization problem is an NP-hard, we show that our algorithm can efficiently find an optimal k-anonymity solutions with exploiting such special characteristics of the IGH data, i.e., the optimality between the nodes in different levels of generalization lattice. From the experimental results, it is obvious that our algorithm is much more efficient than the comparative algorithms by less searching on the given lattice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of the 21st international conference on data engineering, ICDE ’05. IEEE Computer Society, Washington, pp 217–228

  2. Ciglic M, Eder J, Koncilia C (2014) Anon-a flexible tool for achieving optimal k-anonymous and l-diverse tables

  3. Data privacy day (2016) A call for better privacy practices. https://www.trendmicro.com/vinfo/us/security/news/online-privacy/data-privacy-day-a-call-for-better-privacy-practices

  4. Divvela A (2018) A novel approach to privacy-preserving of IoT devices. Int J Pure Appl Math 118:4715–4719

    Google Scholar 

  5. El Emam K, Dankar F, Issa R, Jonker E, Amyot D, Cogo E, Corriveau JP, Walker M, Chowdhury S, Vaillancourt R, Roffey T, Bottomley J (2009) A globally optimal k-anonymity method for the de-identification of health data. J Am Med Inform Assoc 16:670–82

    Article  Google Scholar 

  6. Ghinita G, Karras P, Kalnis P, Mamoulis N (2007) Fast data anonymization with low information loss. In: Proceedings of the 33rd international conference on very large data bases, VLDB ’07. VLDB Endowment, pp 758–769

  7. Goldberg K, Roeder T, Gupta D, Perkins C (2001) Eigentaste: a constant time collaborative filtering algorithm. Inf Retr 4(2):133–151

    Article  Google Scholar 

  8. Harper FM, Konstan JA (2015) The movielens datasets: history and context. ACM Trans Interact Intell Syst 5(4):19:1–19:19

    Google Scholar 

  9. Jansi KR, Kasmir Raja SV, Sandhia GK (2018) Efficient privacy-preserving fault tolerance aggregation for people-centric sensing system. Serv Oriented Comput Appl 12(3):305–315. https://doi.org/10.1007/s11761-018-0241-5

    Article  Google Scholar 

  10. Kohlmayer F, Prasser F, Eckert C, Kemper A, Kuhn KA (2012) Flash: efficient, stable and optimal k-anonymity. In: 2012 international conference on privacy, security, risk and trust and 2012 international conference on social computing, pp 708–717. https://doi.org/10.1109/SocialCom-PASSAT.2012.52

  11. LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, SIGMOD ’05. ACM, New York, pp 49–60

  12. Maram B, Gnanasekar JM, Manogaran G, Balaanand M (2019) Intelligent security algorithm for unicode data privacy and security in IoT. Serv Oriented Comput Appl 13(1):3–15. https://doi.org/10.1007/s11761-018-0249-x

    Article  Google Scholar 

  13. Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: Proceedings of the Twenty-third ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’04. ACM, New York, pp 223–228

  14. Prasser F, Kohlmayer F, Kuhn KA (2014) A benchmark of globally-optimal anonymization methods for biomedical data. In: 2014 IEEE 27th international symposium on computer-based medical systems, pp 66–71. https://doi.org/10.1109/CBMS.2014.85

  15. Samarati P (2001) Protecting respondents identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027

    Article  Google Scholar 

  16. Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, vol 98

  17. Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(5):571–588

    Article  MathSciNet  Google Scholar 

  18. Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):1–14

    MathSciNet  MATH  Google Scholar 

  19. Sweeney LA (2001) Computational disclosure control: a primer on data privacy protection. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA, USA. AAI0803469

  20. Wong RCW, Li J, Fu AWC, Wang K (2006) (\(\alpha \), k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD \(\acute{6}\). ACM, New York, pp 754–759

  21. Zheng Y (2011) T-drive trajectory data sample. T-Drive sample dataset. https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/

Download references

Acknowledgements

This research work was partially supported by Chiang Mai University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juggapong Natwichai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahanan, W., Chaovalitwongse, W.A. & Natwichai, J. Data anonymization: a novel optimal k-anonymity algorithm for identical generalization hierarchy data in IoT. SOCA 14, 89–100 (2020). https://doi.org/10.1007/s11761-020-00287-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11761-020-00287-w

Keywords

Navigation