Distributed L-diversity using spark-based algorithm for large resource description frameworks data

Jeon, MinHyuk; Temuujin, Odsuren; Ahn, Jinhyun; Im, Dong-Hyuk

doi:10.1007/s11227-020-03583-6

Distributed L-diversity using spark-based algorithm for large resource description frameworks data

Published: 04 January 2021

Volume 77, pages 7270–7286, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

MinHyuk Jeon¹,
Odsuren Temuujin¹,
Jinhyun Ahn² &
…
Dong-Hyuk Im ORCID: orcid.org/0000-0002-0290-755X³

201 Accesses
3 Citations
Explore all metrics

Abstract

Privacy protection issues for resource description frameworks (RDFs) have emerged over the use of public government open data and the healthcare data of individuals. As these data may include personal information, they must undergo a de-identification process that deletes or replaces parts of the original data. To enable these protections, a method has been developed to apply k-anonymization to RDF data. However, sensitive RDF information anonymized using k-anonymization is not completely secure and is vulnerable to attacks. In this paper, we propose an l-diversity anatomy de-identification method that can overcome the limitations of k-anonymity and guarantee stronger privacy protection than k-anonymization. Further, as this data anonymization process is computationally time-intensive, we use Spark distributed computing to provide rapid de-identification to enhance its utility. We also propose l-diversity preservation for dynamically evolving RDF datasets. Experimental results show that our proposed distributed l-diversity algorithm processes the data more efficiently than conventional approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

L-RDFDiversity: Distributed De-Identification for Large RDF Data with Spark

SPARK-Based Partitioning Algorithm for k-Anonymization of Large RDFs

Partitioning Templates for RDF

References

Malik KR, Sam Y, Hussain M, Abuarqoub A (2018) A methodology for real-time data sustainability in smart city: Towards inferencing and analytics for big-data. SCS 39:548–556
Google Scholar
Jo J, Sharma PK, Sicato JCS, Park JH (2019) Emerging technologies for sustainable smart city network security: Issues, challenges, and countermeasures. JIPS 15(4):765–784
Google Scholar
Yin C, Zhou B, Yin Z, Wang J (2019) Local privacy protection classification based on human-centric computing. HCIS 9(33):1–14
Google Scholar
Perez AJ, Zeadally S, Jabeur N (2018) Security and privacy in ubiquitous sensor networks. JIPS 14(2):286–308
Google Scholar
Lee J, Jung J, Park P, Chung S, Cha H (2018) Design of a human-centric de-identification framework for utilizing various clinical research data. HCIS 8(19):1–12
Google Scholar
Sweeney L (2002) k-anonymity: a model for protecting privacy. IJUFKS 10(05):557–570
MathSciNet MATH Google Scholar
Machanavajjhala A, et al. (2006) ℓ-diversity: Privacy beyond k-anonymity. In 22nd international conference on data engineering (ICDE’06). IEEE pp. 24–24
Radulovic F, García Castro R, Gómez-Pérez A (2015) Towards the anonymisation of RDF data. In: 27th International conference on software engineering and knowledge engineering
Temuujin O et al (2019) SPARK-based partitioning algorithm for k-Anonymization of large RDFs. Advanced multimedia and ubiquitous engineering. Springer, Singapore, pp 292–298
Google Scholar
Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: Proceedings of the 32nd international conference on Very large data bases. VLDB Endowment, pp. 139–150
Zaharia M et al (2016) Apache Spark: a unified engine for big data processing. Commun ACM 59(11):56–65
Article Google Scholar
Jeon M, Temuujin O, Shin Y, Ahn J, Im D (2019) L-RDFDiversity: distributed de-identification for large RDF data with Spark. In: Proceedings of the CUTE, Macau, China
Li N, Li T, Venkatasubramanian S (2007) T-closeness: Privacy beyond k-anonymity and l-diversity, In: 2007 IEEE 23rd international conference on data engineering, IEEE, pp. 106–115
Heitmann B, Hermsen F, Decker S (2017) k-rdf-neighbourhood anonymity: Combining structural and attribute-based anonymization for linked data. In: PrivOn@ISWC, 1951
Saripalle R, Algarin A, Ziminski T (2015) Towards knowledge level privacy and security using RDF/RDFS and RBAC. In: Proceedings of the 2015 9th International Conference on Semantic Computing
Ahn J, Im D (2020) Efficient access control of large scale RDF data using prefix-based labeling. IEEE Access 8:122405–122412
Article Google Scholar
Klyne G, Carroll JJ, McBride B (2019) Resource description framework (RDF): concepts and abstract syntax. W3C Recommendation, [online] http://www.w3.org/TR/rdf-concepts
Wilkinson K (2006) Jena property table implementation. In: Proceedings of SWWS
Mallea A, et al. (2011) On blank nodes. In: International semantic web conference. pp. 421–437
Im D, Lee S, Kim H (2012) A version management framework for RDF triple stores. IJSEKE 22:85–106
Google Scholar
Wang P, Wang J (2013) L-diversity algorithm for incremental data release. AMIS 7:2055
Article Google Scholar
Sun X, Wang H, Li J (2008) L-diversity based dynamic update for large time-evolving microdata. In: Australasian joint conference on artificial intelligence, pp. 461–469
Temuujin O, Ahn J, Im DH (2019) Efficient L-diversity algorithm for preserving privacy of dynamically published datasets. IEEE Access 7:122878–122888
Article Google Scholar
GitHub (2020) rvesse/lubm-uba. [online] https://github.com/rvesse/lubm-uba [Accessed 10 Jan. 2020]

Download references

Acknowledgments

This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) supported program (IITP-2020-2018-0-01417) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation); and by the Basic Science Research Program through NRF, funded by the Ministry of Education (No. NRF-2018R1D1A1B07048380).

Author information

Authors and Affiliations

Department of Computer Engineering, Hoseo University, Asan, 31499, Korea
MinHyuk Jeon & Odsuren Temuujin
Department of Management Information Systems, Jeju National University, Jeju, 63243, Korea
Jinhyun Ahn
School of Information Convergence, Kwangwoon University, Seoul, 01890, Korea
Dong-Hyuk Im

Authors

MinHyuk Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Odsuren Temuujin
View author publications
You can also search for this author in PubMed Google Scholar
Jinhyun Ahn
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Hyuk Im
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MinHyuk Jeon and Dong-Hyuk Im conceived the problem and supervised the overall research; Jinhyun Ahn clarified some points that helped Dong-Hyuk Im to develop the algorithm; Odsuren Temuujin and MinHyuk Jeon implemented the algorithm and performed the experiments; MinHyuk Jeon, Odsuren Temuujin, and Dong-Hyuk Im wrote the paper.

Corresponding author

Correspondence to Dong-Hyuk Im.

Ethics declarations

Conflicts of interest

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jeon, M., Temuujin, O., Ahn, J. et al. Distributed L-diversity using spark-based algorithm for large resource description frameworks data. J Supercomput 77, 7270–7286 (2021). https://doi.org/10.1007/s11227-020-03583-6

Download citation

Accepted: 16 December 2020
Published: 04 January 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11227-020-03583-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed L-diversity using spark-based algorithm for large resource description frameworks data

Abstract

Access this article

Similar content being viewed by others

L-RDFDiversity: Distributed De-Identification for Large RDF Data with Spark

SPARK-Based Partitioning Algorithm for k-Anonymization of Large RDFs

Partitioning Templates for RDF

References

Acknowledgments

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distributed L-diversity using spark-based algorithm for large resource description frameworks data

Abstract

Access this article

Similar content being viewed by others

L-RDFDiversity: Distributed De-Identification for Large RDF Data with Spark

SPARK-Based Partitioning Algorithm for k-Anonymization of Large RDFs

Partitioning Templates for RDF

References

Acknowledgments

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation