Skip to main content
Log in

Distributed L-diversity using spark-based algorithm for large resource description frameworks data

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Privacy protection issues for resource description frameworks (RDFs) have emerged over the use of public government open data and the healthcare data of individuals. As these data may include personal information, they must undergo a de-identification process that deletes or replaces parts of the original data. To enable these protections, a method has been developed to apply k-anonymization to RDF data. However, sensitive RDF information anonymized using k-anonymization is not completely secure and is vulnerable to attacks. In this paper, we propose an l-diversity anatomy de-identification method that can overcome the limitations of k-anonymity and guarantee stronger privacy protection than k-anonymization. Further, as this data anonymization process is computationally time-intensive, we use Spark distributed computing to provide rapid de-identification to enhance its utility. We also propose l-diversity preservation for dynamically evolving RDF datasets. Experimental results show that our proposed distributed l-diversity algorithm processes the data more efficiently than conventional approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Malik KR, Sam Y, Hussain M, Abuarqoub A (2018) A methodology for real-time data sustainability in smart city: Towards inferencing and analytics for big-data. SCS 39:548–556

    Google Scholar 

  2. Jo J, Sharma PK, Sicato JCS, Park JH (2019) Emerging technologies for sustainable smart city network security: Issues, challenges, and countermeasures. JIPS 15(4):765–784

    Google Scholar 

  3. Yin C, Zhou B, Yin Z, Wang J (2019) Local privacy protection classification based on human-centric computing. HCIS 9(33):1–14

    Google Scholar 

  4. Perez AJ, Zeadally S, Jabeur N (2018) Security and privacy in ubiquitous sensor networks. JIPS 14(2):286–308

    Google Scholar 

  5. Lee J, Jung J, Park P, Chung S, Cha H (2018) Design of a human-centric de-identification framework for utilizing various clinical research data. HCIS 8(19):1–12

    Google Scholar 

  6. Sweeney L (2002) k-anonymity: a model for protecting privacy. IJUFKS 10(05):557–570

    MathSciNet  MATH  Google Scholar 

  7. Machanavajjhala A, et al. (2006) ℓ-diversity: Privacy beyond k-anonymity. In 22nd international conference on data engineering (ICDE’06). IEEE pp. 24–24

  8. Radulovic F, García Castro R, Gómez-Pérez A (2015) Towards the anonymisation of RDF data. In: 27th International conference on software engineering and knowledge engineering

  9. Temuujin O et al (2019) SPARK-based partitioning algorithm for k-Anonymization of large RDFs. Advanced multimedia and ubiquitous engineering. Springer, Singapore, pp 292–298

    Google Scholar 

  10. Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: Proceedings of the 32nd international conference on Very large data bases. VLDB Endowment, pp. 139–150

  11. Zaharia M et al (2016) Apache Spark: a unified engine for big data processing. Commun ACM 59(11):56–65

    Article  Google Scholar 

  12. Jeon M, Temuujin O, Shin Y, Ahn J, Im D (2019) L-RDFDiversity: distributed de-identification for large RDF data with Spark. In: Proceedings of the CUTE, Macau, China

  13. Li N, Li T, Venkatasubramanian S (2007) T-closeness: Privacy beyond k-anonymity and l-diversity, In: 2007 IEEE 23rd international conference on data engineering, IEEE, pp. 106–115

  14. Heitmann B, Hermsen F, Decker S (2017) k-rdf-neighbourhood anonymity: Combining structural and attribute-based anonymization for linked data. In: PrivOn@ISWC, 1951

  15. Saripalle R, Algarin A, Ziminski T (2015) Towards knowledge level privacy and security using RDF/RDFS and RBAC. In: Proceedings of the 2015 9th International Conference on Semantic Computing

  16. Ahn J, Im D (2020) Efficient access control of large scale RDF data using prefix-based labeling. IEEE Access 8:122405–122412

    Article  Google Scholar 

  17. Klyne G, Carroll JJ, McBride B (2019) Resource description framework (RDF): concepts and abstract syntax. W3C Recommendation, [online] http://www.w3.org/TR/rdf-concepts

  18. Wilkinson K (2006) Jena property table implementation. In: Proceedings of SWWS

  19. Mallea A, et al. (2011) On blank nodes. In: International semantic web conference. pp. 421–437

  20. Im D, Lee S, Kim H (2012) A version management framework for RDF triple stores. IJSEKE 22:85–106

    Google Scholar 

  21. Wang P, Wang J (2013) L-diversity algorithm for incremental data release. AMIS 7:2055

    Article  Google Scholar 

  22. Sun X, Wang H, Li J (2008) L-diversity based dynamic update for large time-evolving microdata. In: Australasian joint conference on artificial intelligence, pp. 461–469

  23. Temuujin O, Ahn J, Im DH (2019) Efficient L-diversity algorithm for preserving privacy of dynamically published datasets. IEEE Access 7:122878–122888

    Article  Google Scholar 

  24. GitHub (2020) rvesse/lubm-uba. [online] https://github.com/rvesse/lubm-uba [Accessed 10 Jan. 2020]

Download references

Acknowledgments

This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) supported program (IITP-2020-2018-0-01417) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation); and by the Basic Science Research Program through NRF, funded by the Ministry of Education (No. NRF-2018R1D1A1B07048380).

Author information

Authors and Affiliations

Authors

Contributions

MinHyuk Jeon and Dong-Hyuk Im conceived the problem and supervised the overall research; Jinhyun Ahn clarified some points that helped Dong-Hyuk Im to develop the algorithm; Odsuren Temuujin and MinHyuk Jeon implemented the algorithm and performed the experiments; MinHyuk Jeon, Odsuren Temuujin, and Dong-Hyuk Im wrote the paper.

Corresponding author

Correspondence to Dong-Hyuk Im.

Ethics declarations

Conflicts of interest

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jeon, M., Temuujin, O., Ahn, J. et al. Distributed L-diversity using spark-based algorithm for large resource description frameworks data. J Supercomput 77, 7270–7286 (2021). https://doi.org/10.1007/s11227-020-03583-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03583-6

Keywords

Navigation