Abstract
In this paper, we present a framework for privacy preserving collaborative data analysis among multiple data providers acting as edge of a cloud environment. The proposed framework computes the best trade-off among privacy and result accuracy, based on the privacy requirements of data providers and the specific requested analysis algorithm. Though the presented model is general and can be applied to different environments, this work is motivated by the need of sharing information related to Cyber Threats (CTI). The presented framework is independent from the number of data providers, used data format, privacy requirement and analysis operations. The model is based on the concepts of trade-off score between accuracy and privacy, which also considers measures for privacy requirement such as differential privacy, l-diversity and k-anonymity. Together with the model, the paper discusses the framework implementation and presents results to show the effectiveness and viability of the proposed approach.
Similar content being viewed by others
Notes
The other privacy-preserving techniques in this category, such as data swapping, are not common approaches. Therefore, we do not consider them in this study.
The process is iteratively repeated for 4 times to fulfill the 5 privacy levels’ requirement.
References
Ashok, V., Navuluri, K., Alhafdhi, A., Mukkamala, R.: Dataless data mining: association rules-based distributed privacy-preserving data mining. In: 12th International Conference on Information Technology—New Generations (ITNG), 2015, pp. 615–620 (2015)
Bertino, E., Lin, D., Jiang, W.: A survey of quantification of privacy preserving data mining algorithms. In: Privacy-preserving data mining, vol. 34, pp. 183–205. Springer (2008)
Bogan, E., English, J.: Benchmarking for Best Practices: Winning Through Innovative Adaptation (1994)
Brown, G., Pocock, A.C., Zhao, M., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, 27–66 (2012)
Bunn, P., Ostrovsky, R.: Secure two-party k-means clustering. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 486–497. CCS 07, ACM (2007)
Chen, K., Liu, L.: Privacy-preserving multiparty collaborative mining with geometric data perturbation. IEEE Trans. Parall. Distrib. Syst. 20(12), 1764–1776 (2009)
Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.Y.: Tools for privacy preserving distributed data mining. SIGKDD Explor. Newsl. 4(2), 28–34 (2002)
Costantino, G., Marra, A.L., Martinelli, F., Mori, P., Saracino, A.: Privacy preserving distributed attribute computation for usage control in the internet of things. In: 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications, pp. 1844–1851 (2018)
De Vito, S., Massera, E., Piga, M., Martinotto, L., Di Francia, G.: On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sens. Actuators: B. Chem. 129(2), 750–757 (2008)
Dwork, C.: Differential privacy: a survey of results. In: Proceedings of the 5th International Conference on Theory and Applications of Models of Computation, pp. 1–19. TAMC’08 (2008)
Egea, M., Matteucci, I., Mori, P., Petrocchi, M.: Definition of data sharing agreements—the case of spanish data protection law. In: Accountability and Security in the Cloud, June 2–6, 2014, pp. 248–272 (2014)
Fan, W., He, J., Guo, M., Li, P., Han, Z., Wang, R.: Privacy preserving classification on local differential privacy in data centers. J. Parall. Distrib. Comput. 135, 70–82 (2020)
Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4), 14:1–14:53 (2010)
Gao, C., Li, J., Xia, S., Choo, K.R., Lou, W., Dong, C.: Mas-encryption and its applications in privacy-preserving classifiers. IEEE Trans. Knowl. Data Eng. 1, 1–17 (2020)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Hamidi, M., Sheikhalishahi, M., Martinelli, F.: Secure two-party agglomerative hierarchical clustering construction. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, ICISSP. pp. 432–437 (2018)
Inan, A., Kantarcioglu, M., Bertino, E.: Using anonymized data for classification. In: International Conference on Data Engineering. ICDE 09 (2009)
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 279–288. KDD ’02, ACM (2002)
Khodaparast, F., Sheikhalishahi, M., Haghighi, H., Martinelli, F.: Privacy preserving random decision tree classification over horizontally and vertically partitioned data. In: IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, pp. 600–607 (2018)
Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 217–228. SIGMOD ’06, ACM (2006)
Li, T., Li, J., Liu, Z., Li, P., Jia, C.: Differentially private naive bayes learning over multiple data sources. Inf. Sci. 1, 89–104 (2018)
Lichman, M.: UCI machine learning repository (2013)
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 1 (2007)
Martinelli, F., Sheikhalishahi, M.: Distributed data anonymization. In: IEEE International Conference on Dependable, Autonomic and Secure Computing (DASC), pp. 580–586 (2019)
Martinelli, F., Matteucci, I., Petrocchi, M., Wiegand, L.: A formal support for collaborative data sharing. International Cross-Domain Conference and Workshop on Availability, Reliability, and Security, CD-ARES 2012, 547–561 (2012)
Martinelli, F., Riesco, R.: Nis wg3 deliverable: Strategic research agenda (2015). https://resilience.enisa.europa.eu/nis-platform/shared-documents/3rd-plenary-meeting-april-2015
Martinelli, F., Saracino, A., Sheikhalishahi, M.: Modeling privacy aware information sharing systems: a formal and general approach. In: IEEE Trustcom/BigDataSE/ISPA. pp. 767–774 (2016)
Matteucci, I., Petrocchi, M., Sbodio, M.L.: CNL4DSA: a controlled natural language for data sharing agreements. In: Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), pp. 616–620
Matteucci, I., Petrocchi, M., Sbodio, M.L., Wiegand, L.: A design phase for data sharing agreements. In: Data Privacy Management and Autonomous Spontaneus Security, pp. 25–41. Springer (2012)
Maymounkov, P., Mazieres, D.: Kademlia: A peer-to-peer information system based on the xor metric. In: Druschel, P., Kaashoek, F., Rowstron, A. (eds.) Peer-to-Peer Systems, pp. 53–65. Springer, Berlin (2002)
Mohammed, N., Alhadidi, D., Fung, B.C.M., Debbabi, M.: Secure two-party differentially private data release for vertically partitioned data. IEEE Trans. Dependable Sec. Comput. 11(1), 59–71 (2014)
Mohammed, N., Chen, R., Fung, B.C., Yu, P.S.: Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, pp. 493–501. ACM (2011)
Oliveira, S.R.M., Zaïane, O.R.: Privacy preserving frequent itemset mining. In: Proceedings of the IEEE International Conference on Privacy, Security and Data Mining—Volume 14, CRPIT ’14. pp. 43–54. Australian Computer Society, Inc. (2002)
Rubinstein, B.I.P., Bartlett, P.L., Huang, L., Taft, N.: Learning in a large function space: Privacy-preserving mechanisms for svm learning. CoRR (2009). arXiv:0911.5708
Seligman, L., Rosenthal, A., Caverlee, J.: Data service agreements: toward a data supply chain. In: Proceedings of the Information Integration on the Web workshop at the Very Large Database Conference, Toronto (2004)
Sheikhalishahi, M., Martinelli, F.: Privacy-utility feature selection as a tool in private data classification. In: 14th International Conference on Distributed Computing and Artificial Intelligence DCAI, pp. 254–261 (2017)
Sheikhalishahi, M., Nateghizad, M., Martinelli, F., Erkin, Z., Loog, M.: On the statistical detection of adversarial instances over encrypted data. In: Security and Trust Management—15th International Workshop, STM, pp. 71–88 (2019)
Swarup, V., Seligman, L., Rosenthal, A.: A data sharing agreement framework. In: International Conference on Information Systems Security, pp. 22–36. Springer (2006)
Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)
Xiao, M.J., Huang, L.S., Luo, Y.L., Shen, H.: Privacy preserving id3 algorithm over horizontally partitioned data. In: 6th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT’05), pp. 239–243 (2005)
Acknowledgements
This work has been partially supported by the H2020 EU-funded projects C3ISP (GA #700294), SIFIS-Home (GA #952652) and the ECSEL project SECREDAS (#783119).
Funding
Authors have received funding only from Public International research agencies.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they do not have any conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sheikhalishahi, M., Saracino, A., Martinelli, F. et al. Privacy preserving data sharing and analysis for edge-based architectures. Int. J. Inf. Secur. 21, 79–101 (2022). https://doi.org/10.1007/s10207-021-00542-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10207-021-00542-x