CoK: A Survey of Privacy Challenges in Relation to Data Meshes

Podlesny, Nikolai J.; Kayem, Anne V. D. M.; Meinel, Christoph

doi:10.1007/978-3-031-12423-5_7

Nikolai J. Podlesny¹²,
Anne V. D. M. Kayem¹² &
Christoph Meinel¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13426))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1356 Accesses
2 Citations

Abstract

The growing volumes of data that appear on multiple distributed platforms raise the question of how to compose data meshes that can be published and/or shared safely amongst multiple cooperating parties. Data meshes are composed of subsets (or whole sets) of data repositories that are owned by autonomous parties. This raises new challenges in terms of guaranteeing privacy across various data mesh compositions. In this paper, we present a survey of the issues that emerge in guaranteeing the privacy of distributed mesh data. We discuss the limitations of existing solutions in handling personal data privacy with respect to meshed data. Finally, we postulate that identifying personal data in such datasets must be handled with a performance efficient algorithm that can determine (on-the-fly), potential linkages across various data repositories, that could be exploited to subvert privacy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/jaSunny/synthetic_genome_data.

References

Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. ACM (2016)
Google Scholar
Abedjan, Z., Naumann, F.: Advancing the discovery of unique column combinations. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1565–1570 (2011)
Google Scholar
Abowd, J.M.: The US census bureau adopts differential privacy. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, p. 2867 (2018)
Google Scholar
Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, VLDB Endowment, pp. 901–909 (2005)
Google Scholar
Barth-Jones, D.: The ‘re-identification’ of governor William Weld’s medical information: a critical re-examination of health data identification risks and privacy protections, then and now (July 2012)
Google Scholar
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: 2005 Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 217–228. IEEE (2005)
Google Scholar
Beall, M.W., Shephard, M.S.: A general topology-based mesh data structure. Int. J. Numer. Meth. Eng. 40(9), 1573–1596 (1997)
Article MathSciNet Google Scholar
Birnick, J., Bläsius, T., Friedrich, T., Naumann, F., Papenbrock, T., Schirneck, M.: Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13(12), 2270–2283 (2020)
Article Google Scholar
Bläsius, T., Friedrich, T., Schirneck, M.: The parameterized complexity of dependency detection in relational databases. In: 11th International Symposium on Parameterized and Exact Computation, Dagstuhl, Germany, vol. 63, pp. 6:1–6:13 (2017)
Google Scholar
Braghin, S., Gkoulalas-Divanis, A., Wurst, M.: Detecting quasi-identifiers in datasets (16 January 2018). US Patent 9,870,381
Google Scholar
Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 188–200. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71703-4_18
Chapter Google Scholar
Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. In: 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), pp. 88–93. IEEE (2013)
Google Scholar
Dagum, P., Luby, M.: Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artif. Intell. 60(1), 141–153 (1993)
Article MathSciNet Google Scholar
Dankar, F.K., El Emam, K.: Practicing differential privacy in health care: a review. Trans. Data Priv. 6(1), 35–67 (2013)
MathSciNet Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Dehghani, Z.: Data mesh principles and logical architecture. martinfowler.com (2020)
Downey, R.G., Fellows, M.R.: Fundamentals of Parameterized Complexity. TCS, vol. 4. Springer, London (2013). https://doi.org/10.1007/978-1-4471-5559-1
Book MATH Google Scholar
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Chapter MATH Google Scholar
Dwork, C.: Differential privacy. In: van Tilborg, H.C.A., Jajodia, S. (eds.) Encyclopedia of Cryptography and Security, pp. 338–340. Springer, Boston (2011). https://doi.org/10.1007/978-1-4419-5906-5_752
Chapter Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Chapter Google Scholar
Dwork, C., Naor, M., Reingold, O., Rothblum, G.N., Vadhan, S.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, pp. 381–390. ACM, New York, NY, USA (2009)
Google Scholar
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends® Theor. Compu. Sci. 9(3–4), 211–407 (2013)
Article MathSciNet Google Scholar
Dwork, C., Smith, A.: Differential privacy for statistics: what we know and what we want to learn. J. Priv. Confid. 1(2), 135–154 (2010)
Google Scholar
European Commission: Opinion 05/2014 on anonymisation techniques (April 2014)
Google Scholar
Feldmann, B.: Distributed Unique Column Combinations Discovery. Hasso-Plattner-Institute, January 2020. https://hpi.de/fileadmin/user_upload/fachgebiete/friedrich/documents/Schirneck/Feldmann_masters_thesis.pdf
Franconi, E., Kuper, G., Lopatenko, A., Serafini, L.: A robust logical and computational characterisation of peer-to-peer database systems. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) DBISP2P 2003. LNCS, vol. 2944, pp. 64–76. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24629-9_6
Chapter Google Scholar
Fredj, F.B., Lammari, N., Comyn-Wattiau, I.: Abstracting anonymization techniques: a prerequisite for selecting a generalization algorithm. Procedia Comput. Sci. 60, 206–215 (2015)
Article Google Scholar
Ganesh, P., KamalRaj, R., Karthik, S.: Protection of privacy in distributed databases using clustering. Int. J. Mod. Eng. Res. 2, 1955–1957 (2012)
Google Scholar
Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB Endowment, pp. 758–769 (2007)
Google Scholar
Gribble, S.D., Halevy, A.Y., Ives, Z.G., Rodrig, M., Suciu, D.: What can database do for peer-to-peer? In: WebDB, vol. 1, pp. 31–36 (2001)
Google Scholar
Han, S., Cai, X., Wang, C., Zhang, H., Wen, Y.: Discovery of unique column combinations with hadoop. In: Chen, L., Jia, Y., Sellis, T., Liu, G. (eds.) APWeb 2014. LNCS, vol. 8709, pp. 533–541. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11116-2_49
Chapter Google Scholar
Heise, A., Quiané-Ruiz, J.A., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. Proc. VLDB Endow. 7(4), 301–312 (2013)
Article Google Scholar
Islam, M.Z., Brankovic, L.: Privacy preserving data mining: a noise addition framework using a novel clustering technique. Knowl. Based Syst. 24(8), 1214–1223 (2011)
Article Google Scholar
Ji, Z., Lipton, Z.C., Elkan, C.: Differential privacy and machine learning: a survey and review (2014)
Google Scholar
Kalske, M., Mäkitalo, N., Mikkonen, T.: Challenges when moving from Monolith to microservice architecture. In: Garrigós, I., Wimmer, M. (eds.) ICWE 2017. LNCS, vol. 10544, pp. 32–47. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-74433-9_3
Chapter Google Scholar
Kifer, D., Machanavajjhala, A.: No free lunch in data privacy. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 193–204. ACM, New York (2011)
Google Scholar
Kohlmayer, F., Prasser, F., Eckert, C., Kuhn, K.A.: A flexible approach to distributed data anonymization. J. Biomed. Inform. 50, 62–76 (2014)
Article Google Scholar
Koufogiannis, F., Han, S., Pappas, G.J.: Optimality of the Laplace mechanism in differential privacy. arXiv preprint arXiv:1504.00065 (2015)
Lee, J., Clifton, C.: How much is enough? Choosing \({\varepsilon }\) for differential privacy. In: Lai, X., Zhou, J., Li, H. (eds.) ISC 2011. LNCS, vol. 7001, pp. 325–340. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24861-0_22
Chapter Google Scholar
Leoni, D.: Non-interactive differential privacy: a survey. In: Proceedings of the 1st International Workshop on Open Data, pp. 40–52. ACM (2012)
Google Scholar
Li, C., Miklau, G., Hay, M., McGregor, A., Rastogi, V.: The matrix mechanism: optimizing linear counting queries under differential privacy. VLDB J. 24(6), 757–781 (2015). https://doi.org/10.1007/s00778-015-0398-x
Article Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115 (April 2007)
Google Scholar
Li, N., Lyu, M., Su, D., Yang, W.: Differential privacy: from theory to practice. Synth. Lect. Inf. Secur. Priv. Trust 8(4), 1–138 (2016)
Google Scholar
Liu, F.: Generalized gaussian mechanism for differential privacy. arXiv preprint arXiv:1602.06028 (2016)
Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006)
Article Google Scholar
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 3 (2007)
Article Google Scholar
Masud, M., Kiringa, I.: Transaction processing in a peer to peer database network. Data Knowl. Eng. 70(4), 307–334 (2011)
Article Google Scholar
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 2007 48th Annual IEEE Symposium on Foundations of Computer Science, pp. 94–103. IEEE (2007)
Google Scholar
Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium, pp. 223–228. ACM (2004)
Google Scholar
Mohammed, N., Fung, B., Hung, P.C., Lee, C.K.: Centralized and distributed anonymization for high-dimensional healthcare data. ACM Trans. Knowl. Discov. Data (TKDD) 4(4), 18 (2010)
Google Scholar
Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: 2008 IEEE Symposium on Security and Privacy, SP 2008, pp. 111–125. IEEE (2008)
Google Scholar
Narayanan, A., Shmatikov, V.: Myths and fallacies of “personally identifiable information’’. Commun. ACM 53(6), 24–26 (2010)
Article Google Scholar
Neapolitan, R.E.: Probabilistic reasoning in expert systems: theory and algorithms. CreateSpace Independent Publishing Platform (2012)
Google Scholar
Nergiz, M.E., Atzori, M., Clifton, C.: Hiding the presence of individuals from shared databases. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 665–676 (2007)
Google Scholar
Newman, S.: Monolith to Microservices: Evolutionary Patterns to Transform Your Monolith. O’Reilly Media (2019)
Google Scholar
Papenbrock, T., Naumann, F.: A hybrid approach for efficient unique column combination discovery. Proc. der Fachtagung Business, Technologie und Web (BTW). GI, Bonn, Deutschland (accepted) Google Scholar (2017)
Google Scholar
Phil, B., Giunchiglia, F., Kementsietsidis, A., Mylopoulos, J., Serafini, L., Zaihrayeu, I.: Data management for peer-to-peer computing: a vision. In: 5th International Workshop on the Web and Databases, WebDB 2002 (2002)
Google Scholar
Podlesny, N.J., Kayem, A.V., Meinel, C.: Attribute compartmentation and greedy UCC discovery for high-dimensional data anonymization. In: Proceedings of the 9th ACM Conference on Data and Application Security and Privacy, pp. 109–119 (2019)
Google Scholar
Podlesny, N.J., Kayem, A.V., Meinel, C.: Identifying data exposure across high-dimensional health data silos through Bayesian networks optimised by multigrid and manifold. In: 2019 IEEE 17th International Conference on Dependable, Autonomic and Secure Computing (DASC). IEEE (2019)
Google Scholar
Podlesny, N.J., Kayem, A.V.D.M., Meinel, C.: Towards identifying de-anonymisation risks in distributed health data silos. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DEXA 2019. LNCS, vol. 11706, pp. 33–43. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27615-7_3
Chapter Google Scholar
Podlesny, N.J., Kayem, A.V.D.M., Meinel, C.: A parallel quasi-identifier discovery scheme for dependable data anonymisation. In: Hameurlain, A., Tjoa, A.M. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems L. LNCS, vol. 12930, pp. 1–24. Springer, Heidelberg (2021). https://doi.org/10.1007/978-3-662-64553-6_1
Chapter Google Scholar
Podlesny, N.J., Kayem, A.V.D.M., von Schorlemer, S., Uflacker, M.: Minimising information loss on anonymised high dimensional data with greedy in-memory processing. In: Hartmann, S., Ma, H., Hameurlain, A., Pernul, G., Wagner, R.R. (eds.) DEXA 2018. LNCS, vol. 11029, pp. 85–100. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98809-2_6
Chapter Google Scholar
Record, A.S.: Distributed databases and peer-to-peer databases. SIGMOD Rec. 37(1), 5 (2008)
Article Google Scholar
Remacle, J.F., Shephard, M.S.: An algorithm oriented mesh database. Int. J. Numer. Meth. Eng. 58(2), 349–374 (2003)
Article Google Scholar
Rodríguez-Gianolli, P., et al.: Data sharing in the hyperion peer database system. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 1291–1294. Citeseer (2005)
Google Scholar
Ruiz, J.A.Q., Naumann, F., Abedjan, Z.: Datasets profiling tools, methods, and systems (11 June 2019). US Patent 10,318,388
Google Scholar
Seol, E.S., Shephard, M.S.: Efficient distributed mesh data structure for parallel automated adaptive analysis. Eng. Comput. 22(3–4), 197–213 (2006)
Article Google Scholar
Seol, E.S.: FMDB: flexible distributed mesh database for parallel automated adaptive analysis. Rensselaer Polytechnic Institute Troy, NY (2005)
Google Scholar
Shirazi, F., Keramati, A.: Intelligent digital mesh adoption for big data (2019)
Google Scholar
Soria-Comas, J., Domingo-Ferrer, J.: Big data privacy: challenges to privacy principles and models. Data Sci. Eng. 1(1), 21–28 (2016)
Article Google Scholar
Sweeney, L.: Simple demographics often identify people uniquely. Health (San Francisco) 671(2000), 1–34 (2000)
Google Scholar
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(05), 571–588 (2002)
Article MathSciNet Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(05), 557–570 (2002)
Article MathSciNet Google Scholar
Tassa, T., Mazza, A., Gionis, A.: k-concealment: an alternative model of k-type anonymity. Trans. Data Priv. 5(1), 189–222 (2012)
MathSciNet Google Scholar
Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving anonymization of set-valued data. Proc. VLDB Endow. 1(1), 115–125 (2008)
Article Google Scholar
Wong, R.C.-W., Fu, A.W.-C., Wang, K., Pei, J.: Anonymization-based attacks in privacy-preserving data publishing. ACM Trans. Database Syst. 34(2), 1–46 (2009)
Article Google Scholar
Wu, X., Li, N.: Achieving privacy in mesh networks. In: Proceedings of the 4th ACM Workshop on Security of Ad Hoc and Sensor Networks, pp. 13–22 (2006)
Google Scholar
Xiao, X., Tao, Y.: Anatomy: simple and effective privacy preservation. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 139–150. VLDB Endowment (2006)
Google Scholar
Zhang, X., Liu, C., Nepal, S., Chen, J.: An efficient quasi-identifier index based approach for privacy preservation over incremental data sets on cloud. J. Comput. Syst. Sci. 79(5), 542–555 (2013)
Article MathSciNet Google Scholar
Zhang, X., Yang, L.T., Liu, C., Chen, J.: A scalable two-phase top-down specialization approach for data anonymization using MapReduce on cloud. IEEE Trans. Parallel Distrib. Syst. 25(2), 363–373 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
Nikolai J. Podlesny, Anne V. D. M. Kayem & Christoph Meinel

Authors

Nikolai J. Podlesny
View author publications
You can also search for this author in PubMed Google Scholar
Anne V. D. M. Kayem
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Meinel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikolai J. Podlesny .

Editor information

Editors and Affiliations

University of Vienna, Vienna, Austria
Christine Strauss
University of Calabria, Rende, Italy
Alfredo Cuzzocrea
Johannes Kepler University of Linz, Linz, Austria
Gabriele Kotsis
Vienna University of Technology, Vienna, Austria
A Min Tjoa
Johannes Kepler University of Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Podlesny, N.J., Kayem, A.V.D.M., Meinel, C. (2022). CoK: A Survey of Privacy Challenges in Relation to Data Meshes. In: Strauss, C., Cuzzocrea, A., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2022. Lecture Notes in Computer Science, vol 13426. Springer, Cham. https://doi.org/10.1007/978-3-031-12423-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-12423-5_7
Published: 29 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12422-8
Online ISBN: 978-3-031-12423-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CoK: A Survey of Privacy Challenges in Relation to Data Meshes