Skip to main content

Privacy-Preserving Statistical Data Analysis on Federated Databases

  • Conference paper
Privacy Technologies and Policy (APF 2014)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8450))

Included in the following conference series:

Abstract

The quality of empirical statistical studies is tightly related to the quality and amount of source data available. However, it is often hard to collect data from several sources due to privacy requirements or a lack of trust. In this paper, we propose a novel way to combine secure multi-party computation technology with federated database systems to preserve privacy in statistical studies that combine and analyse data from multiple databases. We describe an implementation on two real-world platforms—the Sharemind secure multi-party computation and the X-Road database federation platform. Our solution enables the privacy-preserving linking and analysis of databases belonging to different institutions. Indeed, a preliminary analysis from the Estonian Data Protection Inspectorate suggests that the correct implementation of our solution ensures that no personally identifiable information is processed in such studies. Therefore, our proposed solution can potentially reduce the costs of conducting statistical studies on shared data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, G., Mishra, N., Pinkas, B.: Secure computation of the median (and other elements of specified ranks). Journal of Cryptology 23(3), 373–401 (2010)

    MATH  MathSciNet  Google Scholar 

  2. Ansper, A., Buldas, A., Freudenthal, M., Willemson, J.: Scalable and Efficient PKI for Inter-Organizational Communication. In: Proceedings of ACSAC 2003, pp. 308–318 (2003)

    Google Scholar 

  3. Ansper, A., Buldas, A., Freudenthal, M., Willemson, J.: High-Performance Qualified Digital Signatures for X-Road. In: Riis Nielson, H., Gollmann, D. (eds.) NordSec 2013. LNCS, vol. 8208, pp. 123–138. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  4. Ansper, A., Buldas, A., Freudenthal, M., Willemson, J.: Protecting a Federated Database Infrastructure Against Denial-of-Service Attacks. In: Luiijf, E., Hartel, P. (eds.) CRITIS 2013. LNCS, vol. 8328, pp. 26–37. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  5. Asharov, G., Lindell, Y., Zarosim, H.: Fair and Efficient Secure Multiparty Computation with Reputation Systems. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013, Part II. LNCS, vol. 8270, pp. 201–220. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  6. Ben-David, A., Nisan, N., Pinkas, B.: FairplayMP: A system for secure multi-party computation. In: Proceedings of ACM CCS 2008, pp. 257–266 (2008)

    Google Scholar 

  7. Bogdanov, D.: Sharemind: programmable secure computations with practical applications. PhD thesis. University of Tartu (2013)

    Google Scholar 

  8. Bogdanov, D., Laud, P., Randmets, J.: Domain-Polymorphic Programming of Privacy-Preserving Applications. Cryptology ePrint Archive, Report 2013/371 (2013), http://eprint.iacr.org/

  9. Bogdanov, D., Niitsoo, M., Toft, T., Willemson, J.: High-performance secure multi-party computation for data mining applications. International Journal of Information Security 11(6), 403–418 (2012)

    Article  Google Scholar 

  10. Bogdanov, D., Talviste, R., Willemson, J.: Deploying secure multi-party computation for financial data analysis. In: Keromytis, A.D. (ed.) FC 2012. LNCS, vol. 7397, pp. 57–64. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  11. Bogetoft, P., et al.: Secure Multiparty Computation Goes Live. In: Dingledine, R., Golle, P. (eds.) FC 2009. LNCS, vol. 5628, pp. 325–343. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Breunig, M.M., Kriegel, H.-P., Ng, R.T., Lof, J.S.: Identifying density-based local outliers. In: Proceedings of CM SIGMOD 2000, pp. 93–104 (2000)

    Google Scholar 

  13. Burkhart, M., Strasser, M., Many, D., Dimitropoulos, X.A.: SEPIA: Privacy-Preserving Aggregation of Multi-Domain Network Events and Statistics. In: Proceedings of USENIX 2010, pp. 223–240 (2010)

    Google Scholar 

  14. Canetti, R., Ishai, Y., Kumar, R., Reiter, M.K., Rubinfeld, R., Wright, R.N.: Selective private function evaluation with applications to private statistics. In: Proceedings of PODC 2001, pp. 293–304. ACM (2001)

    Google Scholar 

  15. Cybernetica. Income analysis of the Estonian Public Sector. Online service, https://sharemind.cyber.ee/clouddemo/ (last accessed December 13, 2013)

  16. Damgård, I., Geisler, M., Krøigaard, M., Nielsen, J.B.: Asynchronous multiparty computation: Theory and implementation. In: Jarecki, S., Tsudik, G. (eds.) PKC 2009. LNCS, vol. 5443, pp. 160–179. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  17. Damgård, I., Pastro, V., Smart, N., Zakarias, S.: Multiparty computation from somewhat homomorphic encryption. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 643–662. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  18. Du, W., Atallah, M.J.: Privacy-preserving cooperative statistical analysis. In: Proceedings of ACSAC 2001, pp. 102–110 (2001)

    Google Scholar 

  19. Du, W., Chen, S., Han, Y.S.: Privacy-preserving multivariate statistical analysis: Linear regression and classification. In: SDM 2004, pp. 222–233 (2004)

    Google Scholar 

  20. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. Part II. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  21. Feigenbaum, J., Pinkas, B., Ryger, R., Saint-Jean, F.: Secure computation of surveys. In: EU Workshop on Secure Multiparty Protocols (2004)

    Google Scholar 

  22. Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Proceedings of STOC 2009, pp. 169–178. ACM (2009)

    Google Scholar 

  23. Goldreich, O., Ostrovsky, R.: Software Protection and Simulation on Oblivious RAMs. Journal of the ACM 43(3), 431–473 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  24. Hollander, M., Wolfe, D.A.: Nonparametric statistical methods, 2nd edn. John Wiley, New York (1999)

    MATH  Google Scholar 

  25. Hoonhout, H.C.M.: Setting the stage for developing innovative product concepts: people and climate. CoDesign, 3(S1),19–34 (2007)

    Google Scholar 

  26. Hyndman, R.J., Fan, Y.: Sample quantiles in statistical packages. The American Statistician 50(4), 361–365 (1996)

    Google Scholar 

  27. Jawurek, M., Kerschbaum, F.: Fault-tolerant privacy-preserving statistics. In: Fischer-Hübner, S., Wright, M. (eds.) PETS 2012. LNCS, vol. 7384, pp. 221–238. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  28. Kalja, A.: The X-Road Project. A Project to Modernize Estonia’s National Databases. Baltic IT&T review 24, 47–48 (2002)

    Google Scholar 

  29. Kalja, A.: The first ten years of X-road. In: Estonian Information Society Yearbook 2011/2012, pp. 78–80. Department of State Information System, Estonia (2012)

    Google Scholar 

  30. Kalja, A., Vallner, U.: Public e-Service Projects in Estonia. In: Proceedings of Baltic DB&IS 2002, vol. 2, pp. 143–153 (June 2002)

    Google Scholar 

  31. Kamm, L., Bogdanov, D., Laur, S., Vilo, J.: A new way to protect privacy in large-scale genome-wide association studies. Bioinformatics 29(7), 886–893 (2013)

    Article  Google Scholar 

  32. Kanji, G.K.: 100 statistical tests. Sage (2006)

    Google Scholar 

  33. Kerschbaum, F.: Practical privacy-preserving benchmarking. In: Jajodia, S., Samarati, P., Cimato, S. (eds.) Proceedings of IFIP TC-11 SEC 2008, vol. 278, pp. 17–31. Springer, Boston (2008)

    Google Scholar 

  34. Kiltz, E., Leander, G., Malone-Lee, J.: Secure computation of the mean and related statistics. In: Kilian, J. (ed.) TCC 2005. LNCS, vol. 3378, pp. 283–302. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  35. Lane, J., Heus, P., Mulcahy, T.: Data Access in a Cyber World: Making Use of Cyberinfrastructure. Transactions on Data Privacy 1(1), 2–16 (2008)

    MathSciNet  Google Scholar 

  36. Laur, S., Talviste, R., Willemson, J.: From Oblivious AES to Efficient and Secure Database Join in the Multiparty Setting. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) ACNS 2013. LNCS, vol. 7954, pp. 84–101. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  37. S. Laur, R. Talviste, J. Willemson.: From Oblivious AES to Efficient and Secure Database Join in the Multiparty Setting (extended version). Cryptology ePrint Archive, Report 2013/203 (2013), http://eprint.iacr.org/

  38. Laur, S., Willemson, J., Zhang, B.: Round-Efficient Oblivious Database Manipulation. In: Lai, X., Zhou, J., Li, H. (eds.) ISC 2011. LNCS, vol. 7001, pp. 262–277. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  39. Lettl, C.: User involvement competence for radical innovation. Journal of engineering and technology management 24(1), 53–75 (2007)

    Article  Google Scholar 

  40. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and â„“-diversity. In: Proceedings of ICDE 2007 (2007)

    Google Scholar 

  41. Y. Lindell, K. Nissim, C. Orlandi.: Hiding the input-size in secure two-party computation. Cryptology ePrint Archive, Report 2012/679 (2012), http://eprint.iacr.org/

  42. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1) ( March 2007)

    Google Scholar 

  43. P. Pruulmann-Vengerfeldt, L. Kamm, R. Talviste, P. Laud, D. Bogdanov.: Deliverable D1.1—Capability model (2012), http://usable-security.eu/files/D1.1.pdf.pdf

  44. Samarati, P.: Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13, 1010–1027 (2001)

    Article  Google Scholar 

  45. Shamir, A.: How to share a secret. Communications of the ACM 22, 612–613 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  46. Suber, P.: Open Access. MIT Press (2012)

    Google Scholar 

  47. Subramaniam, H., Wright, R.N., Yang, Z.: Experimental analysis of privacy-preserving statistics computation. In: Jonker, W., Petković, M. (eds.) SDM 2004. LNCS, vol. 3178, pp. 55–66. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  48. Sweeney, L.: K-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  49. Wilcoxon, F.: Individual Comparisons by Ranking Methods. Biometrics Bulletin 1(6), 80–83 (1945)

    Article  Google Scholar 

  50. Willemson, J.: Pseudonymization Service for X-Road eGovernment Data Exchange Layer. In: Andersen, K.N., Francesconi, E., Grönlund, Å., van Engers, T.M. (eds.) EGOVIS 2011. LNCS, vol. 6866, pp. 135–145. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  51. Willemson, J., Ansper, A.: A Secure and Scalable Infrastructure for Inter-Organizational Data Exchange and eGovernment Applications. In: Proceedings of ARES 2008, pp. 572–577. IEEE Computer Society (2008)

    Google Scholar 

  52. Yang, Z., Wright, R.N., Subramaniam, H.: Experimental analysis of a privacy-preserving scalar product protocol. Computer Systems Science & Engineering 21(1) (2006)

    Google Scholar 

  53. Yao, A.C.-C.: Protocols for Secure Computations (Extended Abstract). In: Proceedings of FOCS 1982, pp. 160–164. IEEE (1982)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Bogdanov, D., Kamm, L., Laur, S., Pruulmann-Vengerfeldt, P., Talviste, R., Willemson, J. (2014). Privacy-Preserving Statistical Data Analysis on Federated Databases. In: Preneel, B., Ikonomou, D. (eds) Privacy Technologies and Policy. APF 2014. Lecture Notes in Computer Science, vol 8450. Springer, Cham. https://doi.org/10.1007/978-3-319-06749-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06749-0_3

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06748-3

  • Online ISBN: 978-3-319-06749-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics