Skip to main content

Privacy-Preserving Schema Reuse

  • Conference paper
Database Systems for Advanced Applications (DASFAA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8422))

Included in the following conference series:

  • 1892 Accesses

Abstract

As the number of schema repositories grows rapidly and several webbased platforms exist to support publishing schemas, schema reuse becomes a new trend. Schema reuse is a methodology that allows users to create new schemas by copying and adapting existing ones. This methodology supports to reduce not only the effort of designing new schemas but also the heterogeneity between them. One of the biggest barriers of schema reuse is about privacy concerns that discourage schema owners from contributing their schemas. Addressing this problem, we develop a framework that enables privacy-preserving schema reuse. Our framework supports the contributors to define their own protection policies in the form of privacy constraints. Instead of showing original schemas, the framework returns an anonymized schema with maximal utility while satisfying these privacy constraints. To validate our approach, we empirically show the efficiency of different heuristics, the correctness of the proposed utility function, the computation time, as well as the trade-off between utility and privacy

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. http://lsirwww.epfl.ch/schema_matching/

  2. http://schema.org/

  3. http://www.factual.com/

  4. Adam, N.R.: Security-control methods for statistical databases: a comparative study. In: CSUR, 515–556 (1989)

    Google Scholar 

  5. Agrawal, D.: On the design and quantification of privacy preserving data mining algorithms. In: PODS 2001, pp. 247–255 (2001)

    Google Scholar 

  6. Agrawal, R., Srikant, R.: Privacy-preserving data mining. SIGMOD Rec., 439–450 (2000)

    Google Scholar 

  7. Antón, A.I., Bertino, E., Li, N., Yu, T.: A roadmap for comprehensive online privacy policy management. Communications of the ACM 50(7), 109–116 (2007)

    Article  Google Scholar 

  8. Aumueller, D., Do, H.-H., Massmann, S., Rahm, E.: Schema and ontology matching with coma++. In: SIGMOD, pp. 906–908 (2005)

    Google Scholar 

  9. Batista, M.C.M., Salgado, A.C.: Information quality measurement in data integration schemas. In: QDB, pp. 61–72 (2007)

    Google Scholar 

  10. Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE, pp. 217–228 (2005)

    Google Scholar 

  11. Bentounsi, M., Benbernou, S., Deme, C.S., Atallah, M.J.: Anonyfrag: an anonymization-based approach for privacy-preserving bpaas. In: Cloud-I, pp. 9:1–9:8 (2012)

    Google Scholar 

  12. Bernstein, P.A., Madhavan, J., Rahm, E.: Generic Schema Matching, Ten Years Later. In: VLDB, pp. 695–701 (2011)

    Google Scholar 

  13. Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD, pp. 1–12 (2007)

    Google Scholar 

  14. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250 (2008)

    Google Scholar 

  15. Brickell, J., Shmatikov, V.: The cost of privacy: destruction of data-mining utility in anonymized data publishing. In: KDD, pp. 70–78 (2008)

    Google Scholar 

  16. Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. In: VLDB, pp. 538–549 (2008)

    Google Scholar 

  17. Chen, K., Kannan, A., Madhavan, J., Halevy, A.: Exploring schema repositories with schemr. SIGMOD Rec., 11–16 (2011)

    Google Scholar 

  18. Clifton, C., Kantarciolu, M., Doan, A., Schadow, G., Vaidya, J., Elmagarmid, A., Suciu, D.: Privacy-preserving data integration and sharing. In: Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 19–26. ACM (2004)

    Google Scholar 

  19. Sarma, A.D., Dong, X., Halevy, A.: Bootstrapping Pay-As-You-Go Data Integration Systems. In: SIGMOD, pp. 861–874 (2008)

    Google Scholar 

  20. Sarma, A.D., Fang, L., Gupta, N., Halevy, A., Lee, H., Wu, F., Xin, R., Yu, C.: Finding related tables. In: SIGMOD, pp. 817–828 (2012)

    Google Scholar 

  21. Duchateau, F., Bellahsene, Z.: Measuring the quality of an integrated schema. In: Parsons, J., Saeki, M., Shoval, P., Woo, C., Wand, Y. (eds.) ER 2010. LNCS, vol. 6412, pp. 261–273. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  22. Duncan, G.T., Lambert, D.: Disclosure-limited data dissemination. In: JASA, pp. 10–18 (1986)

    Google Scholar 

  23. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  24. Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Rec., 27–33 (2005)

    Google Scholar 

  25. Glover, F., McMillan, C.: The general employee scheduling problem: an integration of ms and ai. COR, 563–573 (1986)

    Google Scholar 

  26. Goldberg, D.E.: Genetic algorithms in search, optimization and machine learning. Addison-Wesley (1989)

    Google Scholar 

  27. Gonzalez, H., Halevy, A.Y., Jensen, C.S., Langen, A., Madhavan, J., Shapley, R., Shen, W., Goldberg-Kidon, J.: Google fusion tables: web-centered data management and collaboration. In: SIGMOD, pp. 1061–1066 (2010)

    Google Scholar 

  28. Halfond, W., Viegas, J., Orso, A.: A classification of sql-injection attacks and countermeasures, pp. 65–81. IEEE (2006)

    Google Scholar 

  29. Hung, N.Q.V., Tam, N.T., Miklós, Z., Aberer, K.: On leveraging crowdsourcing techniques for schema matching networks. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013, Part II. LNCS, vol. 7826, pp. 139–154. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  30. Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: SIGKDD, pp. 279–288 (2002)

    Google Scholar 

  31. Karp, R.M.: Reducibility Among Combinatorial Problems. In: CCC, pp. 85–103 (1972)

    Google Scholar 

  32. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM, 604–632 (1999)

    Google Scholar 

  33. Lambert, D.: Measures of disclosure risk and harm. In: JOS, p. 313 (1993)

    Google Scholar 

  34. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115 (2007)

    Google Scholar 

  35. Li, T., Li, N.: On the tradeoff between privacy and utility in data publishing. In: SIGKDD, pp. 517–526 (2009)

    Google Scholar 

  36. Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. In: VLDB, pp. 1338–1347 (2010)

    Google Scholar 

  37. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: Privacy beyond k-anonymity. TKDD, 24 (2007)

    Google Scholar 

  38. Madhavan, J., Bernstein, P.A., Doan, A.-H., Halevy, A.Y.: Corpus-based schema matching. In: ICDE, pp. 57–68 (2005)

    Google Scholar 

  39. Mahmoud, H.A., Aboulnaga, A.: Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In: SIGMOD, pp. 411–422 (2010)

    Google Scholar 

  40. Nergiz, M.E., Atzori, M., Clifton, C.: Hiding the presence of individuals from shared databases. In: SIGMOD, pp. 665–676 (2007)

    Google Scholar 

  41. Viet, Q., Nguyen, H., Do, S.T., Nguyen, T.T., Aberer, K.: Towards enabling schema reuse with privacy constraints, EPFL-REPORT-189971 (2013)

    Google Scholar 

  42. Nguyen, Q.V.H., Luong, H.X., Miklós, Z., Quan, T.T., Aberer, K.: Collaborative Schema Matching Reconciliation. In: CoopIS (2013)

    Google Scholar 

  43. Nguyen, Q.V.H., Thanh, T.N., Miklos, Z., Aberer, K., Gal, A., Weidlich, M.: Pay-as-you-go Reconciliation in Schema Matching Networks. In: ICDE (2014)

    Google Scholar 

  44. Quoc Viet Nguyen, H., Wijaya, T.K., Miklós, Z., Aberer, K., Levy, E., Shafran, V., Gal, A., Weidlich, M.: Minimizing human effort in reconciling match networks. In: Ng, W., Storey, V.C., Trujillo, J.C. (eds.) ER 2013. LNCS, vol. 8217, pp. 212–226. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  45. Peukert, E., Eberius, J., Rahm, E.: Amc - a framework for modelling and comparing matching systems as matching processes. In: ICDE, pp. 1304–1307 (2011)

    Google Scholar 

  46. Smith, K., Bonaceto, C., Wolf, C., Yost, B., Morse, M., Mork, P., Burdick, D.: Exploring schema similarity at multiple resolutions. In: SIGMOD, pp. 1179–1182 (2010)

    Google Scholar 

  47. Smith, K.P., Mork, P., Seligman, L., Leveille, P.S., Yost, B., Li, M.H., Wolf, C.: Unity: Speeding the creation of community vocabularies for information integration and reuse. In: IRI, pp. 129–135 (2011)

    Google Scholar 

  48. Sweeney, L.: k-anonymity: a model for protecting privacy. IJUFKS, 557–570 (2002)

    Google Scholar 

  49. Tsui, F.-C., Espino, J.U., Dato, V.M., Gesteland, P.H., Hutman, J., Wagner, M.M.: Technical description of rods: a real-time public health surveillance system. Journal of the American Medical Informatics Association 10(5), 399–408 (2003)

    Article  Google Scholar 

  50. Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: VLDB, pp. 139–150 (2006)

    Google Scholar 

  51. Yost, B., Bonaceto, C., Morse, M., Wolf, C., Smith, K.: Visualizing Schema Clusters for Agile Information Sharing. In: InfoVis, pp. 5–6 (2009)

    Google Scholar 

  52. Yu, C., Jagadish, H.V.: Schema summarization. In: VLDB, pp. 319–330 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Hung, N.Q.V., Son Thanh, D., Tam, N.T., Aberer, K. (2014). Privacy-Preserving Schema Reuse. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8422. Springer, Cham. https://doi.org/10.1007/978-3-319-05813-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05813-9_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05812-2

  • Online ISBN: 978-3-319-05813-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics