Abstract
As the number of schema repositories grows rapidly and several webbased platforms exist to support publishing schemas, schema reuse becomes a new trend. Schema reuse is a methodology that allows users to create new schemas by copying and adapting existing ones. This methodology supports to reduce not only the effort of designing new schemas but also the heterogeneity between them. One of the biggest barriers of schema reuse is about privacy concerns that discourage schema owners from contributing their schemas. Addressing this problem, we develop a framework that enables privacy-preserving schema reuse. Our framework supports the contributors to define their own protection policies in the form of privacy constraints. Instead of showing original schemas, the framework returns an anonymized schema with maximal utility while satisfying these privacy constraints. To validate our approach, we empirically show the efficiency of different heuristics, the correctness of the proposed utility function, the computation time, as well as the trade-off between utility and privacy
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adam, N.R.: Security-control methods for statistical databases: a comparative study. In: CSUR, 515–556 (1989)
Agrawal, D.: On the design and quantification of privacy preserving data mining algorithms. In: PODS 2001, pp. 247–255 (2001)
Agrawal, R., Srikant, R.: Privacy-preserving data mining. SIGMOD Rec., 439–450 (2000)
Antón, A.I., Bertino, E., Li, N., Yu, T.: A roadmap for comprehensive online privacy policy management. Communications of the ACM 50(7), 109–116 (2007)
Aumueller, D., Do, H.-H., Massmann, S., Rahm, E.: Schema and ontology matching with coma++. In: SIGMOD, pp. 906–908 (2005)
Batista, M.C.M., Salgado, A.C.: Information quality measurement in data integration schemas. In: QDB, pp. 61–72 (2007)
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE, pp. 217–228 (2005)
Bentounsi, M., Benbernou, S., Deme, C.S., Atallah, M.J.: Anonyfrag: an anonymization-based approach for privacy-preserving bpaas. In: Cloud-I, pp. 9:1–9:8 (2012)
Bernstein, P.A., Madhavan, J., Rahm, E.: Generic Schema Matching, Ten Years Later. In: VLDB, pp. 695–701 (2011)
Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD, pp. 1–12 (2007)
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250 (2008)
Brickell, J., Shmatikov, V.: The cost of privacy: destruction of data-mining utility in anonymized data publishing. In: KDD, pp. 70–78 (2008)
Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. In: VLDB, pp. 538–549 (2008)
Chen, K., Kannan, A., Madhavan, J., Halevy, A.: Exploring schema repositories with schemr. SIGMOD Rec., 11–16 (2011)
Clifton, C., Kantarciolu, M., Doan, A., Schadow, G., Vaidya, J., Elmagarmid, A., Suciu, D.: Privacy-preserving data integration and sharing. In: Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 19–26. ACM (2004)
Sarma, A.D., Dong, X., Halevy, A.: Bootstrapping Pay-As-You-Go Data Integration Systems. In: SIGMOD, pp. 861–874 (2008)
Sarma, A.D., Fang, L., Gupta, N., Halevy, A., Lee, H., Wu, F., Xin, R., Yu, C.: Finding related tables. In: SIGMOD, pp. 817–828 (2012)
Duchateau, F., Bellahsene, Z.: Measuring the quality of an integrated schema. In: Parsons, J., Saeki, M., Shoval, P., Woo, C., Wand, Y. (eds.) ER 2010. LNCS, vol. 6412, pp. 261–273. Springer, Heidelberg (2010)
Duncan, G.T., Lambert, D.: Disclosure-limited data dissemination. In: JASA, pp. 10–18 (1986)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Rec., 27–33 (2005)
Glover, F., McMillan, C.: The general employee scheduling problem: an integration of ms and ai. COR, 563–573 (1986)
Goldberg, D.E.: Genetic algorithms in search, optimization and machine learning. Addison-Wesley (1989)
Gonzalez, H., Halevy, A.Y., Jensen, C.S., Langen, A., Madhavan, J., Shapley, R., Shen, W., Goldberg-Kidon, J.: Google fusion tables: web-centered data management and collaboration. In: SIGMOD, pp. 1061–1066 (2010)
Halfond, W., Viegas, J., Orso, A.: A classification of sql-injection attacks and countermeasures, pp. 65–81. IEEE (2006)
Hung, N.Q.V., Tam, N.T., Miklós, Z., Aberer, K.: On leveraging crowdsourcing techniques for schema matching networks. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013, Part II. LNCS, vol. 7826, pp. 139–154. Springer, Heidelberg (2013)
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: SIGKDD, pp. 279–288 (2002)
Karp, R.M.: Reducibility Among Combinatorial Problems. In: CCC, pp. 85–103 (1972)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM, 604–632 (1999)
Lambert, D.: Measures of disclosure risk and harm. In: JOS, p. 313 (1993)
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115 (2007)
Li, T., Li, N.: On the tradeoff between privacy and utility in data publishing. In: SIGKDD, pp. 517–526 (2009)
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. In: VLDB, pp. 1338–1347 (2010)
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: Privacy beyond k-anonymity. TKDD, 24 (2007)
Madhavan, J., Bernstein, P.A., Doan, A.-H., Halevy, A.Y.: Corpus-based schema matching. In: ICDE, pp. 57–68 (2005)
Mahmoud, H.A., Aboulnaga, A.: Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In: SIGMOD, pp. 411–422 (2010)
Nergiz, M.E., Atzori, M., Clifton, C.: Hiding the presence of individuals from shared databases. In: SIGMOD, pp. 665–676 (2007)
Viet, Q., Nguyen, H., Do, S.T., Nguyen, T.T., Aberer, K.: Towards enabling schema reuse with privacy constraints, EPFL-REPORT-189971 (2013)
Nguyen, Q.V.H., Luong, H.X., Miklós, Z., Quan, T.T., Aberer, K.: Collaborative Schema Matching Reconciliation. In: CoopIS (2013)
Nguyen, Q.V.H., Thanh, T.N., Miklos, Z., Aberer, K., Gal, A., Weidlich, M.: Pay-as-you-go Reconciliation in Schema Matching Networks. In: ICDE (2014)
Quoc Viet Nguyen, H., Wijaya, T.K., Miklós, Z., Aberer, K., Levy, E., Shafran, V., Gal, A., Weidlich, M.: Minimizing human effort in reconciling match networks. In: Ng, W., Storey, V.C., Trujillo, J.C. (eds.) ER 2013. LNCS, vol. 8217, pp. 212–226. Springer, Heidelberg (2013)
Peukert, E., Eberius, J., Rahm, E.: Amc - a framework for modelling and comparing matching systems as matching processes. In: ICDE, pp. 1304–1307 (2011)
Smith, K., Bonaceto, C., Wolf, C., Yost, B., Morse, M., Mork, P., Burdick, D.: Exploring schema similarity at multiple resolutions. In: SIGMOD, pp. 1179–1182 (2010)
Smith, K.P., Mork, P., Seligman, L., Leveille, P.S., Yost, B., Li, M.H., Wolf, C.: Unity: Speeding the creation of community vocabularies for information integration and reuse. In: IRI, pp. 129–135 (2011)
Sweeney, L.: k-anonymity: a model for protecting privacy. IJUFKS, 557–570 (2002)
Tsui, F.-C., Espino, J.U., Dato, V.M., Gesteland, P.H., Hutman, J., Wagner, M.M.: Technical description of rods: a real-time public health surveillance system. Journal of the American Medical Informatics Association 10(5), 399–408 (2003)
Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: VLDB, pp. 139–150 (2006)
Yost, B., Bonaceto, C., Morse, M., Wolf, C., Smith, K.: Visualizing Schema Clusters for Agile Information Sharing. In: InfoVis, pp. 5–6 (2009)
Yu, C., Jagadish, H.V.: Schema summarization. In: VLDB, pp. 319–330 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hung, N.Q.V., Son Thanh, D., Tam, N.T., Aberer, K. (2014). Privacy-Preserving Schema Reuse. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8422. Springer, Cham. https://doi.org/10.1007/978-3-319-05813-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-05813-9_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05812-2
Online ISBN: 978-3-319-05813-9
eBook Packages: Computer ScienceComputer Science (R0)