Privacy-Preserving Schema Reuse

Hung, Nguyen Quoc Viet; Son Thanh, Do; Tam, Nguyen Thanh; Aberer, Karl

doi:10.1007/978-3-319-05813-9_16

Nguyen Quoc Viet Hung²²,
Do Son Thanh²²,
Nguyen Thanh Tam²² &
…
Karl Aberer²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8422))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1892 Accesses

Abstract

As the number of schema repositories grows rapidly and several webbased platforms exist to support publishing schemas, schema reuse becomes a new trend. Schema reuse is a methodology that allows users to create new schemas by copying and adapting existing ones. This methodology supports to reduce not only the effort of designing new schemas but also the heterogeneity between them. One of the biggest barriers of schema reuse is about privacy concerns that discourage schema owners from contributing their schemas. Addressing this problem, we develop a framework that enables privacy-preserving schema reuse. Our framework supports the contributors to define their own protection policies in the form of privacy constraints. Instead of showing original schemas, the framework returns an anonymized schema with maximal utility while satisfying these privacy constraints. To validate our approach, we empirically show the efficiency of different heuristics, the correctness of the proposed utility function, the computation time, as well as the trade-off between utility and privacy

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Generalization Based Privacy-Preserving Provenance Publishing

An Enhanced Method for Privacy-Preserving Data Publishing

Privacy Preservation Techniques and Models for Publishing Structured Data

References

http://lsirwww.epfl.ch/schema_matching/
http://schema.org/
http://www.factual.com/
Adam, N.R.: Security-control methods for statistical databases: a comparative study. In: CSUR, 515–556 (1989)
Google Scholar
Agrawal, D.: On the design and quantification of privacy preserving data mining algorithms. In: PODS 2001, pp. 247–255 (2001)
Google Scholar
Agrawal, R., Srikant, R.: Privacy-preserving data mining. SIGMOD Rec., 439–450 (2000)
Google Scholar
Antón, A.I., Bertino, E., Li, N., Yu, T.: A roadmap for comprehensive online privacy policy management. Communications of the ACM 50(7), 109–116 (2007)
Article Google Scholar
Aumueller, D., Do, H.-H., Massmann, S., Rahm, E.: Schema and ontology matching with coma++. In: SIGMOD, pp. 906–908 (2005)
Google Scholar
Batista, M.C.M., Salgado, A.C.: Information quality measurement in data integration schemas. In: QDB, pp. 61–72 (2007)
Google Scholar
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE, pp. 217–228 (2005)
Google Scholar
Bentounsi, M., Benbernou, S., Deme, C.S., Atallah, M.J.: Anonyfrag: an anonymization-based approach for privacy-preserving bpaas. In: Cloud-I, pp. 9:1–9:8 (2012)
Google Scholar
Bernstein, P.A., Madhavan, J., Rahm, E.: Generic Schema Matching, Ten Years Later. In: VLDB, pp. 695–701 (2011)
Google Scholar
Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD, pp. 1–12 (2007)
Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250 (2008)
Google Scholar
Brickell, J., Shmatikov, V.: The cost of privacy: destruction of data-mining utility in anonymized data publishing. In: KDD, pp. 70–78 (2008)
Google Scholar
Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. In: VLDB, pp. 538–549 (2008)
Google Scholar
Chen, K., Kannan, A., Madhavan, J., Halevy, A.: Exploring schema repositories with schemr. SIGMOD Rec., 11–16 (2011)
Google Scholar
Clifton, C., Kantarciolu, M., Doan, A., Schadow, G., Vaidya, J., Elmagarmid, A., Suciu, D.: Privacy-preserving data integration and sharing. In: Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 19–26. ACM (2004)
Google Scholar
Sarma, A.D., Dong, X., Halevy, A.: Bootstrapping Pay-As-You-Go Data Integration Systems. In: SIGMOD, pp. 861–874 (2008)
Google Scholar
Sarma, A.D., Fang, L., Gupta, N., Halevy, A., Lee, H., Wu, F., Xin, R., Yu, C.: Finding related tables. In: SIGMOD, pp. 817–828 (2012)
Google Scholar
Duchateau, F., Bellahsene, Z.: Measuring the quality of an integrated schema. In: Parsons, J., Saeki, M., Shoval, P., Woo, C., Wand, Y. (eds.) ER 2010. LNCS, vol. 6412, pp. 261–273. Springer, Heidelberg (2010)
Chapter Google Scholar
Duncan, G.T., Lambert, D.: Disclosure-limited data dissemination. In: JASA, pp. 10–18 (1986)
Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Chapter Google Scholar
Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Rec., 27–33 (2005)
Google Scholar
Glover, F., McMillan, C.: The general employee scheduling problem: an integration of ms and ai. COR, 563–573 (1986)
Google Scholar
Goldberg, D.E.: Genetic algorithms in search, optimization and machine learning. Addison-Wesley (1989)
Google Scholar
Gonzalez, H., Halevy, A.Y., Jensen, C.S., Langen, A., Madhavan, J., Shapley, R., Shen, W., Goldberg-Kidon, J.: Google fusion tables: web-centered data management and collaboration. In: SIGMOD, pp. 1061–1066 (2010)
Google Scholar
Halfond, W., Viegas, J., Orso, A.: A classification of sql-injection attacks and countermeasures, pp. 65–81. IEEE (2006)
Google Scholar
Hung, N.Q.V., Tam, N.T., Miklós, Z., Aberer, K.: On leveraging crowdsourcing techniques for schema matching networks. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013, Part II. LNCS, vol. 7826, pp. 139–154. Springer, Heidelberg (2013)
Chapter Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: SIGKDD, pp. 279–288 (2002)
Google Scholar
Karp, R.M.: Reducibility Among Combinatorial Problems. In: CCC, pp. 85–103 (1972)
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM, 604–632 (1999)
Google Scholar
Lambert, D.: Measures of disclosure risk and harm. In: JOS, p. 313 (1993)
Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115 (2007)
Google Scholar
Li, T., Li, N.: On the tradeoff between privacy and utility in data publishing. In: SIGKDD, pp. 517–526 (2009)
Google Scholar
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. In: VLDB, pp. 1338–1347 (2010)
Google Scholar
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: Privacy beyond k-anonymity. TKDD, 24 (2007)
Google Scholar
Madhavan, J., Bernstein, P.A., Doan, A.-H., Halevy, A.Y.: Corpus-based schema matching. In: ICDE, pp. 57–68 (2005)
Google Scholar
Mahmoud, H.A., Aboulnaga, A.: Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In: SIGMOD, pp. 411–422 (2010)
Google Scholar
Nergiz, M.E., Atzori, M., Clifton, C.: Hiding the presence of individuals from shared databases. In: SIGMOD, pp. 665–676 (2007)
Google Scholar
Viet, Q., Nguyen, H., Do, S.T., Nguyen, T.T., Aberer, K.: Towards enabling schema reuse with privacy constraints, EPFL-REPORT-189971 (2013)
Google Scholar
Nguyen, Q.V.H., Luong, H.X., Miklós, Z., Quan, T.T., Aberer, K.: Collaborative Schema Matching Reconciliation. In: CoopIS (2013)
Google Scholar
Nguyen, Q.V.H., Thanh, T.N., Miklos, Z., Aberer, K., Gal, A., Weidlich, M.: Pay-as-you-go Reconciliation in Schema Matching Networks. In: ICDE (2014)
Google Scholar
Quoc Viet Nguyen, H., Wijaya, T.K., Miklós, Z., Aberer, K., Levy, E., Shafran, V., Gal, A., Weidlich, M.: Minimizing human effort in reconciling match networks. In: Ng, W., Storey, V.C., Trujillo, J.C. (eds.) ER 2013. LNCS, vol. 8217, pp. 212–226. Springer, Heidelberg (2013)
Chapter Google Scholar
Peukert, E., Eberius, J., Rahm, E.: Amc - a framework for modelling and comparing matching systems as matching processes. In: ICDE, pp. 1304–1307 (2011)
Google Scholar
Smith, K., Bonaceto, C., Wolf, C., Yost, B., Morse, M., Mork, P., Burdick, D.: Exploring schema similarity at multiple resolutions. In: SIGMOD, pp. 1179–1182 (2010)
Google Scholar
Smith, K.P., Mork, P., Seligman, L., Leveille, P.S., Yost, B., Li, M.H., Wolf, C.: Unity: Speeding the creation of community vocabularies for information integration and reuse. In: IRI, pp. 129–135 (2011)
Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. IJUFKS, 557–570 (2002)
Google Scholar
Tsui, F.-C., Espino, J.U., Dato, V.M., Gesteland, P.H., Hutman, J., Wagner, M.M.: Technical description of rods: a real-time public health surveillance system. Journal of the American Medical Informatics Association 10(5), 399–408 (2003)
Article Google Scholar
Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: VLDB, pp. 139–150 (2006)
Google Scholar
Yost, B., Bonaceto, C., Morse, M., Wolf, C., Smith, K.: Visualizing Schema Clusters for Agile Information Sharing. In: InfoVis, pp. 5–6 (2009)
Google Scholar
Yu, C., Jagadish, H.V.: Schema summarization. In: VLDB, pp. 319–330 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

École Polytechnique Fédérale de Lausanne, Switzerland
Nguyen Quoc Viet Hung, Do Son Thanh, Nguyen Thanh Tam & Karl Aberer

Authors

Nguyen Quoc Viet Hung
View author publications
You can also search for this author in PubMed Google Scholar
Do Son Thanh
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen Thanh Tam
View author publications
You can also search for this author in PubMed Google Scholar
Karl Aberer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore, Singapore
Sourav S. Bhowmick
Department of Computer Science, Utah State University, Old Main Hill, 4205, 84322-4205, Logan, UT, USA
Curtis E. Dyreson
Department of Computer Science, Aalborg University, Selma Lagerløfs Vej, 300, 9220, Aalborg Øst, Denmark
Christian S. Jensen
Department of Computer Science, National University of Singapore, 13 Computing Drive, 117417, Singapore, Singapore
Mong Li Lee
Department of Computer Science, Udayana University, Jl. Kampus Unud Jimbaran Bali, 80364, Badung, Bali, Indonesia
Agus Muliantara
Information Systems Engineering, Christian-Albrechts-Universität zu Kiel, Olshausenstrasse 40, 24098, Kiel, Germany
Bernhard Thalheim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hung, N.Q.V., Son Thanh, D., Tam, N.T., Aberer, K. (2014). Privacy-Preserving Schema Reuse. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8422. Springer, Cham. https://doi.org/10.1007/978-3-319-05813-9_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-05813-9_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05812-2
Online ISBN: 978-3-319-05813-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics