skip to main content
10.1145/3269206.3269228acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Single-Setup Privacy Enforcement for Heterogeneous Data Ecosystems

Published:17 October 2018Publication History

ABSTRACT

Strong member privacy in technology enterprises involves, among other objectives, deleting or anonymizing various kinds of data that a company controls. Those requirements are complicated in a heterogeneous data ecosystem where data is stored in multiple stores with different semantics: different indexing or update capabilities require processes specific to a store or even schema. In this demo we showcase a method to enforce record controls of arbitrary data stores via a three step process: generate an offline snapshot, run a policy mechanism to select rows to delete/update, and apply the changes to the original store. The first and third steps work on any store by leveraging Apache Gobblin, an open source data integration framework. The policy computation step runs as a batch Gobblin job where each table can be customized via a dataset metadata tracking system and SQL expressions providing table-specific business logic. This setup allows enforcement of highly-customizable privacy requirements in a variety of systems from hosted databases to third party data storage systems.

References

  1. Issac Buenrostro and Anthony Hsu. 2018. Data Privacy at Scale. Dataworks Summit 2018.Google ScholarGoogle Scholar
  2. Lin Qiao et al. 2015. Gobblin: Unifying Data Ingestion for Hadoop. In Proceedings of the VLDB Endowment, Vol. 8. Kohala Coast, Hawaii, 1764--1769. https://doi.org/2150-8097/15/08 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Eric Sun. 2016. Open Sourcing WhereHows: A Data Discovery and Lineage Portal. https://engineering.linkedin.com/blog/2016/03/open-sourcing-wherehows--a-data-discovery-and-lineage-portalGoogle ScholarGoogle Scholar

Index Terms

  1. Single-Setup Privacy Enforcement for Heterogeneous Data Ecosystems

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
          October 2018
          2362 pages
          ISBN:9781450360142
          DOI:10.1145/3269206

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 October 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          CIKM '18 Paper Acceptance Rate147of826submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        • Article Metrics

          • Downloads (Last 12 months)2
          • Downloads (Last 6 weeks)0

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader