ABSTRACT
Strong member privacy in technology enterprises involves, among other objectives, deleting or anonymizing various kinds of data that a company controls. Those requirements are complicated in a heterogeneous data ecosystem where data is stored in multiple stores with different semantics: different indexing or update capabilities require processes specific to a store or even schema. In this demo we showcase a method to enforce record controls of arbitrary data stores via a three step process: generate an offline snapshot, run a policy mechanism to select rows to delete/update, and apply the changes to the original store. The first and third steps work on any store by leveraging Apache Gobblin, an open source data integration framework. The policy computation step runs as a batch Gobblin job where each table can be customized via a dataset metadata tracking system and SQL expressions providing table-specific business logic. This setup allows enforcement of highly-customizable privacy requirements in a variety of systems from hosted databases to third party data storage systems.
- Issac Buenrostro and Anthony Hsu. 2018. Data Privacy at Scale. Dataworks Summit 2018.Google Scholar
- Lin Qiao et al. 2015. Gobblin: Unifying Data Ingestion for Hadoop. In Proceedings of the VLDB Endowment, Vol. 8. Kohala Coast, Hawaii, 1764--1769. https://doi.org/2150-8097/15/08 Google ScholarDigital Library
- Eric Sun. 2016. Open Sourcing WhereHows: A Data Discovery and Lineage Portal. https://engineering.linkedin.com/blog/2016/03/open-sourcing-wherehows--a-data-discovery-and-lineage-portalGoogle Scholar
Index Terms
- Single-Setup Privacy Enforcement for Heterogeneous Data Ecosystems
Recommendations
Privacy policy enforcement in enterprises with identity management solutions
PST '06: Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business ServicesPeople are usually asked by enterprises and other organizations to disclose their personal information to access web services and engage in business interactions. Enterprises need this information to enable their business processes. This is unlikely to ...
A systemic approach to automate privacy policy enforcement in enterprises
PET'06: Proceedings of the 6th international conference on Privacy Enhancing TechnologiesIt is common practice for enterprises and other organisations to ask people to disclose their personal data in order to grant them access to services and engage in transactions. This practice is not going to disappear, at least in the foreseeable ...
Cross-application data provenance and policy enforcement
We present a new technique that can trace data provenance and enforce data access policies across multiple applications and machines. We have developed Garm, a tool that uses binary rewriting to implement this technique on arbitrary binaries. Users can ...
Comments