ABSTRACT
To meet the diverse data storage and analysis needs in the Internet of Things era, businesses embrace the lakehouse approach, a hybrid deployment of data lakes and data warehouses on a single platform. Data consumers leverage data mining techniques through open APIs to explore data’s untapped potential. However, concerns arise regarding compliant data access and utilization. While privacy regulations like the General Data Protection Regulation (GDPR) offer conceptual guidance, their technical implementations remain vague. This paper proposes a privacy regulation compliance framework specific to lakehouse data analysis. By introducing a compliance verification layer between the analysis and processing layers, the scheme enables regulatory adherence. The utilization of Trusted Execution Environments (TEEs) guarantees verification of analysis requests, with blockchain serving as a storage medium for results. To mitigate unauthorized data analysis, we introduce a reputation-based punishment mechanism. Experimental results demonstrate the scheme’s feasibility.
- Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris, Gennady Laventman, Yacov Manevich, Srinivasan Muralidharan, Chet Murthy, Binh Nguyen, Manish Sethi, Gari Singh, Keith Smith, Alessandro Sorniotti, Chrysoula Stathakopoulou, Marko Vukolic, Sharon Weed Cocco, and Jason Yellick. 2018. Hyperledger fabric: a distributed operating system for permissioned blockchains. In Proceedings of the Thirteenth EuroSys Conference. ACM, 30:1–30:15. https://doi.org/10.1145/3190508.3190538Google ScholarDigital Library
- Masoud Barati and Omer F. Rana. 2022. Tracking GDPR Compliance in Cloud-Based Service Delivery. IEEE Transactions on Services Computing 15, 3 (2022), 1498–1511. https://doi.org/10.1109/TSC.2020.2999559Google ScholarCross Ref
- Raymond Cheng, Fan Zhang, Jernej Kos, Warren He, Nicholas Hynes, Noah M. Johnson, Ari Juels, Andrew Miller, and Dawn Song. 2019. Ekiden: A Platform for Confidentiality-Preserving, Trustworthy, and Performant Smart Contracts. In Proceedings of the 2019 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 185–200. https://doi.org/10.1109/EuroSP.2019.00023Google ScholarCross Ref
- Arka Rai Choudhuri, Matthew Green, Abhishek Jain, Gabriel Kaptchuk, and Ian Miers. 2017. Fairness in an Unfair World: Fair Multiparty Computation from Public Bulletin Boards. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), Bhavani Thuraisingham, David Evans, Tal Malkin, and Dongyan Xu (Eds.). ACM, 719–728. https://doi.org/10.1145/3133956.3134092Google ScholarDigital Library
- Victor Costan and Srinivas Devadas. 2016. Intel SGX Explained. Cryptology ePrint Archive, Paper 2016/086. https://eprint.iacr.org/2016/086 https://eprint.iacr.org/2016/086.Google Scholar
- Poulami Das, Lisa Eckey, Tommaso Frassetto, David Gens, Kristina Hostáková, Patrick Jauernig, Sebastian Faust, and Ahmad-Reza Sadeghi. 2019. FastKitten: Practical Smart Contracts on Bitcoin. In Proceedings of the 2019 USENIX Security Symposium. USENIX Association, 801–818.Google Scholar
- Soukaina Ait Errami, Hicham Hajji, Kenza Ait El Kadi, and Hassan Badir. 2023. Spatial big data architecture: From Data Warehouses and Data Lakes to the LakeHouse. J. Parallel and Distrib. Comput. 176 (2023), 70–79. https://doi.org/10.1016/j.jpdc.2023.02.007Google ScholarDigital Library
- Shaoyong Guo, Keqin Zhang, Bei Gong, Liandong Chen, Yinlin Ren, Feng Qi, and Xuesong Qiu. 2023. Sandbox Computing: A Data Privacy Trusted Sharing Paradigm Via Blockchain and Federated Learning. IEEE Trans. Comput. 72, 3 (2023), 800–810. https://doi.org/10.1109/TC.2022.3180968Google ScholarCross Ref
- Zicong Hong, Song Guo, and Peng Li. 2022. Scaling Blockchain via Layered Sharding. IEEE Journal on Selected Areas in Communications 40, 12 (2022), 3575–3588. https://doi.org/10.1109/JSAC.2022.3213350Google ScholarCross Ref
- Huawei Huang, Xiaowen Peng, Jianzhou Zhan, Shenyang Zhang, Yue Lin, Zibin Zheng, and Song Guo. 2022. BrokerChain: A Cross-Shard Blockchain Protocol for Account/Balance-based State Sharding. In IEEE INFOCOM 2022 - IEEE Conference on Computer Communications. 1968–1977. https://doi.org/10.1109/INFOCOM48880.2022.9796859Google ScholarDigital Library
- Gianluca Lax and Antonia Russo. 2021. A Lightweight Scheme Exploiting Social Networks for Data Minimization According to the GDPR. IEEE Transactions on Computational Social Systems 8, 2 (2021), 388–397. https://doi.org/10.1109/TCSS.2020.3049009Google ScholarCross Ref
- Ngoc Duy Pham, Alsharif Abuadbba, Yansong Gao, Khoa Tran Phan, and Naveen K. Chilamkurti. 2023. Binarizing Split Learning for Data Privacy Enhancement and Computation Reduction. IEEE Transactions on Information Forensics and Security 18 (2023), 3088–3100. https://doi.org/10.1109/TIFS.2023.3274391Google ScholarDigital Library
- Nguyen Binh Truong, Kai Sun, Gyu Myoung Lee, and Yike Guo. 2020. GDPR-Compliant Personal Data Management: A Blockchain-Based Solution. IEEE Transactions on Information Forensics and Security 15 (2020), 1746–1761. https://doi.org/10.1109/TIFS.2019.2948287Google ScholarDigital Library
- Lipeng Wang, Zhi Guan, Zhong Chen, and Mingsheng Hu. 2023. Enabling Integrity and Compliance Auditing in Blockchain-based GDPR-compliant Data Management. IEEE Internet of Things Journal (2023), 1–1. https://doi.org/10.1109/JIOT.2023.3285211Google ScholarCross Ref
- Lun Wang, Usmann Khan, Joseph P. Near, Qi Pang, Jithendaraa Subramanian, Neel Somani, Peng Gao, Andrew Low, and Dawn Song. 2022. PrivGuard: Privacy Regulation Compliance Made Easier. In Proceeding of the 31st USENIX Security Symposium (USENIX Security). USENIX Association, 3753–3770. https://www.usenix.org/conference/usenixsecurity22/presentation/wang-lunGoogle Scholar
- Matei Zaharia, Ali Ghodsi, Reynold Xin, and Michael Armbrust. 2021. Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. In Proceeding of the 11th Conference on Innovative Data Systems Research, (CIDR). http://cidrdb.org/cidr2021/papers/cidr2021_paper17.pdfGoogle Scholar
Index Terms
- A Data Analysis Privacy Regulation Compliance Scheme for Lakehouse
Recommendations
How Data Protection Regulation Affects Startup Innovation
AbstractWhile many data-driven businesses have seen rapid growth in recent years, their business development might be highly contingent upon data protection regulation. While it is often claimed that stricter regulation penalizes firms, there is only ...
Privacy: Front and Center
In the 10 years since IEEE Security & Privacy's initial launch, privacy has moved from being a side story occasionally covered in the newspaper to a central issue of our times. With the Internet, through the rise of online social networks, tracking ...
Comments