skip to main content
10.1145/3264560.3266429acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccbdcConference Proceedingsconference-collections
research-article

Scalable Privacy-Preserving Big Data Management and Analytics: Where We Are and Where We Are Going

Published:03 August 2018Publication History

ABSTRACT

While several research efforts have been developed in the context of privacy-preserving big data management and analytics re- cently, relevant challenges arise when such models, techniques and algorithms must be delivered on top of massive, distributed big data repositories. This problem opens the door to the design of innovative models, techniques and algorithms that, contrary to actual proposals, are able to inject the scalability feature during the privacy-preserving big data management and analytics phase. On the basis of these considerations, this paper provides an overview on actual problems and limitations of state-of-the-art techniques, along with the proposal of an effective framework for supporting scalable privacy-preserving big data management and analytics.

References

  1. Abdulaziz Albatli, David McKee, Paul Townend, Lydia Lau, and Jie Xu. 2017. PROV-TE: A Provenance-Driven Diagnostic Framework for Task Eviction in Data Centers. In Third IEEE International Conference on Big Data Computing Service and Applications, Big Data Service 2017, Redwood City, CA, USA, April 6-9, 2017. 233--242.Google ScholarGoogle Scholar
  2. Yael Amsterdamer, Daniel Deutch, and Val Tannen. 2011. Provenance for aggregate queries. In Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2011, June 12-16, 2011, Athens, Greece. 153--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. David W. Archer, Lois M. L. Delcambre, and David Maier. 2013. User Trust and Judgments in a Curated Database with Explicit Provenance. In In Search of Elegance in the Theory and Practice of Computation - Essays Dedicated to Peter Buneman. 89--111.Google ScholarGoogle Scholar
  4. Flavio Costa, Vítor Silva Sousa, Daniel de Oliveira, Kary A. C. S. Ocaña, and Marta Mattoso. 2014. Towards Supporting Provenance Gathering and Querying in Different Database Approaches. In Provenance and Annotation of Data and Processes - 5th International Provenance and Annotation Workshop, IPAW 2014, Cologne, Germany, June 9-13, 2014. Revised Selected Papers. 254--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alfredo Cuzzocrea. 2014. Privacy and Security of Big Data: Current Challenges and Future Research Perspectives. In Proceedings of the First International Work- shop on Privacy and Security of Big Data, PSBD@CIKM 2014, Shanghai, China, November 7, 2014. 45--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alfredo Cuzzocrea. 2016. Big Data Provenance: State-Of-The-Art Analysis and Emerging Research Challenges. In Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference, EDBT/ICDT Workshops 2016, Bordeaux, France, March 15, 2016. http://ceur-ws.org/Vol-1558/paper37.pdfGoogle ScholarGoogle Scholar
  7. Alfredo Cuzzocrea. 2017. Scalable OLAP-Based Big Data Analytics over Cloud Infrastructures: Models, Issues, Algorithms. In Proceedings of the 2017 International Conference on Cloud and Big Data Computing, ICCBDC 2017, London, United Kingdom, September 17 - 19, 2017. 17--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Alfredo Cuzzocrea and Elisa Bertino. 2011. Privacy Preserving OLAP over Distributed XML Data: A Theoretically-Sound Secure-Multiparty-Computation Ap- proach. J. Comput. Syst. Sci. 77, 6 (2011), 965--987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Alfredo Cuzzocrea and Ernesto Damiani. 2018. Pedigree-ing Your Big Data: Data-Driven Big Data Privacy in Distributed Environments. In 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018, Washington, DC, USA, May 1-4, 2018. 675--681.Google ScholarGoogle Scholar
  10. Alfredo Cuzzocrea and Dimitrios Gunopulos. 2014. A Decomposition Frame- work for Computing and Querying Multidimensional OLAP Data Cubes over Probabilistic Relational Data. Fundam. Inform. 132, 2 (2014), 239--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Alfredo Cuzzocrea, Vincenzo Russo, and Domenico Saccà. 2008. A Robust Sampling-Based Framework for Privacy Preserving OLAP. In Data Ware- housing and Knowledge Discovery, 10th International Conference, DaWaK 2008, Turin, Italy, September 2-5, 2008, Proceedings. 97--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alfredo Cuzzocrea and Domenico Saccà. 2010. Balancing accuracy and privacy of OLAP aggregations on data cubes. In DOLAP 2010, ACM 13th International Workshop on Data Warehousing and OLAP, Toronto, Ontario, Canada, October 30, 2010, Proceedings. 93--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alfredo Cuzzocrea, Il-Yeol Song, and Karen C. Davis. 2011. Analytics over large- scale multidimensional data: the big data revolution!. In DOLAP 2011, ACM 14th International Workshop on Data Warehousing and OLAP, Glasgow, United Kingdom, October 28, 2011, Proceedings. 101--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Renata Queiroz Dividino, Gerd Gröner, Stefan Scheglmann, and Matthias Thimm. 2012. Ranking RDF with Provenance via Preference Aggregation. In Knowledge Engineering and Knowledge Management - 18th International Conference, EKAW 2012, Galway City, Ireland, October 8-12, 2012. Proceedings. 154--163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fausto Giunghiglia and Moaz Reyad. 2014. Provenance in Open Data Entity-Centric Aggregation. In Provenance and Annotation of Data and Processes - 5th International Provenance and Annotation Workshop, IPAW 2014, Cologne, Germany, June 9-13, 2014. Revised Selected Papers. 232--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Grigoris Karvounarakis, Todd J. Green, Zachary G. Ives, and Val Tannen. 2013. Collaborative data sharing via update exchange and provenance. ACM Trans. Database Syst. 38, 3 (2013), 19:1--19:42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Christian Lettner, Mario Pichler, Wilhelm Kirchmayr, Friedrich Kokert, and Mar- tin Habringer. 2013. RDFreduce: Customized Aggregations with Provenance for RDF Data based on an Industrial Use Case. In The 15th International Conference on Information Integration and Web-based Applications & Services, IIWAS '13, Vienna, Austria, December 2-4, 2013. 336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Zhe Liu, Kim-Kwang Raymond Choo, and Minghao Zhao. 2017. Practical-oriented protocols for privacy-preserving outsourced big data analysis: Challenges and future research directions. Computers & Security 69 (2017), 97--113.Google ScholarGoogle ScholarCross RefCross Ref
  19. Syam Menon and Sumit Sarkar. 2016. Privacy and Big Data: Scal- able Approaches to Sanitize Large Transactional Databases for Sharing. MIS Quarterly 40, 4 (2016), 963--981. http://misq.org/ privacy-and-big-data-scalable-approaches-to-sanitize-large-transactional. html Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ivens Portugal, Paulo S. C. Alencar, and Donald D. Cowan. 2016. Towards a provenance-aware spatial-temporal architectural framework for massive data integration and analysis. In 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, December 5-8, 2016. 2686--2691.Google ScholarGoogle Scholar
  21. Asma Rani, Navneet Goyal, and Shashi K. Gadia. 2015. Data Provenance for Historical Queries in Relational Database. In Proceedings of the 8th Annual ACM India Conference, Ghaziabad, India, October 29-31, 2015. 117--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Robert J. Sandusky. 2016. Computational provenance: DataONE and implications for cultural heritage institutions. In 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, December 5-8, 2016. 3266--3271.Google ScholarGoogle ScholarCross RefCross Ref
  23. Pierre Senellart. 2017. Provenance and Probabilities in Relational Databases. SIGMOD Record 46, 4 (2017), 5--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Maryam Sepehri, Stelvio Cimato, Ernesto Damiani, and Chan Yeob Yeun. 2015. Data Sharing on the Cloud: A Scalable Proxy-Based Protocol for Privacy- Preserving Queries. In 2015 IEEE TrustCom/BigDataSE/ISPA, Helsinki, Finland, Au- gust 20-22, 2015, Volume 1. 1357--1362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Salmin Sultana and Elisa Bertino. 2015. A Distributed System for The Management of Fine-grained Provenance. J. Database Manag. 26, 2 (2015), 32--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yucel Tas, Mohamed Jehad Baeth, and Mehmet S. Aktas. 2016. An Approach to Standalone Provenance Systems for Big Social Provenance Data. In 12th International Conference on Semantics, Knowledge and Grids, SKG 2016, Beijing, China, August 15-17, 2016. 9--16.Google ScholarGoogle ScholarCross RefCross Ref
  27. Dongyao Wu, Sherif Sakr, and Liming Zhu. 2017. HDM: Optimized Big Data Processing with Data Provenance. In Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21-24, 2017. 530--533.Google ScholarGoogle Scholar
  28. Xue Yang, Rongxing Lu, Hongbin Liang, and Xiaohu Tang. 2016. SFPM: A Secure and Fine-Grained Privacy-Preserving Matching Protocol for Mobile Social Networking. Big Data Research 3 (2016), 2--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Xuyun Zhang, Wan-Chun Dou, Jian Pei, Surya Nepal, Chi Yang, Chang Liu, and Jinjun Chen. 2015. Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud. IEEE Trans. Computers 64, 8 (2015), 2293--2307.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable Privacy-Preserving Big Data Management and Analytics: Where We Are and Where We Are Going

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICCBDC '18: Proceedings of the 2018 2nd International Conference on Cloud and Big Data Computing
        August 2018
        98 pages
        ISBN:9781450364744
        DOI:10.1145/3264560

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 August 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited
      • Article Metrics

        • Downloads (Last 12 months)12
        • Downloads (Last 6 weeks)4

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader