skip to main content
10.1145/2663876.2663885acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Privacy Preserving Data Quality Assessment for High-Fidelity Data Sharing

Published: 03 November 2014 Publication History

Abstract

In a data-driven economy that struggles to cope with the volume and diversity of information, data quality assessment has become a necessary precursor to data analytics. Real-world data often contains inconsistencies, conflicts and errors. Such dirty data increases processing costs and has a negative impact on analytics. Assessing the quality of a dataset is especially important when a party is considering acquisition of data held by an untrusted entity. In this scenario, it is necessary to consider privacy risks of the stakeholders.
This paper examines challenges in privacy-preserving data quality assessment. A two-party scenario is considered, consisting of a client that wishes to test data quality and a server that holds the dataset. Privacy-preserving protocols are presented for testing important data quality metrics: completeness, consistency, uniqueness, timeliness and validity. For semi-honest parties, the protocols ensure that the client does not discover any information about the data other than the value of the quality metric. The server does not discover the parameters of the client's query, the specific attributes being tested and the computed value of the data quality metric. The proposed protocols employ additively homomorphic encryption in conjunction with condensed data representations such as counting hash tables and histograms, serving as efficient alternatives to solutions based on private set intersection.

References

[1]
J. Plansky, J. Solomon, R. Karp, and C. Drisko. The Data Gold Rush, Strategy Report 2013. http://www.strategyand.pwc.com/media/ file/Strategyand_The-Data-Gold-Rush.pdf, 2013.
[2]
Thomas C Redman. The impact of poor data quality on the typical enterprise. Communications of the ACM, 41(2), 1998.
[3]
Maurizio Lenzerini. Data integration: A theoretical perspective. In PODS, 2002.
[4]
Wayne W Eckerson. Data quality and the bottom line. TDWI Report, The Data Warehouse Institute, 2002.
[5]
YangWLee, DianeMStrong, Beverly K Kahn, and Richard YWang. AIMQ: a methodology for information quality assessment. Information & management, 40(2), 2002.
[6]
Diane M Strong, Yang W Lee, and Richard Y Wang. Data quality in context. Communications of the ACM, 40(5), 1997.
[7]
Marcelo Arenas, Leopoldo Bertossi, and Jan Chomicki. Consistent query answers in inconsistent databases. In PODS, 1999.
[8]
Floris Geerts, Giansalvatore Mecca, Paolo Papotti, and Donatello Santoro. The llunatic data-cleaning framework. VLDB Endowment, 6(9), 2013.
[9]
Wenfei Fan. Dependencies revisited for improving data quality. In PODS, 2008.
[10]
Gao Cong, Wenfei Fan, Floris Geerts, Xibei Jia, and Shuai Ma. Improving data quality: Consistency and accuracy. In VLDB, 2007.
[11]
An Act. Health insurance portability and accountability act of 1996. Public Law, 104:191, 1996.
[12]
Fabio Soldo, Anh Le, and Athina Markopoulou. Predictive blacklisting as an implicit recommendation system. In INFOCOM, 2010.
[13]
Sachin Katti, Balachander Krishnamurthy, and Dina Katabi. Collaborating against common enemies. In IMC, 2005.
[14]
Ernesto Damiani, S De Capitani di Vimercati, Stefano Paraboschi, and Pierangela Samarati. P2P-based collaborative spam detection and filtering. In P2P, 2004.
[15]
George Oikonomou, Jelena Mirkovic, Peter Reiher, and Max Robinson. A framework for a collaborative DDoS defense. In ACSAC, 2006.
[16]
Brent Tzion Hailpern, Peter Kenneth Malkin, Robert Jeffrey Schloss, Steve R White, Philip Shi-Lung Yu, and Charles Campbell Palmer. Collaborative server processing of content and meta-information with application to virus checking in a server network, 2001. US Patent 6,275,937.
[17]
Phillip Porras and Vitaly Shmatikov. Large-scale collection and sanitization of network security data: risks and challenges. In Workshop on New security paradigms, 2006.
[18]
Jian Zhang, Phillip A Porras, and Johannes Ullrich. Highly predictive blacklisting. In USENIX Security, 2008.
[19]
B. Applebaum, H. Ringberg, M.J. Freedman, M. Caesar, and J. Rexford. Collaborative, privacy-preserving data aggregation at scale. In PETS, 2010.
[20]
Patrick Lincoln, Phillip Porras, and Vitally Shmatikov. Privacypreserving sharing and correction of security alerts. In USENIX Security, 2004.
[21]
Martin Burkhart, Mario Strasser, Dilip Many, and Xenofontas Dimitropoulos. SEPIA: Privacy-preserving aggregation of multi-domain network events and statistics. In Usenix Security, 2010.
[22]
Auguste Kerckhoffs. La Cryptographie Militaire. University Microfilms, 1978.
[23]
Phillip Cykana, Alta Paul, and Miranda Stern. DoD Guidelines on Data Quality Management. In IQ, pages 154--171, 1996.
[24]
Pascal Paillier. Public-key cryptosystems based on composite degree residuosity classes. In EUROCRYPT, 1999.
[25]
I. Damgård and M. Jurik. A Generalisation, a Simplification and Some Applications of Paillier's Probabilistic Public-Key System. In Workshop on Practice and Theory in Public Key Cryptosystems, pages 119--136, 2001.
[26]
Michael Freedman, Kobbi Nissim, and Benny Pinkas. Efficient private matching and set intersection. In EUROCRYPT, 2004.
[27]
R. Agrawal, A. Evfimievski, and R. Srikant. Information sharing across private databases. In SIGMOD, 2003.
[28]
S. Hohenberger and S. Weis. Honest-verifier private disjointness testing without random oracles. In PETS, 2006.
[29]
E. De Cristofaro, P. Gasti, and G. Tsudik. Fast and Private Computation of Cardinality of Set Intersection and Union. In CANS, 2012.
[30]
Lea Kissner and Dawn Song. Privacy-preserving set operations. In CRYPTO, 2005.
[31]
S. Jarecki and X. Liu. Fast secure computation of set intersection. In SCN, 2010.
[32]
Seny Kamara, Payman Mohassel, Mariana Raykova, and Saeed Sadeghian. Scaling private set intersection to billion-element sets. In FC, 2014.
[33]
Changyu Dong, Liqun Chen, and Zikai Wen. When Private Set Intersection Meets Big Data: An Efficient and Scalable Protocol. In CCS, 2013.
[34]
Benny Pinkas, Thomas Schneider, and Michael Zohner. Faster private set intersection based on ot extension. In USENIX Security, 2014.
[35]
Dan Boneh and Brent Waters. Conjunctive, Subset, and Range Queries on Encrypted Data. In TCC, 2007.
[36]
Giuseppe Ateniese, Randal Burns, Reza Curtmola, Joseph Herring, Osama Khan, Lea Kissner, Zachary Peterson, and Dawn Song. Remote data checking using provable data possession. TISSEC, 14(1):12, 2011.
[37]
QianWang, CongWang, Jin Li, Kui Ren, andWenjing Lou. Enabling public verifiability and data dynamics for storage security in cloud computing. In ESORICS. 2009.
[38]
Hovav Shacham and Brent Waters. Compact proofs of retrievability. In ASIACRYPT. 2008.
[39]
Ari Juels and Burton S Kaliski Jr. PORs: Proofs of retrievability for large files. In CCS, 2007.
[40]
Cong Wang, Qian Wang, Kui Ren, and Wenjing Lou. Privacypreserving public auditing for data storage security in cloud computing. In INFOCOM, 2010.
[41]
Y. Lindell and B. Pinkas. Privacy Preserving Data Mining. In CRYPTO, 2000.
[42]
Rakesh Agrawal and Ramakrishnan Srikant. Privacy-preserving data mining. ACM Sigmod Record, 29(2), 2000.
[43]
Valeria Nikolaenko, UdiWeinsberg, Stratis Ioannidis, Marc Joye, Dan Boneh, and Nina Taft. Privacy-preserving ridge regression on hundreds of millions of records. In S&P, 2013.
[44]
Wenliang Du and Mikhail J Atallah. Privacy-preserving cooperative statistical analysis. In ACSAC, 2001.
[45]
Y. Huang, D. Evans, and J. Katz. Private Set Intersection: Are Garbled Circuits Better than Custom Protocols? In NDSS, 2012.
[46]
E. De Cristofaro and G. Tsudik. Practical private set intersection protocols with linear complexity. In FC, 2010.
[47]
E. De Cristofaro and G. Tsudik. Experimenting with fast private set intersection. In TRUST, 2012.
[48]
Marcin Nagy, Emiliano De Cristofaro, Alexandra Dmitrienko, N Asokan, and Ahmad-Reza Sadeghi. Do I know you?: efficient and privacy-preserving common friend-finder protocols and applications. In ACSAC, 2013.
[49]
Dilip Many, Martin Burkhart, and Xenofontas Dimitropoulos. Fast private set operations with SEPIA. Technical report, 2012.
[50]
Florian Kerschbaum. Public-key encrypted Bloom filters with applications to supply chain integrity. In Data and Applications Security and Privacy. 2011.
[51]
Steven Michael Bellovin andWilliam R Cheswick. Privacy-enhanced searches using encrypted Bloom filters. 2007.
[52]
Femi Olumofin and Ian Goldberg. Privacy-preserving Queries over Relational Databases. In PETS, 2010.
[53]
Adam Slagell and William Yurcik. Sharing computer network logs for security and privacy: A motivation for new methodologies of anonymization. In Security and Privacy for Emerging Areas in Communication Networks, 2005.
[54]
Jun Xu, Jinliang Fan, Mostafa H Ammar, and Sue B Moon. Prefixpreserving IP address anonymization: Measurement-based security evaluation and a new cryptography-based scheme. In ICNP, 2002.
[55]
Eytan Adar. User 4xxxxx9: Anonymizing query logs. In Query Log Analysis Workshop, 2007.
[56]
Scott E Coull, Charles V Wright, Fabian Monrose, Michael P Collins, Michael K Reiter, et al. Playing Devil's Advocate: Inferring Sensitive Information from Anonymized Network Traces. In NDSS, 2007.
[57]
Erin Kenneally and Kimberly Claffy. Dialing privacy and utility: a proposed data-sharing framework to advance Internet research. IEEE S&P, 8(4), 2010.
[58]
Kiran Lakkaraju and Adam Slagell. Evaluating the utility of anonymized network traces for intrusion detection. In SECURECOMM, 2008.
[59]
Shishir Nagaraja, Prateek Mittal, Chi-Yao Hong, Matthew Caesar, and Nikita Borisov. BotGrep: Finding Bots with Structured Graph Analysis. In Usenix Security, 2010.
[60]
Titan Threat Intelligence System. http://www.gtresearchnews.gatech.edu/titan-threat-intelligence-system/, 2013.

Cited By

View all
  • (2024)Privacy-Preserving Data Quality Assessment for Time-Series IoT Sensors2024 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)10.1109/IoTaIS64014.2024.10799255(51-57)Online publication date: 28-Nov-2024
  • (2024)Safety and Reliability of Artificial Intelligence SystemsArtificial Intelligence for Safety and Reliability Engineering10.1007/978-3-031-71495-5_9(185-199)Online publication date: 29-Sep-2024
  • (2021) Protecting the Moving User’s Locations by Combining Differential Privacy and -Anonymity under Temporal Correlations in Wireless Networks Wireless Communications and Mobile Computing10.1155/2021/66919752021(1-12)Online publication date: 2-Feb-2021
  • Show More Cited By

Index Terms

  1. Privacy Preserving Data Quality Assessment for High-Fidelity Data Sharing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WISCS '14: Proceedings of the 2014 ACM Workshop on Information Sharing & Collaborative Security
    November 2014
    110 pages
    ISBN:9781450331517
    DOI:10.1145/2663876
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 November 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cryptographic protocols
    2. data quality assessment
    3. privacy and confidentiality

    Qualifiers

    • Research-article

    Conference

    CCS'14
    Sponsor:

    Acceptance Rates

    WISCS '14 Paper Acceptance Rate 9 of 18 submissions, 50%;
    Overall Acceptance Rate 23 of 58 submissions, 40%

    Upcoming Conference

    CCS '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)30
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Privacy-Preserving Data Quality Assessment for Time-Series IoT Sensors2024 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)10.1109/IoTaIS64014.2024.10799255(51-57)Online publication date: 28-Nov-2024
    • (2024)Safety and Reliability of Artificial Intelligence SystemsArtificial Intelligence for Safety and Reliability Engineering10.1007/978-3-031-71495-5_9(185-199)Online publication date: 29-Sep-2024
    • (2021) Protecting the Moving User’s Locations by Combining Differential Privacy and -Anonymity under Temporal Correlations in Wireless Networks Wireless Communications and Mobile Computing10.1155/2021/66919752021(1-12)Online publication date: 2-Feb-2021
    • (2021)Enabling Secure Trustworthiness Assessment and Privacy Protection in Integrating Data for Trading Person-Specific InformationIEEE Transactions on Engineering Management10.1109/TEM.2020.297421068:1(149-169)Online publication date: Feb-2021
    • (2021)Addressing the privacy paradox on the organizational level: review and future directionsManagement Review Quarterly10.1007/s11301-021-00239-473:1(263-296)Online publication date: 13-Sep-2021
    • (2016)Privacy Risk in Cybersecurity Data SharingProceedings of the 2016 ACM on Workshop on Information Sharing and Collaborative Security10.1145/2994539.2994541(57-64)Online publication date: 24-Oct-2016
    • (2015)Controlled Data Sharing for Collaborative Predictive BlacklistingProceedings of the 12th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment - Volume 914810.1007/978-3-319-20550-2_17(327-349)Online publication date: 9-Jul-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media