skip to main content
10.1145/1247480.1247491acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Auditing disclosure by relevance ranking

Published: 11 June 2007 Publication History

Abstract

Numerous widely publicized cases of theft and misuse of private information underscore the need for audit technology to identify the sources of unauthorized disclosure. We present an auditing methodology that ranks potential disclosure sources according to their proximity to the leaked records. Given a sensitive table that contains the disclosed data, our methodology prioritizes by relevance the past queries to the database that could have potentially been used to produce the sensitive table. We provide three conceptually different measures of proximity between the sensitive table and a query result. One measure is inspired by information retrieval in text processing, another is based on statistical record linkage, and the third computes the derivation probability of the sensitive table in a tree-based generative model. We also analyze the characteristics of the three measures and the corresponding ranking algorithms.

References

[1]
R. Agrawal, R. Bayardo, C. Faloutsos, J. Kiernan, R. Rantzau, and R. Srikant. Auditing compliance using a hippocratic database. In 30th Int'l Conf. on Very Large Data Bases, Toronto, Canada, August 2004.
[2]
R. Agrawal, P. J. Haas, and J. Kiernan. Watermarking relational databases. VLDB Journal, 12(2):157--169, August 2003.
[3]
Australian privacy act of 1998, 1998. http://www.privacy.gov.au/ACT/privacyact/.
[4]
T. R. Belin and D. B. Rubin. A method for calibrating false match rates in record linkage. Journal of the American Statistical Assocation, 90(430):694--707, June 1995.
[5]
Personal information protection and electronic documents act, second session, thirty-sixth parliament, 48-49 elizabeth ii, 1999-2000, statutes of canada, 2000.
[6]
M. Cochinwala, S. Dalal, A. K. Elmagarmid, and V. S. Verykios. Record matching: Past, present and future. Technical Report CSD-TR #01-013, Department of Computer Sciences, Purdue University, July 2001.
[7]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1--38, 1977.
[8]
European Union Directive on Data Protection, Official Journal of the European Communities, 1995.
[9]
I. P. Fellegi and A. B. Sunter. A theory for record linkage. Journal of the American Statistical Association, 64:1183--1210, December 1969.
[10]
L. A. Goodman. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61:215--231, 1974.
[11]
R. L. Graham, M. Grötschel, and L. Lovász, editors. Handbook of Combinatorics, volume 2, chapter 21, page 1024. Elsevier Science B. V., 1995.
[12]
L. Gu, R. Baxter, D. Vickers, and C. Rainsford. Record linkage: Current practice and future directions. Technical Report 03/83, CSIRO Mathematical and Information Sciences, GPO Box664, Canberra 2601, Australia, April 2003.
[13]
H. O. Hartley. Maximum likelihood estimation from incomplete data. Biometrics, 14:174--194, 1958.
[14]
Health insurance portability and accountability act of 1996, united states public law 104-191, 1996. http://www.hhs.gov/ocr/hipaa.
[15]
M. A. Jaro. Advances in record linkage methodology as applied to matching the 1985 census of Tampa, Florida. Journal of the American Statistical Association, 84:414--420, 1989.
[16]
H. W. Kuhn. The hungarian method for the assignment problem. Naval Research Logistics Quaterly, 2:83--97, 1955.
[17]
M. D. Larsen and D. B. Rubin. Iterative automated record linkage using mixture models. Journal of the American Statistical Association, 96:32--41, 2001.
[18]
Y. Li, V. Swarup, and S. Jajodia. Fingerprinting relational databases: Schemes and specialties. IEEE Trans. Dependable Sec. Computing (TDSC),2(1):34--45, 2005.
[19]
G. McLachlan and T. Krishnan. The EM Algorithmand Extensions. Wiley-Interscience, November 1996.
[20]
G. McLachlan and D. Peel. Finite Mixture Models. Wiley-Interscience, October 2000.
[21]
G. Miklau and D. Suciu. A formal analysis of information disclosure in data exchange. In Proc. of the 2004 ACM SIGMOD Int'l Conf. on Management of Data, Paris, France, June 2004.
[22]
J. Munkres. Algorithms for the assignment and transportation problems. Journal of the Society of Industrial and Applied Mathematics, 5(1):32--38, March 1957.
[23]
A. Nanda and D. K. Burleson. Oracle Privacy Security Auditing. Rampant, 2003.
[24]
President's Information Technology Advisory Committee. Revolutionizing health care through information technology, June 2004.
[25]
J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientific Publ. Co., 1989.
[26]
S. Ruggles, M. Sobek, T. Alexander, C. A. Fitch, R. Goeken, P. K. Hall, M. King, and C. Ronnander. Integrated public use microdata series: Version 3.0, 2004. Machine-readable database.
[27]
V. N. Sachkov. Combinatorial Methods in Discrete Mathematics, chapter 4.2. Cambridge University Press, 1996. Result found in the 2-nd edition, in Russian, 2004.
[28]
G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, New York,1989.
[29]
Wikipedia.org. Hungarian algorithm, March 2006.
[30]
W. E. Winkler. Matching and record linkage. In B. G. Cox, editor, Business Survey Methods, pages 355--384. Wiley, 1995.

Cited By

View all

Index Terms

  1. Auditing disclosure by relevance ranking

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
    June 2007
    1210 pages
    ISBN:9781595936868
    DOI:10.1145/1247480
    • General Chairs:
    • Lizhu Zhou,
    • Tok Wang Ling,
    • Program Chair:
    • Beng Chin Ooi
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. derivation probability
    2. hippocratic database
    3. information retrieval
    4. privacy
    5. record linkage

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS07
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)38
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Hippocratic DatabaseEncyclopedia of Cryptography, Security and Privacy10.1007/978-3-030-71522-9_679(1122-1125)Online publication date: 8-Jan-2025
    • (2017)Blockchain Enabled Privacy Audit LogsThe Semantic Web – ISWC 201710.1007/978-3-319-68288-4_38(645-660)Online publication date: 4-Oct-2017
    • (2012)Pay-as-You-Go ranking of schema mappings using query logsProceedings of the 8th international conference on Data Integration in the Life Sciences10.1007/978-3-642-31040-9_4(37-52)Online publication date: 28-Jun-2012
    • (2010)PolicyReplayProceedings of the VLDB Endowment10.14778/1920841.19208513:1-2(36-47)Online publication date: 1-Sep-2010
    • (2009)A method of deciding the security in publishing viewsProceedings of the 5th International Conference on Wireless communications, networking and mobile computing10.5555/1738467.1738789(5438-5441)Online publication date: 24-Sep-2009
    • (2008)Security deciding in publishing views based on entropyProceedings of the 5th International ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness10.5555/1535571.1535649(1-7)Online publication date: 28-Jul-2008
    • (2008)Hippocratic Databases: Current Capabilities and Future TrendsHandbook of Database Security10.1007/978-0-387-48533-1_17(409-429)Online publication date: 2008

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media