research-article

Detecting data misuse by applying context-based data linkage

Authors:

Yuval EloviciAuthors Info & Claims

Insider Threats '10: Proceedings of the 2010 ACM workshop on Insider threats

Pages 3 - 12

https://doi.org/10.1145/1866886.1866890

Published: 08 October 2010 Publication History

Abstract

Detecting data leakage/misuse poses a great challenge for organizations. Whether caused by malicious intent or an inadvertent mistake, data leakage/misuse can diminish a company's brand, reduce shareholder value, and damage the company's goodwill and reputation. This challenge is intensified when trying to detect and/or prevent data leakage/misuse performed by an insider with legitimate permissions to access the organization's systems and its critical data. In this paper we propose a new approach for identifying suspicious insiders who can access data stored in a database via an application. In the proposed method suspicious access to sensitive data is detected by analyzing the result-sets sent to the user following a request that the user submitted. Result-sets are analyzed within the instantaneous context in which the request was submitted. From the analysis of the result-set and the context we derive a "level of anomality". If the derived level is above a predefined threshold, an alert can be sent to the security officer. The proposed method applies data-linkage techniques in order to link the contextual features and the result-sets. Machine learning algorithms are then employed for generating a behavioral model during a learning phase. The behavioral model encapsulates knowledge on the behavior of a user; i.e., the characteristics of the result-sets of legitimate or malicious requests. This behavioral model is used for identifying malicious requests based on their abnormality. An evaluation with sanitized data shows the usefulness of the proposed method in detecting data misuse.

References

[1]

}}Chung, C. Y., Gertz, M., and Levitt, K. 1999. DEMIDS: A Misuse Detection System for Database Systems. In proceedings of the Conference on Integrity and Internal Control in Information Systems, 159--178.

Digital Library

[2]

}}2010 CyberSecurity Watch Survey, http://www.cert.org/archive/pdf/ecrimesummary10.pdf

[3]

}}Kamra, A., Terzi, E., Evimaria, and Bertino, E. 2008. Detecting Anomalous Access Patterns in Relational Databases. International Journal on Very Large Databases, 17, 5, 1063--1077.

Digital Library

[4]

}}Sunu, M., Michalis, P., Hung, N., and Shambhu, U. 2009. A Data-Centric Approach to Insider Attack Detection in Database Systems. Technical Report.

[5]

}}Fellegi, I. P., and Sunter, A. B. 1969. A theory for Record Linkage. Journal of American Statistical Society, 64, 328, 1183--1210.

[6]

}}Newcombe, H. B., and Kennedy, J. M. 1962. Record linkage: making maximum use of the discriminating power of identifying information. Communications of the ACM, 5, 11, 563--566.

Digital Library

[7]

}}Christen, P., and Goiser, K. 2007. Quality and Complexity Measures for Data Linkage and Deduplications. In Quality Measures in Data Mining, F. Guillet, and H. J. Hammilton Ed. Springer, Berlin / Heidelberg. 127--151.

[8]

}}Cohen W. W. 1998. Integration of heterogeneous databases without common domains using queries based on textual similarity. In Proceedings of ACM SIGMOD, Seattle, 201--212.

Digital Library

[9]

}}Galhardas, H., Florescu, D., and Shasha, D., Simon, E. 2000. An extensible framework for data cleaning. In Proceedings of ICDE, 312.

Digital Library

[10]

}}Nahm, U. Y., Bilenko, M., and Mooney, R. J. 2002. Two approaches to handling noisy variation in text mining. In Proceedings of the ICML-2002 workshop on text learning (Sydney, Australia), 18--27.

[11]

}}Mccallum, A., Nigam, K., and Ungar, L. H. 2000. Efficient clustering of high-dimensional datasets with application to reference matching. In Proceedings of ACM SIGKDD, Boston, 169--178.

Digital Library

[12]

}}Gu, L., and Baxter, R. 2004. Decision models for record linkage. In Proceedings of the 3rd Australasian data mining conference, Cairns, 241--254.

[13]

}}Sarawagi, S., Bhamidipaty, A. 2002. Interactive deduplication using active learning. In Proceedings of ACM SIGKDD, Edmonton, 269--278.

Digital Library

[14]

}}Fonseca, J., Vieira, M., and Madeira, H. 2008. Online Detection of Malicious Data Access Using DBMS Auditing. In Proceedings of the 2008 ACM symposium on Applied Computing, 1013--1020.

Digital Library

[15]

}}Hu, Y., and Panda, B. 2003. Identification of Malicious Transactions in Database Systems. In Proceedings of 7th International Symposium on Database Engineering and Applications, 329--335.

[16]

}}Lee, S. Y., Low, W. L. and Wong, P. Y. 2002. Learning Fingerprints for a Database Intrusion Detection System. In Proceedings of the ESORICS, (Zurich, Switzerland, October 14-16, 2002). 2502, 2002, 264--279.

Digital Library

[17]

}}Spalka, E., and Lehnhardt, J. 2005. A Comprehensive Approach to Anomaly Detection in Relational Databases. In Proceedings of the Annual Working Conference on Data and Applications Security. 3654, 2005, 207--211.

Digital Library

[18]

}}Srivastava, A., Sural, S., and Majumdar, A. K. 2006. Database Intrusion Detection using Weighted Sequence Mining. Journal of Computers, 1, 4. 8--17.

[19]

}}Wenhui, S., and Tan, D. 2001. A novel intrusion detection system model for securing web-based database systems. In IEEE Proceedings of the 25th International Computer Software and Applications Conference on Invigorating Software Development.

Digital Library

[20]

}}Valeur, F., Mutz, D., and Vigna, G. 2005. A Learning-Based Approach to the Detection of SQL Attacks. In Proceedings of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment.

Digital Library

[21]

}}Quinlan, J. R. 1993. C4. 5: programs for machine learning. Morgan Kaufmann.

Digital Library

[22]

}}Frank, E., Hall, M. A., Holmes, G., Kirkby, R., and Pfahringer, B. 2005. WEKA - A Machine Learning Workbench for Data Mining. The Data Mining and Knowledge Discovery Handbook, 1305--1314.

[23]

}}Ferri, C., Flach, P., and Hernández-Orallo, J. 2002. Learning Decision Trees Using the Area Under the ROC Curve. In Proceedings of the Nineteenth International Conference on Machine Learning. 139--146.

Digital Library

Cited By

Domnik JHolland A(2024)On Data Leakage Prevention Maturity: Adapting the C2M2 FrameworkJournal of Cybersecurity and Privacy10.3390/jcp40200094:2(167-195)Online publication date: 30-Mar-2024
https://doi.org/10.3390/jcp4020009
Selvan MSowmith RDheeraj PJancy S(2021)High Secured Data Access and Leakage Detection Using Attribute-Based EncryptionAdvances in Electronics, Communication and Computing10.1007/978-981-15-8752-8_44(433-445)Online publication date: 29-Jan-2021
https://doi.org/10.1007/978-981-15-8752-8_44
Yeng PNweke LYang BAli Fauzi MSnekkenes E(2020)Artificial Intelligence-based framework for analyzing healthcare staff security practice: A Simulated study based on a Review. (Preprint)JMIR Medical Informatics10.2196/19250Online publication date: 9-Apr-2020
https://doi.org/10.2196/19250
Show More Cited By

Index Terms

Detecting data misuse by applying context-based data linkage
1. Security and privacy
  1. Database and storage security
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Theory of database privacy and security

Recommendations

Be careful of when: an empirical study on time-related misuse of issue tracking data
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Issue tracking data have been used extensively to aid in predicting or recommending software development practices. Issue attributes typically change over time, but users may use data from a separate time of data collection rather than the time of their ...
M-score: estimating the potential damage of data leakage incident by assigning misuseability weight
Insider Threats '10: Proceedings of the 2010 ACM workshop on Insider threats

Over the past few years data leakage and data misuse have become a major concern for organizations. A data leakage or data misuse incident can damage an organization's reputation and brand name as well as compromise the privacy of its customers. Much ...
Behavioral Study of Users When Interacting with Active Honeytokens

Active honeytokens are fake digital data objects planted among real data objects and used in an attempt to detect data misuse by insiders. In this article, we are interested in understanding how users (e.g., employees) behave when interacting with ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

Insider Threats '10: Proceedings of the 2010 ACM workshop on Insider threats

October 2010

70 pages

ISBN:9781450300926

DOI:10.1145/1866886

General Chair:
Ehab Al-Shaer
University of North Carolina at Charlotte, USA
,
Program Chairs:
Brent Lagesse
Oak Ridge National Laboratory, USA
,
Craig Shue
Oak Ridge National Laboratory, USA

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS '10

Sponsor:

SIGSAC

CCS '10: 17th ACM Conference on Computer and Communications Security 2010

October 8, 2010

Illinois, Chicago, USA

Acceptance Rates

Insider Threats '10 Paper Acceptance Rate 7 of 14 submissions, 50%;

Overall Acceptance Rate 7 of 14 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
518
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Domnik JHolland A(2024)On Data Leakage Prevention Maturity: Adapting the C2M2 FrameworkJournal of Cybersecurity and Privacy10.3390/jcp40200094:2(167-195)Online publication date: 30-Mar-2024
https://doi.org/10.3390/jcp4020009
Selvan MSowmith RDheeraj PJancy S(2021)High Secured Data Access and Leakage Detection Using Attribute-Based EncryptionAdvances in Electronics, Communication and Computing10.1007/978-981-15-8752-8_44(433-445)Online publication date: 29-Jan-2021
https://doi.org/10.1007/978-981-15-8752-8_44
Yeng PNweke LYang BAli Fauzi MSnekkenes E(2020)Artificial Intelligence-based framework for analyzing healthcare staff security practice: A Simulated study based on a Review. (Preprint)JMIR Medical Informatics10.2196/19250Online publication date: 9-Apr-2020
https://doi.org/10.2196/19250
Yeng PNweke LWoldaregay AYang BSnekkenes E(2020)Data-Driven and Artificial Intelligence (AI) Approach for Modelling and Analyzing Healthcare Security Practice: A Systematic ReviewIntelligent Systems and Applications10.1007/978-3-030-55180-3_1(1-18)Online publication date: 25-Aug-2020
https://doi.org/10.1007/978-3-030-55180-3_1
Dattana VGupta KKush A(2019)A Probability based Model for Big Data Security in Smart City2019 4th MEC International Conference on Big Data and Smart City (ICBDSC)10.1109/ICBDSC.2019.8645607(1-6)Online publication date: Jan-2019
https://doi.org/10.1109/ICBDSC.2019.8645607
Katz GElovici YShapira B(2019)CoBAnInformation Sciences: an International Journal10.1016/j.ins.2013.10.005262(137-158)Online publication date: 6-Jan-2019
https://dl.acm.org/doi/10.1016/j.ins.2013.10.005
Carminati BFerrari EGuglielmi M(2018)Detection of Unspecified Emergencies for Controlled Information SharingIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2015.242784613:6(630-643)Online publication date: 21-Dec-2018
https://dl.acm.org/doi/10.1109/TDSC.2015.2427846
Walker-Roberts SHammoudeh MDehghantanha A(2018)A Systematic Review of the Availability and Efficacy of Countermeasures to Internal Threats in Healthcare Critical InfrastructureIEEE Access10.1109/ACCESS.2018.28175606(25167-25177)Online publication date: 2018
https://doi.org/10.1109/ACCESS.2018.2817560
Walker-Roberts SHammoudeh M(2018)Artificial Intelligence Agents as Mediators of Trustless Security Systems and Distributed Computing ApplicationsGuide to Vulnerability Analysis for Computer Networks and Systems10.1007/978-3-319-92624-7_6(131-155)Online publication date: 5-Sep-2018
https://doi.org/10.1007/978-3-319-92624-7_6
Gupta ISingh A(2017)A Probability based Model for Data Leakage Detection using BigraphProceedings of the 2017 7th International Conference on Communication and Network Security10.1145/3163058.3163060(1-5)Online publication date: 24-Nov-2017
https://dl.acm.org/doi/10.1145/3163058.3163060
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten