skip to main content
10.1145/1866886.1866890acmconferencesArticle/Chapter ViewAbstractPublication Pagesinsider-threatsConference Proceedingsconference-collections
research-article

Detecting data misuse by applying context-based data linkage

Published: 08 October 2010 Publication History

Abstract

Detecting data leakage/misuse poses a great challenge for organizations. Whether caused by malicious intent or an inadvertent mistake, data leakage/misuse can diminish a company's brand, reduce shareholder value, and damage the company's goodwill and reputation. This challenge is intensified when trying to detect and/or prevent data leakage/misuse performed by an insider with legitimate permissions to access the organization's systems and its critical data. In this paper we propose a new approach for identifying suspicious insiders who can access data stored in a database via an application. In the proposed method suspicious access to sensitive data is detected by analyzing the result-sets sent to the user following a request that the user submitted. Result-sets are analyzed within the instantaneous context in which the request was submitted. From the analysis of the result-set and the context we derive a "level of anomality". If the derived level is above a predefined threshold, an alert can be sent to the security officer. The proposed method applies data-linkage techniques in order to link the contextual features and the result-sets. Machine learning algorithms are then employed for generating a behavioral model during a learning phase. The behavioral model encapsulates knowledge on the behavior of a user; i.e., the characteristics of the result-sets of legitimate or malicious requests. This behavioral model is used for identifying malicious requests based on their abnormality. An evaluation with sanitized data shows the usefulness of the proposed method in detecting data misuse.

References

[1]
}}Chung, C. Y., Gertz, M., and Levitt, K. 1999. DEMIDS: A Misuse Detection System for Database Systems. In proceedings of the Conference on Integrity and Internal Control in Information Systems, 159--178.
[2]
}}2010 CyberSecurity Watch Survey, http://www.cert.org/archive/pdf/ecrimesummary10.pdf
[3]
}}Kamra, A., Terzi, E., Evimaria, and Bertino, E. 2008. Detecting Anomalous Access Patterns in Relational Databases. International Journal on Very Large Databases, 17, 5, 1063--1077.
[4]
}}Sunu, M., Michalis, P., Hung, N., and Shambhu, U. 2009. A Data-Centric Approach to Insider Attack Detection in Database Systems. Technical Report.
[5]
}}Fellegi, I. P., and Sunter, A. B. 1969. A theory for Record Linkage. Journal of American Statistical Society, 64, 328, 1183--1210.
[6]
}}Newcombe, H. B., and Kennedy, J. M. 1962. Record linkage: making maximum use of the discriminating power of identifying information. Communications of the ACM, 5, 11, 563--566.
[7]
}}Christen, P., and Goiser, K. 2007. Quality and Complexity Measures for Data Linkage and Deduplications. In Quality Measures in Data Mining, F. Guillet, and H. J. Hammilton Ed. Springer, Berlin / Heidelberg. 127--151.
[8]
}}Cohen W. W. 1998. Integration of heterogeneous databases without common domains using queries based on textual similarity. In Proceedings of ACM SIGMOD, Seattle, 201--212.
[9]
}}Galhardas, H., Florescu, D., and Shasha, D., Simon, E. 2000. An extensible framework for data cleaning. In Proceedings of ICDE, 312.
[10]
}}Nahm, U. Y., Bilenko, M., and Mooney, R. J. 2002. Two approaches to handling noisy variation in text mining. In Proceedings of the ICML-2002 workshop on text learning (Sydney, Australia), 18--27.
[11]
}}Mccallum, A., Nigam, K., and Ungar, L. H. 2000. Efficient clustering of high-dimensional datasets with application to reference matching. In Proceedings of ACM SIGKDD, Boston, 169--178.
[12]
}}Gu, L., and Baxter, R. 2004. Decision models for record linkage. In Proceedings of the 3rd Australasian data mining conference, Cairns, 241--254.
[13]
}}Sarawagi, S., Bhamidipaty, A. 2002. Interactive deduplication using active learning. In Proceedings of ACM SIGKDD, Edmonton, 269--278.
[14]
}}Fonseca, J., Vieira, M., and Madeira, H. 2008. Online Detection of Malicious Data Access Using DBMS Auditing. In Proceedings of the 2008 ACM symposium on Applied Computing, 1013--1020.
[15]
}}Hu, Y., and Panda, B. 2003. Identification of Malicious Transactions in Database Systems. In Proceedings of 7th International Symposium on Database Engineering and Applications, 329--335.
[16]
}}Lee, S. Y., Low, W. L. and Wong, P. Y. 2002. Learning Fingerprints for a Database Intrusion Detection System. In Proceedings of the ESORICS, (Zurich, Switzerland, October 14-16, 2002). 2502, 2002, 264--279.
[17]
}}Spalka, E., and Lehnhardt, J. 2005. A Comprehensive Approach to Anomaly Detection in Relational Databases. In Proceedings of the Annual Working Conference on Data and Applications Security. 3654, 2005, 207--211.
[18]
}}Srivastava, A., Sural, S., and Majumdar, A. K. 2006. Database Intrusion Detection using Weighted Sequence Mining. Journal of Computers, 1, 4. 8--17.
[19]
}}Wenhui, S., and Tan, D. 2001. A novel intrusion detection system model for securing web-based database systems. In IEEE Proceedings of the 25th International Computer Software and Applications Conference on Invigorating Software Development.
[20]
}}Valeur, F., Mutz, D., and Vigna, G. 2005. A Learning-Based Approach to the Detection of SQL Attacks. In Proceedings of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment.
[21]
}}Quinlan, J. R. 1993. C4. 5: programs for machine learning. Morgan Kaufmann.
[22]
}}Frank, E., Hall, M. A., Holmes, G., Kirkby, R., and Pfahringer, B. 2005. WEKA - A Machine Learning Workbench for Data Mining. The Data Mining and Knowledge Discovery Handbook, 1305--1314.
[23]
}}Ferri, C., Flach, P., and Hernández-Orallo, J. 2002. Learning Decision Trees Using the Area Under the ROC Curve. In Proceedings of the Nineteenth International Conference on Machine Learning. 139--146.

Cited By

View all
  • (2024)On Data Leakage Prevention Maturity: Adapting the C2M2 FrameworkJournal of Cybersecurity and Privacy10.3390/jcp40200094:2(167-195)Online publication date: 30-Mar-2024
  • (2021)High Secured Data Access and Leakage Detection Using Attribute-Based EncryptionAdvances in Electronics, Communication and Computing10.1007/978-981-15-8752-8_44(433-445)Online publication date: 29-Jan-2021
  • (2020)Artificial Intelligence-based framework for analyzing healthcare staff security practice: A Simulated study based on a Review. (Preprint)JMIR Medical Informatics10.2196/19250Online publication date: 9-Apr-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
Insider Threats '10: Proceedings of the 2010 ACM workshop on Insider threats
October 2010
70 pages
ISBN:9781450300926
DOI:10.1145/1866886
  • General Chair:
  • Ehab Al-Shaer,
  • Program Chairs:
  • Brent Lagesse,
  • Craig Shue
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data leakage
  2. data misuse
  3. information leakage
  4. insider threat

Qualifiers

  • Research-article

Conference

CCS '10
Sponsor:

Acceptance Rates

Insider Threats '10 Paper Acceptance Rate 7 of 14 submissions, 50%;
Overall Acceptance Rate 7 of 14 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)On Data Leakage Prevention Maturity: Adapting the C2M2 FrameworkJournal of Cybersecurity and Privacy10.3390/jcp40200094:2(167-195)Online publication date: 30-Mar-2024
  • (2021)High Secured Data Access and Leakage Detection Using Attribute-Based EncryptionAdvances in Electronics, Communication and Computing10.1007/978-981-15-8752-8_44(433-445)Online publication date: 29-Jan-2021
  • (2020)Artificial Intelligence-based framework for analyzing healthcare staff security practice: A Simulated study based on a Review. (Preprint)JMIR Medical Informatics10.2196/19250Online publication date: 9-Apr-2020
  • (2020)Data-Driven and Artificial Intelligence (AI) Approach for Modelling and Analyzing Healthcare Security Practice: A Systematic ReviewIntelligent Systems and Applications10.1007/978-3-030-55180-3_1(1-18)Online publication date: 25-Aug-2020
  • (2019)A Probability based Model for Big Data Security in Smart City2019 4th MEC International Conference on Big Data and Smart City (ICBDSC)10.1109/ICBDSC.2019.8645607(1-6)Online publication date: Jan-2019
  • (2019)CoBAnInformation Sciences: an International Journal10.1016/j.ins.2013.10.005262(137-158)Online publication date: 6-Jan-2019
  • (2018)Detection of Unspecified Emergencies for Controlled Information SharingIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2015.242784613:6(630-643)Online publication date: 21-Dec-2018
  • (2018)A Systematic Review of the Availability and Efficacy of Countermeasures to Internal Threats in Healthcare Critical InfrastructureIEEE Access10.1109/ACCESS.2018.28175606(25167-25177)Online publication date: 2018
  • (2018)Artificial Intelligence Agents as Mediators of Trustless Security Systems and Distributed Computing ApplicationsGuide to Vulnerability Analysis for Computer Networks and Systems10.1007/978-3-319-92624-7_6(131-155)Online publication date: 5-Sep-2018
  • (2017)A Probability based Model for Data Leakage Detection using BigraphProceedings of the 2017 7th International Conference on Communication and Network Security10.1145/3163058.3163060(1-5)Online publication date: 24-Nov-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media