Article

MORPHEUS: motif oriented representations to purge hostile events from unlabeled sequences

Authors:

Debasis MitraAuthors Info & Claims

VizSEC/DMSEC '04: Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security

Pages 16 - 25

https://doi.org/10.1145/1029208.1029212

Published: 29 October 2004 Publication History

Abstract

Most of the prevalent anomaly detection systems use some training data to build models. These models are then utilized to capture any deviations resulting from possible intrusions. The efficacy of such systems is highly dependent upon a training data set free of attacks. "Clean" or labeled training data is hard to obtain. This paper addresses the very practical issue of refinement of unlabeled data to obtain a clean data set which can then train an online anomaly detection system.

Our system, called MORPHEUS, represents a system call sequence using the spatial positions of motifs (subsequences) within the sequence. We also introduce a novel representation called sequence space to denote all sequences with respect to a reference sequence. Experiments on well known data sets indicate that our sequence space can be effectively used to purge anomalies from unlabeled sequences. Although an unsupervised anomaly detection system in itself, our technique is used for data purification. A "clean" training set thus obtained improves the performance of existing online host-based anomaly detection systems by increasing the number of attack detections.

References

[1]

C. Aggarwal and P. Yu. Outlier Detection for High Dimensional Data. SIGMOD, 2001.]]

Digital Library

[2]

J. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM 26, 11, 832--843, 1983.]]

Digital Library

[3]

M. Bernaschi, E. Gabrielli and L.V. Mancini. Operating System Enhancement to Prevent the Misuse of System Calls. ACM CCS, 2001.]]

Digital Library

[4]

M. Breunig, H. Kriegel, R. Ng and J. Sander. LOF: Identifying Density-Based Local Outliers. SIGMOD, pp. 93--104, 2000.]]

Digital Library

[5]

P. Chan, M. Mahoney and M. Arshad. Learning Rules and Clusters for Anomaly Detection in Network Traffic. Managing Cyber Threats: Issues, Approaches and Challenges, V. Kumar, J. Srivastava and A. Lazarevic (editors), Kluwer, 2003.]]

[6]

W. Cohen. Fast Effective Rule Induction. ICML, 1995.]]

Digital Library

[7]

S. Coull, J. Branch, B. Szymanski and E. Breimer. Intrusion Detection: A Bioinformatics Approach. ACSAC, 2003.]]

Digital Library

[8]

D.E. Denning, An Intrusion Detection Model, IEEE Transactions on Software Engineering, SE-13:222--232, 1987.]]

Digital Library

[9]

E. Eskin, A. Arnold, M. Prerau, L. Portnoy and S. Stolfo. A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data. In D. Barbara and S. Jajodia (editors), Applications of Data Mining in Computer Security, Kluwer, 2002.]]

[10]

S. Forrest, S. Hofmeyr, A. Somayaji and T. Longstaff. A Sense of Self for UNIX Processes. IEEE S&P, 1996.]]

Digital Library

[11]

A. Ghosh and A. Schwartzbard. A Study in Using Neural Networks for Anomaly and Misuse Detection. USENIX Security Symposium, 1999.]]

Digital Library

[12]

A.J. Gibbs and G.A. McIntyre. The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur. J. Biochem. 16:1--11, 1970.]]

[13]

N. Jiang, K. Hua and S. Sheu. Considering Both Intra-pattern and Inter-pattern Anomalies in Intrusion Detection. ICDM, 2002.]]

Digital Library

[14]

A. Jones and S. Li. Temporal Signatures for Intrusion Detection. ACSAC, 2001.]]

Digital Library

[15]

K. Kendell. A Database of Computer Attacks for the Evaluation of Intrusion Detection Systems. Masters Thesis, MIT 1999.]]

[16]

E. Knorr and R. Ng. Algorithms for Mining Distance-based Outliers in Large Data Sets. VLDB, 1998.]]

Digital Library

[17]

T. Lane and C.E. Brodley. Detecting the abnormal: Machine Learning in Computer Security. TR-ECE 97-1, Purdue University, 1997.]]

[18]

T. Lane and C.E. Brodley. Sequence Matching and Learning in Anomaly Detection for Computer Security. AI Approaches to Fraud Detection and Risk Management, 1997.]]

[19]

A. Lazarevic, L. Ertoz, A. Ozgur, J. Srivastava and V. Kumar. A comparative study of anomaly detection schemes in network intrusion detection, SDM, 2003.]]

[20]

W. Lee, S. Stolfo and P. Chan. Learning Patterns from UNIX Process Execution Traces for Intrusion Detection. Workshop, AI Approaches to Fraud Detection and Risk Management, 1997.]]

[21]

Y. Liao and R. Vemuri. Use of Text Categorization Techniques for Intrusion Detection, 11th USENIX Security Symposium, 2002.]]

Digital Library

[22]

R. Lippmann, J. Haines, D. Fried, J. Korba and K. Das. The 1999 DARPA Off-Line Intrusion Detection Evaluation. Computer Networks (34) 579--595, 2000.]]

Digital Library

[23]

P.C. Mahalanobis. On Tests and Measures of Groups Divergence. International Journal of the Asiatic Society, Vol. 26:541, 1930.]]

[24]

M. Mahoney and P. Chan. Learning Rules for Anomaly Detection of Hostile Network Traffic, ICDM, 2003.]]

Digital Library

[25]

G. Mazeroff, Victor De Cerqueira, J. Gregor and M.G. Thomason. Probabilistic Trees and Automata for Application Behavior Modeling. 41st ACM Southeast Regional Conference Proceedings, 2003.]]

[26]

W. Osser and A. Noordergraaf. Auditing in the Solaris 8 Operating Environment. Sun Blueprints Online.]]

[27]

L. Portnoy. Intrusion Detection with Unlabeled Data Using Clustering, Undergraduate Thesis, Columbia University, 2000.]]

[28]

S. Ramaswamy, R. Rastogi and K. Shim, Efficient Algorithms for Mining Outliers from Large Data Sets, Proceedings of the ACM SIGMOD Conference, 2000.]]

Digital Library

[29]

I. Rigoutsos and A. Floratos. Combinatorial pattern discovery in biological sequences. Bioinformatics, 14(1):55-67, 1998.]]

[30]

S. Salvador, P. Chan and J. Brodie. Learning States and Rules for Time Series Anomaly Detection. FLAIRS, 2004.]]

[31]

R. Sekar, M. Bendre, D. Dhurjati and P. Bollineni. A Fast Automaton-based Method for Detecting Anomalous Program Behaviors. IEEE S&P, 2001.]]

Digital Library

[32]

K. Tan & R. Maxion. "Why 6?" Defining the Operational Limits of stide. IEEE S&P, 2002.]]

[33]

G. Tandon and P. Chan. Learning Rules from System Call Arguments and Sequences for Anomaly Detection. DMSEC, 2003.]]

[34]

D. Wagner and P. Soto. Mimicry Attacks on Host-Based Intrusion Detection Systems. ACM CCS, 2002.]]

Digital Library

[35]

C. Warrender, S. Forrest and B. Pearlmutter. Detecting Intrusions Using System Calls: Alternative Data Models. IEEE S&P, 1999.]]

[36]

A. Wespi, M. Dacier and H. Debar. An Intrusion-Detection System Based on the Teiresias Pattern-Discovery Algorithm. Proc. EICAR, 1999.]]

[37]

A. Wespi, M. Dacier and H. Debar. Intrusion detection using variable-length audit trail patterns. RAID, 2000.]]

Digital Library

[38]

J.A. Whittaker and A. De Vivanco. Neutralizing Windows-based malicious mobile code. ACM Symposium on Applied Computing, 2002.]]

Digital Library

Cited By

Haghighi MCaicedo JCimini BCarpenter ASingh S(2022)High-dimensional gene expression and morphology profiles of cells across 28,000 genetic and chemical perturbationsNature Methods10.1038/s41592-022-01667-019:12(1550-1557)Online publication date: 7-Nov-2022
https://doi.org/10.1038/s41592-022-01667-0
Deshmeh GRahmati M(2008)Distributed anomaly detection, using cooperative learners and association rule analysisIntelligent Data Analysis10.5555/1408960.140896312:4(339-357)Online publication date: 1-Dec-2008
https://dl.acm.org/doi/10.5555/1408960.1408963
Coull SSzymanski B(2008)Sequence alignment for masquerade detectionComputational Statistics & Data Analysis10.1016/j.csda.2008.01.02252:8(4116-4131)Online publication date: 1-Apr-2008
https://dl.acm.org/doi/10.1016/j.csda.2008.01.022
Show More Cited By

Index Terms

MORPHEUS: motif oriented representations to purge hostile events from unlabeled sequences
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
  2. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
      1. Markov decision processes
      2. Rule learning
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Markov decision processes

Recommendations

Autoencoding Binary Classifiers for Supervised Anomaly Detection
PRICAI 2019: Trends in Artificial Intelligence
Abstract
We propose the Autoencoding Binary Classifiers (ABC), a novel supervised anomaly detector based on the Autoencoder (AE). There are two main approaches in anomaly detection: supervised and unsupervised. The supervised approach accurately detects ...
Learning from Positive and Unlabeled Multi-Instance Bags in Anomaly Detection
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

In the multi-instance learning (MIL) setting instances are grouped together into bags. Labels are provided only for the bags and not on the level of individual instances. A positive bag label means that at least one instance inside the bag is positive, ...
Positive and Unlabeled Learning for Anomaly Detection with Multi-features
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Anomaly detection is of great interest to big data applications, and both supervised and unsupervised learning have been applied for anomaly detection. However, it still remains a challenging problem because: (1) for supervised learning, it is difficult ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

VizSEC/DMSEC '04: Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security

October 2004

156 pages

ISBN:1581139748

DOI:10.1145/1029208

General Chairs:
Carla Brodley
Tufts University
,
Philip Chan
Florida Institute of Technology
,
Richard Lippman
MIT Lincoln Lab
,
Bill Yurcik
NCSA

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CCS04

Sponsor:

CCS04: 11th ACM Conference on Computer and Communications Security 2004

October 29, 2004

Washington DC, USA

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
513
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Haghighi MCaicedo JCimini BCarpenter ASingh S(2022)High-dimensional gene expression and morphology profiles of cells across 28,000 genetic and chemical perturbationsNature Methods10.1038/s41592-022-01667-019:12(1550-1557)Online publication date: 7-Nov-2022
https://doi.org/10.1038/s41592-022-01667-0
Deshmeh GRahmati M(2008)Distributed anomaly detection, using cooperative learners and association rule analysisIntelligent Data Analysis10.5555/1408960.140896312:4(339-357)Online publication date: 1-Dec-2008
https://dl.acm.org/doi/10.5555/1408960.1408963
Coull SSzymanski B(2008)Sequence alignment for masquerade detectionComputational Statistics & Data Analysis10.1016/j.csda.2008.01.02252:8(4116-4131)Online publication date: 1-Apr-2008
https://dl.acm.org/doi/10.1016/j.csda.2008.01.022
Wilson WFeyereisl JAickelin U(2007)Detecting motifs in system call sequencesProceedings of the 8th international conference on Information security applications10.5555/1784964.1784980(157-172)Online publication date: 27-Aug-2007
https://dl.acm.org/doi/10.5555/1784964.1784980
Wilson WFeyereisl JAickelin U(2007)Detecting Motifs in System Call SequencesInformation Security Applications10.1007/978-3-540-77535-5_12(157-172)Online publication date: 2007
https://doi.org/10.1007/978-3-540-77535-5_12
Wilson WFeyereisl JAickelin U(undefined)Detecting Motifs in System Call SequencesSSRN Electronic Journal10.2139/ssrn.2831299
https://doi.org/10.2139/ssrn.2831299

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten