skip to main content
10.1145/1029208.1029212acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
Article

MORPHEUS: motif oriented representations to purge hostile events from unlabeled sequences

Published: 29 October 2004 Publication History

Abstract

Most of the prevalent anomaly detection systems use some training data to build models. These models are then utilized to capture any deviations resulting from possible intrusions. The efficacy of such systems is highly dependent upon a training data set free of attacks. "Clean" or labeled training data is hard to obtain. This paper addresses the very practical issue of refinement of unlabeled data to obtain a clean data set which can then train an online anomaly detection system.
Our system, called MORPHEUS, represents a system call sequence using the spatial positions of motifs (subsequences) within the sequence. We also introduce a novel representation called sequence space to denote all sequences with respect to a reference sequence. Experiments on well known data sets indicate that our sequence space can be effectively used to purge anomalies from unlabeled sequences. Although an unsupervised anomaly detection system in itself, our technique is used for data purification. A "clean" training set thus obtained improves the performance of existing online host-based anomaly detection systems by increasing the number of attack detections.

References

[1]
C. Aggarwal and P. Yu. Outlier Detection for High Dimensional Data. SIGMOD, 2001.]]
[2]
J. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM 26, 11, 832--843, 1983.]]
[3]
M. Bernaschi, E. Gabrielli and L.V. Mancini. Operating System Enhancement to Prevent the Misuse of System Calls. ACM CCS, 2001.]]
[4]
M. Breunig, H. Kriegel, R. Ng and J. Sander. LOF: Identifying Density-Based Local Outliers. SIGMOD, pp. 93--104, 2000.]]
[5]
P. Chan, M. Mahoney and M. Arshad. Learning Rules and Clusters for Anomaly Detection in Network Traffic. Managing Cyber Threats: Issues, Approaches and Challenges, V. Kumar, J. Srivastava and A. Lazarevic (editors), Kluwer, 2003.]]
[6]
W. Cohen. Fast Effective Rule Induction. ICML, 1995.]]
[7]
S. Coull, J. Branch, B. Szymanski and E. Breimer. Intrusion Detection: A Bioinformatics Approach. ACSAC, 2003.]]
[8]
D.E. Denning, An Intrusion Detection Model, IEEE Transactions on Software Engineering, SE-13:222--232, 1987.]]
[9]
E. Eskin, A. Arnold, M. Prerau, L. Portnoy and S. Stolfo. A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data. In D. Barbara and S. Jajodia (editors), Applications of Data Mining in Computer Security, Kluwer, 2002.]]
[10]
S. Forrest, S. Hofmeyr, A. Somayaji and T. Longstaff. A Sense of Self for UNIX Processes. IEEE S&P, 1996.]]
[11]
A. Ghosh and A. Schwartzbard. A Study in Using Neural Networks for Anomaly and Misuse Detection. USENIX Security Symposium, 1999.]]
[12]
A.J. Gibbs and G.A. McIntyre. The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur. J. Biochem. 16:1--11, 1970.]]
[13]
N. Jiang, K. Hua and S. Sheu. Considering Both Intra-pattern and Inter-pattern Anomalies in Intrusion Detection. ICDM, 2002.]]
[14]
A. Jones and S. Li. Temporal Signatures for Intrusion Detection. ACSAC, 2001.]]
[15]
K. Kendell. A Database of Computer Attacks for the Evaluation of Intrusion Detection Systems. Masters Thesis, MIT 1999.]]
[16]
E. Knorr and R. Ng. Algorithms for Mining Distance-based Outliers in Large Data Sets. VLDB, 1998.]]
[17]
T. Lane and C.E. Brodley. Detecting the abnormal: Machine Learning in Computer Security. TR-ECE 97-1, Purdue University, 1997.]]
[18]
T. Lane and C.E. Brodley. Sequence Matching and Learning in Anomaly Detection for Computer Security. AI Approaches to Fraud Detection and Risk Management, 1997.]]
[19]
A. Lazarevic, L. Ertoz, A. Ozgur, J. Srivastava and V. Kumar. A comparative study of anomaly detection schemes in network intrusion detection, SDM, 2003.]]
[20]
W. Lee, S. Stolfo and P. Chan. Learning Patterns from UNIX Process Execution Traces for Intrusion Detection. Workshop, AI Approaches to Fraud Detection and Risk Management, 1997.]]
[21]
Y. Liao and R. Vemuri. Use of Text Categorization Techniques for Intrusion Detection, 11th USENIX Security Symposium, 2002.]]
[22]
R. Lippmann, J. Haines, D. Fried, J. Korba and K. Das. The 1999 DARPA Off-Line Intrusion Detection Evaluation. Computer Networks (34) 579--595, 2000.]]
[23]
P.C. Mahalanobis. On Tests and Measures of Groups Divergence. International Journal of the Asiatic Society, Vol. 26:541, 1930.]]
[24]
M. Mahoney and P. Chan. Learning Rules for Anomaly Detection of Hostile Network Traffic, ICDM, 2003.]]
[25]
G. Mazeroff, Victor De Cerqueira, J. Gregor and M.G. Thomason. Probabilistic Trees and Automata for Application Behavior Modeling. 41st ACM Southeast Regional Conference Proceedings, 2003.]]
[26]
W. Osser and A. Noordergraaf. Auditing in the Solaris 8 Operating Environment. Sun Blueprints Online.]]
[27]
L. Portnoy. Intrusion Detection with Unlabeled Data Using Clustering, Undergraduate Thesis, Columbia University, 2000.]]
[28]
S. Ramaswamy, R. Rastogi and K. Shim, Efficient Algorithms for Mining Outliers from Large Data Sets, Proceedings of the ACM SIGMOD Conference, 2000.]]
[29]
I. Rigoutsos and A. Floratos. Combinatorial pattern discovery in biological sequences. Bioinformatics, 14(1):55-67, 1998.]]
[30]
S. Salvador, P. Chan and J. Brodie. Learning States and Rules for Time Series Anomaly Detection. FLAIRS, 2004.]]
[31]
R. Sekar, M. Bendre, D. Dhurjati and P. Bollineni. A Fast Automaton-based Method for Detecting Anomalous Program Behaviors. IEEE S&P, 2001.]]
[32]
K. Tan & R. Maxion. "Why 6?" Defining the Operational Limits of stide. IEEE S&P, 2002.]]
[33]
G. Tandon and P. Chan. Learning Rules from System Call Arguments and Sequences for Anomaly Detection. DMSEC, 2003.]]
[34]
D. Wagner and P. Soto. Mimicry Attacks on Host-Based Intrusion Detection Systems. ACM CCS, 2002.]]
[35]
C. Warrender, S. Forrest and B. Pearlmutter. Detecting Intrusions Using System Calls: Alternative Data Models. IEEE S&P, 1999.]]
[36]
A. Wespi, M. Dacier and H. Debar. An Intrusion-Detection System Based on the Teiresias Pattern-Discovery Algorithm. Proc. EICAR, 1999.]]
[37]
A. Wespi, M. Dacier and H. Debar. Intrusion detection using variable-length audit trail patterns. RAID, 2000.]]
[38]
J.A. Whittaker and A. De Vivanco. Neutralizing Windows-based malicious mobile code. ACM Symposium on Applied Computing, 2002.]]

Cited By

View all
  • (2022)High-dimensional gene expression and morphology profiles of cells across 28,000 genetic and chemical perturbationsNature Methods10.1038/s41592-022-01667-019:12(1550-1557)Online publication date: 7-Nov-2022
  • (2008)Distributed anomaly detection, using cooperative learners and association rule analysisIntelligent Data Analysis10.5555/1408960.140896312:4(339-357)Online publication date: 1-Dec-2008
  • (2008)Sequence alignment for masquerade detectionComputational Statistics & Data Analysis10.1016/j.csda.2008.01.02252:8(4116-4131)Online publication date: 1-Apr-2008
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
VizSEC/DMSEC '04: Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security
October 2004
156 pages
ISBN:1581139748
DOI:10.1145/1029208
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anomaly detection
  2. data cleaning
  3. motifs

Qualifiers

  • Article

Conference

CCS04
Sponsor:

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)High-dimensional gene expression and morphology profiles of cells across 28,000 genetic and chemical perturbationsNature Methods10.1038/s41592-022-01667-019:12(1550-1557)Online publication date: 7-Nov-2022
  • (2008)Distributed anomaly detection, using cooperative learners and association rule analysisIntelligent Data Analysis10.5555/1408960.140896312:4(339-357)Online publication date: 1-Dec-2008
  • (2008)Sequence alignment for masquerade detectionComputational Statistics & Data Analysis10.1016/j.csda.2008.01.02252:8(4116-4131)Online publication date: 1-Apr-2008
  • (2007)Detecting motifs in system call sequencesProceedings of the 8th international conference on Information security applications10.5555/1784964.1784980(157-172)Online publication date: 27-Aug-2007
  • (2007)Detecting Motifs in System Call SequencesInformation Security Applications10.1007/978-3-540-77535-5_12(157-172)Online publication date: 2007
  • (undefined)Detecting Motifs in System Call SequencesSSRN Electronic Journal10.2139/ssrn.2831299

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media