Application-based anomaly intrusion detection with dynamic information flow analysis

doi:10.1016/j.cose.2008.06.002

Computers & Security

Volume 27, Issues 5–6, October 2008, Pages 176-187

https://doi.org/10.1016/j.cose.2008.06.002 Get rights and content

Abstract

This paper presents a new approach to detecting software security failures, whose primary goal is facilitating identification and repair of security vulnerabilities rather than permitting online response to attacks. The approach is based on online capture of executions and offline execution replay, profiling, and analysis. It employs fine-grained dynamic information flow analysis in conjunction with anomaly detection. This approach, which we call information flow anomaly detection, is capable of detecting a variety of security failures, including both ones that involve violations of confidentiality or integrity requirements and ones that do not. A prototype tool called DynFlow implementing the approach has been developed for use with Java byte code programs. To illustrate the potential of the approach, it is applied to detect security failures of four open source systems. Also, its effectiveness is compared to the effectiveness of an approach to anomaly detection that is based on analyzing method call stacks.

Introduction

Information flow control, which deals with restricting the flow of information between objects manipulated by a program, is a classical subject of computer security research (Fenton, 1974, Denning and Denning, 1977). Information flow control is necessary because controlling access to individual objects is not sufficient to prevent indirect propagation of information resulting in either leakage of information from sensitive objects to untrusted recipients or tampering with sensitive objects by untrusted agents. For example, a program that legitimately accesses confidential information associated with one user may, under certain conditions, disclose it to another user who is not authorized to access it.

Despite its long history, research on information flow control has until recently had relatively little impact on the field of intrusion detection (Denning, 1987), which aims at detecting and responding to intrusions/attacks against systems and applications. This is surprising because the purpose of certain sophisticated attacks against software is precisely to induce insecure information flows to or from sensitive objects, and such flows may be the only indication of such attacks.

Intrusion detection systems are of two principal types (Axelsson, March 2000): signature matching (or misuse detection) systems and anomaly detection systems. Signature matching systems look for intrusion signatures, which are characteristic indications of known attacks. Anomaly detection systems look for system/application behavior that is anomalous in the sense that it differs markedly from normal, safe behavior. Signature matching systems tend to issue fewer “false positive” alerts than anomaly detection systems, but the former cannot detect novel attacks, while the latter can. In practice, both kinds of systems suffer from problems of inaccuracy and imprecision.

Recently, Zimmermann et al., October 2003, Zimmermann et al., November 2003 proposed an intrusion detection model based on runtime enforcement of an information flow policy, which specifies the information flows that are permissible in a given system. They argued that their model detects confidentiality and integrity violations more reliably than either signature matching systems or anomaly detection systems do, because it focuses on policy violations rather than on ancillary events. Although enforcement of information flow policies is very desirable, it is not a complete solution to the problem of revealing security vulnerabilities in software. First, denial-of-service attacks need not cause confidentiality or integrity violations. Second, information flow policies may suffer from the same types of problems that affect other software specifications, such as incorrectness and incompleteness.

In this paper, we present evidence that dynamic information flow analysis (DIFA) has important applications to vulnerability detection besides the enforcement of information flow policies (which we addressed in Masri et al. (2004)). These applications are based on the fact that patterns of information flow that occur during execution of a program characterize its computation to a high degree. Information flows are fundamental aspects of program execution, because they indicate the direct and indirect interactions between program elements (e.g., variables, instructions, and procedures) and the associated dependences between those elements. Such interactions may span much program code and many program components. Information flows, which we assume are transitive in general,¹ also define the essential constraints on the order in which operations are executed, that is, those that determine a program's input/output behavior.²

We investigate the hypothesis that novel attacks often cause anomalous patterns of information flow, which sometimes are the only indication of a security failure. (Note that where an information flow policy exists, an anomalous information flow pattern associated with a security failure need not violate the policy, because the policy may be incomplete or the failure may not involve any illegal flows.) We show that these patterns can be detected by applying anomaly detection techniques to information flow profiles, that is, to execution profiles that characterize a program's dynamic information flows. This suggests that it is feasible to enhance anomaly intrusion detection systems with the capability to detect anomalous information flows associated with attacks. Note, however, that we do not claim that information flows alone are sufficient to characterize the full range of attacks that an IDS must confront.

Zimmermann et al., October 2003, Zimmermann et al., November 2003 policy-based model deals with information flows between entire objects. They assert that a finer-grained online analysis, which would involve analyzing: (1) data flows involving object fields and local variables and (2) control dependences, is unrealistic on a large-scale OS running third-party software (Zimmermann et al., October 2003). Our own prior work suggests that the higher overhead of fine-grained DIFA precludes its online application with processing-intensive applications (Masri, 2004). For use in corrective software maintenance, however, fine-grained DIFA is preferable to coarse-grained DIFA, because the former is more precise and because the analysis can be done offline with the aid of execution capture/replay (Steven et al., 2000, Orso and Kennedy, 2005). In contrast to most work on intrusion detection, including Zimmerman et al.'s, we emphasize offline identification and repair of security vulnerabilities in software rather than online detection and response to attacks. Our approach, which we call information flow anomaly detection (IFAD) calls for executions to be captured online and then replayed, profiled, analyzed, and audited (if suspicious) offline. This makes it feasible to employ fine-grained DIFA and to involve software developers in the analysis and decision-making processes.

In Section 2, we survey work related to dynamic information flow analysis. In Section 3, we discuss the assumptions behind our information flow anomaly detection approach. Section 4 describes our fine-grained DIFA tool DynFlow. Section 5 provides a detailed description of our approach. In Section 6, we illustrate the potential of our approach by applying it in several case studies involving security vulnerabilities in open source software; we also compare the effectiveness of our approach to another one based on analyzing method calls stacks. Section 7 presents our conclusions and future work.

The main contributions of this work are:

•
A new semi-automated offline approach to detecting software security failures that combines fine-grained DIFA with anomaly detection techniques to facilitate the identification and repair of security vulnerabilities.
•
A prototype implementation of the approach.
•
Case studies in which the approach was employed to detect security failures of open source software and was compared to a form of anomaly detection based on analysis of method call stacks and to random sampling.

Section snippets

Related work

Dynamic information flow analysis was first proposed by Fenton (1974), who described an abstract machine called the Data Mark Machine to support the dynamic checking of information flows. DIFA has received substantially less attention in the research literature than static information flow analysis, which was first described by Denning and Denning (1977). Recently, however, several papers in addition to Zimmermann et al., October 2003, Zimmermann et al., November 2003 (see Section 1) have

Assumptions

To limit the scope of our investigation, we focus on attacks against applications that do not compromise the underlying computing platform or violate the semantics of the implementation language. This class of attacks is becoming increasingly relevant as more applications are implemented using languages such as Java and C# that are based on relatively secure virtual machines. Thus, we do not attempt to address buffer-overflow attacks or other attacks that exploit vulnerabilities in an operating

The DynFlow Tool

In Masri et al. (2004) we presented a new approach to DIFA that can be used to detect, prevent, or debug insecure flows in programs, and we described a prototype tool implementing the approach for Java byte code programs. We used an updated version of this tool, which is called DynFlow, in the case studies described in Section 6, to implement fine-grained DIFA in support of information flow anomaly detection. The basic functions of DynFlow are: detecting violations of information flow policies;

Information flow anomaly detection

Anomaly detection techniques (Liepins and Vaccaro, 1989), which are intended to enable the detection of novel attacks, are based on the assumption that attacks often induce unusual execution behavior that can be distinguished from normal behavior. Most research on host-based and application-based anomaly detection techniques involves the analysis of system call sequences. Forrest et al., 1996, Hofmeyr et al., 1998 presented a form of anomaly detection that involves: (1) characterizing normal

Empirical results

To assess the relative effectiveness of information flow anomaly detection for revealing vulnerabilities, we compared IFP-OPC, IFP-FP, MCS-OPC, MCS-FP, and simple random sampling (SRS) with respect to how many distinct vulnerabilities each technique revealed in each of four subject programs, for given numbers of executions selected for auditing. Technique A is considered more effective than technique B if A generally reveals more vulnerabilities than B without requiring more executions to be

Conclusions and future work

A new offline approach to detecting software security failures was presented that employs fine-grained dynamic information flow analysis (DIFA) to reveal unusual patterns of information flow (information flow anomalies) that may be associated with them. The emphasis of the approach is identification of security vulnerabilities in software rather than online response to attacks. Executions are captured online and are replayed, profiled, and analyzed offline so that suspicious executions can be

References (38)

S. Axelsson
Intrusion detection systems: a survey and taxonomy. Technical report 99-15
(March 2000)
S. Axelsson
The base-rate fallacy and the difficulty of intrusion detection
ACM Transactions on Information Systems and Security
(August 2000)
A. Chaturvedi et al.
Improving attack detection in host-based IDS by learning properties of system call arguments. Technical report SECLAB-05-03
(July 2005)
D.E. Denning et al.
Certification of programs for secure information flow
Communication of the ACM
(1977)
D.E. Denning
An intrusion detection model
IEEE Transactions on Software Engineering
(February, 1987)
Dickinson W, Leon D, Podgurski A. Finding failures by cluster analysis of execution profiles. In: Twenty-third...
W. Dickinson et al.
Pursuing failure: the distribution of program failures in a profile space
H. Feng et al.
Anomaly detection using call stack information
IEEE Symposium on Security and Privacy (Oakland, CA)
(May 2003)
J.S. Fenton
Memoryless subsystems
The Computer Journal
(1974)
S. Forrest et al.
A sense of self for UNIX processes
IEEE Symposium on Security and Privacy (Los Alamitos, CA)
(1996)

V. Haldar et al.

Practical, dynamic information-flow for virtual machines. Technical report no. 05-02

(February 2005)

S. Hofmeyr et al.

Intrusion detection using sequences of system calls

Journal of Computer Security

(1998)

A.K. Jain et al.

Algorithms for clustering data

(1988)

Leon D, Masri W, Podgurski A. An empirical evaluation of test case filtering techniques based on exercising complex...

Liepins G, Vaccaro HS. Anomaly detection: purpose and framework. In: Twelfth national computer security conference...

Masri W. Dynamic information flow analysis. Slicing and profiling. Ph.D. dissertation; 2004....

Masri W, Nahas N, Podgurski A. Memorized forward computation of program slices. In: Seventeenth IEEE international...

Masri W, Podgurski A, Leon D. Detecting and debugging insecure information flows. In: Fifteenth IEEE international...

Masri W, Podgurski A. Using dynamic information flow analysis to detect attacks against applications. In: 2005 Workshop...

Cited by (30)

Generating profile-based signatures for online intrusion and failure detection
2014, Information and Software Technology
Citation Excerpt :
Next we describe work related to our approach that we categorize and list in order of relevance as follows: profile-based techniques, pattern matching techniques, taint-based techniques, and finally anomaly-based techniques; noting that some listed work fall under more than one category. In previous work, the main author presented an approach to detecting security attacks, whose primary goal is facilitating identification and repair of security vulnerabilities rather than permitting online response to attacks [41]. The approach is based on online capture of execution inputs and offline replay, profiling, and analysis.
Program execution profiles have been extensively and successfully used in several dynamic analysis fields such as software testing and fault localization.
This paper presents a pattern-matching approach implemented as an application-based intrusion (and failure) detection system that operates on signatures generated from execution profiles. Such signatures are not descriptions of exploits, i.e. they do not depend on the syntax or semantics of the exploits, but instead are descriptions of program events that correlate with the exploitation of program vulnerabilities.
A vulnerability exploit is generally correlated with the execution of a combination of program elements, such as statements, branches, and definition–use pairs. In this work we first analyze the execution profiles of a vulnerable application in order to identify such suspicious combinations, define signatures that describe them, and consequently deploy these signatures within an intrusion detection system that performs online signature matching.
To evaluate our approach, which is also applicable to online failure detection, we implemented it for the Java platform and applied it onto seven open-source applications containing 30 vulnerabilities/defects for the purpose of the online detection of attacks/ failures. Our results showed that our approach worked very well for 26 vulnerabilities/defects (86.67%) and the overhead imposed by the system is somewhat acceptable as it varied from 46% to 102%. The exhibited average rates of false negatives and false positives were 0.43% and 1.03%, respectively.
Using profile-based signatures for online intrusion and failure detection was shown to be effective.
An algorithm for capturing variables dependences in test suites
2011, Journal of Systems and Software
Citation Excerpt :
Program dependences are leveraged by several areas of software engineering including software testing, debugging, fault localization, and security (Harman and Danicic, 1995; Masri et al., 2007; Masri and Podgurski, 2009a, 2008; Masri, 2010; Masri and El-Ghali, 2009; El-Ghali and Masri, 2009).
The use of dynamic dependence analysis spans several areas of software research including software testing, debugging, fault localization, and security. Many of the techniques devised in these areas require the execution of large test suites in order to generate profiles that capture the dependences that occurred between given types of program elements. When the aim is to capture direct and indirect dependences between finely granular elements, such as statements and variables, this process becomes highly costly due to: (1) the large number of elements, and (2) the transitive nature of the indirect dependence relationship.
The focus of this paper is on computing dynamic dependences between variables, i.e., dynamic information flow analysis or DIFA. First, because the problem of tracking dependences between statements, i.e., dynamic slicing, has already been addressed by numerous researchers. Second, because DIFA is a more difficult problem given that the number of variables in a program is unbounded. We present an algorithm that, in the context of test suite execution, leverages the already computed dependences to efficiently compute subsequent dependences within the same or later test runs. To evaluate our proposed algorithm, we conducted an empirical comparative study that contrasted it, with respect to efficiency, to three other algorithms: (1) a naïve basic algorithm, (2) a memoization based algorithm that does not leverage computed dependences from previous test runs, and (3) an algorithm that uses reduced ordered binary decision diagrams (roBDDs) to maintain and manage dependences. The results indicated that our new DIFA algorithm performed considerably better in terms of both runtime and memory consumption.
ViaLin: Path-Aware Dynamic Taint Analysis for Android
2023, ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
Security Aspect in Software Testing Perspective: A Systematic Literature Review
2023, Journal of Information Systems Engineering and Business Intelligence
Scalable and Cost-effective Data Flow Analysis for Distributed Software: Algorithms and Applications
2023, arXiv
NodeMedic: End-to-End Analysis of Node.js Vulnerabilities with Provenance Graphs
2023, Proceedings - 8th IEEE European Symposium on Security and Privacy, Euro S and P 2023

View all citing articles on Scopus

Andy Podgurski received his MS and PhD degrees in Computer Science from the University of Massachusetts at Amherst in 1989. He is currently an Associate Professor in the Electrical Engineering & Computer Science Department at Case Western Reserve University, where he has been a faculty member since 1989. His research interest is software engineering methodology, especially the application of static and dynamic program analysis in combination with data mining, statistical, and machine learning techniques to enhance software reliability and security and to facilitate software maintenance. He is currently Associate Editor of the journal Software Testing, Verification, and Reliability, and he is a member of the IEEE Computer Society and the ACM.

Wes Masri received the BS degree in electrical engineering from Case Western Reserve University, the MS degree in electrical and computer engineering from the Pennsylvania State University, and the PhD degree in computer engineering from Case Western Reserve University. He is currently an assistant professor in the Department of Computer Science at the American University of Beirut, prior to that he spent fifteen years in the software industry primarily as a developer. His research interests include software engineering and dynamic program analysis with an emphasis on software testing and security. He is a member of the IEEE Computer Society.

^☆: In this research, Dr. Masri was supported in part by LNCSR grant 022136.

View full text

Application-based anomaly intrusion detection with dynamic information flow analysis☆

Abstract

Introduction

Section snippets

Related work

Assumptions

The DynFlow Tool

Information flow anomaly detection

Empirical results

Conclusions and future work

Intrusion detection systems: a survey and taxonomy. Technical report 99-15

The base-rate fallacy and the difficulty of intrusion detection

ACM Transactions on Information Systems and Security

Improving attack detection in host-based IDS by learning properties of system call arguments. Technical report SECLAB-05-03

Certification of programs for secure information flow

Communication of the ACM

An intrusion detection model

IEEE Transactions on Software Engineering

Pursuing failure: the distribution of program failures in a profile space

Anomaly detection using call stack information

IEEE Symposium on Security and Privacy (Oakland, CA)

Memoryless subsystems

The Computer Journal

A sense of self for UNIX processes

IEEE Symposium on Security and Privacy (Los Alamitos, CA)

Practical, dynamic information-flow for virtual machines. Technical report no. 05-02

Intrusion detection using sequences of system calls

Journal of Computer Security

Algorithms for clustering data