Elsevier

Knowledge-Based Systems

Volume 20, Issue 7, October 2007, Pages 671-682
Knowledge-Based Systems

Forensic analysis of logs: Modeling and verification

https://doi.org/10.1016/j.knosys.2007.05.002Get rights and content

Abstract

Information stored in logs of a computer system is of crucial importance to gather forensic evidence of investigated actions or attacks against the system. Analysis of this information should be rigorous and credible, hence it lends itself to formal methods. We propose a model checking approach to the formalization of the forensic analysis of logs. The set of logs of a certain system is modeled as a tree whose labels are events extracted from the logs. In order to provide a structure to these events, we express each event as a term of a term algebra. The signature of the algebra is carefully chosen to include all relevant information necessary to conduct the analysis. Properties of the model are expressed as formulas of a logic having dynamic, linear, temporal, and modal characteristics. Moreover, we provide a tableau-based proof system for this logic upon which a model checking algorithm can be developed. In order to illustrate the proposed approach, the Windows auditing system is studied. The properties that we capture in our logic include invariant properties of a system, forensic hypotheses, and generic or specific attack signatures. Moreover, we discuss the admissibility of forensics hypotheses and the underlying verification issues.

Introduction

Attacks on IT systems are increasing in number, and sophistication at an alarming rate. These systems now range from servers to mobile devices and the damage from such attacks is estimated in billions of dollars. However, due to the borderless nature of cyber attacks, many criminals/offenders have been able to evade responsibility due to the lack of supporting evidence to convict them. In this context, cyber forensics plays a major role by providing scientifically proven methods to gather, process, interpret, and use digital evidence to bring a conclusive description of cyber crime activities. The development of forensics IT solutions for law enforcement has been limited. Although outstanding results have been achieved for forensically sound evidence gathering, little has been done on the automatic analysis of the acquired evidence. Furthermore, limited efforts have been made into formalizing the digital forensic science. In many cases, the forensic procedures employed are constructed in an ad hoc manner that impedes the effectiveness or the integrity of the investigation. In this paper, we contribute with an automatic and formal approach to the log analysis problem.

One of the most common sources of evidence that an investigator should analyze is logs from the activities of the system that is related to the incident in question. Indeed, having the logs from all system events during the incident will reduce the process of forensics analysis to event reconstruction. However, log analysis depends largely on the analyst’s skills and experience to effectively decipher and determine what information is pertinent and useful to support the case at hand. Despite the paramount importance of this aspect, not much research effort has been dedicated to the automation of forensic log analysis.

The main intent of this paper is to introduce a formal and automatic log analysis technique. The advocated approach caters for:

  • Modeling of log events and logical representation of properties that should be satisfied by the traces of system events.

  • Formal and automatic analysis of the logs looking for a specific pattern of events or verifying a particular forensic hypothesis.

In spite of the few research results on formal and automatic analysis of forensic and digital evidence, there are some important proposals that we detail hereafter.

Current research efforts on cyber forensic analysis can be categorized into baseline analysis, root cause analysis, common vulnerability analysis, timeline analysis, and semantic integrity check analysis. The baseline analysis, proposed in [15], uses an automated tool that checks for differences between a baseline of the safe state of the system and the state during the incident. The work presented in [24] proposes an approach to post-incident root cause analysis of digital incidents through a separation of the information system into different security domains and modeling the transactions between these domains. Common vulnerability analysis [3] involves searching through a database of common vulnerabilities and investigating the case according to the related past and known vulnerabilities. The timeline analysis approach [11] consists of analyzing logs, scheduling information, and memory to develop a timeline of the events that led to the incident. Finally, the semantic integrity checking approach [22] uses a decision engine that is endowed with a tree to detect semantic incongruities. The decision tree reflects pre-determined invariant relationships between redundant digital objects.

In [9], Pavel Gladyshev proposed a formalization of digital evidence and event reconstruction based on finite state machines. In his work, the behavior of the system is modeled as a state machine and a recursive model-checking procedure is proposed to verify the admissibility of a forensic hypothesis. However, in the real world, modeling the behavior of a complex system such as an operating system as a state machine diagram is burdensome and sometimes impossible to achieve because of complexity issues. Other research on formalized forensic analysis include the formalization of event time binding in digital investigation [10], [14], which proposes an approach to constructing formalized forensic procedures. Nevertheless, the science of digital forensics still lacks a formalized approach to log analysis.

As for log analysis and correlation, some research has been done on alert correlation, which can be classified into four categories [28]:

  • Similarity based approaches [6], [12], [23], [27], which group the alerts according to the similarity between alert attributes.

  • Predefined attack scenario based approaches [8], [16], which detect attacks according to well defined attack scenarios. However, they cannot discover novel attack scenarios.

  • Pre/post condition based approaches [7], [18], [25] that match the post-condition of an attack to the pre-conditions of another attack.

  • The multiple information sources based approaches[17], [20], [26] that are concerned with distritbuted attack discovery.

However, these approaches are mainly concerned with correlation, and intrusion detection, while formal log analysis and hypothesis verification is of paramount importance to forensic science. As an example, invariant properties of the system cannot be modeled and analyzed through the above approaches. The absence of a satisfactory and a general methodology for forensic log analysis has resulted in ad hoc analysis techniques such as log analysis [19] and operating system-specific analysis [13].

In this paper, we propose a new approach for log analysis that is based on computational logic and formal automatic verification. We start by developing a model of logs based on traces of events. Each event is actually an abstract view of a piece of information stored in the log. The structure of an event is carefully chosen to convey the necessary information needed for the analysis. To this end, events are represented as terms of a multi-sorted term algebra whose operation symbols are chosen such that they faithfully convey the information stored in the actual events stored in the log. For instance the term DeleteFile(F, U) with operation DeleteFile:File × User  Bool represents the deletion of file F by a user U. Using this approach, we can reason about log events irrespective of the specific syntax of the log, which is usually different for different systems. Each log in the system is thus modeled as a trace of terms. Moreover, in the presence of several logs to which information is written concurrently, the whole logging system is modeled as a tree that represents possible different interleavings of events from the logs. To express properties of the model, we resort to a temporal, dynamic, modal and linear logic. This logic is an accommodated version of ADM logic that has been initially proposed in [4]. The motivation behind this choice is that ADM comes with many features and attributes that make it very suitable for what we intend to achieve. First, ADM is very compact in its syntax, elegant and formal in its semantics and high in terms of expressiveness. Actually, it is temporal (through the use of modal operators), dynamic (through the use of patterns as arguments in the modalities) and linear (by allowing model modifications in the logic semantics). Besides, it comes with fixpoint operators à la modal μ-calculus, which allows for the specification of properties that are finite encodings of infinite logical formulas. All this expressiveness is extremely useful in capturing forensic properties, hypotheses and system invariants. Moreover, we present a tableau-based proof system that defines a compositional model-checking algorithm. All these features provide a rigorous and provable logical support, which is a necessity for an investigation to be admitted in courts of law.

In Section 2, we present our approach to the formal modeling of logs. Section 3 is devoted to the logic used to express properties of the model. Section 4 contains an application of our approach to the Windows logging system. Finally, conclusions and future work are discussed in Section 5.

Section snippets

Modeling approach

We begin by presenting some definitions, then we explain our model.

Logic for log properties

In this section, we present a new logic for the specification of properties of the log model. The logic is based on ideas from the ADM logic [4], with some basic differences. First, ADM is trace-based while the logic we present is tree-based, therefore we can quantify existentially and universally over traces. Moreover, this gives us the opportunity to express branching-time properties. Second, the actions in ADM are atomic symbols whereas the actions in our logic have a structure since they

Windows logging system

To demonstrate the ideas discussed in the previous sections, we consider the Windows logging system as an example of an operating system that is popular and hence the target of manu attacks. First, we present the logging system, then we discuss the modeling process and the properties we are able to express in our logic. The overall functionality of the windows logging system [21] is depicted in Fig. 1. Logs are created by the audit process, which monitors the behavior of applications. It

Conclusion

We proposed a model checking approach to the problem of formal analysis of logs. We modeled the log as a tree labeled by terms from a term algebra that represents the different actions logged by the logging system. The properties of such a log are expressed through the use of a logic that has temporal, modal, dynamic and computational characteristics. Moreover, the logic is provided by a sound and complete tableau-based proof system that can be the basis for verification algorithms. The Windows

References (28)

  • K. Adi et al.

    A new logic for electronic commerce protocols

    International Journal of Theoretical Computer Science TCS

    (2003)
  • Snort – sourcefire inc., <http://www.snort.org>, accessed in April,...
  • Sophos inc., <http://www.sophos.com>, accessed in April,...
  • Tenable Network Security, <http://www.nessus.org/>, accessed in April,...
  • R. Cleaveland

    Tableau-based model checking in the propositional mu-calculus

    Acta Informatica

    (1990)
  • F. Cuppens, Managing alerts in a multi-intrusion detection environment, in: Proceedings of the 17th Annual Computer...
  • F. Cuppens, A. Miege, Alert correlation in a cooperative intrusion detection framework, in: Proceedings of the IEEE...
  • H. Debar, A. Wespi, Aggregation and correlation of intrusion-detection alerts, in: Recent Advances in Intrusion...
  • P. Gladyshev et al.

    Finite state machine approach to digital event reconstruction

    Digital Investigation Journal

    (2004)
  • P. Gladyshev et al.

    Formalising event time bounding in digital investigations

    International Journal of Digital Evidence

    (2005)
  • C. Hosmer, Time Lining Computer Evidence, available at <http://www.wetstonetech.com/f/timelining.pdf>,...
  • K. Julisch

    Clustering intrusion detection alarms to support root cause analysis

    ACM Transactions on Information and System Security

    (2003)
  • W. Kruse et al.

    Computer Forensics: Incident Response Essentials

    (2002)
  • R. Leigland et al.

    A formalization of digital forensics

    Digital Investigation Journal

    (2004)
  • Cited by (10)

    • Read the digital fingerprints: log analysis for digital forensics and security

      2021, Computer Fraud and Security
      Citation Excerpt :

      The main purpose of log analysis methods is to transform the data generated as a result of log recording into meaningful information. In most cybercrimes committed against information systems, there is a long process that requires the use of time, human resources and information technology to find and define the types of attacks and solve these problems.7 Both hardware and software support is needed in the process of identifying these attacks.

    • A graph-based approach to detect unexplained sequences in a log

      2021, Expert Systems with Applications
      Citation Excerpt :

      The approach combines data mining and supervised/unsupervised machine learning; logs from Dynamic Host Configuration Protocol (DHCP) servers, authentication servers, and firewall are considered as data sources. A log analysis approach based on model checking is proposed in Saleh, Arasteh, Sakha, and Debbabi (2007). The approach models logs as a tree representing the possible inter-leavings of events extracted from logs, which can be used to detect anomalies.

    • A survey on forensic investigation of operating system logs

      2019, Digital Investigation
      Citation Excerpt :

      To accommodate multiple sources of log files including OS logs, Arasteh et al. (2007) propose a tree-based data structure and analyze correlation using algebraic terms. Another formal and unified verification model for event logs is presented by Saleh et al. (2007). The event logs are modeled based on logic for electronic commerce protocol called ADM logic and use a tree data structure to query the properties.

    • Multi-agent Based Forensic Analysis Framework for Infrastructures Involving Storage Networks

      2019, Proceedings of the National Academy of Sciences India Section A - Physical Sciences
    • Time synchronization: Pivotal element in cloud forensics

      2016, Security and Communication Networks
    • Introducing and analysis of the Windows 8 event log for forensic purposes

      2015, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus

    This research is done in collaboration between the Computer Security Laboratory at Concordia University and Bell Canada under a PROMPT-Québec grant.

    View full text