The Windows-Users and -Intruder simulations Logs dataset (WUIL): An experimental framework for masquerade detection mechanisms

https://doi.org/10.1016/j.eswa.2013.08.022Get rights and content

Highlights

  • We introduce a new masquerade dataset, called WUIL, that includes faithful masquerade attempts.

  • We argue that how a user navigates her file system structure neatly separates a masquerade from ordinary user behavior.

  • We argue that our approach works at a high-level of abstraction for building novel models for masquerade detection.

Abstract

We introduce a new masquerade dataset, called Windows-Users and -Intruder simulations Logs (WUIL), which, unlike existing datasets, involves more faithful masquerade attempts. While building WUIL, we have worked under the hypothesis that the way in which a user navigates her file system structure can neatly separate a masquerade attack. Thus, departing from standard practice, we state that it is not a user action, but the object upon which the action is carried out what distinguishes user participation. We shall argue that this approach, based on file system navigation provides a richer means, and at a higher-level of abstraction, for building novel models for masquerade detection.

We shall devote an important part of this paper to describe WUIL’s content: what information about user activity is stored and how it is represented; prominent characteristics of the participant users; the kinds of masquerade attacks to be timely detected; and the way they have been simulated. We shall argue that WUIL provides reliable data for experimenting on close to real-life instances of masquerade detection, as well as for conducting fair comparisons on rival detection mechanisms, hoping it will be of use to the research community.

As a side contribution of this paper, we use WUIL to conduct a simple comparison of two masquerade detection methods: one based on SVM, and the other based on KNN. While this comparison experiment is not central to the paper, we expect it to motivate research exploring deeper the masquerade detection problem, and spreading the use of WUIL. In a similar vein, we provide directions for further research, hinting on how to use the features contained in WUIL, and hoping others would find them appealing.

Introduction

Information is extremely critical and valuable. As more information is stored in computers, it is paramount to timely detect whether one’s computer session is being illegally seized by an intruder, so-called a masquerader. Failing at doing so may result in countless losses.

Masquerade detection is approached as an anomaly detection task, where the masquerade detection mechanism aims at distinguishing any diversion in the current user activity from a given profile of ordinary user behavior. It has been actively studied since the seminal work of Schonlau et al. (2001), who suggested that, to profile a user, one should use the commands she has typed during a UNIX session. Schonlau et al. (2001) masquerade dataset, called SEA, has been the de facto standard for developing and comparing rival masquerade detection mechanisms.

SEA, however, presents severe limitations. Most prominently, it does not include a collection of faithful intrusion attempts. Instead, Schonlau et al. (2001) have adopted a one versus the others (OVTO) masquerade approach, where an ordinary session from a user is taken to be an intrusion attempt against other. Further, masquerade detection based on command usage has proven not to be powerful enough (Razo-Zapata, Mex-Perera, & Monroy, 2012). As a result, research has resorted to profile a user considering alternative sources of activity, including device usage (e.g., the keyboard) (Garg et al., 2006, Killourhy and Maxion, 2010), application usage (Sankaranarayanan, Pramanik, & Upadhyaya, 2006), or search behavior (Ben-Salem & Stolfo, 2011). This, in turn, has yielded new masquerade datasets; some of them enable the development and fair comparison of new detection mechanisms, but others do not have a clear working hypothesis for user profiling. What is more, they either follow the OVTO approach for masquerade detection, or consider a restricted masquerade scenario.

In this paper, we introduce a new masquerade dataset, called Windows-Users and -Intruder simulations Logs (WUIL). WUIL contains information about both user activity, in terms of file system usage, and, unlike rival datasets, faithful masquerade attempts. It has been built under the working hypothesis that to characterize user behavior we should observe the way she navigates her file system structure. Thus, it is the object upon which an action is carried out what distinguishes user participation; this is unlike existing approaches, which consider the user action only. We shall argue that file system navigation provides a richer means, and at a higher-level of abstraction, for building novel models for masquerade detection; furthermore, it is not device-, platform-, or application-dependent.

We shall devote part of this paper to describe WUIL: what information about user activity is stored and how it is represented; prominent characteristics of the participant users; the kinds of masquerade attacks to be timely detected; and the way they have been simulated. We shall argue that WUIL provides reliable data for experimenting on close to real-life instances of masquerade detection, as well as for conducting fair comparisons on rival detection mechanisms, hoping it will be of use to the research community.

To give the reader a taste as to how to use WUIL to approach masquerade detection, we also report on the results of comparing two masquerade detection mechanisms: one based on Support Vector Machines (SVM), and the other on K-Nearest Neighbors (KNN). While we shall discuss strengths an limitations of this experiment, we insist it is not central to the paper: we expect it to motivate the use of WUIL among the research community to deepen general understanding on masquerade detection.

In a similar vein, we provide directions for further research, hinting on how to use the features contained in WUIL, and hoping others would find them appealing. Having signed a non-disclosure agreement, one can freely download WUIL from http://homepage.cem.itesm.mx/raulm/wuil-ds/.

Section snippets

Existing datasets for masquerade detection

This section outlines the strengths and weaknesses in existing datasets for masquerade detection, prompting the development of WUIL. Since our main interest consists in describing and assessing the most prominent datasets used to detect masquerade attacks, we will focus on the structure and features of each dataset presented. The reader is referred to the original work in case he is interested in knowing validation details and classification results.

The WUIL dataset

The design and development of WUIL has been driven by a simple working hypothesis: the way a user navigates the structure of her file system (FS) suffices to masquerade detection. In WUIL, FS navigation comprises two key aspects: first, the FS object upon which a user has carried out an action; and second, information as to how each of these objects is used over a session. This information is captured by means of a navigation structure.

The construction of WUIL: the user dataset

Currently, WUIL comprises records of 20 users. Each user is characterized by a collection of logs. User logs were captured over a period of observation, which ranges from five to ten weeks of ordinary working days. Our logs comprehend information about the navigation of user objects only; put differently, WUIL involves no records about objects that are part of the OS, or an application.

Attacks and their simulation

One of the main difficulties that the research community in intrusion detection faces is the lack of datasets with real, or at least real enough, attacks (see Section 2). We have developed WUIL in an attempt at filling in this gap. In this section, we report on the way we crafted and simulated our masquerade attempts.

Preliminary experiments

In this section, we shall briefly describe the results obtained through a preliminary experiment on applying WUIL for masquerade detection. The purpose of this experiment, other than obtaining a fully functional masquerade detection system, is to provide the reader with sufficient evidence to consider WUIL as a working dataset for the development of new masquerade detection mechanisms.

In our experiment, we have adopted a window-based, one-class classification approach. A window comprehends a

Further work

We divide our directions for further work according to two key aspects. First, how to improve our dataset to have a better and more complete dataset, and, second, what kind of experiments need to be carried out to validate the working hypothesis behind the construction of WUIL, namely: user profiling considering how a user navigates the structure of her FS suffices to detect masqueraders. We discuss these issues below.

Conclusions

In this work, we argued about the need to widen research in the masquerade detection field to other sources of user activity and behavior. We chose the approach of analyzing the way a user navigates the structure of her file system and introduced WUIL, a dataset that contains information regarding the use of file system objects.

WUIL is made up of data gathered from normal users and a set of different users simulating masqueraders’ attacks. Three kinds of attacks were performed per legitimate

Acknowledgements

We are grateful to the anonymous referees, and to the members of the NetSec group, at Tecnológico de Monterrey, Campus Estado de México, for their comments on an earlier draft of this paper. The research reported here was supported by CONACYT Grant 105698.

References (23)

  • I. Razo-Zapata et al.

    Masquerade attacks based on user’s profile

    Journal of Systems and Software

    (2012)
  • L. Araujo et al.

    User authentication through typing biometrics features

    IEEE Transactions on Signal Processing

    (2005)
  • Ben-Salem, M., & Stolfo, S. J. (2011). Modeling user search behavior for masquerade detection. In Sommer, R.,...
  • Bertacchini, M., & Fierens, P. (2008). A survey on masquerader detection approaches. In Proceedings of V Congreso...
  • S. Bleha et al.

    Computer-access security systems using keystroke dynamics

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1990)
  • K.-T. Chen et al.

    User identification based on game-play activity patterns

  • R. Chinchani et al.

    RACOON: Rapidly generating user command data for anomaly detection from customizable templates

  • S. Cho et al.

    Web based keystroke dynamics identity verification using neural network

    Journal of Organizational Computing and Electronic Commerce

    (2000)
  • A. Garg et al.

    Profiling users in GUI based systems masquerade detection

  • Greenberg, S. (1998). Using unix: Collected traces of 168 users. Research Report 88/333/45, Department of Computer...
  • Haider, S., Abbas, A., & Zaidi, A. (2000). A multi-technique approach for user identification through keystroke...
  • Cited by (30)

    • Addressing insider attacks via forensic-ready risk management

      2023, Journal of Information Security and Applications
    • Using binary classifiers for one-class classification

      2022, Expert Systems with Applications
      Citation Excerpt :

      This particular scenario is referred to as one-class classification (Perera et al., 2021; Tax, 2001). Many real-world applications correspond to this scenario, such as fault detection, fraud detection, and intrusion detection (Barrera-Animas et al., 2017; Camina et al., 2014; Camiña et al., 2019). Owing to its practical importance, one-class classification has received considerable attention from the research community.

    • Employee profiling via aspect-based sentiment and network for insider threats detection

      2019, Expert Systems with Applications
      Citation Excerpt :

      The profiles encompasses the users’ biometric behaviors and can be used for masquerade and insider threat detection. While not focusing on providing a defense solution to insider threats, Camiña, Hernández-Gracidas, Monroy, and Trejo (2014) proposed a synthetic dataset namely, the Windows-Users and -Intruder simulations Logs (WUIL) dataset, which encompasses faithful masquerade attempts, in hope to motivate the research community to use WUIL and further explore the solution to masquerade detection. Leu, Tsai, Hsiao, and Yang (2017) propose an Internal Intrusion Detection and Protection System (IIDPS) to detect insider attacks at the system call level.

    • Online masquerade detection resistant to mimicry

      2016, Expert Systems with Applications
      Citation Excerpt :

      This is the case of collections such as Are You You? ( RUU) (Salem & Stolfo, 2011b) and Windows-Users and Intruder simulations Logs (WUIL) (Camiña, Hernández-Gracidas, Monroy, & Trejo, 2014). In RRU the collecting of legitimate user data was carried out by the Windows operative system host sensor on personal computers.

    • A benchmark for visual analysis of insider threat detection

      2022, Science China Information Sciences
    View all citing articles on Scopus
    View full text