The Windows-Users and -Intruder simulations Logs dataset (WUIL): An experimental framework for masquerade detection mechanisms
Introduction
Information is extremely critical and valuable. As more information is stored in computers, it is paramount to timely detect whether one’s computer session is being illegally seized by an intruder, so-called a masquerader. Failing at doing so may result in countless losses.
Masquerade detection is approached as an anomaly detection task, where the masquerade detection mechanism aims at distinguishing any diversion in the current user activity from a given profile of ordinary user behavior. It has been actively studied since the seminal work of Schonlau et al. (2001), who suggested that, to profile a user, one should use the commands she has typed during a UNIX session. Schonlau et al. (2001) masquerade dataset, called SEA, has been the de facto standard for developing and comparing rival masquerade detection mechanisms.
SEA, however, presents severe limitations. Most prominently, it does not include a collection of faithful intrusion attempts. Instead, Schonlau et al. (2001) have adopted a one versus the others (OVTO) masquerade approach, where an ordinary session from a user is taken to be an intrusion attempt against other. Further, masquerade detection based on command usage has proven not to be powerful enough (Razo-Zapata, Mex-Perera, & Monroy, 2012). As a result, research has resorted to profile a user considering alternative sources of activity, including device usage (e.g., the keyboard) (Garg et al., 2006, Killourhy and Maxion, 2010), application usage (Sankaranarayanan, Pramanik, & Upadhyaya, 2006), or search behavior (Ben-Salem & Stolfo, 2011). This, in turn, has yielded new masquerade datasets; some of them enable the development and fair comparison of new detection mechanisms, but others do not have a clear working hypothesis for user profiling. What is more, they either follow the OVTO approach for masquerade detection, or consider a restricted masquerade scenario.
In this paper, we introduce a new masquerade dataset, called Windows-Users and -Intruder simulations Logs (WUIL). WUIL contains information about both user activity, in terms of file system usage, and, unlike rival datasets, faithful masquerade attempts. It has been built under the working hypothesis that to characterize user behavior we should observe the way she navigates her file system structure. Thus, it is the object upon which an action is carried out what distinguishes user participation; this is unlike existing approaches, which consider the user action only. We shall argue that file system navigation provides a richer means, and at a higher-level of abstraction, for building novel models for masquerade detection; furthermore, it is not device-, platform-, or application-dependent.
We shall devote part of this paper to describe WUIL: what information about user activity is stored and how it is represented; prominent characteristics of the participant users; the kinds of masquerade attacks to be timely detected; and the way they have been simulated. We shall argue that WUIL provides reliable data for experimenting on close to real-life instances of masquerade detection, as well as for conducting fair comparisons on rival detection mechanisms, hoping it will be of use to the research community.
To give the reader a taste as to how to use WUIL to approach masquerade detection, we also report on the results of comparing two masquerade detection mechanisms: one based on Support Vector Machines (SVM), and the other on K-Nearest Neighbors (KNN). While we shall discuss strengths an limitations of this experiment, we insist it is not central to the paper: we expect it to motivate the use of WUIL among the research community to deepen general understanding on masquerade detection.
In a similar vein, we provide directions for further research, hinting on how to use the features contained in WUIL, and hoping others would find them appealing. Having signed a non-disclosure agreement, one can freely download WUIL from http://homepage.cem.itesm.mx/raulm/wuil-ds/.
Section snippets
Existing datasets for masquerade detection
This section outlines the strengths and weaknesses in existing datasets for masquerade detection, prompting the development of WUIL. Since our main interest consists in describing and assessing the most prominent datasets used to detect masquerade attacks, we will focus on the structure and features of each dataset presented. The reader is referred to the original work in case he is interested in knowing validation details and classification results.
The WUIL dataset
The design and development of WUIL has been driven by a simple working hypothesis: the way a user navigates the structure of her file system (FS) suffices to masquerade detection. In WUIL, FS navigation comprises two key aspects: first, the FS object upon which a user has carried out an action; and second, information as to how each of these objects is used over a session. This information is captured by means of a navigation structure.
The construction of WUIL: the user dataset
Currently, WUIL comprises records of 20 users. Each user is characterized by a collection of logs. User logs were captured over a period of observation, which ranges from five to ten weeks of ordinary working days. Our logs comprehend information about the navigation of user objects only; put differently, WUIL involves no records about objects that are part of the OS, or an application.
Attacks and their simulation
One of the main difficulties that the research community in intrusion detection faces is the lack of datasets with real, or at least real enough, attacks (see Section 2). We have developed WUIL in an attempt at filling in this gap. In this section, we report on the way we crafted and simulated our masquerade attempts.
Preliminary experiments
In this section, we shall briefly describe the results obtained through a preliminary experiment on applying WUIL for masquerade detection. The purpose of this experiment, other than obtaining a fully functional masquerade detection system, is to provide the reader with sufficient evidence to consider WUIL as a working dataset for the development of new masquerade detection mechanisms.
In our experiment, we have adopted a window-based, one-class classification approach. A window comprehends a
Further work
We divide our directions for further work according to two key aspects. First, how to improve our dataset to have a better and more complete dataset, and, second, what kind of experiments need to be carried out to validate the working hypothesis behind the construction of WUIL, namely: user profiling considering how a user navigates the structure of her FS suffices to detect masqueraders. We discuss these issues below.
Conclusions
In this work, we argued about the need to widen research in the masquerade detection field to other sources of user activity and behavior. We chose the approach of analyzing the way a user navigates the structure of her file system and introduced WUIL, a dataset that contains information regarding the use of file system objects.
WUIL is made up of data gathered from normal users and a set of different users simulating masqueraders’ attacks. Three kinds of attacks were performed per legitimate
Acknowledgements
We are grateful to the anonymous referees, and to the members of the NetSec group, at Tecnológico de Monterrey, Campus Estado de México, for their comments on an earlier draft of this paper. The research reported here was supported by CONACYT Grant 105698.
References (23)
- et al.
Masquerade attacks based on user’s profile
Journal of Systems and Software
(2012) - et al.
User authentication through typing biometrics features
IEEE Transactions on Signal Processing
(2005) - Ben-Salem, M., & Stolfo, S. J. (2011). Modeling user search behavior for masquerade detection. In Sommer, R.,...
- Bertacchini, M., & Fierens, P. (2008). A survey on masquerader detection approaches. In Proceedings of V Congreso...
- et al.
Computer-access security systems using keystroke dynamics
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1990) - et al.
User identification based on game-play activity patterns
- et al.
RACOON: Rapidly generating user command data for anomaly detection from customizable templates
- et al.
Web based keystroke dynamics identity verification using neural network
Journal of Organizational Computing and Electronic Commerce
(2000) - et al.
Profiling users in GUI based systems masquerade detection
- Greenberg, S. (1998). Using unix: Collected traces of 168 users. Research Report 88/333/45, Department of Computer...
Cited by (30)
Addressing insider attacks via forensic-ready risk management
2023, Journal of Information Security and ApplicationsUsing binary classifiers for one-class classification
2022, Expert Systems with ApplicationsCitation Excerpt :This particular scenario is referred to as one-class classification (Perera et al., 2021; Tax, 2001). Many real-world applications correspond to this scenario, such as fault detection, fraud detection, and intrusion detection (Barrera-Animas et al., 2017; Camina et al., 2014; Camiña et al., 2019). Owing to its practical importance, one-class classification has received considerable attention from the research community.
Employee profiling via aspect-based sentiment and network for insider threats detection
2019, Expert Systems with ApplicationsCitation Excerpt :The profiles encompasses the users’ biometric behaviors and can be used for masquerade and insider threat detection. While not focusing on providing a defense solution to insider threats, Camiña, Hernández-Gracidas, Monroy, and Trejo (2014) proposed a synthetic dataset namely, the Windows-Users and -Intruder simulations Logs (WUIL) dataset, which encompasses faithful masquerade attempts, in hope to motivate the research community to use WUIL and further explore the solution to masquerade detection. Leu, Tsai, Hsiao, and Yang (2017) propose an Internal Intrusion Detection and Protection System (IIDPS) to detect insider attacks at the system call level.
Online masquerade detection resistant to mimicry
2016, Expert Systems with ApplicationsCitation Excerpt :This is the case of collections such as Are You You? ( RUU) (Salem & Stolfo, 2011b) and Windows-Users and Intruder simulations Logs (WUIL) (Camiña, Hernández-Gracidas, Monroy, & Trejo, 2014). In RRU the collecting of legitimate user data was carried out by the Windows operative system host sensor on personal computers.
Host-Based Intrusion Detection: A Behavioral Approach Using Graph Model
2023, Lecture Notes in Networks and SystemsA benchmark for visual analysis of insider threat detection
2022, Science China Information Sciences