1 Introduction

In information security management, insider threat is one of the biggest threats. Since there are too many involved factors, it is not clear which factor plays the most significant role in malicious activities. Hence, this study aims to identify the factors of insider threat for security management viewpoint. Classifying behaviors into two classes, positive and negative, Hausawi conducted interviews with security experts [1]. This survey-based study is very useful for understanding insider’s behaviors and collecting all possible features for malicious activities. However, survey and interview are not always true, e.g., subjects pretending to be honest and unintentionally protecting their organization. Moreover, it is not feasible to observe potential insider’s every steps to perform malicious action.

In order to overcome the difficulties, in observation, we propose an experiment that subjects are employed to work to given task and observe the number of malicious activities of subjects.

We conduct an experiment from that a total of 198 subjects work to review sample web search engine and observed what they behaved. Our decision tree analysis reveals the typical characteristics of malicious ideas.

2 Experiment

In our experiment, we focus on an assignment of identities to users. If users share some common ID such as “administrator” with others, they tend to be malicious more often than users with individual IDs. Since it is impossible to figure out who makes misbehavior, the ID sharing user may think the malicious activities never be exposed. To verify how much malicious activities are increased when ID is shared, we divided a set of subjects into two groups; the first half assigned to a common ID, and the other half assigned to individual IDs.

The flow of experiment is as follows. First, all subjects login to a registration site by using the IDs of the crowdsourcing service. At that site, subjects are assigned the word list to be studied in a trial search service. Second, at the search site, subjects are divided into two groups; individual-IDs users and ID-sharing users groups. The individual-IDs users need to input their IDs of crowdsourcing site before login to the search site. While, the ID-sharing users are allowed to login without any information for access. At the search site, they tested the given 50 search words, and evaluate the quality of results as well as the performance of the search function.

Table 1. Maliious subjects with respects to demographic groups

If subjects complete the test with less than 50 words, we regard them as a malicious activity.

3 Experimental Results

3.1 Malicious Subjects

Table 1 shows the summary of experimental result. We show the numbers of malicious subject for their demographic attributes, e.g., sex, age, and affiliations. As a result, we observed that 20 ID-sharing users (out of 98) played malicious activity. The number of malicious users who shared a common ID is greater than that of individual-IDs users. Based on the experimental result, we analyze the set of malicious subjects in some methods, (1) Decision tree, and (2) Association rule mining, and (3) Logistic regression analyses.

3.2 Decision Tree Analysis

By a decision tree, node “Age” is chosen as the best classifier, which is at the root of tree, and plays a significant role for insider.

A decision tree reveals the logical conditions for determining a target attribute. Figure 1 shows the decision tree of malicious users, learned in R package “rpart”. The target attribute is whether the subject is malicious or not. In this tree, nodes are logical conditions to classify subjects and the left branch means satisfied. By labels “Malicious/Honest”, we denote the numbers of subjects in the node. For example, if user’s age is over 55 (at the left sub tree of the root node) then 7 subjects are malicious except 1 honest (at Sex = b).

Fig. 1.
figure 1

Decision tree of malicious subjects

Table 2. Assosiation rules
Table 3. Logistic regression analysis

3.3 Association Rules Mining

To reveal the typical characteristic of insider related with combination of attributes, we extract some association rules by using R package, “arules”. Table 2 shows the selected association rules. By Support and Confidence, we denote a joint probability Pr(lhsrhs), and a conditional probability \(Pr(rhs \mid lhs) \), respectively. For example, No. 1 rule means that “If users use individual IDs and they are self-employed worker, then they are legitimate with 89% confidence. No. 5 means that “ID-sharing users sometimes (20% confidence) have played malicious activity”.

The association rule shows “If individual-IDs users are 30’s, they are legitimate”.

3.4 Logistic Regression Analysis

A logistic regression is an analysis method to predict a conditional probability of event given conditions in a logistic model. We applied the logistic regression to the dataset of malicious subjects of some demographic attributes. Our model is of the form

$$\begin{aligned} \log \frac{Pr(malicious \mid x)}{1-Pr(malicious \mid x)} = -1 -0.05x_1 + 0.048 x_2 + \cdots + 0.064x_{10} \end{aligned}$$

where the coefficients of variables are given in Table 3.

4 Conclusions

We studied the factor analysis of malicious insider in total of 198 subjects with some conditions. Our experiment showed that sharing ID and Password could increase a risk of malicious insider by \(\frac{1}{0.68}\) times than without sharing.