1 Introduction

With the popularization and development of enterprise information management, ERP system is widely used in enterprises as an advanced management system. Enterprise users are faced with many information security issues while enjoying the convenience of ERP system. The US Department of Homeland Security (DHS) issued a security alert saying that national hackers and criminals are increasingly attacking ERP systems, and they have found evidence that Dridex Trojan attacked bank’s ERP system, which brought huge losses to the bank.

One of the methods that can be used to secure information security in ERP systems is intrusion detection. Anomaly detection approach [1] is a key element of intrusion detection that attempts to evaluate the behavior of a user or system and consider intrusive or irregular activities as some deviation from normal patterns. The core of the method is how to identify whether the current behavior is abnormal (intrusive or irregular).

Currently, the traditional studies on identify abnormal behavior of network users are classified into two categories. One is to use network traffic as a characteristic [2,3,4,5,6]. Jain et al. [2] identify abnormal behavior by identifying abnormal network traffic. The other is to use packet as a characteristic [7,8,9,10,11]. Lee et al. [7] identify abnormal behavior by monitoring whether packets are abnormal. However, the characteristics they choose are vitally dependent on computer, moreover, the abnormal behavior identified by these characteristics do not necessarily correspond to the abnormal behavior of users in real life. Therefore, the credibility of the abnormal behavior identified by these methods is open to question.

Trust is typically interpreted as a subjective belief in the reliability, honesty and security of an entity on which we depend for our welfare [12], and these entities contain software, hardware, data, people and organizations. Numerous researchers have conceptualized trust as a behavior, which has been validated in work collaboration and social communications [13]. On the one hand, behavior-based trust models are widely used in e-commerce sites to help consumers assess the quality of their products. Cao et al. [14] studied the trusted third party (TTP) in Australia’s business and examined the factors influencing consumers’ trust behavior from the perspective of consumers online trust of online shopping. Kaur et al. [15] proposed a model to discern the impact of trust factors pertaining in Indian E-Commerce marketplace on the customers’ intention to purchase from an e-store. On the other hand, behavior-based trust models are widely used in software system to ensure information security of software systems. There are two main models in this field. One is to evaluate the factors affecting trust by using continuous or discrete real numbers (trust value) [16,17,18,19]. Hosseini et al. [16] proposed a way to measure the user’s behavioral credibility by scoring the user’s behavior, and also proposed that the score of repeated malicious behavior should be lower than the first malicious behavior. There are some problems in determining whether the behavior is credible by calculating the trust value. When establishing the evaluation system, some untrusted interactive behaviors are preset, so it is impossible to detect untrusted interactive behaviors that are not preset. The other is to model the trusted users by continually optimizing the characteristic framework that describes trusted behaviors [20,21,22]. Yan et al. [20] built a behavioral characteristic framework of trusted users based on computer trust and the interaction intention between human and computer. However, these articles have only been theoretically studied and have not been further explored in conjunction with actual data.

In this paper, we study how to identify untrusted interactive behavior in ERP software systems based on human factors. Trusted interaction is defined as a predictable and controllable information transformation process through a computer network in a way that people and computers work together in an effective manner. Moreover, trusted interactive behavior refers to behavior that is consistent with an individual’s behavioral habits and reflects it in network operations. Hence, we establish a behavioral model of a trusted user by selecting characteristics that reflect individual behavioral habits and then identify untrusted interactive behavior on this basis.

The remainder of this paper is organized as follows: The background is described in Sect. 2. The method for how to identify the untrusted interactive behavior is described in Sect. 3 and an example is given in Sect. 4. Finally, the concluding remarks are addressed in Sect. 5.

2 Background

Hidden Markov Model (HMM) has a wide range of applications in the field of pattern recognition.

2.1 The Concept of Hidden Markov Model (HMM)

HMM is a conceptual model of time series. It describes the process of randomly generating unobservable state sequences from a hidden Markov chain, and then generating an observation sequence from each state.

HMM is a double stochastic process. One is Markov chain, which is used to describe the state of metastasis. Another is to describe each state and observation of the corresponding relation between statistics [23].

HMM has two basic assumptions:

  1. (1)

    The state of the hidden Markov chain at any time t depends only on the state of its previous moment, regardless of the state and observation at other times, and is independent of the time t.

  2. (2)

    Observation at any time depends only on the state of the Markov chain at that moment, independent of other observations and states.

2.2 The Parameters of HMM

An HMM is characterized by the following:

  1. (1)

    Q is a collection of all possible states, V is a collection of all possible observations;

  2. (2)

    I is a sequence of states of length T, O is the corresponding observation sequence;

  3. (3)

    N is the number of all possible states; M is the number of all possible observations;

  4. (4)

    A is the state transition probability distribution;

    $$ A = \left\{ {a_{ij} } \right\}\,\text{and}\,a_{ij} = P\left[ {q_{t + 1} = j\,|\,q_{t} = i} \right],1 \le i,j \le N $$
    (1)
  5. (5)

    B is the observation symbol probability distribution in state j;

    $$ B = \left\{ {b_{j} \left( {v_{k} } \right)} \right\}\,\text{and}\,b_{j} \left( {v_{k} } \right) = P\left[ {o_{t} = v_{k} \,|\,q_{t} = j} \right], \, 1 \le j \le N, \, 1 \le k \le M $$
    (2)
  6. (6)

    π is the initial state distribution;

    $$ \pi = \left\{ {\pi_{i} } \right\}\,\text{and}\,\pi_{i} = P\left[ {q_{1} = i} \right], \, 1 \le i \le N $$
    (3)

For convenience, we usually use a compact notation λ = (A, B, π) to indicate the complete parameter set of an HMM.

2.3 Three Algorithms of HMM

HMM has three algorithms:

  1. (1)

    Forward or backward algorithm. Given the model λ = (A, B, π) and the observation sequence O = (o1, o2, …, oT ) to calculate the probability P(O | λ) of the occurrence of the sequence O under the model λ.

  2. (2)

    Baum-Welch algorithm. Given the observation sequence O = (o1, o2, …, oT) to estimate the parameters of the model λ = (A, B, π) and make the observation sequence probability P(O | λ) maximum under this model.

  3. (3)

    Viterbi algorithm. Given the observation sequence to find the most likely corresponding state sequence.

3 Method

In our research, we collect the characteristic data of each trusted interactive behavior of the user, and use the hidden Markov model to establish each user’s network behavior pattern. Then, each user’s online behavior is matched to its network behavior pattern. Matching unsuccessful behavior is considered untrusted interactive behavior.

3.1 Data Collection and Preparation

In this paper, the data comes from the background log of a publishing company. This log records the operational records left by all users of the company when they use the ERP software system. All the characteristics of the log record are as follows (Table 1):

Table 1. The characteristics of the log record

When we preprocess this data, firstly, we should filter the entire operation record of the required user according to the operator’s name. Secondly, trusted interactive behavior refers to behavior that is consistent with an individual’s behavioral habits and reflects it in network operations. However, a single operation can’t correctly describe the user’s operating habits, and usually a series of operations can represent the user’s operating habits. Therefore, the user’s ten operations are treated as one unit, and the next unit is obtained by moving one operation down on the basis of the previous unit. Finally, we need to determine which characteristics are selected to describe the behavior patterns of trusted users.

3.2 The Selection of Characteristics

Based on the user’s behavioral habits, six characteristics are chosen to describe the behavior patterns of trusted users (Table 2).

Table 2. Selected characteristics

The number of IPs can show that the user likes to use the same IP for a long time while working, or prefers to change frequently. The enter button & function can represent the order of operations. The time accumulation for each operation can reflect the speed of user operations. The operating time period can reflect the user’s work schedule. The time difference between before and after operation reflect the user’s attitude towards work (like delay or timely processing). The combination of the types of operations can represent the character of the individual.

An example is used to illustrate the meaning of the characteristics: the operation of a unit of a user is (1, 23, 3, 4, 1, 10), that means the user only uses one type of IP to perform this group of operations and the operation sequence number is 23, a total of 3 s was spent to perform this set of operations and the accumulated time difference between before and after operations is 6 min to 8 min, The operating time period is 9:00–9:30 and the operation type combination number is 10 (5 business operations, 2 function operations, 3 business operations).

The relevant original record table displayed in Chinese is shown below (Fig. 1).

Fig. 1.
figure 1

The relevant original record

3.3 The Model Parameters of Trusted Users

The untrusted interactive behavior is diverse and we can’t fully understand. Based on this, we model the behavior of the user when the system is running normally, which means that each behavior of the user is trusted. The hidden Markov model built for the behavior of trusted users contains only two states: trusted state and untrusted state. The trusted state is represented by 0, and the untrusted state is represented by 1. The number of observations is determined by the type of unit operation in the previous section. Because the model is modeled when the system is running normally, the state transition matrix A = \( \left[ {\begin{array}{*{20}c} 1 & 0 \\ 1 & 0 \\ \end{array} } \right] \), this means that the transition probability from the trusted state to the trusted state and from the untrusted state to the trusted state is 1, that is, regardless of the current state, the next step will be transferred to a trusted state with a probability of 1. The observation probability matrix B refers to the probability distribution of the unit operation of the trusted user. The initial state probability vector π = {1, 0}. Based on this, the hidden Markov model of trusted user behavior is established.

3.4 The Behavior Recognition of Untrusted Users

We need to set a fixed size sliding window for the observation sequence. The distance that the window slides down each time is an operation. Next, the forward algorithm is used to calculate the observed sequence probability set of trusted and untrusted user behavior under the hidden Markov model of trusted user behavior. When we obtain an observation sequence set of trusted user behavior, we need to use a smaller value in the observation sequence set as our decision threshold. The observation sequence exceeding the threshold is determined as a sequence of behaviors of the trusted user, and instead is determined as a sequence of behaviors of the untrusted user.

4 Procedure

There are two network users using the ERP system participated in the experiment. They are from a publishing company in Chongqing, China. User A is defined as a trusted user, user B is defined as an untrusted user.

4.1 Training Phase

Sequences of user A are used as a training set, User A’s hidden Markov model is the trusted user’s hidden Markov model. The hidden Markov model of the trusted user has been represented in the third section. User A’s observation sequence has a window size of three. Using user A’s 20,000 observation sequences as training data, the model can obtain the observation sequence probability set of user A. The probability of the observed sequences is so small, so we use a logarithm of the probability of these observations to amplify them. The amount of data is too large. The following Fig. 2 only shows the probability of observation sequence of 1000 data. Thus, the probability threshold of the observation sequence of user A is determined to be –6.389.

Fig. 2.
figure 2

Observation sequence probability set of training data

4.2 Test Phase

User A’s remaining 5000 observation sequences are used as test data 1, which are used to test the recognition rate of the model. The observation sequence probability set of user A’s test data is shown in the Fig. 3 below.

Fig. 3.
figure 3

Observation sequence probability set of test data 1

The test results show that the recognition rate of the model for trusted user behavior is 92.64%, and the false positive rate is 7.36%. This means that in the 4989 pieces of behavior data of trusted users, 4622 pieces of data are judged as behavior data of trusted users, and 367 pieces of data are determined as behavior data of untrusted users.

User B’s 5000 observation sequences are used as test data 2, which are used to test the false positive rate of the model. The observation sequence probability set of user B’s test data is shown in the Fig. 4 below.

Fig. 4.
figure 4

Observation sequence probability set of test data 2

The test results show that the recognition rate of the model for untrusted user behavior is 99.24%, and the false positive rate is 0.76%. This means that in the 4989 pieces of behavior data of untrusted users, 4951 pieces of data are judged as behavior data of untrusted users, and 38 pieces of data are determined as behavior data of trusted users.

Identifying the untrusted user’s behavior as the trusted user is more horrible than identifying the trusted user’s behavior as the untrusted user. Therefore, we choose a relatively large threshold to ensure a lower false positive rate when selecting the observation sequence probability threshold.

5 Conclusion

In this paper, the method of identifying untrusted interaction behavior in the process of human-computer interaction based on human behavior habits is proposed by us. Firstly, we analyzed the current information security issues of the ERP software system and reviewed the current methods for solving the information security problems of ERP software systems. Secondly, we propose that to solve the information security problem of ERP software system, we first need to identify the untrusted interaction behavior in the ERP software system. At the same time, we define the trusted interaction and trusted interactive behavior. Thirdly, we introduced the related concepts, parameters and algorithms of the hidden Markov model, then we use Hidden Markov Model to model the behavior of trusted users. Fourthly, we use the forward algorithm in the hidden Markov model to calculate the observation sequence probability set of the trusted user behavior and determine the probability threshold of the observed sequence. Finally, the recognition rate and false positive rate of the model were tested with two test sets.

From the experimental results, the recognition rate of our model is 92.64% and the false positive rate is 0.76%. This shows that the model is effective for identifying untrusted interactive behavior. Moreover, our research provides a new way to identify untrusted interactive behavior and the behavior we define as untrusted user behavior is closer to the abnormal user behavior in real life.

In the future, there is still a lot of work that needs to be done by us. Firstly, we can consider to improve the characteristic framework, such as adding some computer-related characteristics, or the characteristics of the environment’s influence on interaction behavior in human-computer interaction etc. Secondly, only the simplest hidden Markov model is used to model the behavior of trusted users. In future research, higher-order hidden Markov models can be considered to model the behavior of trusted users. Finally, the influence of other factors on the experiment wasn’t considered when selecting the experimental subjects, for example, the influence of the occupation of the experimental subjects on their operating habits.