Keywords

1 Introduction

Generally speaking, software engineering can be considered as a systematic and disciplined approach to developing software. It concerns all the aspects of the production cycle of software systems and requires expertise, in particular, in data management, design and algorithm paradigms, programming languages, and human-computer interfaces. It also demands an understanding and an appreciation for systematic design processes, non-functional properties, and large integrated systems. Thus, when developing complex systems such as Internet and web-based services, it is necessary to apply sound engineering principles in order to economically obtain reliable and efficient software.

In this context, any system is prone to threats, that can be caused either by the failure of an internal element or by internal/external attacks. It is a global necessity to be able to react and defend himself against these threats. In computer science, conventional fault-tolerance infrastructures tolerate crashes of system by applying replication [1]. Complex communication networks display a surprising degree of robustness based on distributed fault-tolerance protocols (Paxos [2]). However, crashing is not the only way systems fail. While random nodes crashes do not fragment a whole network, an attack can easily destroy such a network. If, for example, a malicious client can manage to exploit a vulnerability in the application code to take over a node [3] or using ransomware [4].

In this paper, we propose a practical attack-tolerant methodology for insider attacks that are known to be difficult to detect. The specificity of our approach is that it integrates Software Reflection techniques [5, 6] as well as monitoring based on log files, for an efficient detection and mitigation of such attacks. We assume that there is a kernel that is “frozen” and any change/modification of this kernel is an unexpected behavior that needs to be analysed. Our technique not only present the methodology for software design. We also present a way to detect the modification of the kernel at runtime, to check the modification against an online virus DataBase (DB), using the hash of the code that tries to be executed, and providing relevant warnings to the admin user to decide how to proceed in some situations. In addition we propose a multi-layer software based technique for making our software, which allows to continue having some functionalities working even if some layers are under attack. We consider these systems as attack tolerant. In the sequel, we present a multi-layer System Under Test example that will be used to address the methodology. We applied a secure design approach from the beginning of the project until the implementation to reduce some risky developments to be executed. Moreover, using the Montimage Monitoring Tool [7] for runtime verification we analyze the code against a Virus DB and detect any change of the software. The approach then allows Web services to continue functioning even in the presence of insider attacks.

We also present the results of the experiments obtained by the application of our methodology to the security of a Web service implemented for the experiments. The Web service was developed in Python, and the secure design of the “frozen” parts were decided when designing the API of the system using an UML schema. In order to show their robustness we performed some attacks by injecting real viruses to the running system. We obtained promising results. Finally, the main contributions of the paper are then as follows:

  • We propose and describe a new approach exploring the usage of Software Reflection as a mean of detection and mitigation of insider attacks.

  • We describe the approach through a RESTful e-health Web service.

  • We evaluate it with this e-health Web service using a realistic testbed. We show how the attacks are detected and how we use our methodology. This shows the suitability and the accuracy of the approach to enable attack tolerance to these attacks.

The paper is organized as follows. In Sect. 2 a state of the art is introduced. In Sect. 3, we show our secure design methodology and instantiate it through case studies in Sect. 4. The experiments are presented in Sect. 5. Finally, in Sect. 6 we present the conclusions and future work.

2 State of the Art

In this section, we first introduce the notion of reflection and the links with other programming techniques. Then, we will define the attacks tolerance concept and techniques and the works related to these techniques. Finally we will compare existing frameworks with our proposed methodology.

Software reflection is a way of implementation of meta programming techniques (programs manipulating themselves) and was originally envisaged to enable the construction of programs that require the ability to examine or modify their execution behaviors. The foundations of software reflection were first proposed by [8] that brought some additional features of reflection and described an original experiment to introduce and show how reflective architecture can be incorporated in object-oriented languages such as Python [9], C++ [10], JAVA [11], etc. At the beginning, reflection techniques were used for adaptability [12], debugging, self-optimization, integrity verification [13], Remote Method Invocation (Java RMI),... However, in this paper we present how reflection can be used as a secure by design technique, having as a consequence to improve the security of an application and enable attack tolerance. As a definition, an application is attack tolerant if in presence of an attack, there is a possible scenario in which the system can work properly with minimal degradation of performance [14,15,16,17]. Several techniques [18,19,20] and architectures have been proposed during the last ten years to enable attack tolerance. In [21, 22] we proposed a model-based attack tolerance technique. The technique presented in this work is based on monitoring and in a formal model approach. The main idea underpinning this approach is the following: from a formal model of a system, we derive a library of equivalent models that have the same purposes as the first one and are verified correct. These models can be used to replace one model that is suffering an attack.

Although all of these methods are interesting from a theoretical point of view, we think they are not sufficient to properly detect internal attacks and malware due to the following reasons: (a) In all existing machine learning methods, detection involves a learning step in which the normal behaviour is described. However, in an internal attack, the usurper’s behavior often has very little difference compared to the behavior of legitimate one. This can lead to several false positives in the final detection. (b) None of these approaches offer remediation sketches in order to allow applications to continue functioning even in the presence of insider attacks. A new approach to insider attack detection and tolerance is needed. In this paper, we rather propose to design a generic attack-tolerant methodology for insider attacks. Our approach integrates the reflection techniques mentioned above as well as the monitoring of the log files.

3 Framework

In this section, we introduce the assumptions and the definition of basic concepts that will be used in our methodology. In our environment we require the following assumptions:

  • We consider that the software of the client is located in a safe environment;

  • Some potential attacks that can take place are internal ones. That is, coming from internal hackers. The aim of the attacker is to usurp the actions, i.e., to modify the methods of the \(\mathcal{A\!P\!I}\) of the platform;

  • Even if the environment is safe, we also assume that the unsuspecting use of employees (e.g. the unknowing click of an email attachment) can lead to malware exposures.

Fig. 1.
figure 1

Attack tolerance framework

Definition 1

An attack is any external or internal interaction with the system that modifies the behavior or changes some parts of the code making them unsafe. An attack can also be the presence of a virus or malware that has the purpose of spreading and infecting the machines of the system.    \(\square \)

Definition 2

An \(\mathcal{A\!P\!I}\) is a set of methods and tools that can be used for building software applications and we can consider them as safe.    \(\square \)

Definition 3

Given an \(\mathcal{A\!P\!I}\) of our system, any change of this \(\mathcal{A\!P\!I}\) can be considered as an internal attack.    \(\square \)

As presented in the introduction we define a layered framework depicted in Fig. 1:

  • Layer 1: Web Firewall Service. This is the entry-point of the framework.

  • Layer 2: Specialized Operations. This layer contains the business operations (for example patients management in the e-health example below) of the running application.

  • Layer 3: Authentication. We will have a multi-factor authentication namely a user-password authentication followed by an SMS authentication. This multi-factor authentication obviously increases the security since the attacker needs much more time to access to the system.

  • Layer 4. Logs, Monitoring and Reaction. This layer spans the other layers presented above. It is responsible of the detection of misbehaviors and the reaction against such threats in each layer of the running application. The detection of attacks is based on the monitoring of the log files. Each operation or method invocation is stored in a log located in each part of the system (application, server and cloud). The monitoring tool analyses these logs in order to find misbehaviors or attacks according to some security rules. When the logs or the network probes testify the presence of an attack or a misbehavior, a classification is made for deploying the best countermeasures or remediation to these threats. The classification leverages a vulnerability DB, where the hashes codes of known vulnerabilities are stored.

Any software component (Web Service, Specialized Operations, Documents Managements and Authentication) has it own \(\mathcal{A\!P\!I}\). The whole framework aims at providing attack tolerance by design for any user of the platform. For example, lets us consider the user 1 who wants to use the Web service. If the Web Service does not work because of an attack, the user can still access to the documents using the extranet. In the same manner, the users 2 and 3 can still perform some specific operations (Layer 2) even if some operations of Layer 2 stop working. But any component needs the authentication component (Layer 3). Also, let us note that user 4 wants to authenticate himself and that the authentication component is not available due to an attack, user 4 is not able to access the database or critical parts of the running application.

Once the assumptions and the general framework described we present a detailed description of the detection and reaction methodology (Layer 4 above). We propose a methodology that will ensure an efficient attack tolerance. To better tolerate attacks, first they must be detected. Consequently, detection is an important part of our approach.

The whole detection and reaction framework works as follows:

  • Monitoring and detection: We begin with the detection step. Monitoring and detection are possible thanks to the tool MMT. The programs are checked at runtime using reflection. The system can be in the normal working conditions, i.e., the security policies are respected, in this case we have nothing to do. Or on the other hand, if something abnormal is found, there may be a virus attack, an \(\mathcal{A\!P\!I}\) modification or an unknown attack.

  • Check against the database of virus: We analyse the hashed fingerprints of the program and compare that value in the database of attacks. If the attack appears in the vulnerability database, it means that the attack is a virus or an attack resulting by the modification of a method of the \(\mathcal{A\!P\!I}\). We provide a set of countermeasures. If the attack is a malware, the system launches the corresponding patch. If the threat is a modification of one of the methods of the \(\mathcal{A\!P\!I}\), a countermeasure can be to replace a software component/layer or to change to the initial \(\mathcal{A\!P\!I}\) method. These countermeasures will be described later.

  • Check against the M-DB: If the attack that happens is not known in the vulnerability DB, we check in our own DB, called Montimage DB (M-DB). If the attack exists, we provide the same countermeasures as mentioned above.

  • Save on M-DB: This case corresponds to the situation in which the attack has no hash either in the vulnerability DB nor in the M-DB. The hash is then stored in the M-DB and we define a new countermeasure (State \(S_{8}\)).

Fig. 2.
figure 2

\(\mathcal{A\!P\!I}\) of the HealthOperation center.

4 Case Studies

In order to apply the methodology to real use cases, we introduce two new examples.

Example 1

We propose an e-health Web application example that is a software for the management of patients and doctors of an hospital. The simplified \(\mathcal{A\!P\!I}\) consists of 4 methods, that are presented in Fig. 2.    \(\square \)

It is assumed that this \(\mathcal{A\!P\!I}\) is by definition safe, i.e., it is only accessible by authorised people. Let’s suppose the following code is used as the implementation of the updatePatient method (Fig. 3):

Fig. 3.
figure 3

Correct implementation of updatePatient.

Let us note that both: (a) In this scenario, as protection measure, a user wanting to perform an action must first obtain a token provided by the super-administrator. (b) This token should be validated before updating the database.

Fig. 4.
figure 4

Unexpected implementation of updatePatient function.

These lines of code are functionally similar to the first implementation of the updatePatient method but the length of the code is not the same. The insider attacker got a token from one of his colleagues who has more privileges than him. There is no verification of that token. The attack can, for example, modify the \(\mathcal{A\!P\!I}\) and insert fake values into the system or can retrieve confidential data. In this way, the requests of the users of the application do not return correct results. An attack can also be the presence of a malware that has the purpose of spreading and infecting the machines of the system. Following our methodology, this attack is detected by using software reflection. This is a Meta programming technique. It is possible in many programming languages to be able to dynamically get the code and even the execution trace of a method, class, module. One can also modify the class at runtime. In Python the inspect module provides functions for learning about live objects, including modules, classes, instances, functions, and methods. Functions can be used in this module to retrieve the original source code for a function, look at the arguments of a method on the stack, and extract the sort of information useful for producing library documentation for your source code. We can see in our running system under test how we can detect such attack based on inspection mechanisms.

We suppose that the \(\mathcal{A\!P\!I}\) is the one described previously. For detecting attacks, we will use logs located on the two endpoints: on premises, on the server (proxy). We then will store them in addition to the general information such as the date and time, the hash of the stack of any running code. These hashes are made possible by the method foo() defined in Fig. 5.

Fig. 5.
figure 5

Implementation of the foo method.

Consequently, any line in all logs has the following format:

Date Hour Operation Hash. We also consider that, every request made by a user using the \(\mathcal{A\!P\!I}\) is followed by an answer from the server before the request is performed on the cloud. Any request has then two traces in the logs: outbound and inbound. For instance, if the user sends a getConnected request, it will produce the corresponding getConnected outbound in the log file of the application. When the server responds to that request, it will produce getConnected inbound in the log file of the application.

Example 2

Consider now the case of the updatePatient() method that interests us in particular. Let’s detail how the client can use that method in a correct way and also in cases where the attack is manifest. As we showed in the first implementation of updatePatient method (Fig. 3), the foo method is called before and after the operation itself. We suppose that the hash of the outbound operation corresponds to the first foo call while the inbound operation one’s corresponds to the second call of foo. Both hashes should obviously match because they are obtained from the same method. The log file of the application will be (Table 1):

Table 1. Normal entries in the log of the client application
Fig. 6.
figure 6

Communication chart in the normal case

Reciprocally upon receiving the updatePatient request, updatePatient inbound is written in the log file located on the server. When the server responds to that request, it will produce an updatePatient outbound in the server’s log file (Fig. 6). The server’s log will be (Table 2):

Table 2. Normal entries in the log of the server

Let’s see the case where an attacker has succeeded to launch his attack against the \(\mathcal{A\!P\!I}\). We describe then the situation when an attack occurs. Any of these cases seem to show that there is an attack: someone has modify the \(\mathcal{A\!P\!I}\) and overridden one or several methods:

  • No hash: This is the case where we see some information of the methods but there is no Hash. This happens when the attack overrides the method but do not implement it correctly (Fig. 4). It is also possible to get only one hash for the outbound operation and nothing for the inbound operation.

  • Hashes not equal: Here we consider that we got some relevant information in the logs but the hashes of both Outbound and Inbound are not the same. This can appear on the client side, the server side or both. This happens when the attacker uses the same core algorithm used on the correct code but the instructions or the implementation of the rest of the code is not the same.

  • Inconsistency: Here we can get some inconsistencies in the logs. For instance on the client log (respectively server log) we got an Inbound (respectively Outbound) operation before an Outbound (respectively Inbound).

We can also have the inconsistency with the dates or hours in the log files.The inconsistency lies in the fact that the client receive updatePatient(Inbound) before the server sent updatePatient(Inbound). Let us remark that we could have any combination of the inconsistencies mentioned above.    \(\square \)

For detecting these attacks, regarding our methodology in Sect. 3, we applied the following security policies (rules). On the client application:

  • Rule 1: Any update request should have hashes for its operations (outbound and inbound) on the log files and these hashes must correspond.

  • Rule 2: Any outbound operations should be followed by an inbound operation.

  • Rule 3: If the outbound and the inbound operations have the same hashes, the inbound one shouldn’t appear before the outbound one.

    The rules on the server side are the same as those of the client seen above. Aggregating the two logs we will have these news rules:

  • Rule 4: For any outbound operation that appears in the log of the application, there must be a corresponding inbound operation in the log of the server coming from the application and the outbound (from the application) operation must occurred (clock indication) before the inbound operation(from the server).

  • Rule 5: For any inbound operation that appears in the log of the server, there must be a corresponding outbound operation in the log of the application whose source is the server; and the outbound (from the server) operation must occurred (clock indication) before the inbound operation (from the application).

In the case of attack against the \(\mathcal{A\!P\!I}\), by comparing the hashes of both outbound and inbound operations, one can see that a lambda instruction has been called without being the right one. Thus the attack is detected. In the same way we can detect the external attacker. For a good and efficient monitoring of and how to specify those rules, we will use the MMT tool and investigate how we can enhance the Complex Event Processing engine. Due to space limitation we only show Rule 1: Any update request should have hashes for its operations (outbound and inbound) on the log files and these hashes must correspond, as depicted in Fig. 7.

Fig. 7.
figure 7

Security rule representation in MMT.

This XML document expresses the rule 1 in the formalism of the MMT tool. This is an attack detection rule (type_property= “ATTACK”). An MMT-Security properties XML file can contain as many properties as required. Each property begins with a <property> tag and ends with </property>. A property is a general ordered tree. In general, the left branch (Event 1 (\(e_{1}\))) represents the context and the right branch (Event 2 (\(e_{2}\))) represents the trigger. In this example, if the context and the trigger are verified, then an attack/evasion has been detected. Note that &amp; is equivalent to logical AND and strcmp is the classical string comparison function in C.

In summary, we proposed a rule for the efficient detection of an insider attack using MMT. After the detection, we must react in order to ensure the attack tolerance capability of the framework. The mitigation and the remediation techniques are as follows. To mitigate an insider attack, there are three ways of reacting.

(a) The first way is to dynamically change the implementation of the extend class at runtime. In this way any further attacks leveraging that issue will be thwarted. This is called Behavioral Reflection i.e. reification of execution. (b) The second way is to disable the overridden function. This is called Structural Reflection i.e. reification of structure. (c) Finally we can also change the class at runtime according to a period of time to increase the randomness. These are made possible by using Software reflection in Python. We will illustrate the first method in the next section through experimentation.

5 Experiments and Results

To implement and test our approach, we developed a python RESTful Web-service with the FLASKFootnote 1 framework. This Web service implements and extends the example of the hospital seen above. We used a RESTful API for various reasons. As we shown in previous sections, to ensure attack-tolerance, our framework must be modular in the sense that the different levels should be independent of each other. One of the keys of SOA architecture is that interactions take place with modular services (flexible coupling) that operate independently. SOA enables reuse of services, which avoids starting from scratch when upgrades and other changes are needed. This is an undeniable advantage for companies seeking to save time and money. The preference of REST above SOAP is because REST is easier to use and is more flexible. It has the following advantages over SOAP: fast, effective, no need for expensive tools to interact with the Web service. The service has two main databases, a database of viruses (Virus DB) and a database for MMT (MMT DB) which contains the meta-data of the methods (name, module, source code). To be sure that these databases can not be corrupted, standard database protection techniques have been used. These databases have been encrypted and the data they contain as well. For the whole framework, the methodology is the one explained in the previous section. The operations related to both the patients and the doctors (creation, list, update, deletion) were implemented as REST requests (POST, GET, PUT, DELETE).

The use-case conducted is the following. We assume that one or more operations of the service have been compromised by injecting the code with a known virus. This may be the case, for example, if one of the project partners has clicked on a malicious link received in an e-mail. The requests launched by the different users are intercepted and analyzed with the detection tool (MMT) before executing the corresponding method. In this step, there is a comparison between the hash code of the methods invoked and the hash code we have in MMT DB. If the hash is equivalent to the hash of the method that is in the DB, it means that nothing malicious happens. If the hash does not correspond to the hash of the method that is in the DB, an alarm is issued and as a countermeasure, the safe code of the operation existing in the DB is dynamically executed so that the corrupted code can not spread. This also ensures continuity of the service for the users. By doing so, all subsequent attempts by the attackers will not succeed. We will evaluate the framework in the presence of virus samples of VirusShareFootnote 2. Two experiments have been launched. In the first experiment we investigated the overhead(overall time needed to respond to the requests of the clients) generated by the framework when an attack is detected. The second experiment aims at comparing the accuracy (time to detect an attack) of our detection tool with the classical detection commercial off-the-shelf (COTS) tools.

Experiment 1: We measured the average time to make a client request without attack and when the attack is detected. The results are recorded in the following Table 3a. We find that these values are very close. It can be concluded that the approach does not induce much overhead and that this is transparent to the user.

Table 3. Experiments 1 and 2 results

Experiment 2: In this part, we evaluated the ability of the framework to detect viruses attack in comparison to a conventional anti-virus. A virus was injected into the Web Service. For security and simplicity issues, a new virus has been proposed. This virus modifies all the codes of the classes, methods or functions of the python modules of a given directory tree. We established a signature of the new virus. We added this signature to our virus database as well as that of the anti-virus ClamAvFootnote 3 a well known anti-virus for all operating systems. We used a logical signature (a logical signature allows combining of multiple signatures in extended format using logical operators) as well as a hash-based signature (namely md5).

Our local database is based on the site’ virusshareFootnote 4 database. This site contains all the hashes (md5) of known malware. For fairness we only launched ClamAv in the folder containing our web service implementation. We run the virus and try to detect it with both antivirus. The results in terms of detection time are recorded in the Table 3b. From the Table 3b we can conclude that our framework is twice faster than ClamAv. Moreover, we think our framework is more suitable than the conventional anti-viruses for the following reasons. MD5-based anti-malware only works against static-infections that never change. However, there are also polymorphic malwares that change continuously their source code. So whether it’s static signatures or dynamic signatures, attackers can still do zero-days attack. But with our methodology, any attack that will take place will necessarily be detected because we base our detection on the sources of our modules and not on the sources of viruses. This is fundamental for attack tolerance. The only condition is to make sure that this database can not be easily compromised.

6 Conclusion and Future Work

In this paper we have presented a generic secure methodology to detect and remediate insider attacks on Web-based services. We explored and laid the groundwork for attack tolerance using software reflection. We have also presented a multi-layer architecture and show how we are able to detect at runtime changes of the kernel application that might be considered as attacks.

In addition, to be able to detect attacks our methodology allows to enable/disable different layers of the system in order to stop the attack and allow the system to continue working. Our goal is to have always a part of the system always running, even in the presence of an attack and be able to provide some “critical” services in any situation.

As future work, we plan a more in-depth study of reflection as a mean of attack tolerance with some additional experiments. We also plan to introduce new testing and verification techniques in our methodology, to be able to classify the virus/unexpected running code of our system, and also to share the reporting of this analysis among several users at runtime. In this way, we will share new attacks detection methods as soon as attacks will be detected.