1 Introduction

Problems of computer security is currently one of the most important problems of technical civilization. There are many methods of attacking computers or computer networks, in particular remote methods that do not require direct access.

The emergence of this type of problem resulted in the emergence of a number of defense methods. Some of these methods use technical means, such as firewalls or antivirus programs. Users’ awareness of possible threats is also important for the security of computers or computer networks. Organizational security methods are, for example, software or hardware audits.

Air–gap is a very efficient method of improving the security of individual computers or computer systems [6]. Isolation through the lack of communication interfaces reduces the possibility of attacks. Air–gap breaking by the establishing unconventional communication interfaces is possible by the infection of secured computer or computer system [7]. The methods used for infection are not taken into account in this work.

Establishing one–way or two–way communication is important for both attackers and for prevention that is important for security engineers [8]. The data rate achieved is usually very low: a single bit is sent per second. Such transmission allows attack on PIN codes, passwords and very short confidential data. A typical air–gap attack is a one-way attack where data is sent outside of a protected computer or computer network. This type of attack is a long–term process that requires the attacker to listen in close proximity to the protected area.

1.1 Content and Contribution of the Paper

Air–gap transmission techniques are very active research area and some recent development are related to the application of different physical mediums. Example optical channels are: QR code embedding in image [13] (VisiSploit), network LEDs modulation in routers [18] (xLED), infrared transmission [16] (aIR–Jumper). Radio based transmission attacks are for example: USBee [15], AirHopper [14] and GSMem [17]. These types of transmission channels are not covered by typical protection, in particular audits, and therefore are very dangerous. Intentional data transmission hiding could be used for monitoring devices also [21].

In previous works, we proposed using a computer screen flashing for data transmission [4, 5]. In the absence or very low user activity, very small changes in screen brightness are used that are not noticeable to humans.

It is possible to receive data correctly from a large distance using image observation with a camera and a telescope, as well as digital image processing and demodulation algorithms A typical modulation applied in this type of solutions is BFSK (Binary Frequency–Shift Keying) [10, 19], which allows estimation of transmission settings like symbol and keyed frequencies [5].

Previous work assumed the processing of full–frame video sequences for estimation the transmitted signal [5]. Each frame of the image was transformed by averaging to a single value, i.e. the video sequence was converted into a one-dimensional signal The signal analysis for a single image pixel is not effective due to the low SNR (Signal–to–Noise Ratio) value. Averaging the whole frame of the image allows improved SNR, and the best solution is to average only the selected image area, such as the computer screen image. The problem dealt with in this article is the automatic determination of the area subjected to averaging. Determining the image area where brightness modulation occurs allows the SNR to be improved, because areas where there is no modulation are just a source of undesirable noise. This is particularly important in the case of an indirect attack, where the computer screen is not visible directly, but only the glow in the room is visible (Fig. 1). Very often screens of computers and keyboards, for security reasons, are set so that they are not visible from the outside.

Fig. 1.
figure 1

Scheme of measurement of monitor brightness changes.

What’s more, diffusion foils are used to block the possibility of direct observation of the room through the windows, but diffusion foils does not eliminate the glow of monitors.

The application of the optimization algorithm allows searching in the frame for the image of areas for which the brightness changes are the greatest to reduce the impact of noise. Areas for which the brightness changes are small bring the most noise to the signal and should be omitted in the analysis. Areas that should be omitted are those that are illuminated by other sources (other monitors, sources of light in the room).

The proposed algorithm is an offline algorithm, and the data is processed after recording the video image.

Example scenarios and acquisition configurations are presented in Sect. 2. Details of algorithm and processing method are considered in Sect. 3. Result are presented in Sect. 4 and discussion is provided in Sect. 5. Final conclusions and further work are considered in Sect. 6.

2 Data

The evaluation of the algorithm is based on an empirical test as proof of concept.

The source of the data is the laptop of which the screen is not visible directly. The laptop has software that modulates the brightness of the monitor. Brightness changes are not visible to a human directly, which means that the signal has a relatively small SNR.

The second laptop is positioned so that its screen is visible in the frame (Fig. 2). This laptop is a source of local interference, due to strong changes in screen brightness. Additional sources of interference are the lamps that illuminate the corridor. The purpose of the optimization algorithm is to detect areas from which transmitted data can be obtained, while avoiding interference areas.

Fig. 2.
figure 2

Light disturbance computer from the left and back of observed computer screen from the right.

Data is recorded using the ZWO ASI1600MM–COOL camera with a monochromatic sensor [26] and a resolution of 16 Mpix, and Canon EF 100 mm 1:2.8L IS USM lenses. The camera was placed on a tripod to ensure video recording without shifting the essential elements of the scene (Fig. 3).

ASICAP v.1.3 software for MS Windows and USB 3.0 interface has been used for video sequence acquisition. The camera sensor supports ROI (Region of Interest), that is used to reduce USB bandwidth and storage requirements, so the recorded picture frames have a resolution of \(1024 \times 768\). Measured HFOV (Horizontal Field of View) is about \(3^\circ \) for the selected ROI, and the distance between the camera and the computer screen is about 29 m. This camera has advanced cooling (thermoelectric cooler and fan) for sensor noise reduction, but this option was not used. Recording speed is 100 fps and 10 ms exposure time is selected for the flickering reduction due to fluorescent lamps sources in corridor. Recorded video sequences are of high quality because they are raw frames, which enables to control the influence of noise (Fig. 4).

Fig. 3.
figure 3

Acquisition system.

Fig. 4.
figure 4

Light disturbance computer from the left and back of observed computer screen from the right (camera view).

3 Method

Direct observation of the computer screen is sometimes not possible due to spatial relations. The light emitted by the computer screen can be observed indirectly by reflections from the surrounding objects. Glass, plastic, metal materials are particularly interesting because they can reflect light strongly in a specific direction. The Lambertian surfaces are also interesting because the light is scattered in all directions.

An optimization algorithm, based on local full–frame search, is required to process as many picture frames as possible using the demodulation algorithm. Signal processing part is described in [4] and [5]. The detection using two band–pass filters and rectifiers was proposed in [19] and is not considered in this paper. There are two sources of interfering signals: light sources and camera noise. The input image is reduced to \(16 \times 12\) pixels for reducing picture noise and processing complexity.

The aim of proposed algorithm is the calculation of segmented image S with positive pixel values S(xy) that corresponds to segmented area number. Pixel position (xy) corresponds to the signal from video sequence V(xyk), where x and y are image coordinates and k is the frame number. Two pixels \(S_i\) and \(S_j\) belong to the same segmented area \(S_i(.,.) = s\) and \(S_j(.,.) = s\) if the similarity between corresponding video sequences is found using Euclidean metric:

$$\begin{aligned} d(i,j)=\left( \sum _k \left( V_i(x_i,y_i,k)-V_j(x_j,y_j,k) \right) ^2 \right) < T, \end{aligned}$$
(1)

where T is threshold and d(ij) binary value (1 - similar, 0 - not similar).

The problem is the number of calculations required for d(ij), large image resolutions and long sequences. Local comparisons can be used to reduce the calculation due to spatial similarity and this variant is considered.

Algorithm randomly selects starting position \((x_i,y_i)\) uses spatial neighborhoods:

$$\begin{aligned}&(x_i-1,y_i),\end{aligned}$$
(2)
$$\begin{aligned}&(x_i+1,y_i),\end{aligned}$$
(3)
$$\begin{aligned}&(x_i,y_i-1), \end{aligned}$$
(4)
$$\begin{aligned}&(x_i,y_i+1) \end{aligned}$$
(5)

as a second position \((x_j,y_j)\) if they are not assigned to any s. New positions are marked as possible points for starting new comparisons with own neighborhoods. This algorithm behaves like local fill algorithm [25]. This process is repeated until the assignment to the same s region is possible. New position is randomly selected from not assigned yet positions to any s with new s value and repeated until all positions are not used.

Modified distance criteria (1) should be used due to the problem of direct comparison of values. Adjacent pixels can be illuminated in the same way, but with different average values, so the basic algorithm (1) discards pixels that, for example, belong to different scene objects, even if the lighting changes are similar. Proposed algorithm uses the removal of mean values using the following formulas:

$$\begin{aligned} d^{{}*{}}(i,j)=\left( \sum _k \left( V^{{}*{}}_i\left( x_i,y_i,k\right) -V^{{}*{}}_j\left( x_j,y_j,k\right) \right) ^2 \right) < T, \end{aligned}$$
(6)
$$\begin{aligned} V^{{}*{}}(x,y,k)= V(x,y,k) - mean\left( V(x,y,.)\right) , \end{aligned}$$
(7)

where \(V^{{}*{}}\) is corrected V sequence and \(d^{{}*{}}(i,j)\) is a new similarity metric.

The threshold value is adaptively selected by testing the number of areas achieved, and the number of areas should typically be 5–20, so multiple passes of the segmentation algorithm are necessary to select the threshold value. Demodulation is processed individually for each \(A_s\) area with a common value of s. This area uses the following signal fusion formula:

$$\begin{aligned} A_s(k) = \frac{1}{N_s} \sum _s \left( V(x,y,k) - mean\left( V(x,y,.)\right) \right) , \end{aligned}$$
(8)

where \(N_s\) is the number of pixels with the same \(S(x,y)=s\) assignment.

Each demodulated sequence should be checked using an additional CRC (Cyclic Redundancy Check) code for the final selection of the sequence of bits and this topic is out of the scope of the paper.

4 Results

Basic method of air gap transmission using the computer screen brightness changes assumes BFSK modulation and 0.18 bit/s transmission is used. The speed results from assumed slow changes in the brightness of the monitor screen. The Hamming distance between known and received binary sequences is used as the quality criterion. Binary keying signals that have higher resolution are used instead of comparison of demodulated bits. This approach reduces final result influences in demodulation algorithm.

Additive Gaussian noise images with known and controlled standard deviation are applied to all video frames in order to analysis of the sensitivity. Particular video sequence is tested 100 times for the selected standard deviation value and number of segmented region in 5–20 range.

Box and whisker plots for full frame processing and selective processing are shown in Fig. 5. These Monte Carlo test [22] allows the comparison of algorithm and the determination of properties basing on intensive numerical tests. Example segmentation results are shown in Fig. 6.

5 Discussion

Obtained results (Fig. 5) show importance of proposed solution. Achieved segmentation results influences Hamming errors. Proposed solution gives about ten times less errors comparing to entire image frame processing. Noised video sequences introduces Hamming errors, but proposed algorithm is still superior (Fig. 5). The results presented using the Monte Carlo analysis show how the signal degradation curve behaves for many tests for different standard deviation values of the interfering video noise.

Fig. 5.
figure 5

Monte Carlo analysis of Hamming errors for full frame (black) and selective processing (red) algorithms. (Color figure online)

Fig. 6.
figure 6

Examples of processed image (top) and segmented regions (bottom) after rescalling with additive Gaussian noise with std.dev.: 0 (top), 0.055 (middle) and 0.120 (bottom). (Color figure online)

The computing cost is very large for the proposed algorithm, because the entire video sequence is analyzed. It is possible to reduce it by the application only to a short sequence. Obtained area of interest can be used for the selection of a pixel for the entire video sequence. The largest regions in Fig. 6 are exemplary results of the proposed algorithm (marked using dark red color). This area includes both the background behind the monitor and the monitor housing, so it is not an optimal area, which requires the selection of a different threshold value. The obtained results, however, show how strong the level of interference from the monitor with the directly visible screen (Fig. 5).

The backlight modulation can be used for transmission of data from a room with diffusers in the windows The diffusers are used for protection against direct observation of the computer screen using a telescope. This method also accepts blurred images.

The cost of the calculations depends on the image resolution, so quite small resolution has been used. A serious problem is the necessity of processing quite long video sequences (Eqs. (7) and (8)). Processing using OpenMP [9], MPI [20, 24] and CUDA [12, 23] is possible for the considered algorithm.

6 Conclusions and Further Work

Backlight modulation of computer screen is very import technique for air–gap data transmission and proposed segmentation extends possibilities by the automatic selection of video region. Light reflection from walls and surrounding objects could be used, if no direct visibility of screen is possible.

Optical techniques for the data transmission requires direct visibility of source, but this method shows the possibility of indirect data transmission. Very important property of the optical techniques is the possible large distance observation using telescopes and high sensitivity cameras.

The search task for the best area having the largest SNR can be performed using the optimization algorithm [1,2,3, 11]. In addition, the selection of the threshold T value can also be performed automatically using the optimization algorithm. These tasks will be the subject of further work.

Additional further work will be related to the improving of bit rate also, because it is important limitation for numerous air–gap communication methods. Such improvement gives better utilization of short time slots without user activities that could disturb data transmission.