1 Introduction

Along with the peer-to-peer applications on Internet are raised rapidly, such as Bittorrent, Bitcoin; meanwhile, potential threats are also appearing upon them. P2P botnets are consisting of a lot of compromised machine, controlled by botmaster, and been used for various cybercrime purposes including phishing, spam, distributed denial of service (DDoS). For example, the recently Mirai botnet, which can intrude many off-the-shelf routers and acquire the root access, had been reported by anti-virus companies. Mirai also generates DDoS attacks for paralyzing several leading service platforms such as GitHub, Twitter and Minecraft server. The typical P2P botnets, like Mirai, use comprised peers to form a P2P network for exchanging command directly. Other advanced P2P botnets, e.g., Zeus, Kelihos, have improved for the robustness by adding rendezvous points in their network structure. Moreover, both the typical or advanced P2P botnets own the ability to avoid the single point of failure (SPOF) problem, which makes themselves are more difficult to detect in contrast to traditional centralized command and control botnets.

Previous studies [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22] use the supervised-based learning or signature-based approach to identify malware activities and have very high precision on particular botnets. However, some deficiencies have existed among those studies. First, supervised-based learning can only detect already-known botnets, as opposed to the mutated version of the same codebase malware, e.g., Kelihos, such that the performance perhaps degrade due to the behavior pattern had changed. Second, the signature-based approach requires examining the content of packets or infected executable file for digging out the malware execution flow; it may spend a lot of pre-training time for analyzing the malware behaviors and may cause privacy concerns because of the inevitable investigating on payloads in the detecting phase. Third, most research uses synthetic logs, which collected from the testbed in the short-term period, for evaluating their performance. They are not verified on the real traffic and may impractical especially in facing the rapidly mutated botnets.

In this paper, we use our BotCluster [1] to give the discussions of the observation period on botnet detection between the short-term and long-term duration. The testing traffic logs as Netflow format are collected from two campuses in Taiwan, (National Cheng Kung University (NCKU) and National Chung Cheng University (CCU)) upon TWAREN [24] (Taiwan Advanced Research and Education Network). We analyze the traffic logs of two campuses in the same duration between April 2nd to April 15th, 2017, accompanied with three time-scopes including single-day, three-day, and weekly observation period for presenting their importance on detecting the activities of P2P botnets. The contributions of our work are as follows:

  • We use BotCluster to analyze the activities of P2P botnets on the real traffic.

  • The precision was evaluated using public blacklist service VirusTotal [23] to ensure detection results feasible and reliable.

  • We present three time-scopes including single-day, three-day and weekly observation period for proving that has a significant influence on the P2P botnet detection.

  • We also show that even botnets stay in stealth communication, the detected results aggregated by BotCluster, can grow dramatically either in the long-term observing or using combined traffic logs from different campuses, e.g., NCKU and CCU.

The rest of this paper is organized as follows: Sect. 2 presents the related work. Section 3 gives a brief of BotCluster. Section 4 presents experiments and discussions. Finally, the conclusion and future work are illustrated in Sect. 5.

2 Related Work

Yan et al. [15] are tracking P2P botnet activities over 6 months to recognize the super peers, normal peers, and their statistical characterization within one AS (Autonomous System) scope. Their main contribution is to give a proof that the botnet communication pattern would eventually appear on long-term observation.

Sun et al. [6] traces the network behavior of compromised hosts to discover malicious domains. It consists of two central procedures for the feature extraction and domain classifier. The first phase runs on the MapReduce used to construct the Process-Domain Bipartite-Graph from the training dataset for explaining the connection between the domain and the process. Every domain will be labeled malicious or benign using blacklist and whitelist, and then it will mark each process as infected or good accompanied with at least two different malicious domains. After labeling, each domain will extract a process behavior feature proposed to represent as its feature vector. In the classifier phase, they use Spark to build the random forest model for identifying the unknown domain as malicious or benign. The experiment uses the ROC to evaluate the precision and the false positive rate of itself with two enterprises traffic log in gigabyte level and the malicious IP provided by the VirusTotal.

Qiu et al. [8] use the Chow-Liu Bayesian Network, Gaussian Mixture, and logistic regression to form an active learning framework with human-in-the-loop for modeling traffic behavior as a feature set. They extract the first ten packets of each flow after the handshake of the connection and make a distribution accompanying with the sequence of directions and packet sizes of them. Next, they use Gaussian Mixture Model to derive the feature vector; those vectors then applied to a regression model for building a classifier model. Experiments, using the traffic comprised of the LBNL background traffic, VRT Zeus, and the ISOT Zeus, show that the ROC and AUC of the proposed framework has better performance than other feature set CSET11 and TNNLS16.

Yang et al. [9] implements a two-stage algorithm for detecting P2P botnets on the SCADA system, the first stage removes the hosts without P2P network traffic, and the second stage uses Affinity Propagation to identify the bots. A self-organizing map (SOM) introduced in [10] is applied to recognize the botnets. It is based on two datasets, ISOT and CTU13, with three plans for model training. The experiment shows that the classifier under given full knowledge can achieve the best performance up to 99% on traffic classification. The Generic Algorithm (GA) on [11] is used to resolve the appropriate features on the initial set of 19 characteristics for training the C4.5 model. Their experiments adopt the two datasets, ISOT and ISCX, and use the GA to obtain the best combination of the features for detecting a particular botnet activity. The results show that with the optimal feature vector, the detection rate can boost up dramatically, and they conclude that feature selection is an important impact factor on the botnet detection.

Mai and Park [12] use ISOT dataset to evaluate the performance of the three unsupervised clustering K-means, DBSCAN and Mean Shift and construct decision trees using clustering results for traffic classification. Their experiments show that when the number of traffic data is small, the K-means gets the poor performance, but it achieves the best detection rate at the traffic volume is enormous enough. Moreover, there is no significant improvement for others while the amount of traffic was growing up. However, the proper size K of the cluster in the K-means is hard to determine in the real traffic, and it may lead to the low accuracy against variants of the botnets. Gavrilut et al. [13] presents a DGA-based botnet detection using traffic log between hosts and DNS services. They observe the domain name resolving among 18 botnets to construct the composition rules of them and create a query distribution. Chebyshev’s inequality then applied to the distribution for detecting the unusual fraction of DNS traffic to identify botnets. Our BotCluster is based on unsupervised-based learning and can detect P2P botnet activities on real traffic with the non-prior knowledge required in contrast to [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22].

3 Introduction to BotCluster

BotCluster [1] is a generic P2P botnet detection system. It can identify P2P botnets using the similar behavior in Netflow traces. In this section, we introduce each stage of BotCluster including Session Extraction, Filter and Grouping to give an overview of our previous work. Figure 1 shows the workflow of BotCluster. The implementation of Session Extraction, Filter and Grouping stages are using Hadoop’s MapReduce.

Fig. 1.
figure 1

The workflow of BotCluster.

3.1 Session Extraction

In Session Extraction stage, it constructs sessions and extracts features from unidirectional flows in the Netflow. Netflow’s features are including the source and destination endpoints (IP addresses), communication ports and its protocol, and the number of incoming and outgoing of bytes and packets. For aggregating sessions from unidirectional flows between the two endpoints, we collect continuous flows within a timeout threshold into a session. In other words, any two flows that their inter-arrival time is less than a timeout threshold would be merged into the same session. Every session also accompanies with a feature vector consist of the number of packets, the number of bytes, the maximal, minimal and average packet length, the ratio of the incoming and outgoing bytes, session duration and the flow loss-response ratio (FLR). FLR ratio is a proportion of the requests to corresponding reverse directional response.

3.2 Filter

In Filter Stage, it decreases the input size for reducing the computing overhead. We use the whitelist to filter safety network transactions and utilize FLR to remove unassociated sessions which their endpoints have lower likelihood in P2P traffic for decreasing the total amount of volume in the dataset. Filter Stage is composed of two phase including the whitelist filter and the FLR filter. Whitelist filter would eliminate sessions that their endpoint is on the whitelist including public DNS servers or validate websites. Next, remaining sessions with low FLR would be excluded from the FLR filter. A session has a lower FLR also means that it unlikely participates any P2P networks.

3.3 Grouping

In the Grouping phase, a mutant DBScan-like algorithm [1] was used to cluster sessions with a like behavior hierarchically. It consists of three-level subgrouping for aggregating similar behaviors by measuring the similarity in their feature vectors with different target correspondingly at each level as shown in Fig. 2. The first subgrouping applied to collect similar sessions with the same endpoints to form Super-Sessions. Next, it merged similar Super-Sessions of the same source endpoint to construct the Session-Groups. Finally, we examine the behavior similarity among the Session-Groups to consolidate them as Behavior-Groups. Hosts inside a Behavior-Group can be considered them belong to the same P2P network with most likely behaviors. Besides, the similarity measurement is based on Euclidean distance. After performing the three-level grouping, we use a Python script for reversing lookup the correlated IP address within each Behavior-Group to generate a suspicious IP list.

Fig. 2.
figure 2

3-level grouping workflow in BotCluster

4 Experiments

4.1 Environment

Our experiments run on Braavos [25] at the National Center for High-Performance Computing (NCHC) in Taiwan. It has 256 nodes, total 4096 cores and 16.38 TB memory, and its Hadoop version is YARN 2.7.2 with 1.5 PB storage space in HDFS. Each node had installed the operating system CentOS 6.7 and equipped with dual eight-core Xeon CPU E5-2630 up to 2.4 GHz and 128 GB DDR3 memory.

4.2 Netflow Dataset and Parameter Setting

The Netflow traffic logs, provided by NCHC (National Center for High-Performance Computing), are collected between April 2nd to April 15th, 2017 at the two campuses in Taiwan (National Cheng Kung University (NCKU) and National Chung Cheng University (CCU)) upon the TWAREN (Taiwan Advanced Research and Education Network). All statistics of our collected logs had shown in Table 1. The column “IPs” is the number of IP address, “Size” is the file size in gigabytes and “Flows” is the number of record in Netflow log. The total number of unique IP address is about 1.29 million and 0.67 million for NCKU and CCU. Moreover, the total size and total flows are about 39.5 GB and 23.8 GB, and 342 million and 206 million for NCKU and CCU correspondingly. The FLR Threshold set as 0.225 adopted from our previous work setting [1]; the minimal points (MinPts) and the distance threshold in grouping stage of BotCluster set as 5 and 1.5 respectively.

Table 1. Netflow statistics for NCKU and CCU between April 2nd to 15th 2017.

4.3 Precision

Due to the nature of the stealth and evading communication of P2P botnets, only using the blacklist cannot reveal the complete malicious activities. Therefore, we use the blacklist from VirusTotal for inferring infected IP from our detection results. We use the inference rule in [1] to identify malicious IP. The inference rule is that if the numbers of IP in a group are more than 5 or over 50% directly recording on VirusTotal, then all IPs in the same group would be considered as malicious IPs, because only similar behavioral sessions with a strong association would be aggregated into the same group. The precision verification also adopts from BotCluster as a ratio of CIP to DIP. “Detected IPs” (DIP) is the total number of IPs detected by BotCluster. “Detected Mal. IPs” (DMIP) is the total number of malicious IPs found by BotCluster and directed reported on VirusTotal; “Correct IPs” (CIP) is the number of correct IPs, as inferred by the verification rule; and “Wrong IPs (WIP)” is the number of improperly identified IPs, as WIP = DIP – CIP.

4.4 Experiments

Experiment 1The Impact of the Observation Period on the Traffic in NCKU

In this experiment, we analyze the Netflow logs on the NCKU from April 2nd to April 15th, 2017, and traffic logs were merged according to three time-scopes including the single-day (1D), three-day (3D) and weekly (1W) observation period respectively. The purpose of this experiment was to observe the influence of different time-scopes on the precision. All duplicated IPs are erased for ensuring the accuracy of the detection results. The statistical average detection results had shown in Table 2. The experimental results show that the number of detected malicious IPs (DIP) on the NCKU using the single-day observation period has only 6809 at the beginning. However, as more daily traffic logs merged, the detected results grow up to 8277 and 8826 corresponding to three-day (3D) and weekly (1W) observation period. Detected malicious IP on VirusTotal (DMIP) also increased from 313 to 377, and the precision climbed up from 88% to 92% compared to the single-day observation period.

Table 2. Detection results for NCKU with three observation periods.

Experiment 2The Impact of the Observation Period on the Traffic in CCU

We also analyze the P2P botnet activities of CCU between April 2nd to April 15th, 2017. The experimental results were demonstrated in Table 3. It seems that the botnet activities of CCU are more silence compared to NCKU because their amount of detected malicious IPs (DMIP) is far less than NCKU, i.e., for single-day (1D) observation period has found only 61 DMIP in CCU contrast to 313 DMIP in NCKU. The precision corresponding to three time-scopes including single-day (1D), three-day (3D) and weekly (1W) observation period are about 60%, 77%, and 83%. However, even CCU has more silence botnet activities as opposed to NCKU, the detected malicious IPs on VirusTotal (DMIP) still increased about 42% following time-scope extending from single-day (1D) observation period to weekly (1W) observation period. It also shows that as a time-scope extending, more the traffic logs merged, more the invisible sessions can dig out by BotCluster.

Table 3. Detection results for CCU with three observation periods.

Experiment 3The Impact of the Observation Period on the Combined Traffic

Finally, we merge the Netflow traffic logs of NCKU and CCU in Experiment 3 to observe the influence of the combined logs in botnet detection. The results are summarized in Table 4; we can see that the number of detected IPs is not equal to the sum of individual’s DIP because some of them have appeared on both campuses. i.e., the detected IP (DIP) of single-day (1D) observation period is 9136, which is higher than individual detected IP, which is 6809 and 2356, in NCKU and CCU. Moreover, the precision has significant increment from 85% to 95%. The detected malicious IP on VirusTotal (DMIP) also had extended from 378 in single-day (1D) observation period to 490 in weekly (1W) observation period (which increased by 29%). The experiment result also indicates that there is some connection of P2P botnet may both exist between NCKU and CCU, and use the combined logs can provide a more comprehensive viewpoint in botnet detection.

Table 4. Detection results for the combined logs of NCKU and CCU with three observation periods.

In P2P botnet, the rendezvous points or super-peers are used to exchange the botmaster commands and to ensure the connectivity; meanwhile, the normal peer periodically communicates with super-peers. Table 5 showed the statistics of the number of detected intersection IP and detected malicious intersection IP in the NCKU and CCU. There were 24 DMIP existed on both NCKU and CCU under the single-day observation period. We believe that those DMIP can be treated as the super-peers in the P2P botnets. Besides, the number of detected malicious intersection IP had grown up about 58% (increased by 14) from 24 DMIP in the single-day observation period to 38 DMIP in the weekly observation period. The detected intersection IP had also risen by expanding the observation period. Moreover, this experimental result represents an evidence that some rendezvous points had been utilized to exchange message for peers on both campuses NCKU and CCU.

Table 5. The statistics of the number of intersection DIP and DMIP for the traffic logs of NCKU and CCU with three observation periods.

Table 6 shows the new additional detected results of DIP and DMIP in combined traffic logs. The definition of the new additional DIP or DMIP is that a detected IP or a detected malicious IP has never been found in the individual traffic logs of NCKU and CCU, but it had been detected on the combined traffic logs. The experimental results show that the additional DMIP had risen upward from 48 in the single-day observation period to 79 in the weekly observation period (which increased by 64%). Also, the new additional DIP is growing from 933 in 1D time-scope to 1256 in 1W time-scope (which increased by 35%). We further examine the new additional DMIP and find that there are some of them may act as OpenCandy servers for installing PUP (potentially unwanted programs) purpose. Further, this experimental result also presents the fact, that analyzing the combined logs can provide more insight into the botnet activities.

Table 6. The statistics of the number of the new additional DIP and DMIP for the combined logs of NCKU and CCU with three observation periods.

The above experiments verify the fact that even Bots communicate with each other in the stealth way, time-scope extending will improve the opportunity for detecting malicious behaviors. We also demonstrate that the detection results are getting more accurate with incremental observation period on the same traffic logs. Furthermore, these experimental results also tell us that the botnets may invisible in the short-term period, but they can be caught in the long-term observing.

5 Conclusion

In this paper, we use BotCluster [1] on two campuses NCUK and CCU to analyze the impact of the observation period to P2P botnet detection in real traffic. As time-scope extending, more maliciously sessions had been revealed, and the precision is getting better in long-term observing opposed to short-term observing. Moreover, using combined traffic logs and the weekly observation period can improve the precision from 84% to 94%, and obtain more malicious IP compared to the single-day and the three-day observation period. Experiments also proved that the long-term observation is necessary for discovering botnet activities. Our future work will focus on integrating more campuses traffic log to dig out the long-term P2P botnet activities.