skip to main content
10.1145/1413140.1413185acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsiirwConference Proceedingsconference-collections
research-article

Peer to peer botnet detection for cyber-security: a data mining approach

Published:12 May 2008Publication History

ABSTRACT

Botnet is a network of compromised hosts or bots, under the control of a human attacker known as the botmaster [7, 8]. Botnets are used to perform malicious actions, such as launching DDoS attacks, sending spam or phishing emails and so on. Thus, botnets have emerged as a threat to internet community. Peer to Peer (P2P) is a relatively new architecture of botnets [4]. These botnets are distributed, and small. So, they are difficult to locate and destroy. Most of the recent works in P2P botnet are in the analysis phase [4, 5, 6]. On the contrary, our work is aimed at detecting P2P botnets using network traffic mining.

Network traffic can be considered as an infinite data stream. So, our data mining approach is specialized for mining stream data. There are two major problems related to stream data classification. First, it is impractical to store and use all the historical data for training, since it would require infinite storage and running time. Second, there may be concept-drift in the data. For example, in the context of botnets, the botmaster usually updates the bot software frequently, which may change the characteristics of botnet traffic, resulting in a concept drift in the data. If there is a concept-drift in the data, we need to refine our hypothesis to accommodate the new concept. Thus, most of the old data must be discarded from the training set. There are two mainstream techniques available for stream data classification: single classifier approach [1], and ensemble classifier approach [10, 9]. Among these, the ensemble classifier is often more robust in handling concept drifts. We also propose an ensemble classification approach for that solves both the problems related to stream data classification.

A common approach in classifying stream data is to divide the stream data into equal sized chunks [2, 10, 9, 3]. We also follow this approach. However, instead of storing historical data, we store the trained classifiers. We always store an ensemble A of best K classifiers {A1, ..., AK}. The ensemble A is actually a two-level ensemble. That is, each classifier Ai in the ensemble A is actually a collection (ensemble) of v classifiers. Thus, we build a hierarchy of ensembles, where A is at the top level of the hierarchy, and each of its children Ai is at the middle level. The lowest level (or the leaves) contains the actual classifiers.

Each middle-level ensemble Ai is trained with r consecutive data chunks. As soon as a new data chunk appears, we train a new middle-level ensemble An. Let D={Dn, Dn-1, ..., Dn-r+1}, i.e, the most recent r data chunks including Dn. We randomly divide D into v equal parts = {d1, ..., dv}, such that roughly, all the parts have the same number of positive and negative examples. We then build An with v classifiers = {An(1), An(2), ..., An(v)}, where each classifier An(j) is trained with the dataset D - {dj}. We compute the expected error of the ensemble An by testing each classifier An(j) on dj and averaging their error. Finally, we update the top-level ensemble A by replacing a middle-level ensemble Ai(1 ≤ iK) with the new ensemble An, if An has lower error rate than Ai. By introducing this multi-chunk multi-level ensemble, we reduce the expected error by a factor of rv over the single-chunk, single-level ensemble method (e.g. [10]). We prove the effectiveness of our approach both theoretically and empirically.

We have several contributions. First, we propose a novel multi-chunk, multi-level ensemble technique for stream data classification, which is a generalization over the existing single-chunk single-level ensemble techniques. Second, we prove the effectiveness of our technique theoretically. Finally, we apply our technique on for detecting P2P botnet traffic, and achieve better detection accuracies than other stream data classification techniques. No botnet detection techniques so far applied the stream classification approach. We believe that the proposed ensemble technique provides a powerful tool for network security and it will encourage the future use of stream data classification in botnet detection.

Skip Supplemental Material Section

Supplemental Material

References

  1. P. Domingos and G. Hulten. Mining high-speed data streams. In Proc. SIGKDD, pages 71--80, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. Fan. Systematic data selection to mine concept-drifting data streams. In Proc. KDD, pages 128--137, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Gao, W. Fan, and J. Han. On appropriate assumptions to mine data streams. In Proc. ICDM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. B. Grizzard, V. Sharma, C. Nunnery, B. B. Kang, and D. Dagon. Peer-to-peer botnets: Overview and case study. In Usenix/Hotbots '07 Workshop, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. T. I. Group. Sinit p2p trojan analysis. lurhq. http://www.lurhq.com/sinit.html, 2004.Google ScholarGoogle Scholar
  6. R. Lemos. Bot software looks to improve peerage. http://www.securityfocus.com/news/11390, 2006.Google ScholarGoogle Scholar
  7. M. A. Rajab, J. Zarfoss, F. Monrose, and A. Terzis. A multifaceted approach to understanding the botnet phenomenon. In Proc. of the 6th ACM SIGCOMM on Internet Measurement Conference (IMC), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Saha and A. Gairola. Botnet: An overview. CERT-In White Paper CIWP-2005-05, 2005.Google ScholarGoogle Scholar
  9. M. Scholz and R. Klinkenberg. An ensemble classifier for drifting concepts. In Proc. ICML/PKDD Workshop in Knowledge Discovery in Data Streams., 2005.Google ScholarGoogle Scholar
  10. H. Wang, W. Fan, P. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proc. KDD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Peer to peer botnet detection for cyber-security: a data mining approach

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      CSIIRW '08: Proceedings of the 4th annual workshop on Cyber security and information intelligence research: developing strategies to meet the cyber security and information intelligence challenges ahead
      May 2008
      470 pages
      ISBN:9781605580982
      DOI:10.1145/1413140

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 May 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader