research-article

Peer to peer botnet detection for cyber-security: a data mining approach

Authors:

Bhavani ThuraisinghamAuthors Info & Claims

CSIIRW '08: Proceedings of the 4th annual workshop on Cyber security and information intelligence research: developing strategies to meet the cyber security and information intelligence challenges ahead

Article No.: 39, Pages 1 - 2

https://doi.org/10.1145/1413140.1413185

Published: 12 May 2008 Publication History

Get Access

Abstract

Botnet is a network of compromised hosts or bots, under the control of a human attacker known as the botmaster [7, 8]. Botnets are used to perform malicious actions, such as launching DDoS attacks, sending spam or phishing emails and so on. Thus, botnets have emerged as a threat to internet community. Peer to Peer (P2P) is a relatively new architecture of botnets [4]. These botnets are distributed, and small. So, they are difficult to locate and destroy. Most of the recent works in P2P botnet are in the analysis phase [4, 5, 6]. On the contrary, our work is aimed at detecting P2P botnets using network traffic mining.

Network traffic can be considered as an infinite data stream. So, our data mining approach is specialized for mining stream data. There are two major problems related to stream data classification. First, it is impractical to store and use all the historical data for training, since it would require infinite storage and running time. Second, there may be concept-drift in the data. For example, in the context of botnets, the botmaster usually updates the bot software frequently, which may change the characteristics of botnet traffic, resulting in a concept drift in the data. If there is a concept-drift in the data, we need to refine our hypothesis to accommodate the new concept. Thus, most of the old data must be discarded from the training set. There are two mainstream techniques available for stream data classification: single classifier approach [1], and ensemble classifier approach [10, 9]. Among these, the ensemble classifier is often more robust in handling concept drifts. We also propose an ensemble classification approach for that solves both the problems related to stream data classification.

A common approach in classifying stream data is to divide the stream data into equal sized chunks [2, 10, 9, 3]. We also follow this approach. However, instead of storing historical data, we store the trained classifiers. We always store an ensemble A of best K classifiers {A₁, ..., A_K}. The ensemble A is actually a two-level ensemble. That is, each classifier A_i in the ensemble A is actually a collection (ensemble) of v classifiers. Thus, we build a hierarchy of ensembles, where A is at the top level of the hierarchy, and each of its children A_i is at the middle level. The lowest level (or the leaves) contains the actual classifiers.

Each middle-level ensemble A_i is trained with r consecutive data chunks. As soon as a new data chunk appears, we train a new middle-level ensemble A_n. Let D={D_n, D_n-1, ..., D_n-r+1}, i.e, the most recent r data chunks including D_n. We randomly divide D into v equal parts = {d₁, ..., d_v}, such that roughly, all the parts have the same number of positive and negative examples. We then build A_n with v classifiers = {A_n(1), A_n(2), ..., A_n(v)}, where each classifier A_n(j) is trained with the dataset D - {d_j}. We compute the expected error of the ensemble A_n by testing each classifier A_n(j) on d_j and averaging their error. Finally, we update the top-level ensemble A by replacing a middle-level ensemble A_i(1 ≤ i ≤ K) with the new ensemble A_n, if A_n has lower error rate than A_i. By introducing this multi-chunk multi-level ensemble, we reduce the expected error by a factor of rv over the single-chunk, single-level ensemble method (e.g. [10]). We prove the effectiveness of our approach both theoretically and empirically.

We have several contributions. First, we propose a novel multi-chunk, multi-level ensemble technique for stream data classification, which is a generalization over the existing single-chunk single-level ensemble techniques. Second, we prove the effectiveness of our technique theoretically. Finally, we apply our technique on for detecting P2P botnet traffic, and achieve better detection accuracies than other stream data classification techniques. No botnet detection techniques so far applied the stream classification approach. We believe that the proposed ensemble technique provides a powerful tool for network security and it will encourage the future use of stream data classification in botnet detection.

Supplementary Material

References

[1]

P. Domingos and G. Hulten. Mining high-speed data streams. In Proc. SIGKDD, pages 71--80, 2000.

Digital Library

Google Scholar

[2]

W. Fan. Systematic data selection to mine concept-drifting data streams. In Proc. KDD, pages 128--137, 2004.

Digital Library

Google Scholar

[3]

J. Gao, W. Fan, and J. Han. On appropriate assumptions to mine data streams. In Proc. ICDM, 2007.

Digital Library

Google Scholar

[4]

J. B. Grizzard, V. Sharma, C. Nunnery, B. B. Kang, and D. Dagon. Peer-to-peer botnets: Overview and case study. In Usenix/Hotbots '07 Workshop, 2007.

Digital Library

Google Scholar

[5]

L. T. I. Group. Sinit p2p trojan analysis. lurhq. http://www.lurhq.com/sinit.html, 2004.

Google Scholar

[6]

R. Lemos. Bot software looks to improve peerage. http://www.securityfocus.com/news/11390, 2006.

Google Scholar

[7]

M. A. Rajab, J. Zarfoss, F. Monrose, and A. Terzis. A multifaceted approach to understanding the botnet phenomenon. In Proc. of the 6th ACM SIGCOMM on Internet Measurement Conference (IMC), 2006.

Digital Library

Google Scholar

[8]

B. Saha and A. Gairola. Botnet: An overview. CERT-In White Paper CIWP-2005-05, 2005.

Google Scholar

[9]

M. Scholz and R. Klinkenberg. An ensemble classifier for drifting concepts. In Proc. ICML/PKDD Workshop in Knowledge Discovery in Data Streams., 2005.

Google Scholar

[10]

H. Wang, W. Fan, P. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proc. KDD, 2003.

Digital Library

Google Scholar

Cited By

View all

Kassem ASalam AL HAJJAR ADaya BChauvet P(2018)A Proposed Methodology for Cyber Security Mechanism According to the Most Popular Detected Attacks for University Web Application2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4)10.1109/WorldS4.2018.8611626(215-219)Online publication date: Oct-2018
https://doi.org/10.1109/WorldS4.2018.8611626
Haque AAyyar ASingh S(2018)A meta data mining framework for botnet analysisInternational Journal of Computers and Applications10.1080/1206212X.2018.144213641:5(392-399)Online publication date: 8-Mar-2018
https://doi.org/10.1080/1206212X.2018.1442136
Khan MPradhan SFatima H(2017)Applying Data Mining techniques in Cyber Crimes2017 2nd International Conference on Anti-Cyber Crimes (ICACC)10.1109/Anti-Cybercrime.2017.7905293(213-216)Online publication date: Mar-2017
https://doi.org/10.1109/Anti-Cybercrime.2017.7905293
Show More Cited By

Index Terms

Peer to peer botnet detection for cyber-security: a data mining approach
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Botnet Detection Using DNS and HTTP Traffic Analysis
ICONETSI '20: Proceedings of the 2020 International Conference on Engineering and Information Technology for Sustainable Industry

To perform a large scale attack on the victim, cyber attacker usually prepares thousands if not millions of infected computers to accomplish the goal. Once the infected computers, also called botnet, are ready, they will communicate with the Command and ...
An Advanced Hybrid Peer-to-Peer Botnet

A “botnet” consists of a network of compromised computers controlled by an attacker (“botmaster”). Recently, botnets have become the root cause of many Internet attacks. To be well prepared for future attacks, it is not enough to study how to detect and ...
Towards complete node enumeration in a peer-to-peer botnet
ASIACCS '09: Proceedings of the 4th International Symposium on Information, Computer, and Communications Security

Modern advanced botnets may employ a decentralized peer-to-peer overlay network to bootstrap and maintain their command and control channels, making them more resilient to traditional mitigation efforts such as server incapacitation. As an alternative ...

Comments

Information & Contributors

Information

Published In

May 2008

470 pages

ISBN:9781605580982

DOI:10.1145/1413140

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 May 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CSIIRW '08

CSIIRW '08: Cyber Security and Information Intelligence Research Workshop

May 12 - 14, 2008

Tennessee, Oak Ridge, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
2,114
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Kassem ASalam AL HAJJAR ADaya BChauvet P(2018)A Proposed Methodology for Cyber Security Mechanism According to the Most Popular Detected Attacks for University Web Application2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4)10.1109/WorldS4.2018.8611626(215-219)Online publication date: Oct-2018
https://doi.org/10.1109/WorldS4.2018.8611626
Haque AAyyar ASingh S(2018)A meta data mining framework for botnet analysisInternational Journal of Computers and Applications10.1080/1206212X.2018.144213641:5(392-399)Online publication date: 8-Mar-2018
https://doi.org/10.1080/1206212X.2018.1442136
Khan MPradhan SFatima H(2017)Applying Data Mining techniques in Cyber Crimes2017 2nd International Conference on Anti-Cyber Crimes (ICACC)10.1109/Anti-Cybercrime.2017.7905293(213-216)Online publication date: Mar-2017
https://doi.org/10.1109/Anti-Cybercrime.2017.7905293
Gañán CCetin Ovan Eeten MBao FMiller SZhou JAhn G(2015)An Empirical Analysis of ZeuS C&C LifetimeProceedings of the 10th ACM Symposium on Information, Computer and Communications Security10.1145/2714576.2714579(97-108)Online publication date: 14-Apr-2015
https://dl.acm.org/doi/10.1145/2714576.2714579
Rodríguez-Gómez RMaciá-Fernández GGarcía-Teodoro P(2013)Survey and taxonomy of botnet research through life-cycleACM Computing Surveys10.1145/2501654.250165945:4(1-33)Online publication date: 30-Aug-2013
https://dl.acm.org/doi/10.1145/2501654.2501659
Behdad MBarone LBennamoun MFrench T(2012)Nature-Inspired Techniques in the Context of Fraud DetectionIEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews10.1109/TSMCC.2012.221585142:6(1273-1290)Online publication date: 1-Nov-2012
https://dl.acm.org/doi/10.1109/TSMCC.2012.2215851
Lin SChen PChang C(2012)A novel method of mining network flow to detect P2P botnetsPeer-to-Peer Networking and Applications10.1007/s12083-012-0195-x7:4(645-654)Online publication date: 28-Dec-2012
https://doi.org/10.1007/s12083-012-0195-x
Lu WRammidi GGhorbani A(2011)Clustering botnet communication traffic based on n-gram feature selectionComputer Communications10.1016/j.comcom.2010.04.00734:3(502-514)Online publication date: 1-Mar-2011
https://dl.acm.org/doi/10.1016/j.comcom.2010.04.007
Kenyeres PSzentgyörgyi AMészáros TFehér G(2010)BotSpot: Anonymous and Distributed Malware DetectionRecent Trends in Wireless and Mobile Networks10.1007/978-3-642-14171-3_6(59-70)Online publication date: 2010
https://doi.org/10.1007/978-3-642-14171-3_6
Thuraisingham BKhan LMasud MHamlen K(2008)Data Mining for Security ApplicationsProceedings of the 2008 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing - Volume 0210.1109/EUC.2008.62(585-589)Online publication date: 17-Dec-2008
https://dl.acm.org/doi/10.1109/EUC.2008.62

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Botnet Detection Using DNS and HTTP Traffic Analysis

An Advanced Hybrid Peer-to-Peer Botnet

Towards complete node enumeration in a peer-to-peer botnet

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations