Skip to main content
Log in

A mathematical exploitation of simulated uniform scanning botnet propagation dynamics for early stage detection and management

  • Original Paper
  • Published:
Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Abstract

The contribution of this paper is two-fold. Firstly, we propose a botnet detection approach that is sufficiently timely to enable a containment of the botnet outbreak in a supervised network. Secondly, we show that mathematical models of botnet propagation dynamics are a viable means of achieving that level of defense from bot infections in a supervised network. Our approach is built on the idea of processing network traffic such as to localize a weakly connected subgraph within a graph that models network communications between hosts, and thus consider that subgraph as representative of a suspected botnet. We devise applied statistics to infer the propagation dynamics that would characterize the suspected botnet if this latter were indeed a botnet. The inferred dynamics are materialized into a model graph. A subgraph isomorphism search determines whether or not there is an approximate match between the model graph and any subgraph of the weakly connected subgraph. An approximate match between the two leads to a timely identification of infected hosts. We have implemented this research in the Matlab and Perl programming languages, and have validated it in practice in the Emulab network testbed. In the paper, we describe our approach in detail, and discuss experiments along with experimental data that are indicative of the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Anderson, R.M., May, R.M., Anderson, B.: Infectious Diseases of Humans: Dynamics and Control. Oxford University Press, USA (1992)

    Google Scholar 

  2. Andersson, H., Britton, T.: Stochastic Epidemic Models and Their Statistical Analysis, ser. Lecture Notes in Statistics, vol. 151. Springer-Verlag, New York (2000)

  3. Bonfante, G., Kaczmarek, M., Marion, J.Y.: Architecture of a morphological malware detector. J. Comput. Virol. 5(3), 263–270 (2009)

    Article  Google Scholar 

  4. Calvet, J., Davis, C.R., Fernandez, J.M., Marion, J.Y., St-Onge, P.L., Guizani, W., Bureau, P.M., Somayaji, A.: The case for In-the-lab botnet experimentation: creating and taking down a 3000-node botnet. In: Proceedings of the 2010 Annual Computer Security Applications Conference. Orlando, Florida (2010)

  5. Cho, C.Y., Caballero, J., Grier, C., Paxson, V., Song, D.: Insights from the inside: a view of botnet management from infiltration. In: Proceedings of the 3rd USENIX Conference on Large-Scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More (2010)

  6. Collins, M.P., Reiter, M.K.: Hit-list worm detection and bot identification in large networks using protocol graphs. In: Proceedings of the 10th International Symposium on Recent Advances in Intrusion Detection, pp. 276–295. Queensland, Australia (2007)

  7. Coskun, B., Dietrich, S., Memon, N.: Friends of an enemy: identifying local members of peer-to-peer botnets using mutual contacts. In: Proceedings of the 2010 Annual Computer Security Applications Conference. Orlando, Florida (2010)

  8. Crandall, J.R., Su, Z., Wu, S.F., Chong, F.T.: On deriving unknown vulnerabilities from zero-day polymorphic and metamorphic worm exploits. In: Proceedings of the 12th ACM Conference on Computer and Communications Security, pp. 235–248. Virginia (2005)

  9. Dagon, D., Zou, C., Lee, W.: Modeling botnet propagation using time zones. In: Proceedings of the 13th Annual Network and Distributed System Security Symposium. San Diego, CA (2006)

  10. Daley, D.J., Gani, J.: Epidemic Modelling: An Introduction. Cambridge University Press, Cambridge (1999)

    MATH  Google Scholar 

  11. Edwards, B., Moore, T., Stelle, G., Hofmeyr, S., Forrest, S.: Beyond the blacklist: modeling malware spread and the effect of interventions. In: Proceedings of the 2012 Workshop on New Security Paradigms, pp. 53–66. Italy (2012)

  12. Ellis, D.R., Aiken, J.G., Attwood, K.S., Tenaglia, S.D.: A behavioral approach to worm detection. In: Proceedings of the ACM Workshop on Rapid Malcode, pp. 43–53. Washington DC (2004)

  13. Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9(3), 319–349 (1987)

    Article  MATH  Google Scholar 

  14. Ferrie, P.: Attacks on virtual machines. In: Proceedings of the Association of Anti-Virus Asia Researchers Conference, France (2007)

  15. Filiol, E., Franc, E., Gubbioli, A., Moquet, B., Roblot, G.: Combinatorial optimisation of worm propagation on an unknown network. Intern. J. Comput. Sci. 2(2), 124–130 (2007)

    Google Scholar 

  16. Foggia, P., Sansone, C., Vento, M.: A performance comparison of five algorithms for graph isomorphism. In: Proceedings of the 3rd IAPR-TC15 Workshop on Graph-Based Representations in, Pattern Recognition, pp. 188–199 (2001)

  17. Forrest,S., Balthrop, J., Glickman, M., Ackley, D.: Computation in the wild. In: Park, K. ,Willins, W. (eds.) The Internet as a Large-Scale Complex System. Oxford University Press, Oxford (2005)

  18. Gagnon, M.N., Taylor, S., Ghosh, A.K.: Software protection through anti-debugging. IEEE Sec. Privacy 5(3), 82–84 (2007)

    Article  Google Scholar 

  19. Gopalan, P., Jamieson, K., Mavrommatis, P., Poletto, M.: Signature metrics for accurate and automated worm detection. In: Proceedings of the 4th ACM Workshop on Recurring Malcode, pp. 65–72. Virginia (2006)

  20. Heubach, S., Mansour, T.: Combinatorics of compositions and words. CRC Press, USA (2009)

  21. Hsu, C.H., Huang, C.Y., Chen, K.T.: Fast-flux bot detection in real time. In: Proceedings of the 13th International Conference on Recent Advances in Intrusion Detection, pp. 464–483. Ottawa, Ontario (2010)

  22. Jacob, G., Hund, R., Kruegel, C., Holz, T.: JACKSTRAWS: picking command and control connections from bot traffic. In: Proceedings of the 20th USENIX Conference on, Security (2011)

  23. Kachitvichyanukul, V., Schmeiser, B.W.: Binomial random variate generation. Commun. ACM 31(2), 216–222 (1988)

    Article  MathSciNet  Google Scholar 

  24. Kim, K., Moon, B.R.: Malware detection based on dependency graph using hybrid genetic algorithm. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, pp. 1211–1218. Portland (2010)

  25. Kolbitsch, C., Kirda, E., Kruegel, C.: The power of procrastination: detection and mitigation of execution-stalling malicious code. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, pp. 285–296. Chicago, Illinois (2011)

  26. Liljenstam, M., Nicol, D.M., Berk, V.H., Gray, R.S.: Simulating realistic network worm traffic for worm warning system design and testing. In: Proceedings of the ACM Workshop on Rapid Malcode, pp. 24–33. Washington (2003)

  27. Livshits, B., Cui, W.: Spectator: detection and containment of javascript worms. In: Proceedings of the USENIX Annual Technical Conference, pp. 335–348. Massachusetts (2008)

  28. Martignoni, L., Paleari, R., Roglia, G.F., Bruschi, D.: Testing CPU emulators. In: International Symposium on Software Testing and, Analysis, USA (2009)

  29. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, USA (1995)

    Book  MATH  Google Scholar 

  30. Nagaraja, S., Mittal, P., Hong, C.-Y., Caesar, M., Borisov, N.: Botgrep: finding bots with structured graph analysis. In: Proceedings of the 19th USENIX Security Symposium, pp. 95–110. California (2010)

  31. Newsome, J., Karp, B., Song, D.: Polygraph: automatically generating signatures for polymorphic worms. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 226–241. California (2005)

  32. Opdyke, J.D.: A unified approach to algorithms generating unrestricted and restricted integer compositions and integer partitions. J. Math. Model. Algorithm 9(1), 53–97 (2009)

    Article  MathSciNet  Google Scholar 

  33. Porras, P., Saidi, H., Yegneswaran, V.: A multi-perspective analysis of the storm (peacomm) worm. In: SRI Computer Science Laboratory Technical Note. California (2007)

  34. Porras, P., Saidi, H., Yegneswaran, V.: A foray into conficker’s logic and rendezvous points. In: Proceedings of the 2nd USENIX Workshop on Large-Scale Exploits and Emergent Threats. Massachusetts (2009)

  35. Rohloff, K.R., Basar, T.: Stochastic behavior of random constant scanning worms. In: Proceedings of the 14th International Conference on Computer Communications and Networks, pp. 339–344. California (2005)

  36. Rrushi, J.L., Mokhtari, E., Ghorbani, A.A.: A statistical approach to botnet virulence estimation. In: Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security (short paper). Hong Kong (2011)

  37. Sellke, S., Shroff, N.B., Bagchi, S.: Modeling and automated containment of worms. IEEE Trans. Depend. Secur. Comput. 5(2), 71–86 (2008)

    Article  Google Scholar 

  38. Staniford, S., Paxson, V., Weaver, N.: How to own the internet in your spare time. In: Proceedings of the 11th USENIX Security Symposium, pp. 149–165. California (2002)

  39. Stover, S., Dittrich, D., Hernandez, J., Dietrich, S.: Analysis of the Storm and Nugache Trojans: P2P is Here. LOGIN, vol. 32, p. 6 (2007)

  40. Stringhini, G., Holz, T., Stone-Gross, B., Kruegel, C., Vigna, G.: BOT MAGNIFIER: locating spambots on the internet. In: Proceedings of the 20th USENIX Conference on, Security, USA (2011)

  41. Ullmann, J.R.: An algorithm for subgraph isomorphism. J. Assoc. Comput. Mach. 23(1), 31–42 (1976)

    Article  MathSciNet  Google Scholar 

  42. White, B., Lepreau, J., Stoller, L., Ricci, R., Guruprasad, S., Newbold, M., Hibler, M., Barb, C., Joglekar, A.: An integrated experimental environment for distributed systems and networks. In: Proceedings of the Fifth Symposium on Operating Systems Design and Implementation, pp. 255–270. USENIX Association, Boston (2002)

Download references

Acknowledgments

This work was funded by the National Science and Engineering Research Council of Canada (NSERC) through grant STPGP 381091 - 09 to Dr. Ali A. Ghorbani.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julian L. Rrushi.

Appendix

Appendix

1.1 Overview of the Variant tool

The Variant tool is a workable and functional prototype implementation of the research discussed in this paper. The Variant tool produces consistent results each time it is run. Our motivation behind choosing the name Variant for this tool lies in the fact that botnet dynamics targeted by this research vary among network infections. Even the same botnet launched in the same network several times may exhibit different dynamics at each one of those individual launches. That phenomenon is due to the stochastic nature of a botnet infection process that belongs to the category of malware considered in this research. As we wrote earlier in this paper, the Variant tool takes in input Pcap files generated by Tcpdump or any other derivative network sniffing tool.

In the case of a positive botnet interception, the output produced by the Variant tool consists of a textual description of the final form of the weakly connected subgraph formed by the botnet infection process within the data graph until the moment of intervention. Typical forms of that specific output are shown in the bottom half of Fig. 5 and in the bottom half of Fig. 10. In the case of absent formation of weakly connected subgraphs within the data graph, or in the case of lack of isomorphism between the model graph and weakly connected subgraphs within the data graph, the Variant tool simply notifies that no botnets were detected. The Variant tool can function in total independence from any human intervention, and thus can conduct automatic early state botnet detection and containment tasks.

Fig. 5
figure 5

Command line interface of the Variant tool along with a typical botnet search session

Nevertheless, we decided to equip the Variant tool with an interactive interface in order to enable a network security analyst to explore the internals of each processing session. That interactive interface also helped us debug the overall approach as it progressed throughout a processing session. The Variant tool provides a command line interface to the network security analyst. The available commands and their possible respective options are shown in the top half of Fig. 5. The Variant tool is designed to process simultaneously several Pcap files created by network sniffers deployed at various points in a supervised network. The netpackets command sorts entries from those Pcap files and places them in a unified file, which is then processed by other commands.

The logit model that we developed in this research can be solved via the logit command as shown in Figs. 67, and  8. The botnet virulence can be estimated via the virulence command, which accepts three options, namely -i, -v, and -b. Those options allow for estimating the infection rate only, the vulnerability rate only, and both infection rate and vulnerability rate, respectively. A typical session is shown in Fig. 9. The various plots of statistical data that we used in this research to characterize the effectiveness of various components of our approach can be drawn via the plot command. That command is not required for conducting real-world botnet containment tasks. Nevertheless, we developed the functionality behind the command in question as we use the Variant tool also as a research tool.

Fig. 6
figure 6

Estimation of the intercept terms and coefficient terms of the logit model

Fig. 7
figure 7

Probability distribution of the optimal number of pools

Fig. 8
figure 8

Reporting the optimal number of pools

Fig. 9
figure 9

A session of botnet virulence estimation

The propagation command is used to display data on the botnet propagation process in a supervised network. It is accompanied by two options, namely -a and -d. The -a option displays propagation data based on reports generated by bots and gathered from the collector program during an experiment in the Emulab network testbed. That option is only needed when the Variant tool is employed as a research tool. The -d option displays propagation data that characterize botnet propagation dynamics as inferred by the Variant tool. A typical session of displaying propagation data is shown in Fig. 10. A detailed insight into the inferred dynamics can be obtained via the dynamics command. As we wrote earlier in this paper, our overall approach is equipped with an incremental auto-corrective feature.

Fig. 10
figure 10

Botnet propagation characteristics as reported by bots during experimental network infections (top), and as detected from the Variant tool (bottom)

The Variant tool keeps intermediate findings as a table that it stores in a file named IntermediateResults.dat. Each entry in that table represents a specific phase or step within the inference process. The table in question is taken into account in full from the dynamics command, and thus the data displayed by that command are generated on the basis that the findings represented by entries in that table hold. In other words, the data displayed by the dynamics command correspond to the degree of progress made by the inference process, which is denoted by the last entry in the table in question. If we need to display inferred dynamics at the beginning of the inference process, we would have to remove all possible entries from the table in question.

Similarly, if we need to display inferred dynamics at the end of the inference process, we would have to have all table entries in place at the moment of running the dynamics command. The -d option allows for obtaining the infection time of each infected host as labeled by a bot number. A typical session is shown on top of Fig. 11. The -o option allows for viewing the estimated number of offsprings for each infected host along with an estimation of the botnet infectious time and an estimated period of time during which the infected host can cause other infections in the enterprise network. The -o option also allows for viewing the average number of infection attempts that are predicted to originate from each infected host as labeled by a bot number.

Fig. 11
figure 11

A portion of botnet dynamics inferred statistically by the Variant tool

A typical use of the -o option is shown in the middle of Fig. 11. The variability time windows for each infected host can be displayed via the -w option, as shown on the bottom of Fig. 11. The option formed by - and a bot number allows for viewing the estimated probability distribution of the number of offsprings of the infected host labeled with that specific bot number. A typical session involving the use of that option is shown on top of Fig. 12. The -f option displays possible offsprings of each infected host in terms of bot numbers, as shown on the bottom of Fig. 12. Mutually exclusive concurrent edges, i.e. concurrent parent-child relationships between infected hosts, can be viewed via the -c option, as shown in Fig. 13.

Fig. 12
figure 12

Probability distribution of the number of offsprings of an infected host (top), and possible offsprings of each infected host in terms of bot numbers (bottom)

Fig. 13
figure 13

Mutually exclusive concurrent edges in the model graph

As we wrote earlier in this paper, botnet dynamics are inferred by the Matlab code within the Variant tool. The inferred dynamics are written in a file named ModelGraph. dat. The Perl code within the Variant tool reads the content of that file, and hence uses it to generate the model graph. The content in question can be viewed via the -m option, as shown in Fig. 14. Each record of the file ModelGraph.dat is formatted as follows. The first field indicates the infection time of a host labeled by a bot number that is represented by the fifth field. The third field indicates a bot number that corresponds to the host that initiated the infection. The presence of an X instead of a bot number denotes that the host that initiated the infection is not located within the perimeter of the enterprise network.

Fig. 14
figure 14

File content that serves as basis for building the model graph

The sixth and the seventh fields indicate the boundaries of the variability time window that applies to the infected host. The last field indicates the estimated number of offsprings that the infected host is predicted to have after exhaustion of the vulnerable host population. The subgraph isomorphism search is conducted via the -s option, as shown on the bottom of Fig. 5. In conclusion, the network security analyst can issue the storedb command to store in a MySQL database all of the data discussed in this chapter. Those data can then be visualized via a web application, which lies outside the scope of this paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rrushi, J.L., Ghorbani, A.A. A mathematical exploitation of simulated uniform scanning botnet propagation dynamics for early stage detection and management. J Comput Virol Hack Tech 10, 29–51 (2014). https://doi.org/10.1007/s11416-013-0190-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-013-0190-7

Keywords

Navigation