Skip to main content

Abstract

Verdicts assignment is a recurring problem in malware research and it involves deciding if a given program is clean or infected (if it contains malicious logic). Since the general problem of identifying malicious logic is undecidable, a certain amount of manual analysis is required. As the collections of both clean and malicious samples are continuously increasing, we would like to reduce the manual work to a minimum, by using information extracted by automated analysis systems and the similarity between some programs in the collection.

Based on the assumption that similar programs are likely to share the same verdict, we have designed a system that selects a subset from a given collection of program samples for manual analysis. The selected subset should be as small as possible, given the constraint that the other verdicts must be inferable from the manually-assigned ones. The system was tested on a collection of more than 200000 clusters built using the single linkage approach on a collection of over 20 million samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abou-Assaleh, T., Cercone, N., Kešelj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Proceedings of the 28th Annual International Computer Software and Applications Conference, COMPSAC 2004, vol. 2, pp. 41–42. IEEE (2004)

    Google Scholar 

  2. Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: NDSS, vol. 9, pp. 8–11. Citeseer (2009)

    Google Scholar 

  3. Bilar, D.: Opcodes as predictor for malware. Int. J. Electr. Secur. Digit. Forensics 1(2), 156–168 (2007)

    Article  Google Scholar 

  4. Bishop, M.: Computer Security: Art and Science. Addison-Wesley, Reading (2002)

    Google Scholar 

  5. Cohen, F.: Computational aspects of computer viruses. Comput. Secur. 8(4), 297–298 (1989)

    Article  Google Scholar 

  6. Colesa, A.: Fast creation of short-living virtual machines using copy-on-write ram-disks. In: 2014 IEEE International Conference on Automation, Quality and Testing, Robotics, pp. 1–6. IEEE (2014)

    Google Scholar 

  7. Feige, U.: A threshold of ln n for approximating set cover. J. ACM (JACM) 45(4), 634–652 (1998)

    Article  MATH  Google Scholar 

  8. Hedetniemi, S.T., Laskar, R.C.: Bibliography on domination in graphs and some basic definitions of domination parameters. Discrete Math. 86(1), 257–277 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  9. Johnson, D.S.: Approximation algorithms for combinatorial problems. In: Proceedings of the Fifth Annual ACM Symposium on Theory of Computing, pp. 38–49. ACM (1973)

    Google Scholar 

  10. Kann, V.: On the approximability of NP-complete optimization problems. Ph.d. thesis, Royal Institute of Technology Stockholm (1992)

    Google Scholar 

  11. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, California, USA, pp. 281–297 (1967)

    Google Scholar 

  12. Oprisa, C., Cabau, G., Colesa, A.: From plagiarism to malware detection. In: 2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 227–234. IEEE (2013)

    Google Scholar 

  13. Oprisa, C., Checiches, M., Nandrean, A.: Locality-sensitive hashing optimizations for fast malware clustering. In: 2014 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), pp. 97–104. IEEE (2014)

    Google Scholar 

  14. van Rooij, J.M., Nederlof, J., van Dijk, T.C.: Inclusion/exclusion meets measure and conquer. In: Algorithms-ESA 2009, pp. 554–565. Springer (2009)

    Google Scholar 

  15. Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., Elovici, Y.: Detecting unknown malicious code by applying classification techniques on opcode patterns. Secur. Inf. 1(1), 1–22 (2012)

    Article  Google Scholar 

  16. Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16(1), 30–34 (1973)

    Article  MathSciNet  Google Scholar 

  17. Turing, A.M.: On computable numbers, with an application to the entscheidungsproblem. J. Math. 58(345–363), 5 (1936)

    MATH  Google Scholar 

  18. Vatamanu, C., Gavriluţ, D., Benchea, R.: A practical approach on clustering malicious pdf documents. J. Comput. Virol. 8(4), 151–163 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ciprian Oprişa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Oprişa, C., Cabău, G., Sebestyen Pal, G. (2018). Multi-centroid Cluster Analysis in Malware Research. In: Tantar, AA., Tantar, E., Emmerich, M., Legrand, P., Alboaie, L., Luchian, H. (eds) EVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation VI. Advances in Intelligent Systems and Computing, vol 674. Springer, Cham. https://doi.org/10.1007/978-3-319-69710-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69710-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69708-6

  • Online ISBN: 978-3-319-69710-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics