Multi-centroid Cluster Analysis in Malware Research

Oprişa, Ciprian; Cabău, George; Sebestyen Pal, Gheorghe

doi:10.1007/978-3-319-69710-9_7

Ciprian Oprişa^20,21,
George Cabău^20,21 &
Gheorghe Sebestyen Pal²¹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 674))

382 Accesses
2 Citations

Abstract

Verdicts assignment is a recurring problem in malware research and it involves deciding if a given program is clean or infected (if it contains malicious logic). Since the general problem of identifying malicious logic is undecidable, a certain amount of manual analysis is required. As the collections of both clean and malicious samples are continuously increasing, we would like to reduce the manual work to a minimum, by using information extracted by automated analysis systems and the similarity between some programs in the collection.

Based on the assumption that similar programs are likely to share the same verdict, we have designed a system that selects a subset from a given collection of program samples for manual analysis. The selected subset should be as small as possible, given the constraint that the other verdicts must be inferable from the manually-assigned ones. The system was tested on a collection of more than 200000 clusters built using the single linkage approach on a collection of over 20 million samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Clustering for malware classification

Article 27 January 2016

Clustering versus SVM for malware detection

Article 03 October 2015

AVclass: A Tool for Massive Malware Labeling

References

Abou-Assaleh, T., Cercone, N., Kešelj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Proceedings of the 28th Annual International Computer Software and Applications Conference, COMPSAC 2004, vol. 2, pp. 41–42. IEEE (2004)
Google Scholar
Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: NDSS, vol. 9, pp. 8–11. Citeseer (2009)
Google Scholar
Bilar, D.: Opcodes as predictor for malware. Int. J. Electr. Secur. Digit. Forensics 1(2), 156–168 (2007)
Article Google Scholar
Bishop, M.: Computer Security: Art and Science. Addison-Wesley, Reading (2002)
Google Scholar
Cohen, F.: Computational aspects of computer viruses. Comput. Secur. 8(4), 297–298 (1989)
Article Google Scholar
Colesa, A.: Fast creation of short-living virtual machines using copy-on-write ram-disks. In: 2014 IEEE International Conference on Automation, Quality and Testing, Robotics, pp. 1–6. IEEE (2014)
Google Scholar
Feige, U.: A threshold of ln n for approximating set cover. J. ACM (JACM) 45(4), 634–652 (1998)
Article MATH Google Scholar
Hedetniemi, S.T., Laskar, R.C.: Bibliography on domination in graphs and some basic definitions of domination parameters. Discrete Math. 86(1), 257–277 (1990)
Article MATH MathSciNet Google Scholar
Johnson, D.S.: Approximation algorithms for combinatorial problems. In: Proceedings of the Fifth Annual ACM Symposium on Theory of Computing, pp. 38–49. ACM (1973)
Google Scholar
Kann, V.: On the approximability of NP-complete optimization problems. Ph.d. thesis, Royal Institute of Technology Stockholm (1992)
Google Scholar
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, California, USA, pp. 281–297 (1967)
Google Scholar
Oprisa, C., Cabau, G., Colesa, A.: From plagiarism to malware detection. In: 2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 227–234. IEEE (2013)
Google Scholar
Oprisa, C., Checiches, M., Nandrean, A.: Locality-sensitive hashing optimizations for fast malware clustering. In: 2014 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), pp. 97–104. IEEE (2014)
Google Scholar
van Rooij, J.M., Nederlof, J., van Dijk, T.C.: Inclusion/exclusion meets measure and conquer. In: Algorithms-ESA 2009, pp. 554–565. Springer (2009)
Google Scholar
Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., Elovici, Y.: Detecting unknown malicious code by applying classification techniques on opcode patterns. Secur. Inf. 1(1), 1–22 (2012)
Article Google Scholar
Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16(1), 30–34 (1973)
Article MathSciNet Google Scholar
Turing, A.M.: On computable numbers, with an application to the entscheidungsproblem. J. Math. 58(345–363), 5 (1936)
MATH Google Scholar
Vatamanu, C., Gavriluţ, D., Benchea, R.: A practical approach on clustering malicious pdf documents. J. Comput. Virol. 8(4), 151–163 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bitdefender, 1, Cuza Vodă Street, City Business Center, 400107, Cluj-Napoca, Romania
Ciprian Oprişa & George Cabău
Technical University of Cluj-Napoca, 28, Gh. Bariţiu Street, Room M01A, 400027, Cluj-Napoca, Romania
Ciprian Oprişa, George Cabău & Gheorghe Sebestyen Pal

Authors

Ciprian Oprişa
View author publications
You can also search for this author in PubMed Google Scholar
George Cabău
View author publications
You can also search for this author in PubMed Google Scholar
Gheorghe Sebestyen Pal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ciprian Oprişa .

Editor information

Editors and Affiliations

Computer Science and Communications Research Unit, University of Luxembourg E009 (CSC) Kirchberg Campus, Luxembourg, Luxembourg
Alexandru-Adrian Tantar
Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Luxembourg, Luxembourg
Emilia Tantar
Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
Michael Emmerich
Bâtiment Leyteire, URF Sciences et Modelisation, Université Bordeaux, Bordeaux, France
Pierrick Legrand
Faculty of Computer Science, Alexandru Ioan Cuza University of Iași, Iași, Romania
Lenuta Alboaie
Faculty of Computer Science, Alexandru Ioan Cuza University, Iasi, Romania
Henri Luchian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oprişa, C., Cabău, G., Sebestyen Pal, G. (2018). Multi-centroid Cluster Analysis in Malware Research. In: Tantar, AA., Tantar, E., Emmerich, M., Legrand, P., Alboaie, L., Luchian, H. (eds) EVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation VI. Advances in Intelligent Systems and Computing, vol 674. Springer, Cham. https://doi.org/10.1007/978-3-319-69710-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-69710-9_7
Published: 11 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69708-6
Online ISBN: 978-3-319-69710-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Multi-centroid Cluster Analysis in Malware Research

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Clustering for malware classification

Clustering versus SVM for malware detection

AVclass: A Tool for Massive Malware Labeling

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multi-centroid Cluster Analysis in Malware Research

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Clustering for malware classification

Clustering versus SVM for malware detection

AVclass: A Tool for Massive Malware Labeling

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation