Skip to main content

Correlating High- and Low-Level Features:

Increased Understanding of Malware Classification

  • Conference paper
  • First Online:
Advances in Information and Computer Security (IWSEC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11689))

Included in the following conference series:

Abstract

Malware brings constant threats to the services and facilities used by modern society. In order to perform and improve anti-malware defense, there is a need for methods that are capable of malware categorization. As malware grouped into categories according to its functionality, dynamic malware analysis is a reliable source of features that are useful for malware classification. Different types of dynamic features are described in literature [5, 6, 13]. These features can be divided into two main groups: high-level features (API calls, File activity, Network activity, etc.) and low-level features (memory access patterns, high-performance counters, etc). Low-level features bring special interest for malware analysts: regardless of the anti-detection mechanisms used by malware, it is impossible to avoid execution on hardware. As hardware-based security solutions are constantly developed by hardware manufacturers and prototyped by researchers, research on low-level features used for malware analysis is a promising topic. The biggest problem with low-level features is that they don’t bring much information to a human analyst. In this paper, we analyze potential correlation between the low- and high-level features used for malware classification. In particular, we analyze n-grams of memory access operations found in [6] and try to find their relationship with n-grams of API calls. We also compare performance of API calls and memory access n-grams on the same dataset as used in [6]. In the end, we analyze their combined performance for malware classification and explain findings in the correlation between high- and low-level features.

The research leading to these results has received funding from the Center for Cyber and Information Security, under budget allocation from the Ministry of Justice and Public Security.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Types of malware. https://usa.kaspersky.com/resource-center/threats/types-of-malware. Accessed 17 Mar 2019

  2. Virusshare.com. https://virusshare.com/. Accessed 12 Mar 2019

  3. Weka: Data mining software in Java (2019). http://www.cs.waikato.ac.nz/ml/weka/. Accessed 12 Mar 2019

  4. Alazab, M., Layton, R., Venkataraman, S., Watters, P.: Malware detection based on structural and behavioural features of api calls (2010)

    Google Scholar 

  5. Bahador, M.B., Abadi, M., Tajoddin, A.: HPCMalHunter: behavioral malware detection using hardware performance counters and singular value decomposition. In: 2014 4th International eConference on Computer and Knowledge Engineering (ICCKE), pp. 703–708. IEEE (2014). https://doi.org/10.1109/iccke.2014.6993402

  6. Banin, S., Dyrkolbotn, G.O.: Multinomial malware classification via low-level features. Digit. Invest. 26, S107–S117 (2018). https://doi.org/10.1016/j.diin.2018.04.019

    Article  Google Scholar 

  7. Banin, S., Shalaginov, A., Franke, K.: Memory access patterns for malware detection. (NISK) 96–107 (2016). Norsk informasjonssikkerhetskonferanse

    Google Scholar 

  8. Cole, E.: Advanced Persistent Threat: Understanding the Danger and How to Protect Your Organization. Newnes, Amsterdam (2012)

    Google Scholar 

  9. Hoglund, G.: What APT Means To Your Enterprise (2011). https://pdfs.semanticscholar.org/d0a0/47c6b19fc3645973f8f300b507886b54196a.pdf

  10. Group, T.R.: Testimon research group (2017). https://testimon.ccis.no/

  11. Hall, M.A.: Correlation-based feature subset selection for machine learning. Ph.D. thesis, University of Waikato, Hamilton, New Zealand (1998)

    Google Scholar 

  12. IntelPin: A dynamic binary instrumentation tool (2019)

    Google Scholar 

  13. Islam, R., Tian, R., Batten, L.M., Versteeg, S.: Classification of malware based on integrated static and dynamic features. J. Netw. Comput. Appl. 36(2), 646–656 (2013). https://doi.org/10.1016/j.jnca.2012.10.004

    Article  Google Scholar 

  14. Khasawneh, K.N., Ozsoy, M., Donovick, C., Abu-Ghazaleh, N., Ponomarev, D.: Ensemble learning for low-level hardware-supported malware detection. In: Bos, H., Monrose, F., Blanc, G. (eds.) RAID 2015. LNCS, vol. 9404, pp. 3–25. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26362-5_1

    Chapter  Google Scholar 

  15. Kononenko, I., Kukar, M.: Machine Learning and Data Mining: Introduction to Principles and Algorithms. Horwood Publishing, Cambridge (2007)

    Book  Google Scholar 

  16. Lim, H.I.: Detecting malicious behaviors of software through analysis of api sequence k-grams i (2016). https://doi.org/10.13189/csit.2016.040301

    Article  Google Scholar 

  17. Ozsoy, M., Donovick, C., Gorelik, I., Abu-Ghazaleh, N., Ponomarev, D.: Malware-aware processors: a framework for efficient online malware detection. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 651–661. IEEE (2015). https://doi.org/10.1109/hpca.2015.7056070

  18. Ozsoy, M., Khasawneh, K.N., Donovick, C., Gorelik, I., Abu-Ghazaleh, N., Ponomarev, D.: Hardware-based malware detection using low-level architectural features. IEEE Trans. Comput. 65(11), 3332–3344 (2016). https://doi.org/10.1109/tc.2016.2540634

    Article  MathSciNet  MATH  Google Scholar 

  19. Reuters: Ukraine’s power outage was a cyber attack: Ukrenergo (2017). https://www.reuters.com/article/us-ukraine-cyber-attack-energy/ukraines-power-outage-was-a-cyber-attack-ukrenergo-idUSKBN1521BA

  20. Shalaginov, A., Grini, L.S., Franke, K.: Understanding neuro-fuzzy on a class of multinomial malware detection problems. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 684–691. IEEE (2016). https://doi.org/10.1109/ijcnn.2016.7727266

  21. Shijo, P., Salim, A.: Integrated static and dynamic analysis for malware detection. Procedia Comput. Sci. 46, 804–811 (2015). https://doi.org/10.1016/j.procs.2015.02.149

    Article  Google Scholar 

  22. Tang, A., Sethumadhavan, S., Stolfo, S.J.: Unsupervised anomaly-based malware detection using hardware features. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 109–129. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11379-1_6

    Chapter  Google Scholar 

  23. The Verge: the petya ransomware is starting to look like a cyberattack in disguise (2017). https://www.theverge.com/2017/6/28/15888632/petya-goldeneye-ransomware-cyberattack-ukraine-russia

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergii Banin .

Editor information

Editors and Affiliations

Appendix A. Raw Data Sample

Appendix A. Raw Data Sample

In this Appendix we present a sample of a raw data gather during our experiments. We also explain each field included in the data.

  1. 1.

    Opcode id: each opcode is given a unique identifier. If this opcode is executed again (e.g. in a loop), it will receive the same id.

  2. 2.

    Module name: a name of a module where current instruction is executed, It can be a name of a library or a name of an executable itself.

  3. 3.

    Section name: a name of a section in executable file or library where current instruction is executed. Often it will be .text or CODE, however it some cases (especially with malware) a name of an executable section can be different from standard.

  4. 4.

    Current function name: if a function name of a current instruction can be found we record it to understand which function performed a certain part of logic.

  5. 5.

    Opcode: text representation of an assembly instruction together with arguments but without arguments values.

  6. 6.

    Type of module: whether an instruction is executed from the main module of executable under analysis or from the external library.

  7. 7.

    Memory operations: memory operations performed by an instruction. Only read or write without addresses and values.

  8. 8.

    Name of a function being called: if a current instruction is call - a name of a function is being stored.

A real example of raw data is present in the Listing 2. The first line represents header: names of fields are in the same order as in the list above.

figure b

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Banin, S., Dyrkolbotn, G.O. (2019). Correlating High- and Low-Level Features:. In: Attrapadung, N., Yagi, T. (eds) Advances in Information and Computer Security. IWSEC 2019. Lecture Notes in Computer Science(), vol 11689. Springer, Cham. https://doi.org/10.1007/978-3-030-26834-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26834-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26833-6

  • Online ISBN: 978-3-030-26834-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics