Correlating High- and Low-Level Features:

Banin, Sergii; Dyrkolbotn, Geir Olav

doi:10.1007/978-3-030-26834-3_9

Sergii Banin¹⁰ &
Geir Olav Dyrkolbotn¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11689))

Included in the following conference series:

International Workshop on Security

710 Accesses
4 Citations

Abstract

Malware brings constant threats to the services and facilities used by modern society. In order to perform and improve anti-malware defense, there is a need for methods that are capable of malware categorization. As malware grouped into categories according to its functionality, dynamic malware analysis is a reliable source of features that are useful for malware classification. Different types of dynamic features are described in literature [5, 6, 13]. These features can be divided into two main groups: high-level features (API calls, File activity, Network activity, etc.) and low-level features (memory access patterns, high-performance counters, etc). Low-level features bring special interest for malware analysts: regardless of the anti-detection mechanisms used by malware, it is impossible to avoid execution on hardware. As hardware-based security solutions are constantly developed by hardware manufacturers and prototyped by researchers, research on low-level features used for malware analysis is a promising topic. The biggest problem with low-level features is that they don’t bring much information to a human analyst. In this paper, we analyze potential correlation between the low- and high-level features used for malware classification. In particular, we analyze n-grams of memory access operations found in [6] and try to find their relationship with n-grams of API calls. We also compare performance of API calls and memory access n-grams on the same dataset as used in [6]. In the end, we analyze their combined performance for malware classification and explain findings in the correlation between high- and low-level features.

The research leading to these results has received funding from the Center for Cyber and Information Security, under budget allocation from the Ministry of Justice and Public Security.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Types of malware. https://usa.kaspersky.com/resource-center/threats/types-of-malware. Accessed 17 Mar 2019
Virusshare.com. https://virusshare.com/. Accessed 12 Mar 2019
Weka: Data mining software in Java (2019). http://www.cs.waikato.ac.nz/ml/weka/. Accessed 12 Mar 2019
Alazab, M., Layton, R., Venkataraman, S., Watters, P.: Malware detection based on structural and behavioural features of api calls (2010)
Google Scholar
Bahador, M.B., Abadi, M., Tajoddin, A.: HPCMalHunter: behavioral malware detection using hardware performance counters and singular value decomposition. In: 2014 4th International eConference on Computer and Knowledge Engineering (ICCKE), pp. 703–708. IEEE (2014). https://doi.org/10.1109/iccke.2014.6993402
Banin, S., Dyrkolbotn, G.O.: Multinomial malware classification via low-level features. Digit. Invest. 26, S107–S117 (2018). https://doi.org/10.1016/j.diin.2018.04.019
Article Google Scholar
Banin, S., Shalaginov, A., Franke, K.: Memory access patterns for malware detection. (NISK) 96–107 (2016). Norsk informasjonssikkerhetskonferanse
Google Scholar
Cole, E.: Advanced Persistent Threat: Understanding the Danger and How to Protect Your Organization. Newnes, Amsterdam (2012)
Google Scholar
Hoglund, G.: What APT Means To Your Enterprise (2011). https://pdfs.semanticscholar.org/d0a0/47c6b19fc3645973f8f300b507886b54196a.pdf
Group, T.R.: Testimon research group (2017). https://testimon.ccis.no/
Hall, M.A.: Correlation-based feature subset selection for machine learning. Ph.D. thesis, University of Waikato, Hamilton, New Zealand (1998)
Google Scholar
IntelPin: A dynamic binary instrumentation tool (2019)
Google Scholar
Islam, R., Tian, R., Batten, L.M., Versteeg, S.: Classification of malware based on integrated static and dynamic features. J. Netw. Comput. Appl. 36(2), 646–656 (2013). https://doi.org/10.1016/j.jnca.2012.10.004
Article Google Scholar
Khasawneh, K.N., Ozsoy, M., Donovick, C., Abu-Ghazaleh, N., Ponomarev, D.: Ensemble learning for low-level hardware-supported malware detection. In: Bos, H., Monrose, F., Blanc, G. (eds.) RAID 2015. LNCS, vol. 9404, pp. 3–25. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26362-5_1
Chapter Google Scholar
Kononenko, I., Kukar, M.: Machine Learning and Data Mining: Introduction to Principles and Algorithms. Horwood Publishing, Cambridge (2007)
Book Google Scholar
Lim, H.I.: Detecting malicious behaviors of software through analysis of api sequence k-grams i (2016). https://doi.org/10.13189/csit.2016.040301
Article Google Scholar
Ozsoy, M., Donovick, C., Gorelik, I., Abu-Ghazaleh, N., Ponomarev, D.: Malware-aware processors: a framework for efficient online malware detection. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 651–661. IEEE (2015). https://doi.org/10.1109/hpca.2015.7056070
Ozsoy, M., Khasawneh, K.N., Donovick, C., Gorelik, I., Abu-Ghazaleh, N., Ponomarev, D.: Hardware-based malware detection using low-level architectural features. IEEE Trans. Comput. 65(11), 3332–3344 (2016). https://doi.org/10.1109/tc.2016.2540634
Article MathSciNet MATH Google Scholar
Reuters: Ukraine’s power outage was a cyber attack: Ukrenergo (2017). https://www.reuters.com/article/us-ukraine-cyber-attack-energy/ukraines-power-outage-was-a-cyber-attack-ukrenergo-idUSKBN1521BA
Shalaginov, A., Grini, L.S., Franke, K.: Understanding neuro-fuzzy on a class of multinomial malware detection problems. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 684–691. IEEE (2016). https://doi.org/10.1109/ijcnn.2016.7727266
Shijo, P., Salim, A.: Integrated static and dynamic analysis for malware detection. Procedia Comput. Sci. 46, 804–811 (2015). https://doi.org/10.1016/j.procs.2015.02.149
Article Google Scholar
Tang, A., Sethumadhavan, S., Stolfo, S.J.: Unsupervised anomaly-based malware detection using hardware features. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 109–129. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11379-1_6
Chapter Google Scholar
The Verge: the petya ransomware is starting to look like a cyberattack in disguise (2017). https://www.theverge.com/2017/6/28/15888632/petya-goldeneye-ransomware-cyberattack-ukraine-russia

Download references

Author information

Authors and Affiliations

Department of Information Security and Communication Technology, NTNU, Gjøvik, Norway
Sergii Banin & Geir Olav Dyrkolbotn

Authors

Sergii Banin
View author publications
You can also search for this author in PubMed Google Scholar
Geir Olav Dyrkolbotn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergii Banin .

Editor information

Editors and Affiliations

National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
Nuttapong Attrapadung
NTT Security (Japan) KK, Tokyo, Japan
Takeshi Yagi

Appendix A. Raw Data Sample

In this Appendix we present a sample of a raw data gather during our experiments. We also explain each field included in the data.

1.
Opcode id: each opcode is given a unique identifier. If this opcode is executed again (e.g. in a loop), it will receive the same id.
2.
Module name: a name of a module where current instruction is executed, It can be a name of a library or a name of an executable itself.
3.
Section name: a name of a section in executable file or library where current instruction is executed. Often it will be .text or CODE, however it some cases (especially with malware) a name of an executable section can be different from standard.
4.
Current function name: if a function name of a current instruction can be found we record it to understand which function performed a certain part of logic.
5.
Opcode: text representation of an assembly instruction together with arguments but without arguments values.
6.
Type of module: whether an instruction is executed from the main module of executable under analysis or from the external library.
7.
Memory operations: memory operations performed by an instruction. Only read or write without addresses and values.
8.
Name of a function being called: if a current instruction is call - a name of a function is being stored.

A real example of raw data is present in the Listing 2. The first line represents header: names of fields are in the same order as in the list above.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Banin, S., Dyrkolbotn, G.O. (2019). Correlating High- and Low-Level Features:. In: Attrapadung, N., Yagi, T. (eds) Advances in Information and Computer Security. IWSEC 2019. Lecture Notes in Computer Science(), vol 11689. Springer, Cham. https://doi.org/10.1007/978-3-030-26834-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-26834-3_9
Published: 24 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26833-6
Online ISBN: 978-3-030-26834-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Correlating High- and Low-Level Features:

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A. Raw Data Sample

Appendix A. Raw Data Sample

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation