Abstract
Malware constitutes a long-term challenge to the operation of contemporary information technology systems. A tremendous amount of realistic and current training data is necessary in order to train digital forensic professionals on the use of forensic tools and to update their skills. Unfortunately, very limited training data images are available, especially images of recent malware, for reasons such as privacy, competitive advantage, intellectual property rights and secrecy. A promising solution is to provide recent, realistic corpora produced by dataset synthesis frameworks. However, none of the publicly-available frameworks currently enables the creation of realistic malware traces in a customizable manner, where the synthesis of relevant traces can be configured to meet individual needs.
This chapter presents a concept, implementation and validation of a synthesis framework that generates malware traces for Windows operating systems. The framework is able to generate coherent malware traces at three levels, random-access memory level, network level and hard drive level. A typical malware infection with data exfiltration is demonstrated as a proof of concept.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Abt and H. Baier, Are we missing labels? A study of the availability of ground truth in network security research, Proceedings of the Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, pp. 40–55, 2014.
I. Baggili and F. Breitinger, Data sources for advancing cyber forensics: What the social world has to offer, Proceedings of the AAAI Spring Symposia – Sociotechnical Behavior Mining: From Data to Decisions? pp. 6–9, 2015.
Biometrics and Information Security Group (dasec), hystck-malware-module, GitHub (github.com/dasec/hystck-malware-module), 2022.
D. Brauckhoff, A. Wagner and M. May, FLAME: A flow-level anomaly modeling engine, Proceedings of the Conference on Cyber Security Experimentation and Test, article no. 1, 2008.
B. Carrier, Open Source Digital Forensic Tools: The Legal Argument, @stake, Cambridge, Massachusetts, 2002.
B. Carrier, Digital Forensics Tool Testing Images (www.dftt.sourceforge.net), 2010.
R. Cole, A. Moore, G. Stark and B. Stancill, STOMP 2 DIS: Brilliance in the (visual) basics, Mandiant, Reston, Virginia (www.mandiant.com/resources/stomp-2-dis-brilliance-in-the-visual-basics), February 5, 2020.
C. Cordero, E. Vasilomanolakis, N. Milanov, C. Koch, D. Hausheer and M. Muhlhauser, ID2T: A DIY dataset creation toolkit for intrusion detection systems, Proceedings of the IEEE Conference on Communications and Network Security, pp. 739–740, 2015.
Digital Corpora, Home (www.digitalcorpora.org), 2021.
X. Du, C. Hargreaves, J. Sheppard and M. Scanlon, TraceGen: User activity emulation for digital forensic test image generation, Digital Investigation, vol. 38(S), article no. 301133, 2021.
S. Garfinkel, P. Farrell, V. Roussev and G. Dinolt, Bringing science to digital forensics with standardized forensic corpora, Digital Investigation, vol. 6(S), pp. S2–S11, 2009.
T. Göbel, T. Schäfer, J. Hachenberger, J. Türr and H. Baier, A novel approach for generating synthetic datasets for digital forensics, in Advances in Digital Forensics XVI, G. Peterson and S. Shenoi (Eds.), Springer, Cham, Switzerland, pp. 73–93, 2020.
C. Grajeda, F. Breitinger and I. Baggili, Availability of datasets for digital forensics – And what is missing, Digital Investigation, vol. 22(S), pp. S94–S105, 2017.
A. Hadi, Digital Forensic Challenge Images (Datasets), Champlain College, Burlington, Vermont (www.ashemery.com/dfir.html), 2011.
N. Harbour, Flare-On 7 challenge solutions, Mandiant, Reston, Virginia (www.mandiant.com/resources/flare-7-challenge-solutions), October 23, 2020.
S. Hegt, Evil Clippy: MS Office maldoc assistant, Outflank Blog, Amsterdam, The Netherlands (www.outflank.nl/blog/2019/05/05/evil-clippy-ms-office-maldoc-assistant), May 5, 2019.
J. Huang, A. Yasinsac and P. Hayes, Knowledge sharing and reuse in digital forensics, Proceedings of the Fifth IEEE International Workshop on Systematic Approaches to Digital Forensic Engineering, pp. 73–78, 2010.
D. Lillis, B. Becker, T. O’Sullivan and M. Scanlon, Current challenges and future research areas for digital forensic investigations, Proceedings of the Eleventh Annual Conference on Digital Forensics, Security and Law, 2016.
J. Liu, Ten-year synthesis review: A baccalaureate program in computer forensics, Proceedings of the Seventeenth Annual Conference on Information Technology Education and the Fifth Annual Conference on Research in Information Technology, pp. 121–126, 2016.
M. McMahon and Contributors, What is pywinauto? (pywinauto.readthedocs.io/en/latest), 2018.
MITRE Corporation, Caldera, GitHub (github.com/mitre/caldera), 2021.
C. Moch and F. Freiling, The Forensic Image Generator Generator (Forensig\(^2\)), Proceedings of the Fifth International Conference on IT Security Incident Management and IT Forensics, pp. 78–93, 2009.
C. Moch and F. Freiling, Evaluating the Forensic Image Generator Generator, Proceedings of the International Conference on Digital Forensics and Cyber Crime, pp. 238–252, 2011.
P. Mockapetris, Domain Names – Implementation and Specification, RFC 1035, 1987.
monnappa22, HollowFind, GitHub (github.com/monnappa22/HollowFind), 2016.
National Institute of Standards and Technology, The CFReDS Project, Gaithersburg, Maryland (www.cfreds.nist.gov), 2019.
Outflank, Evil Clippy, GitHub (github.com/outflanknl/EvilClippy), 2021.
Quarkslab, LIEF Project, GitHub (github.com/lief-project/LIEF), 2022.
M. Scanlon, X. Du and D. Lillis, EviPlant: An efficient digital forensics challenge creation, manipulation and distribution solution, Digital Investigation, vol. 20(S), pp. S29–S36, 2017.
Statista, Operating systems most affected by malware as of 1st quarter 2020, New York (www.statista.com/statistics/680943/malware-os-distribution), April 11, 2022.
The Honeynet Project, Challenges (www.honeynet.org/challenges), 2022.
H. Visti, ForGe, Forensic Test Image Generator, GitHub (github.com/hannuvisti/forge), 2015.
H. Visti, S. Tohill and P. Douglas, Automatic creation of computer forensic test images, in Computational Forensics, U. Garain and F. Shafait (Eds.), Springer, Cham, Switzerland, pp. 163–175, 2015.
K. Woods, C. Lee, S. Garfinkel, D. Dittrich, A. Russell and K. Kearton, Creating realistic corpora for security and forensic education, Proceedings of the Sixth Annual Conference on Digital Forensics, Security and Law, 2011.
Y. Yannikos, L. Graner, M. Steinebach and C. Winter, Data corpora for digital forensics education and research, in Advances in Digital Forensics X, G. Peterson and S. Shenoi (Eds.), Springer, Berlin Heidelberg, Germany, pp. 309–325, 2014.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Lukner, M., Göbel, T., Baier, H. (2022). Realistic and Configurable Synthesis of Malware Traces in Windows Systems. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics XVIII. DigitalForensics 2022. IFIP Advances in Information and Communication Technology, vol 653. Springer, Cham. https://doi.org/10.1007/978-3-031-10078-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-10078-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10077-2
Online ISBN: 978-3-031-10078-9
eBook Packages: Computer ScienceComputer Science (R0)