ABSTRACT
In this paper, we discuss the role that machine learning can play in computer forensics. We begin our analysis by considering the role that machine learning has gained in computer security applications, with the aim of aiding the computer forensics community in learning the lessons from the experience of the computer security community. Afterwards, we propose a brief literature review, with the purpose of illustrating the areas of computer forensics where machine learning techniques have been used until now. Then, we remark the technical requirements that should be meet by tools for computer security and computer forensics applications, with the goal of illustrating in which way machine learning algorithms can be of any practical help. We intend this paper to foster applications of machine learning in computer forensics, and we hope that the ideas in this paper may represent promising directions to pursue in the quest for more efficient and effective computer forensics tools.
- E. Anaya, M. Nakano-Miyatake, and H. Perez Meana. Network forensics with neurofuzzy techniques. In Circuits and Systems, 2009. MWSCAS '09. 52nd IEEE International Midwest Symposium on, pages 848--852, August 2009.Google ScholarCross Ref
- D. Ariu, R. Tronci, and G. Giacinto. HMMpayl: An Intrusion Detection System Based On Hidden Markov Models. Computers & Security, 30(4):221 -- 241, 2011.Google ScholarDigital Library
- M. Barreno, P. L. Bartlett, F. J. Chi, A. D. Joseph, B. Nelson, B. I. P. Rubinstein, U. Saini, and J. D. Tygar. Open problems in the security of learning. In D. Balfanz and J. Staddon, editors, AISec, pages 19--26. ACM, 2008. Google ScholarDigital Library
- M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine learning be secure? In F.-C. Lin, D.-T. Lee, B.-S. P. Lin, S. Shieh, and S. Jajodia, editors, ASIACCS, pages 16--25. ACM, 2006. Google ScholarDigital Library
- N. Beebe. Digital forensic research: The good, the bad and the unaddressed. In G. Peterson and S. Shenoi, editors, Advances in Digital Forensics V, volume 306 of IFIP Advances in Information and Communication Technology, pages 17--36. Springer Boston, 2009.Google Scholar
- D. E. Bell and L. J. LaPadula. Secure computer systems: Mathematical foundations and model. Technical Report M74244 1, MITRE Corporation Bedford MA, May 1973.Google Scholar
- K. J. Biba. Integrity considerations for secure computer systems. Technical report a423930, MITRE Corporation Bedford MA, April 1977.Google Scholar
- A. Case, A. Cristina, L. Marziale, G. G. Richard, and V. Roussev. Face: Automated digital evidence discovery and correlation. Digital Investigation, 5(Supplement 1):S65 -- S75, 2008. The Proceedings of the Eighth Annual DFRWS Conference. Google ScholarDigital Library
- N. Cheng, R. Chandramouli, and K. Subbalakshmi. Author gender identification from text. Digital Investigation, 8(1):78 -- 88, 2011. Google ScholarDigital Library
- O. de Vel. File classification using byte sub-stream kernels. Digital Investigation, 1(2):150 -- 157, 2004. Google ScholarDigital Library
- O. de Vel, A. Anderson, M. Corney, and G. Mohay. Mining e-mail content for author identification forensics. ACM SIGMOD Record, 30:55--64, December 2001. Google ScholarDigital Library
- D. Denning. An intrusion-detection model. Software Engineering, IEEE Transactions on, SE-13(2):222 -- 232, February 1987. Google ScholarDigital Library
- FBI. RCFL Program Annual Report for Fiscal Year 2010.Google Scholar
- B. Fei, J. Eloff, H. Venter, and M. Olivier. Exploring forensic data with self-organizing maps. In M. Pollitt and S. Shenoi, editors, Advances in Digital Forensics, volume 194 of IFIP International Federation for Information Processing, pages 113--123. Springer Boston, 2005.Google Scholar
- S. Garfinkel, P. Farrell, V. Roussev, and G. Dinolt. Bringing science to digital forensics with standardized forensic corpora. Digital Investigation, 6:S2 -- S11, 2009. Google ScholarDigital Library
- P. Giura and N. Memon. Netstore: An efficient storage infrastructure for network forensics and monitoring. In S. Jha, R. Sommer, and C. Kreibich, editors, RAID, volume 6307 of Lecture Notes in Computer Science, pages 277--296. Springer, 2010. Google ScholarDigital Library
- F. Iqbal, H. Binsalleeh, B. C. Fung, and M. Debbabi. Mining writeprints from anonymous e-mails for forensic investigation. Digital Investigation, 7(1-2):56 -- 64, 2010. Google ScholarDigital Library
- F. Iqbal, H. Binsalleeh, B. C. Fung, and M. Debbabi. A unified data mining solution for authorship analysis in anonymous textual communications. Information Sciences, In Press, Corrected Proof:--, 2011. Google ScholarDigital Library
- M. Karresand and N. Shahmehri. File type identification of data fragments by their binary structure. In Information Assurance Workshop, 2006 IEEE, pages 140--147, June 2006.Google ScholarCross Ref
- M. Khan, C. Chatwin, and R. Young. A framework for post-event timeline reconstruction using neural networks. Digital Investigation, 4(3-4):146 -- 157, 2007. Google ScholarDigital Library
- W.-J. Li, K. Wang, S. Stolfo, and B. Herzog. Fileprints: identifying file types by n-gram analysis. In Information Assurance Workshop, 2005. IAW '05. Proceedings from the 6th Annual IEEE SMC, pages 64 -- 71, June 2005.Google Scholar
- N. Liao, S. Tian, and T. Wang. Network forensics based on fuzzy logic and expert system. Computer Communications, 32(17):1881 -- 1892, 2009. Google ScholarDigital Library
- M. McDaniel and M. Heydari. Content based file type detection algorithms. In System Sciences, 2003. Proceedings of the 36th Annual Hawaii International Conference on, January 2003. Google ScholarDigital Library
- J. McHugh. Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Transactions on Information and System Security, 3:262--294, November 2000. Google ScholarDigital Library
- T. M. Mitchell. The discipline of machine learning. Technical Report Carnegie Mellon University-ML-06-108, Machine Learning Department, School of Computer Science, Carnegie Mellon University, 2006.Google Scholar
- R. Perdisci, W. Lee, and N. Feamster. Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces. In NSDI, pages 391--404. USENIX Association, 2010. Google ScholarDigital Library
- V. Roussev and S. Garfinkel. File fragment classification-the case for specialized approaches. In Systematic Approaches to Digital Forensic Engineering, 2009. SADFE '09. 4th International IEEE Workshop on, pages 3--14, May 2009. Google ScholarDigital Library
- F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34:1--47, 2002. Google ScholarDigital Library
- O. Thonnard and M. Dacier. A framework for attack patterns' discovery in honeynet data. Digital Investigation, 5(Supplement 1):S128 -- S139, 2008. The Proceedings of the Eighth Annual DFRWS Conference. Google ScholarDigital Library
- A. Valdes and K. Skinner. Probabilistic alert correlation. In W. Lee, L. Mé, and A. Wespi, editors, Recent Advances in Intrusion Detection, volume 2212 of Lecture Notes in Computer Science, pages 54--68. Springer, 2001. Google ScholarDigital Library
- W. Wang and T. E. Daniels. A graph based approach toward network forensics analysis. ACM Transactions on Information and System Security, 12:4:1--4:33, October 2008. Google ScholarDigital Library
Index Terms
- Machine learning in computer forensics (and the lessons learned from machine learning in computer security)
Recommendations
Machine Learning: The State of the Art
The two fundamental problems in machine learning (ML) are statistical analysis and algorithm design. The former tells us the principles of the mathematical models that we establish from the observation data. The latter defines the conditions on which ...
Computer Security and Machine Learning: Worst Enemies or Best Friends?
SYSSEC '11: Proceedings of the 2011 First SysSec WorkshopComputer systems linked to the Internet are confronted with a plethora of security threats, ranging from classic computer worms to involved drive-by downloads and bot networks. In the last years these threats have reached a new quality of automatization ...
Comments