ABSTRACT
Signature based anti-virus systems inherently restrict the detection of new and previously unknown types of malicious attacks. To that end researchers are searching for methodologies to combat this problem. One potential method is the use of static application analysis. Using this methodology the applications are not executed to determine whether or not they contain malicious functionality. This paper presents a static application analysis methodology using the information retrieval technique of n-gram analysis and the dimensionality reduction techniques of randomized projection and mutual information to create a malicious application detection model. For this effort, a data set was extracted from Microsoft Windows applications that were either benign or possessed malicious functionality. Dimensionality and prediction methodology was then applied. Initial results show promise when comparing the prediction to expected outcomes. In one performance evaluation, the Boosted J48 algorithm achieved an accuracy of 99.08%.
- G. McGraw and G. Morisett, "Attacking Malicious Code: A Report to the Infosec Research Council," IEEE Software, vol. 17, no. 5, pp. 33--41, Sep/Oct 2000. Google ScholarDigital Library
- M. Christodorescu and S. Jha, "Static Analysis of Executables to Detect Malicious Patterns," in Proceedings of the 12th Conference on USENIX Security Symposium, Berkeley, CA, USA, 2003, p. 12. Google ScholarDigital Library
- M. Christodorescu, S. Jha, M. D. Preda, and S. Debray, "A Semantics-Based Approach to Malware Detection," in Proceedings of the 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Nice, France, Jan. 2007, pp. 377--388. Google ScholarDigital Library
- E. Konstantinou and S. Wolthusen, "Metamorphic Virus: Analysis and Detection," Information Security Group, Royal Holloway, University of London, Technical Report RHULMA-2008-02, 2008.Google Scholar
- T. Abou-Assaleh, N. Cercone, V. Keselj, and R. Sweidan, "Detection of New Malicious Code Using N-grams Signatures," Proceedings of the 2nd Annual Conference on Privacy, Security and Trust, New Brunswick, Canada, 2004, pp. 193--196.Google Scholar
- T. Abou-Assaleh, N. Cercone, V. Keselj, and R. Sweidan, "N-gram-based Detection of New Malicious Code," Proceedings of the 28th Annual International Computer Software and Applications Conference, COMPSAC. vol. 2, 2004. Google ScholarDigital Library
- O. Henchiri and N. Japkowicz, "A Feature Selection and Evaluation Scheme for Computer Virus Detection," 6th International Conference on Data Mining, ICDM'06, 2006, pp. 891--895. Google ScholarDigital Library
- C. Marceau, "Characterizing the Behavior of a Program Using Multiple-Length N-grams," Proceedings of the 2000 Workshop on New Security Paradigms, Ballycotton, County Cork, Ireland: ACM, 2000. Google ScholarDigital Library
- D. K. S. Reddy and A. K. Pujari, "N-gram analysis for computer virus detection," Journal in Computer Virology, vol. 2, no. 3, 2006, pp. 231--239.Google ScholarCross Ref
- R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Harlow, England, Addison Wesley, 1999. Google ScholarDigital Library
- H. Mannila and J. K. Seppänen, "Finding similar situations in sequences of events," 1st SIAM International Conference on Data Mining, 2001,Google Scholar
- E. Bingham and H. Mannila, "Random projection in dimensionality reduction: applications to image and text data," Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 245--250. Google ScholarDigital Library
- P. Chandra, B. Chess, and J. Steven, "Putting the tools to work: How to succeed with source code analysis," IEEE Security & Privacy, vol. 4, no. 3, 2006, pp. 80--83. Google ScholarDigital Library
- D. Hovemeyer and W. Pugh, "Finding bugs is easy," Companion to the 19th annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications Vancouver, BC, CANADA: ACM, 2004. Google ScholarDigital Library
- D. Wagner and R. Dean, "Intrusion detection via static analysis," 2001, pp. 156--168. Google ScholarDigital Library
- Y. Zhang, J. Rilling, and V. Haarslev, "An Ontology-based Approach to Software Comprehension-Reasoning about Security Concerns," 2006.Google Scholar
- J. Bergeron, M. Debbabi, M. M. Erhioui, and B. Ktari, "Static analysis of binary code to isolate malicious behaviors," IEEE 8th International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, 1999.(WET ICE'99) Proceedings, 1999, pp. 184--189. Google ScholarDigital Library
- J. Lin and D. Gunopulos, "Dimensionality reduction by random projection and latent semantic indexing," Proceedings of the Text Mining Workshop at the 3rd SIAM International Conference on Data Mining, 2003.Google Scholar
- C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna, "Automating mimicry attacks using static binary analysis,"Google Scholar
- J. Bergeron, M. Debbabi, J. Desharnais, M. M. Erhioui, Y. Lavoie, N. Tawbi, and M. Erhioui, "Static Detection of Malicious Code in Executable Programs," Symposium on Requirements Engineering for Information Security, Indianapolis, IN, 2001.Google Scholar
- J. Hegedus, Y. Miche, A. Ilin, and A. Lendasse, 2011, "Methodology for Behavioral-based Malware Analysis and Detection using Random Projections and K-Nearest Neighbors Classifiers," Hainan, 2011.Google Scholar
- C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala, "Latent Semantic Indexing: A Probabilistic Analysis," Journal of Computer and System Sciences, vol. 61, no. 2, 2000, pp. 217--235. Google ScholarDigital Library
- J. Z. Kolter and M. A. Maloof, "Learning to Detect Malicious Executables in the Wild," in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, Aug. 2004, pp. 470--478. Google ScholarDigital Library
- J. Z. Kolter and M. A. Maloof, "Learning to Detect and Classify Malicious Executables in the Wild," The Journal of Machine Learning Research, Google ScholarDigital Library
- R. W. Lo, K. N. Levit, and R. A. Olsson, "MCF: A Malicious Code Filter," Computers & Security, vol. 14, 1995.Google Scholar
- I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. San Francisco, CA, USA: Morgan Kaufmann, 2005. Google ScholarDigital Library
- G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Communications of the ACM, vol. 18, no. 11, 1975, pp. 613--620. Google ScholarDigital Library
- R. Bellman, Adaptive Control Processes: A Guided Tour.: Princeton University Press, 1961.Google Scholar
- Applying static analysis to high-dimensional malicious application detection
Recommendations
Malware Detection by Static Checking and Dynamic Analysis of Executables
The advanced malware continue to be a challenge in digital world that signature-based detection techniques fail to conquer. The malware use many anti-detection techniques to mutate. Thus no virus scanner can claim complete malware detection even for ...
Aiding prediction algorithms in detecting high-dimensional malicious applications using a randomized projection technique
ACM SE '10: Proceedings of the 48th Annual Southeast Regional ConferenceThis research paper describes an on-going effort to design, develop and improve upon malicious application detection algorithms. This work looks specifically at improving a cosine similarity, information retrieval technique to enhance detection of known ...
Using randomized projection techniques to aid in detecting high-dimensional malicious applications
ACM-SE '11: Proceedings of the 49th Annual Southeast Regional ConferenceThis work is part of an on-going effort in using randomized projection as a feature extraction and reduction method to improve a cosine similarity, information retrieval technique to enhance the detection of known malicious applications and their ...
Comments