skip to main content
10.1145/2498328.2500079acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
research-article

Applying static analysis to high-dimensional malicious application detection

Published:04 April 2013Publication History

ABSTRACT

Signature based anti-virus systems inherently restrict the detection of new and previously unknown types of malicious attacks. To that end researchers are searching for methodologies to combat this problem. One potential method is the use of static application analysis. Using this methodology the applications are not executed to determine whether or not they contain malicious functionality. This paper presents a static application analysis methodology using the information retrieval technique of n-gram analysis and the dimensionality reduction techniques of randomized projection and mutual information to create a malicious application detection model. For this effort, a data set was extracted from Microsoft Windows applications that were either benign or possessed malicious functionality. Dimensionality and prediction methodology was then applied. Initial results show promise when comparing the prediction to expected outcomes. In one performance evaluation, the Boosted J48 algorithm achieved an accuracy of 99.08%.

References

  1. G. McGraw and G. Morisett, "Attacking Malicious Code: A Report to the Infosec Research Council," IEEE Software, vol. 17, no. 5, pp. 33--41, Sep/Oct 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Christodorescu and S. Jha, "Static Analysis of Executables to Detect Malicious Patterns," in Proceedings of the 12th Conference on USENIX Security Symposium, Berkeley, CA, USA, 2003, p. 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Christodorescu, S. Jha, M. D. Preda, and S. Debray, "A Semantics-Based Approach to Malware Detection," in Proceedings of the 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Nice, France, Jan. 2007, pp. 377--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Konstantinou and S. Wolthusen, "Metamorphic Virus: Analysis and Detection," Information Security Group, Royal Holloway, University of London, Technical Report RHULMA-2008-02, 2008.Google ScholarGoogle Scholar
  5. T. Abou-Assaleh, N. Cercone, V. Keselj, and R. Sweidan, "Detection of New Malicious Code Using N-grams Signatures," Proceedings of the 2nd Annual Conference on Privacy, Security and Trust, New Brunswick, Canada, 2004, pp. 193--196.Google ScholarGoogle Scholar
  6. T. Abou-Assaleh, N. Cercone, V. Keselj, and R. Sweidan, "N-gram-based Detection of New Malicious Code," Proceedings of the 28th Annual International Computer Software and Applications Conference, COMPSAC. vol. 2, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. O. Henchiri and N. Japkowicz, "A Feature Selection and Evaluation Scheme for Computer Virus Detection," 6th International Conference on Data Mining, ICDM'06, 2006, pp. 891--895. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Marceau, "Characterizing the Behavior of a Program Using Multiple-Length N-grams," Proceedings of the 2000 Workshop on New Security Paradigms, Ballycotton, County Cork, Ireland: ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. K. S. Reddy and A. K. Pujari, "N-gram analysis for computer virus detection," Journal in Computer Virology, vol. 2, no. 3, 2006, pp. 231--239.Google ScholarGoogle ScholarCross RefCross Ref
  10. R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Harlow, England, Addison Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Mannila and J. K. Seppänen, "Finding similar situations in sequences of events," 1st SIAM International Conference on Data Mining, 2001,Google ScholarGoogle Scholar
  12. E. Bingham and H. Mannila, "Random projection in dimensionality reduction: applications to image and text data," Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 245--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Chandra, B. Chess, and J. Steven, "Putting the tools to work: How to succeed with source code analysis," IEEE Security & Privacy, vol. 4, no. 3, 2006, pp. 80--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Hovemeyer and W. Pugh, "Finding bugs is easy," Companion to the 19th annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications Vancouver, BC, CANADA: ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Wagner and R. Dean, "Intrusion detection via static analysis," 2001, pp. 156--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Zhang, J. Rilling, and V. Haarslev, "An Ontology-based Approach to Software Comprehension-Reasoning about Security Concerns," 2006.Google ScholarGoogle Scholar
  17. J. Bergeron, M. Debbabi, M. M. Erhioui, and B. Ktari, "Static analysis of binary code to isolate malicious behaviors," IEEE 8th International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, 1999.(WET ICE'99) Proceedings, 1999, pp. 184--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Lin and D. Gunopulos, "Dimensionality reduction by random projection and latent semantic indexing," Proceedings of the Text Mining Workshop at the 3rd SIAM International Conference on Data Mining, 2003.Google ScholarGoogle Scholar
  19. C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna, "Automating mimicry attacks using static binary analysis,"Google ScholarGoogle Scholar
  20. J. Bergeron, M. Debbabi, J. Desharnais, M. M. Erhioui, Y. Lavoie, N. Tawbi, and M. Erhioui, "Static Detection of Malicious Code in Executable Programs," Symposium on Requirements Engineering for Information Security, Indianapolis, IN, 2001.Google ScholarGoogle Scholar
  21. J. Hegedus, Y. Miche, A. Ilin, and A. Lendasse, 2011, "Methodology for Behavioral-based Malware Analysis and Detection using Random Projections and K-Nearest Neighbors Classifiers," Hainan, 2011.Google ScholarGoogle Scholar
  22. C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala, "Latent Semantic Indexing: A Probabilistic Analysis," Journal of Computer and System Sciences, vol. 61, no. 2, 2000, pp. 217--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Z. Kolter and M. A. Maloof, "Learning to Detect Malicious Executables in the Wild," in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, Aug. 2004, pp. 470--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Z. Kolter and M. A. Maloof, "Learning to Detect and Classify Malicious Executables in the Wild," The Journal of Machine Learning Research, Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. W. Lo, K. N. Levit, and R. A. Olsson, "MCF: A Malicious Code Filter," Computers & Security, vol. 14, 1995.Google ScholarGoogle Scholar
  26. I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. San Francisco, CA, USA: Morgan Kaufmann, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Communications of the ACM, vol. 18, no. 11, 1975, pp. 613--620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Bellman, Adaptive Control Processes: A Guided Tour.: Princeton University Press, 1961.Google ScholarGoogle Scholar
  1. Applying static analysis to high-dimensional malicious application detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ACMSE '13: Proceedings of the 51st ACM Southeast Conference
        April 2013
        224 pages
        ISBN:9781450319010
        DOI:10.1145/2498328
        • General Chair:
        • Ashraf Saad

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 April 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate178of377submissions,47%
      • Article Metrics

        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader