skip to main content
10.1145/2811411.2811549acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
research-article

Machine learning-based software classification scheme for efficient program similarity analysis

Published:09 October 2015Publication History

ABSTRACT

For the health of software ecosystems, we should detect and filter out pirated and counterfeit software on the Web sites and peer-to-peer (P2P) networks. Whenever a suspicious program is found on the Internet or software market, we can adopt a software filtering system that determines whether the program is legal one or not by comparing it with the all programs maintained in the market. That is, we need to measure similarity between a suspicious program and one of the programs in the market for determining whether the suspicious program is one of pirated or hacked versions from its original. In this case, it is necessary to reduce the number of programs to be compared since there are so many programs in the market. This paper proposes a machine learning-based software classification scheme to reduce the number of comparisons for measuring software similarity. The scheme extracts API call frequency from a suspicious program, and classifies the program automatically through a machine learning technique like random forests. Experimental results show that the proposed scheme can effectively classify a program into one of nine categories and can reduce the time to determine whether the program is illegal version or not.

References

  1. IDC White paper, "The Dangerous World of Counterfeit and Pirated Software", March 2013Google ScholarGoogle Scholar
  2. IDC White paper, "The Link between Pirated Software and Cybersecurity Breaches", March 2014Google ScholarGoogle Scholar
  3. M. Jang and D. Kim, "Filtering illegal Android application based on feature information", Proc. of the 2013 Conference on Research in Adaptive and Convergent Systems (RACS' 13), pp. 357--358, Oct. 2013 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Kang, H. Shim, S. Cho, M. Park, and S. Han, "A robust and efficient birthmark-based android application filtering system", Proceeding Of the 2014 Conference on Research in Adaptive and Convergent Systems (RACS' 14), pp. 253--257, Oct. 2014 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagation errors," Nature 323:553--536, 1986Google ScholarGoogle ScholarCross RefCross Ref
  6. L. Breiman, "RandomForest," Machine Learning, 45, pp.5--32, 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. S. Han, B. Kang, and E. G. Im, "Malware classification using instruction frequencies," pp. 298--300, ACM RACS 2011 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ivan Firdausi, Charles Lim, Alva Erwinm and Anto Satriyo Nugroho, "Analysis of machine learning techniques used in behavior-based malware detection," Second International conference on Advances in Computing, Control, and Telecommunication Technologies, pp.201--203, 2010 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. "WEKA 3: Data Mining Software in JAVA", http://www.cs.waikato.ac.nz/ml/weka/Google ScholarGoogle Scholar
  10. "Source forge", http://sourceforge.net/Google ScholarGoogle Scholar
  11. S. Choi, and H. Park, "An Automated Classification Technique for Android Application Based on Software Montage," Journal of KIISE : Computing Practices and Letters, v.18, no.11, pp.756--761, 2012Google ScholarGoogle Scholar

Index Terms

  1. Machine learning-based software classification scheme for efficient program similarity analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systems
      October 2015
      540 pages
      ISBN:9781450337380
      DOI:10.1145/2811411

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 October 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      RACS '15 Paper Acceptance Rate75of309submissions,24%Overall Acceptance Rate393of1,581submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader