research-article

Machine learning-based software classification scheme for efficient program similarity analysis

Authors:
Yesol Kim

Dankook University, Yongin, Korea

Dankook University, Yongin, Korea
View Profile

,
Jonghyuk Park

Dankook University, Yongin, Korea

Dankook University, Yongin, Korea
View Profile

,
Seong-je Cho

Dankook University, Yongin, Korea

Dankook University, Yongin, Korea
View Profile

,
Yunmook Nah

Dankook University, Yongin, Korea

Dankook University, Yongin, Korea
View Profile

,
Sangchul Han

Konkuk University, Chungbuk, Korea

Konkuk University, Chungbuk, Korea
View Profile

,
Minkyu Park

Konkuk University, Chungbuk, Korea

Konkuk University, Chungbuk, Korea
View Profile

RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systemsOctober 2015Pages 114–118https://doi.org/10.1145/2811411.2811549

Published:09 October 2015Publication History

RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systems

Pages 114–118

ABSTRACT

For the health of software ecosystems, we should detect and filter out pirated and counterfeit software on the Web sites and peer-to-peer (P2P) networks. Whenever a suspicious program is found on the Internet or software market, we can adopt a software filtering system that determines whether the program is legal one or not by comparing it with the all programs maintained in the market. That is, we need to measure similarity between a suspicious program and one of the programs in the market for determining whether the suspicious program is one of pirated or hacked versions from its original. In this case, it is necessary to reduce the number of programs to be compared since there are so many programs in the market. This paper proposes a machine learning-based software classification scheme to reduce the number of comparisons for measuring software similarity. The scheme extracts API call frequency from a suspicious program, and classifies the program automatically through a machine learning technique like random forests. Experimental results show that the proposed scheme can effectively classify a program into one of nine categories and can reduce the time to determine whether the program is illegal version or not.

References

IDC White paper, "The Dangerous World of Counterfeit and Pirated Software", March 2013Google Scholar
IDC White paper, "The Link between Pirated Software and Cybersecurity Breaches", March 2014Google Scholar
M. Jang and D. Kim, "Filtering illegal Android application based on feature information", Proc. of the 2013 Conference on Research in Adaptive and Convergent Systems (RACS' 13), pp. 357--358, Oct. 2013 Google ScholarDigital Library
S. Kang, H. Shim, S. Cho, M. Park, and S. Han, "A robust and efficient birthmark-based android application filtering system", Proceeding Of the 2014 Conference on Research in Adaptive and Convergent Systems (RACS' 14), pp. 253--257, Oct. 2014 Google ScholarDigital Library
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagation errors," Nature 323:553--536, 1986Google ScholarCross Ref
L. Breiman, "RandomForest," Machine Learning, 45, pp.5--32, 2001 Google ScholarDigital Library
K. S. Han, B. Kang, and E. G. Im, "Malware classification using instruction frequencies," pp. 298--300, ACM RACS 2011 Google ScholarDigital Library
Ivan Firdausi, Charles Lim, Alva Erwinm and Anto Satriyo Nugroho, "Analysis of machine learning techniques used in behavior-based malware detection," Second International conference on Advances in Computing, Control, and Telecommunication Technologies, pp.201--203, 2010 Google ScholarDigital Library
"WEKA 3: Data Mining Software in JAVA", http://www.cs.waikato.ac.nz/ml/weka/Google Scholar
"Source forge", http://sourceforge.net/Google Scholar
S. Choi, and H. Park, "An Automated Classification Technique for Android Application Based on Software Montage," Journal of KIISE : Computing Practices and Letters, v.18, no.11, pp.756--761, 2012Google Scholar

Index Terms

Machine learning-based software classification scheme for efficient program similarity analysis
1. Social and professional topics
  1. Computing / technology policy
    1. Intellectual property
      1. Copyrights

Recommendations

A software classification scheme using binary-level characteristics for efficient software filtering

Software filtering systems can be employed to detect and filter out pirated or counterfeit software on the Web sites and peer-to-peer networks. They determine whether a suspicious program is legal or not by comparing it with original programs in a ...
Read More
An effective and intelligent Windows application filtering system using software similarity

As licensed programs are pirated and illegally spread over the Internet, it is necessary to filter illegally distributed or cracked programs. The conventional software filtering systems can prevent unauthorized dissemination of the programs maintained ...
Read More
Analysis of Machine Learning Techniques for Anomaly-Based Intrusion Detection

Determining the machine learning (ML) technique that performs best on new datasets is an important factor in the design of effective anomaly-based intrusion detection systems. This study therefore evaluated four machine learning algorithms (naive ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systems
October 2015
540 pages
ISBN:9781450337380
DOI:10.1145/2811411
Conference Chairs:
Esmaeil S. Nadimi
University of Southern Denmark, Denmark
,
Tomas Cerny
Czech Technical University, Czech Republic
,
Program Chairs:
Sung-Ryul Kim
Konkuk University, Korea
,
Wei Wang
San Diego State University
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 October 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
artificial neural network
machine learning
random forest
software classification
software filtering
Qualifiers
- research-article
Conference

Acceptance Rates
RACS '15 Paper Acceptance Rate75of309submissions,24%Overall Acceptance Rate393of1,581submissions,25%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 141
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Machine learning-based software classification scheme for efficient program similarity analysis

RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

A software classification scheme using binary-level characteristics for efficient software filtering

An effective and intelligent Windows application filtering system using software similarity

Analysis of Machine Learning Techniques for Anomaly-Based Intrusion Detection