ABSTRACT
For the health of software ecosystems, we should detect and filter out pirated and counterfeit software on the Web sites and peer-to-peer (P2P) networks. Whenever a suspicious program is found on the Internet or software market, we can adopt a software filtering system that determines whether the program is legal one or not by comparing it with the all programs maintained in the market. That is, we need to measure similarity between a suspicious program and one of the programs in the market for determining whether the suspicious program is one of pirated or hacked versions from its original. In this case, it is necessary to reduce the number of programs to be compared since there are so many programs in the market. This paper proposes a machine learning-based software classification scheme to reduce the number of comparisons for measuring software similarity. The scheme extracts API call frequency from a suspicious program, and classifies the program automatically through a machine learning technique like random forests. Experimental results show that the proposed scheme can effectively classify a program into one of nine categories and can reduce the time to determine whether the program is illegal version or not.
- IDC White paper, "The Dangerous World of Counterfeit and Pirated Software", March 2013Google Scholar
- IDC White paper, "The Link between Pirated Software and Cybersecurity Breaches", March 2014Google Scholar
- M. Jang and D. Kim, "Filtering illegal Android application based on feature information", Proc. of the 2013 Conference on Research in Adaptive and Convergent Systems (RACS' 13), pp. 357--358, Oct. 2013 Google ScholarDigital Library
- S. Kang, H. Shim, S. Cho, M. Park, and S. Han, "A robust and efficient birthmark-based android application filtering system", Proceeding Of the 2014 Conference on Research in Adaptive and Convergent Systems (RACS' 14), pp. 253--257, Oct. 2014 Google ScholarDigital Library
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagation errors," Nature 323:553--536, 1986Google ScholarCross Ref
- L. Breiman, "RandomForest," Machine Learning, 45, pp.5--32, 2001 Google ScholarDigital Library
- K. S. Han, B. Kang, and E. G. Im, "Malware classification using instruction frequencies," pp. 298--300, ACM RACS 2011 Google ScholarDigital Library
- Ivan Firdausi, Charles Lim, Alva Erwinm and Anto Satriyo Nugroho, "Analysis of machine learning techniques used in behavior-based malware detection," Second International conference on Advances in Computing, Control, and Telecommunication Technologies, pp.201--203, 2010 Google ScholarDigital Library
- "WEKA 3: Data Mining Software in JAVA", http://www.cs.waikato.ac.nz/ml/weka/Google Scholar
- "Source forge", http://sourceforge.net/Google Scholar
- S. Choi, and H. Park, "An Automated Classification Technique for Android Application Based on Software Montage," Journal of KIISE : Computing Practices and Letters, v.18, no.11, pp.756--761, 2012Google Scholar
Index Terms
- Machine learning-based software classification scheme for efficient program similarity analysis
Recommendations
A software classification scheme using binary-level characteristics for efficient software filtering
Software filtering systems can be employed to detect and filter out pirated or counterfeit software on the Web sites and peer-to-peer networks. They determine whether a suspicious program is legal or not by comparing it with original programs in a ...
An effective and intelligent Windows application filtering system using software similarity
As licensed programs are pirated and illegally spread over the Internet, it is necessary to filter illegally distributed or cracked programs. The conventional software filtering systems can prevent unauthorized dissemination of the programs maintained ...
Analysis of Machine Learning Techniques for Anomaly-Based Intrusion Detection
Determining the machine learning (ML) technique that performs best on new datasets is an important factor in the design of effective anomaly-based intrusion detection systems. This study therefore evaluated four machine learning algorithms (naive ...
Comments