Abstract
Malicious web pages that use drive-by-download attacks or social engineering technique have become a popular means for compromising hosts on the Internet. To search for malicious web pages, researchers have developed a number of systems that analyze web pages for the presence of malicious code. Most of these systems use dynamic analysis. That is, the tools are quite precise, the analysis process is costly. Therefore, performing this analysis on a large-scale of web pages can be prohibitive. In this paper, we present JSPRE, an approach to search the web more efficiently for pages that are likely malicious. JSPRE proposes a malicious page collection algorithm based on guided crawling, which starts from an initial URLs of know malicious web pages. In the meanwhile, JSPRE uses static analysis techniques to quickly examine a web page for malicious content. We have implemented our approach, and we evaluated it on a large-scale dataset. The results show that JSPRE is able to identify malicious web pages more efficiently when compared to crawler-based approaches.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bichhawat, A., Rajani, V., Garg, D., Hammer, C.: Information flow control in WebKit’s JavaScript bytecode. In: Abadi, M., Kremer, S. (eds.) POST 2014. LNCS, vol. 8414, pp. 159–178. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54792-8_9
Shindo, Y., et al.: Lightweight approach to detect drive-by download attacks based on file type transition. ACM (2014)
Jensen, S.H., Madsen, M., Moller, A.: Modeling the HTML DOM and browser API in static analysis of JavaScript web applications. ACM (2011)
Thinh, T.N., et al.: Memory-efficient signature matching for ClamAV on FPGA (2014)
Flores, R.: How Blackhat SEO became big. Technical report, Trend Micro (2010)
Spitzner, L.: The honeynet project: trapping the hackers. IEEE Secur. Priv. 1(2), 15–23 (2003)
Gang, Z., Peng, W., Xin, W.: The detection method for two-dimensional barcode malicious URL based on the decision tree. Inf. Secur. Technol. 2, 12 (2014)
Choi, J., et al.: Efficient malicious code detection using n-gram analysis and SVM. IEEE (2011)
Wang, Y., et al.: Automated web patrol with strider honeymonkeys (2006)
Kaur, R., Singh, M.: Efficient hybrid technique for detecting zero-day polymorphic worms. IEEE (2014)
Moshchuk, A., et al.: A crawler-based study of spyware in the web (2006)
Seifert, C., Steenson, R.: Capture-honeypot client (capture-HPC) (2006)
Nazario, J.: PhoneyC: a virtual client honeypot. USENIX Association (2009)
Keane, J.K.: Using the Google safe browsing API from PHP. Mad Irish, 7 August 2009
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Seifert, C., Welch, I., Komisarczuk, P.: Honeyc-the low-interaction client honeypot. In: Proceedings of the 2007 NZCSRCS, Waikato University, Hamilton (2007)
Friedrichs, O., Huger, A., O’Donnell, A.J.: Method and apparatus for detecting malicious software using machine learning techniques. US Patent (2015)
Feinstein, B., Peck, D., Secureworks, Inc.: Caffeine monkey: automated collection, detection and analysis of malicious JavaScript. Black Hat USA (2007)
Rieck, K., Krueger, T., Dewald, A.: Cujo: efficient detection and prevention of drive-by-download attacks. ACM (2010)
Curtsinger, C., et al.: ZOZZLE: fast and precise in-browser JavaScript malware detection (2011)
Choi, Y.H., Kim, T.G., Choi, S.J., Lee, C.W.: Automatic detection for JavaScript obfuscation attacks in web pages through string pattern analysis. In: Lee, Y., Kim, T., Fang, W., Ślęzak, D. (eds.) FGIT 2009. LNCS, vol. 5899, pp. 160–172. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10509-8_19
Long, J.: Google Hacking for Penetration Testers. Syngress (2011)
Pilgrim, M.: Dive Into Python [EB/OL] (2000). http://www.diveintopython.com/
Hartstein, B.: Jsunpack: an automatic JavaScript unpacker (2009)
Page, L., et al.: The PageRank citation ranking: bringing order to the web (1999)
Das Sarma, A., et al.: Fast distributed PageRank computation. Theor. Comput. Sci. (2014)
Polychronakis, M., Anagnostakis, K.G., Markatos, E.P.: Network-level polymorphic shellcode detection using emulation. In: Büschkes, R., Laskov, P. (eds.) DIMVA 2006. LNCS, vol. 4064, pp. 54–73. Springer, Heidelberg (2006). https://doi.org/10.1007/11790754_4
Daniel, M., Honoroff, J., Miller, C.: Engineering heap overflow exploits with JavaScript. WOOT 8, 1–6 (2008)
Hallaraker, O., Vigna, G.: Detecting malicious JavaScript code in Mozilla. IEEE (2005)
Shkapenyuk, V., Suel, T.: Design and implementation of a high-performance distributed web crawler. IEEE (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Hou, B., Yu, J., Liu, B., Cai, Z. (2018). JSPRE: A Large-Scale Detection of Malicious JavaScript Code Based on Pre-filter. In: Sun, X., Pan, Z., Bertino, E. (eds) Cloud Computing and Security. ICCCS 2018. Lecture Notes in Computer Science(), vol 11068. Springer, Cham. https://doi.org/10.1007/978-3-030-00021-9_52
Download citation
DOI: https://doi.org/10.1007/978-3-030-00021-9_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00020-2
Online ISBN: 978-3-030-00021-9
eBook Packages: Computer ScienceComputer Science (R0)