ABSTRACT
The steady evolution of the Web has paved the way for miscreants to take advantage of vulnerabilities to embed malicious content into web pages. Up on a visit, malicious web pages steal sensitive data, redirect victims to other malicious targets, or cease control of victim's system to mount future attacks. Approaches to detect malicious web pages have been reactively effective at special classes of attacks like drive-by-downloads. However, the prevalence and complexity of attacks by malicious web pages is still worrisome. The main challenges in this problem domain are (1) fine-grained capturing and characterization of attack payloads (2) evolution of web page artifacts and (3) exibility and scalability of detection techniques with a fast-changing threat landscape. To this end, we proposed a holistic approach that leverages static analysis, dynamic analysis, machine learning, and evolutionary searching and optimization to effectively analyze and detect malicious web pages. We do so by: introducing novel features to capture fine-grained snapshot of malicious web pages, holistic characterization of malicious web pages, and application of evolutionary techniques to fine-tune learning-based detection models pertinent to evolution of attack payloads. In this paper, we present key intuition and details of our approach, results obtained so far, and future work.
- M. Alexander, B. Tanya, D. Damien, S. D. Gribble, and H. M. Levy. Spyproxy: execution-based detection of malicious web content. In Proceedings of 16th USENIX Security Symposium, pages 3:1--3:16, 2007. Google ScholarDigital Library
- I. Archive. Heritrix. http://crawler.archive.org/index.html, July 2012.Google Scholar
- K. Byung-Ik, I. Chae-Tae, and J. Hyun-Chul. Suspicious malicious web site detection with strength analysis of a javascript obfuscation. In International Journal of Advanced Science and Technology, pages 19--32, 2011.Google Scholar
- D. Canali, M. Cova, G. Vigna, and C. Kruegel. Prophiler: a fast filter for the large-scale detection of malicious web pages. In Proceedings of WWW, pages 197--206, 2011. Google ScholarDigital Library
- H. Choi, B. B. Zhu, and H. Lee. Detecting malicious web links and identifying their attack types. In Proceedings of the 2nd USENIX conference on Web application development, pages 11--11, 2011. Google ScholarDigital Library
- S. Corporation. Symantec web based attack prevalence report. http://www.symantec.com/business/threatreport/topic.jsp?id=threat_activity_trends&aid=web_based_attack_prevalence, July 2011.Google Scholar
- A. Dewald, T. Holz, and F. C. Freiling. Adsandbox: sandboxing javascript to fight malicious websites. In ACM Symposium on Applied Computing, pages 1859--1864, 2010. Google ScholarDigital Library
- B. Eshete, A. Villafiorita, and K. Weldemariam. Binspect: Holistic analysis and detection of malicious web pages. In Proceedings of Security and Privacy in Communication Networks, 2012.Google Scholar
- B. Eshete, A. Villafiorita, and K. Weldemariam. Einspect: Evolution-guided analaysis and detection of malicious web pages. Technical report, Fondazione Bruno Kessler, 2012.Google Scholar
- Google. Google safe browsing api. http://code.google.com/apis/safebrowsing/, August 2011.Google Scholar
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11, 2009. Google ScholarDigital Library
- A. Ikinci, T. Holz, and F. Freiling. Monkey-spider: Detecting malicious websites with low-interaction honeyclients. In Proceedings of Sicherheit, Schutz und Zuverl Lssigkeit, pages 407--421, 2008.Google Scholar
- M. Justin, S. L. K., S. Stefan, and V. G. M. Beyond blacklists: learning to detect malicious web sites from suspicious urls. In Proceedings of KDDM, pages 1245--1254, 2009. Google ScholarDigital Library
- M. Justin, S. L. K., S. Stefan, and V. G. M. Identifying suspicious urls: an application of large-scale online learning. In Proceedings of ICML, pages 681--688, 2009. Google ScholarDigital Library
- C. Kolbitsch, B. Livshits, B. Zorn, and C. Seifer. Rozzle: De-cloaking internet malware. Technical report, Microsoft, 2011.Google Scholar
- C. Marco, K. Christopher, and V. Giovanni. Detection and analysis of drive-by-download attacks and malicious javascript code. In Proceedings of WWW, pages 281--290, 2010. Google ScholarDigital Library
- T. Micro. Web threats. http://apac.trendmicro.com/apac/threats/enterprise/web-threats/, November 2012.Google Scholar
- MITRE. The mitre honeyclient project. http://search.cpan.org/~mitrehc, November 2011.Google Scholar
- H. Project. Honeyc. https://projects.honeynet.org/honeyc, July 2011.Google Scholar
- T. H. Project. Capture-hpc. https://projects.honeynet.org/capture-hpc, October 2011.Google Scholar
- M. Qassrawi and H. Zhang. Detecting malicious web servers with honeyclients. Journal of Networks, 6(1), 2011.Google ScholarCross Ref
- K. Rieck, T. Krueger, and A. Dewald. Cujo: efficient detection and prevention of drive-by-download attacks. In Proceedings ACSAC, pages 31--39, 2010. Google ScholarDigital Library
- C. Seifert, I. Welch, and P. Komisarczuk. Identification of malicious web pages with static heuristics. In Proceedings of the Australasian Telecommunication Networks and Applications Conference, 2008.Google ScholarCross Ref
- C. Seifert, I. Welch, P. Komisarczuk, C. Aval, and B. Endicott-Popovsky. Identification of malicious web pages through analysis of underlying dns and web server relationships. In 33rd IEEE Conference on Local Computer Networks, 2008.Google ScholarCross Ref
- G. Software. Htmlunit. http://htmlunit.sourceforge.net/, March 2012.Google Scholar
- Symantec. Symantec report on attack kits and malicious websites. http://symantec.com/content/en/us/enterprise/other_resources/b-symantec_report_on_attack_kits_and_malicious_websites_21169171_WP.en-us.pdf, July 2011.Google Scholar
- K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song. Design and Evaluation of a Real-Time URL Spam Filtering Service. In Proceedings of the IEEE Symposium on Security and Privacy, 2011. Google ScholarDigital Library
- UCSB. Wepawet. http://wepawet.cs.ucsb.edu, July 2011.Google Scholar
- Y.-M. Wang, D. Beck, X. Jiang, and R. Roussev. Automated web patrol with strider honeymonkeys: Finding web sites that exploit browser vulnerabilities. In Proceedings of the NDSS, 2006.Google Scholar
- A. Weiss. Top 5 security threats in html5. http://www.esecurityplanet.com/trends/article.php/3916381/Top-5-Security-Threats-in-HTML5.htm, October 2011.Google Scholar
- D. Whitley. A genetic algorithm tutorial. Statistics and Computing, 4:65--85, 1993.Google Scholar
- C. Whittaker, B. Ryner, and M. Nazif. Large-scale automatic classification of phishing pages. In Proceedings of the NDSS, 2010.Google Scholar
- H. Yung-Tsung, C. Yimeng, C. Tsuhan, L. Chi-Sung, and C. Chia-Mei. Malicious web content detection by machine learning. Expert Syst. Appl., 3 (1):55--60, 2010. Google ScholarDigital Library
Index Terms
- Effective analysis, characterization, and detection of malicious web pages
Recommendations
EINSPECT: Evolution-Guided Analysis and Detection of Malicious Web Pages
COMPSAC '13: Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications ConferenceMost existing work to thwart malicious web pages capture maliciousness viadiscriminative artifacts, learn a model, and detect by leveraging staticand/or dynamic analysis. Unfortunately, there is a two-sided evolution of theartifacts of web pages. On one ...
Early detection of malicious behavior in JavaScript code
AISec '12: Proceedings of the 5th ACM workshop on Security and artificial intelligenceMalicious JavaScript code is widely used for exploiting vulnerabilities in web browsers and infecting users with malicious software. Static detection methods fail to protect from this threat, as they are unable to cope with the complexity and dynamics ...
Hybrid Analysis Technique to detect Advanced Persistent Threats
Advanced persistent threats APT are major threats in the field of system and network security. They are extremely stealthy and use advanced evasion techniques like packing and behaviour obfuscation to hide their malicious behaviour and evade the ...
Comments