WebHound: a data-driven intrusion detection from real-world web access logs

Wei, Te-En; Lee, Hahn-Ming; Jeng, Albert B.; Lamba, Hemank; Faloutsos, Christos

doi:10.1007/s00500-018-03750-1

WebHound: a data-driven intrusion detection from real-world web access logs

Methodologies and Application
Published: 18 January 2019

Volume 23, pages 11947–11965, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

Te-En Wei ORCID: orcid.org/0000-0001-6914-5046^1,2,
Hahn-Ming Lee^1,3,
Albert B. Jeng¹,
Hemank Lamba⁴ &
…
Christos Faloutsos⁴

616 Accesses
1 Citation
Explore all metrics

Abstract

Hackers usually discover and exploit vulnerabilities existing in the entry point before invading a corporate environment. The web server exploration and spams are two popular means used by hackers to gain access to the enterprise computer systems. In this paper, we focus on protecting a web server in dealing with such cybersecurity intrusion threat. During the discovery stage, a web vulnerability investigation scanner (e.g., SQLMap, NMap, and Kali) is used by hackers to learn the web server versions and other related vulnerabilities. Then, in the exploitation stage, hackers develop a customized intrusion method which exploits those previously learned vulnerabilities to launch a subsequent attack. Currently, the most popular defense approaches (e.g., IDS, WAF) detect web server intrusion events through domain expert rules and anomaly pattern matches. For example, ModSecurity is an open source WAF which only detects known malware signature by domain expert rules. Thus, those approaches are good to defend the first discovery stage intrusion. However, they are not effective to deal with the customized intrusion in the second exploitation stage since no rules or signatures are available for such kind of intrusion detection. In this paper, in order to resolve the above problem, we propose an unsupervised data-driven anomaly detection known as WebHound. It not only identifies hackers reconnaissance but also detects the customized intrusion means deployed by hackers by analyzing large-scale web access logs. Moreover, WebHoundalso provides intrusion evidence using storyline for recovering intrusion procedure. Among numerous experiments and case studies, we applied WebHoundto a special government case for the intrusion evidence investigation and at the same time, we compared our results with the work done by computer forensic experts. The results showed that WebHoundcould discover more intrusion evidence than human experts. We also compared WebHoundwith ModSecurity which is updated with the newest domain expert rules running in a virtualized corporate environment. The experimental results show that WebHoundhas a better accuracy rate than ModSecurity. In summary, WebHoundalleviates the heavy demand on expert knowledge and human efforts to detect cyber-attack on a web server, and it also enhances detection accuracy and recall rate. Moreover, WebHoundcould provide more evidence for forensic experts to trace the original entry points.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

An Analysis of Key Tools for Detecting Cross-Site Scripting Attacks on Web-Based Systems

A Novel Semantic-Aware Approach for Detecting Malicious Web Traffic

GuruWS: A Hybrid Platform for Detecting Malicious Web Shells and Web Application Vulnerabilities

References

Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: spotting anomalies in weighted graphs. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 410–421
Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Discov 29(3):626–688
Article MathSciNet Google Scholar
Braue D (2015) Cybercrime not “solvable”, requires data-based harm minimisation. http://www.cso.com.au/article/565596/cybercrimeHrB-solvable-requires-data-based-harm-minimisation/HrB. Accessed 5 Feb 2015
Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: ACM sigmod record, vol 29. ACM, pp 93–104
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15
Article Google Scholar
Cheng W, Zhang K, Chen H, Jiang G, Chen Z, Wang W (2016) Ranking causal anomalies via temporal and dynamical analysis on vanishing correlations. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD)
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Darktrace. https://www.darktrace.com/en/
Di Mauro M, Di Sarno C (2018) Improving siem capabilities through an enhanced probe for encrypted skype traffic detection. J Inf Secur Appl 38:85–95
Google Scholar
Elasticsearch. https://www.elastic.co/products/elasticsearch
Elkan C (2003) Using the triangle inequality to accelerate k-means. In: ICML, pp 147–153
Filtering SQL injection from classic ASP. https://blogs.iis.net/nazim/filtering-sql-injection-from-classic-asp
Ge Y, Jiang G, Ding M, Xiong H (2014) Ranking metric anomaly in invariant networks. ACM Trans Knowl Discov Data (TKDD) 8(2):8
Google Scholar
Goh V (2016) The anatomy of large-scale cyber attacks. http://www.cso.com.au/article/606694/anatomy-large-scale-cyber-attacks/. Accessed 13 Sept 2016
Golub GH, Reinsch C (1970) Singular value decomposition and least squares solutions. Numerische mathematik 14(5):403–420
Article MathSciNet Google Scholar
Gunestas M, Bilgin Z (2016) Log analysis using temporal logic and reconstruction approach: web server case. J Digit Forensics Secur Law JDFSL 11(2):35
Google Scholar
Gyöngyi Z, Garcia-Molina H, Pedersen J (2004) Combating web spam with trustrank. In: Proceedings of the thirtieth international conference on very large data bases, vol 30. VLDB Endowment, pp 576–587
Chapter Google Scholar
Henderson K, Eliassi-Rad T, Faloutsos C, Akoglu L, Li L, Maruhashi K, Prakash BA, Tong H (2010) Metric forensics: a multi-level approach for mining volatile graphs. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 163–172
Hoplaros D, Tari Z, Khalil I (2014) Data summarization for network traffic monitoring. J Netw Comput Appl 37:194–205
Article Google Scholar
Http status codes. http://www.restapitutorial.com/httpstatuscodes.html
Jaeger D, Ussath M, Cheng F, Meinel C (2015) Multi-step attack pattern detection on normalized event logs. In: 2015 IEEE 2nd international conference on cyber security and cloud computing (CSCloud). IEEE, pp 390–398
Jiang G, Chen H, Yoshihira K (2006) Modeling and tracking of transaction flow dynamics for fault detection in complex systems. IEEE Trans Dependable Secure Comput 3(4):312–326
Article Google Scholar
Jiang D, Xu Z, Zhang P, Zhu T (2014) A transform domain-based anomaly detection approach to network-wide traffic. J Netw Comput Appl 40:292–306
Article Google Scholar
Langville AN, Meyer CD (2005) A survey of eigenvector methods for web information retrieval. SIAM Rev 47(1):135–161
Article MathSciNet Google Scholar
Liao H-J, Lin C-HR, Lin Y-C, Tung K-Y (2013) Intrusion detection system: a comprehensive review. J Netw Comput Appl 36(1):16–24
Article Google Scholar
Liu C, Yan X, Yu H, Han J, Philip SY (2005) Mining behavior graphs for“ backtrace” of noncrashing bugs. In: SDM. SIAM, pp 286–297
Malkin R (2016) Fighting the growing threat of DDoS attacks down under. http://www.cso.com.au/article/606693/fighting-growing-threat-ddos-attacks-down-under/. Accessed 13 Sept 2016
Manevitz L M, Yousef M (2001) One-class SVMs for document classification. J Mach Learn Res 2:139–154
MATH Google Scholar
Matta V, Di Mauro M, Longo M (2017) Ddos attacks with randomized traffic innovation: botnet identification challenges and strategies. IEEE Trans Inf Forensics Secur 12(8):1844–1859
Article Google Scholar
Modi C, Patel D, Borisaniya B, Patel H, Patel A, Rajarajan M (2013) A survey of intrusion detection techniques in cloud. J Netw Comput Appl 36(1):42–57
Article Google Scholar
Modsecurity: open source web application firewall. https://modsecurity.org/
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab
Public security log sharing site. http://log-sharing.dreamhosters.com/
Ray S, Turi RH (1999) Determination of number of clusters in k-means clustering and application in colour image segmentation. In: Proceedings of the 4th international conference on advances in pattern recognition and digital techniques (ICAPRDT’99)
Rossi RA, Gallagher B, Neville J, Henderson K (2013) Modeling dynamic behavior in large evolving graphs. In: Proceedings of the sixth ACM international conference on Web search and data mining. ACM, pp 667–676
Snort. https://www.snort.org/
SQL injection prevention cheat sheet. https://www.owasp.org/index.php/SQL_Injection_Prevention_Cheat_Sheet
Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 374–383
Sun X, Dai J, Liu P, Singhal A, Yen J (2016) Towards probabilistic identification of zero-day attack paths. In: 2016 IEEE conference on communications and network security (CNS). IEEE, pp 64–72
Tao C, Ge Y, Song Q, Ge Y, Omitaomu OA (2014) Metric ranking of invariant networks with belief propagation. In: 2014 IEEE international conference on data mining. IEEE, pp 1001–1006
Tong H, Faloutsos C, Pan J-Y (2006) Fast random walk with restart and its applications

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
Te-En Wei, Hahn-Ming Lee & Albert B. Jeng
CyberTrust Technology Institute, Institute for Information Industry, Taipei, Taiwan
Te-En Wei
Institute of Information Science Academia Sinica, Taipei, Taiwan
Hahn-Ming Lee
Department of Computer Science, Carnegie Mellon University, Pittsburgh, USA
Hemank Lamba & Christos Faloutsos

Authors

Te-En Wei
View author publications
You can also search for this author in PubMed Google Scholar
Hahn-Ming Lee
View author publications
You can also search for this author in PubMed Google Scholar
Albert B. Jeng
View author publications
You can also search for this author in PubMed Google Scholar
Hemank Lamba
View author publications
You can also search for this author in PubMed Google Scholar
Christos Faloutsos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Te-En Wei.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article doesn’t contain any studies with human participates or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, TE., Lee, HM., Jeng, A.B. et al. WebHound: a data-driven intrusion detection from real-world web access logs. Soft Comput 23, 11947–11965 (2019). https://doi.org/10.1007/s00500-018-03750-1

Download citation

Published: 18 January 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s00500-018-03750-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

WebHound: a data-driven intrusion detection from real-world web access logs

Abstract

Access this article

Similar content being viewed by others

An Analysis of Key Tools for Detecting Cross-Site Scripting Attacks on Web-Based Systems

A Novel Semantic-Aware Approach for Detecting Malicious Web Traffic

GuruWS: A Hybrid Platform for Detecting Malicious Web Shells and Web Application Vulnerabilities

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

WebHound: a data-driven intrusion detection from real-world web access logs

Abstract

Access this article

Similar content being viewed by others

An Analysis of Key Tools for Detecting Cross-Site Scripting Attacks on Web-Based Systems

A Novel Semantic-Aware Approach for Detecting Malicious Web Traffic

GuruWS: A Hybrid Platform for Detecting Malicious Web Shells and Web Application Vulnerabilities

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation