research-article

Cyber Threat Hunting Through the Use of an Isolation Forest

Authors:
Dimitar Karev

Harvard University

Harvard University
View Profile

,
Christopher McCubbin

SQRRL Data Inc.

SQRRL Data Inc.
View Profile

,
Ruslan Vaulin

SQRRL Data Inc.

SQRRL Data Inc.
View Profile

CompSysTech '17: Proceedings of the 18th International Conference on Computer Systems and TechnologiesJune 2017Pages 163–170https://doi.org/10.1145/3134302.3134319

Published:23 June 2017Publication History

CompSysTech '17: Proceedings of the 18th International Conference on Computer Systems and Technologies

Pages 163–170

ABSTRACT

Most intrusion detection systems use supervised machine learning algorithms which allow them to detect only recorded types of malicious attacks. This paper applies a fundamentally different approach to the problem, exploiting Isolation Forests, an unsupervised machine learning algorithm in a new context. One of the most important advantages of the algorithm is that it can identify and record novel intrusion models. We conduct experiments using HTTP log data to explore the algorithm's accuracy under various conditions. We empirically determine the optimal values for the algorithm's parameters and prove that the originally suggested standard Isolation Forest's parameters do not always produce optimal results. Furthermore, we explore which HTTP features achieve the best results for differentiating between malicious and normal data by running a genetic algorithm. After applying the established results, we achieve approximately 300% increase in the accuracy and we decrease the requested time of the algorithm by nearly 50%.

References

"Cyber threat hunting." https://sqrrl.com/solutions/cyber-threat-hunting/. Accessed: 2016-07-22.Google Scholar
D. E. Cole, "Automating the hunt for hidden threats," Oct. 2015.Google Scholar
A. Lazarevic, L. Ertoz, V. Kumar, A. Ozgur, and J. Srivastava, "A comparative study of anomaly detection schemes in network intrusion detection," SIAM International Conference on Data Mining, May 2003.Google Scholar
F. T. Liu, K. M. Ting, and Z.-H. Zhou, "Isolation forest," pp. 413--422, Dec. 2008. Google ScholarDigital Library
F. T. Liu, K. M. Ting, and Z.-H. Zhou, "Isolation-based anomaly detection," ACM Transactions on Knowledge Discovery from Data, vol. 6, 2012. Google ScholarDigital Library
J. tsung Chiang, "The masking and swamping effects using the planted mean-shift outliers models," Int. J. Contemp. Math. Sciences, vol. 2, pp. 297--307, 2007.Google ScholarCross Ref
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas "Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825--2830, 2011. Google ScholarDigital Library
M. R. Smith and T. Martinez, "Improving classification accuracy by identifying and removing instances that should be misclassified"," The 2011 International Joint Conference on Neural Networks, pp. 2690--2697, 2011.Google Scholar
S. Webb, J. Caverlee, and C. Pu, "Predicting web spam with http session information," in Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM '08, (New York, NY, USA), pp. 339--348, ACM, 2008. Google ScholarDigital Library
Mila, "Contagio. malware dump.." http://contagiodump.blogspot.com/2010/08/malicious-documents-archive-for.html. Accessed: 2016-07-29.Google Scholar
"Malware domain list." http://www.malwaredomainlist.com/. Accessed: 2016-07-29.Google Scholar
"The bro network security monitor." https://www.bro.org/index.html. Accessed: 2016-07-29.Google Scholar
C. E. Shannon, "A mathematical theory of communication," SIGMOBILE Mob. Comput. Commun. Rev., vol. 5, pp. 3--55, Jan. 2001. Google ScholarDigital Library
C. E. Metz, "Basic principles of roc analysis," Seminars in Nuclear Medicine, vol. 8, no. 4, pp. 283--298, 1978.Google ScholarCross Ref
F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, and C. Gagné, "DEAP: Evolutionary algorithms made easy," Journal of Machine Learning Research, vol. 13, pp. 2171--2175, jul 2012. Google ScholarDigital Library
K. S. Tang, K. F. Man, S. Kwong, and Q. He, "Genetic algorithms and their applications," IEEE Signal Processing Magazine, vol. 13, pp. 22--37, Nov 1996.Google ScholarCross Ref

Recommendations

An Overview of Cyber Threat Intelligence Platform and Role of Artificial Intelligence and Machine Learning
Information Systems Security
Abstract
Ever enhancing computational capability of digital system along with upgraded tactics, technology and procedure (TTPs) enforced by the cybercriminals, does not match to the conventional security mechanism for detection of intrusion and prevention ...
Read More
Strategic evolution of adversaries against temporal platform diversity active cyber defenses
ADS '14: Proceedings of the 2014 Symposium on Agent Directed Simulation

Adversarial dynamics are a critical facet within the cyber security domain, in which there exists a co-evolution between attackers and defenders in any given threat scenario. While defenders leverage capabilities to minimize the potential impact of an ...
Read More
Enhancements to Threat, Vulnerability, and Mitigation Knowledge for Cyber Analytics, Hunting, and Simulations
Cross-linked threat, vulnerability, and defensive mitigation knowledge is critical in defending against diverse and dynamic cyber threats. Cyber analysts consult it by deductively or inductively creating a chain of reasoning to identify a threat starting ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CompSysTech '17: Proceedings of the 18th International Conference on Computer Systems and Technologies
June 2017
358 pages
ISBN:9781450352345
DOI:10.1145/3134302
Editors:
Boris Rachev,
Angel Smrikarov
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cyber security
genetic algorithm
isolation forest
unsupervised machine learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
CompSysTech '17 Paper Acceptance Rate42of107submissions,39%Overall Acceptance Rate241of492submissions,49%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 539
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cyber Threat Hunting Through the Use of an Isolation Forest

CompSysTech '17: Proceedings of the 18th International Conference on Computer Systems and Technologies

ABSTRACT

References

Cited By

Recommendations

An Overview of Cyber Threat Intelligence Platform and Role of Artificial Intelligence and Machine Learning

Strategic evolution of adversaries against temporal platform diversity active cyber defenses

Enhancements to Threat, Vulnerability, and Mitigation Knowledge for Cyber Analytics, Hunting, and Simulations