skip to main content
10.1145/3134302.3134319acmotherconferencesArticle/Chapter ViewAbstractPublication PagescompsystechConference Proceedingsconference-collections
research-article

Cyber Threat Hunting Through the Use of an Isolation Forest

Published:23 June 2017Publication History

ABSTRACT

Most intrusion detection systems use supervised machine learning algorithms which allow them to detect only recorded types of malicious attacks. This paper applies a fundamentally different approach to the problem, exploiting Isolation Forests, an unsupervised machine learning algorithm in a new context. One of the most important advantages of the algorithm is that it can identify and record novel intrusion models. We conduct experiments using HTTP log data to explore the algorithm's accuracy under various conditions. We empirically determine the optimal values for the algorithm's parameters and prove that the originally suggested standard Isolation Forest's parameters do not always produce optimal results. Furthermore, we explore which HTTP features achieve the best results for differentiating between malicious and normal data by running a genetic algorithm. After applying the established results, we achieve approximately 300% increase in the accuracy and we decrease the requested time of the algorithm by nearly 50%.

References

  1. "Cyber threat hunting." https://sqrrl.com/solutions/cyber-threat-hunting/. Accessed: 2016-07-22.Google ScholarGoogle Scholar
  2. D. E. Cole, "Automating the hunt for hidden threats," Oct. 2015.Google ScholarGoogle Scholar
  3. A. Lazarevic, L. Ertoz, V. Kumar, A. Ozgur, and J. Srivastava, "A comparative study of anomaly detection schemes in network intrusion detection," SIAM International Conference on Data Mining, May 2003.Google ScholarGoogle Scholar
  4. F. T. Liu, K. M. Ting, and Z.-H. Zhou, "Isolation forest," pp. 413--422, Dec. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. T. Liu, K. M. Ting, and Z.-H. Zhou, "Isolation-based anomaly detection," ACM Transactions on Knowledge Discovery from Data, vol. 6, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. tsung Chiang, "The masking and swamping effects using the planted mean-shift outliers models," Int. J. Contemp. Math. Sciences, vol. 2, pp. 297--307, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  7. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas "Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825--2830, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. R. Smith and T. Martinez, "Improving classification accuracy by identifying and removing instances that should be misclassified"," The 2011 International Joint Conference on Neural Networks, pp. 2690--2697, 2011.Google ScholarGoogle Scholar
  9. S. Webb, J. Caverlee, and C. Pu, "Predicting web spam with http session information," in Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM '08, (New York, NY, USA), pp. 339--348, ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mila, "Contagio. malware dump.." http://contagiodump.blogspot.com/2010/08/malicious-documents-archive-for.html. Accessed: 2016-07-29.Google ScholarGoogle Scholar
  11. "Malware domain list." http://www.malwaredomainlist.com/. Accessed: 2016-07-29.Google ScholarGoogle Scholar
  12. "The bro network security monitor." https://www.bro.org/index.html. Accessed: 2016-07-29.Google ScholarGoogle Scholar
  13. C. E. Shannon, "A mathematical theory of communication," SIGMOBILE Mob. Comput. Commun. Rev., vol. 5, pp. 3--55, Jan. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. E. Metz, "Basic principles of roc analysis," Seminars in Nuclear Medicine, vol. 8, no. 4, pp. 283--298, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  15. F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, and C. Gagné, "DEAP: Evolutionary algorithms made easy," Journal of Machine Learning Research, vol. 13, pp. 2171--2175, jul 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. S. Tang, K. F. Man, S. Kwong, and Q. He, "Genetic algorithms and their applications," IEEE Signal Processing Magazine, vol. 13, pp. 22--37, Nov 1996.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    CompSysTech '17: Proceedings of the 18th International Conference on Computer Systems and Technologies
    June 2017
    358 pages
    ISBN:9781450352345
    DOI:10.1145/3134302

    Copyright © 2017 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 23 June 2017

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    CompSysTech '17 Paper Acceptance Rate42of107submissions,39%Overall Acceptance Rate241of492submissions,49%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader