Skip to main content
Log in

Random resampling algorithms for addressing the imbalanced dataset classes in insider threat detection

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

Cybersecurity threats can be perpetrated by insiders or outsiders. The threats that could be carried out by insiders are far more serious due to their privileged access, which they may use to cause financial loss and reputation harm for an organization. Thus, insider threats represent a major cybersecurity challenge for private and government organizations. Researchers and cybersecurity practitioners have proposed different approaches for detecting and mitigating insider threats, but they face many challenges (e.g., dataset availability and the highly imbalanced classes of the available dataset). Because the shortcoming of an insider threat dataset, the benchmarking dataset given by The Computer Emergency Response Team (CERT) was used to validate the majority of the insider threat detection approaches. The CERT dataset of insider threat is extremely imbalanced, and hence, once utilized to validate an insider threat detection model, the detection results may be biased and inaccurate. Such imbalance issue of the CERT dataset is ignored by most existing approaches of insider threat detection. As result, effective model is required to detect insider data leakage incidents from an imbalanced dataset more precisely. In this paper an insider data leakage detection model is proposed to leverage various random sampling techniques and well-known machine learning algorithms to deal with the dataset's extremely imbalanced classes. We evaluate the model on CERT r4.2 insider threat dataset utilizing different sampling techniques, and then compare its performance with the baseline and existing work. The empirical results show that by resolving the imbalanced dataset issue, our model enhances the detection performance of insider data leakage events by surpassing existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig.1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

The dataset analyzed during the current study is publicly available.

References

  1. Silowash, G., Shimeall, T.J., Cappelli, D., Moore, A., Flynn, L., Trzeciak, R.: Common sense guide to mitigating threats. CERT Progr. 4, 1–144 (2012)

    Google Scholar 

  2. IBM, O. and: Cost of Insider Threats: Global Report, https://www.proofpoint.com/us/resources/threat-reports/2020-cost-of-insider-threats

  3. CSO, CERT Division of SEI-CMU, U.S. Secret Service, and K.: The 2018 U.S. state of cybercrime survey, https://www.idg.com/tools-for-marketers/2018-u-s-state-of-cybercrime/

  4. Partners, C.R.: 2019 insider threat report, https://crowdresearchpartners.com/insider-threat-report/

  5. Collins, M.: Common sense guide to mitigating insider threats. CARNEGIE-MELLON UNIV PITTSBURGH PA PITTSBURGH United States (2016)

  6. Azaria, A., Richardson, A., Kraus, S., Subrahmanian, V.S.: Behavioral analysis of insider threat: a survey and bootstrapped prediction in imbalanced data. IEEE Trans. Comput. Soc. Syst. 1, 135–155 (2014). https://doi.org/10.1109/TCSS.2014.2377811

    Article  Google Scholar 

  7. Homoliak, I., Toffalini, F., Guarnizo, J., Elovici, Y., Ochoa, M.: Insight into insiders and it: a survey of insider threat taxonomies, analysis, modeling, and countermeasures. ACM Comput. Surv. 52, 30 (2018)

    Google Scholar 

  8. Liu, L., De Vel, O., Han, Q.L., Zhang, J., Xiang, Y.: Detecting and Preventing Cyber Insider Threats: A Survey, https://ieeexplore.ieee.org/document/8278157/, (2018)

  9. Zeadally, S., Yu, B., Jeong, D.H., Liang, L.: Detecting insider threats solutions and trends. Inf. Secur. J. 21, 183–192 (2012). https://doi.org/10.1080/19393555.2011.654318

    Article  Google Scholar 

  10. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 41, 1–58 (2009)

    Article  Google Scholar 

  11. Salem, M. Ben, Hershkop, S., Stolfo, S.J.: A Survey of Insider Attack Detection Research. In: Insider Attack and Cyber Security. Springer US, Boston, MA (2008)

  12. Alsowail, R.A., Al-Shehari, T.: Empirical detection techniques of insider threat incidents. IEEE Access. 8, 78385–78402 (2020). https://doi.org/10.1109/ACCESS.2020.2989739

    Article  Google Scholar 

  13. Gayathri, R.G., Sajjanhar, A., Xiang, Y.: Image-based feature representation for insider threat classification. Appl. Sci. (2020). https://doi.org/10.3390/app10144945

    Article  Google Scholar 

  14. CERT and ExactData LLC: Insider Threat Test Dataset, https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=508099

  15. Glasser, J., Lindauer, B.: Bridging the gap: A pragmatic approach to generating insider threat data. In: Proceedings - IEEE CS Security and Privacy Workshops, SPW 2013. pp. 98–104. IEEE (2013)

  16. Roberts, S.C., Holodnak, J.T., Nguyen, T., Yuditskaya, S., Milosavljevic, M., Streilein, W.W.: A Model-Based Approach to Predicting the Performance of Insider Threat Detection Systems. In: 2016 IEEE Security and Privacy Workshops (SPW). pp. 314–323. IEEE, San Jose, CA, USA (2016)

  17. Rashid, T., Agrafiotis, I., Nurse, J.R.C.: A new take on detecting insider threats: Exploring the use of Hidden Markov Models. In: MIST 2016 - Proceedings of the International Workshop on Managing Insider Security Threats, co-located with CCS 2016 (2016)

  18. Le, D.C., Zincir-Heywood, A.N.: Evaluating Insider Threat Detection Workflow Using Supervised and Unsupervised Learning. In: 2018 IEEE Security and Privacy Workshops (SPW). pp. 270–275. IEEE, San Francisco, CA, USA (2018)

  19. Le, D.C., Zincir-Heywood, N.: Exploring anomalous behaviour detection and classification for insider threat identification. Int. J. Netw. Manag. (2021). https://doi.org/10.1002/nem.2109

    Article  Google Scholar 

  20. Lo, O., Buchanan, W.J., Griffiths, P., Macfarlane, R.: Distance measurement methods for improved insider threat detection. Secur. Commun. Networks. (2018). https://doi.org/10.1155/2018/5906368

    Article  Google Scholar 

  21. Lin, L., Zhong, S., Jia, C., Chen, K.: Insider threat detection based on deep belief network feature representation. In: Proceedings - 2017 International Conference on Green Informatics, ICGI 2017 (2017)

  22. Le, D.C., Zincir-Heywood, A.N., Heywood, M.I.: Dynamic insider threat detection based on adaptable genetic programming. In: 2019 IEEE Symposium Series on Computational Intelligence, SSCI 2019 (2019)

  23. Lv, B., Wang, D., Wang, Y., Lv, Q., Lu, D.: A hybrid model based on multi-dimensional features for insider threat detection. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018)

  24. Hall, A.J., Pitropakis, N., Buchanan, W.J., Moradpoor, N.: Predicting malicious insider threat scenarios using organizational data and a heterogeneous stack-classifier. In: 2018 IEEE International Conference on Big Data (Big Data). pp. 5034–5039. IEEE (2018)

  25. Aldairi, M., Karimi, L., Joshi, J.: A trust aware unsupervised learning approach for insider threat detection. In: Proceedings - 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science, IRI 2019 (2019)

  26. Gamachchi, A., Boztas, S.: Insider threat detection through attributed graph clustering. In: 2017 IEEE Trustcom/BigDataSE/ICESS. pp. 112–119. IEEE, Sydney, NSW, Australia (2017)

  27. Yuan, F., Cao, Y., Shang, Y., Liu, Y., Tan, J., Fang, B.: Insider threat detection with deep neural network. In: Shi, Y., Haohuan, Fu., Tian, Y., Krzhizhanovskaya, V.V., Lees, M.H., Dongarra, J., Sloot, P.M.A. (eds.) Computational Science – ICCS 2018: 18th International Conference, Wuxi, China, June 11–13, 2018, Proceedings, Part I, pp. 43–54. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-319-93698-7_4

    Chapter  Google Scholar 

  28. Diop, A., Emad, N., Winter, T.: A parallel and scalable framework for insider threat detection. In: Proceedings - 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics, HiPC 2020 (2020)

  29. Tabash, K.A., Happa, J.: Insider-threat detection using gaussian mixture models and sensitivity profiles. Comput. Secur. 77, 838–859 (2018)

    Article  Google Scholar 

  30. Le, D.C., Zincir-Heywood, N., Heywood, M.I.: Analyzing data granularity levels for insider threat detection using machine learning. IEEE Trans. Netw. Serv. Manag. 17, 30–44 (2020). https://doi.org/10.1109/TNSM.2020.2967721

    Article  Google Scholar 

  31. Yuan, F., Shang, Y., Liu, Y., Cao, Y., Tan, J.: Data augmentation for insider threat detection with GAN. In: 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI). pp. 632–638. IEEE (2020)

  32. Le, D.C., Zincir-Heywood, N.: Anomaly detection for insider threats using unsupervised ensembles. IEEE Trans. Netw. Serv. Manag. 18, 1152–1164 (2021). https://doi.org/10.1109/TNSM.2021.3071928

    Article  Google Scholar 

  33. Wang, J., Cai, L., Yu, A., Meng, D.: Embedding learning with heterogeneous event sequence for insider threat detection. In: 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI). pp. 947–954. IEEE (2019)

  34. Ye, X., Han, M.-M.: An improved feature extraction algorithm for insider threat using hidden Markov model on user behavior detection. Inf. Comput. Secur. (2020)

  35. Al-Mhiqani, M.N., Ahmed, R., Abidin, Z.Z., Isnin, S.N.: An integrated imbalanced learning and deep neural network model for insider threat detection. Int. J. Adv. Comput. Sci. Appl. 12, (2021)

  36. Alsowail, R.A., Al-Shehari, T.: A multi-tiered framework for insider threat prevention. Electronics 10, 1005 (2021). https://doi.org/10.3390/electronics10091005

    Article  Google Scholar 

  37. Nelli, F.: Machine learning with scikit-learn. Python Data Anal. 19, 237–264 (2015). https://doi.org/10.1007/978-1-4842-0958-5_8

    Article  Google Scholar 

  38. Su, L., Wu, S., Cao, D.: Windows-based analysis for HFS+ file system. Adv. Mater. Res. (2011). https://doi.org/10.4028/www.scientific.net/AMR.179-180.538

    Article  Google Scholar 

  39. Pereira, R.M., Costa, Y.M.G., Silla, C.N., Jr.: Toward hierarchical classification of imbalanced data using random resampling algorithms. Inf. Sci. (Ny) 578, 344–363 (2021). https://doi.org/10.1016/j.ins.2021.07.033

    Article  MathSciNet  Google Scholar 

  40. He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley, New Jersey (2013)

    Book  MATH  Google Scholar 

  41. Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49, 1–50 (2016). https://doi.org/10.1145/2907070

    Article  Google Scholar 

  42. Alqahtani, Gumaei, Mathkour, Maher Ben Ismail: A Genetic-Based Extreme Gradient Boosting Model for Detecting Intrusions in Wireless Sensor Networks. Sensors. 19, 4383 (2019). https://doi.org/10.3390/s19204383

  43. Al-Rakhami, M., Gumaei, A., Alsanad, A., Alamri, A., Hassan, M.M.: An ensemble learning approach for accurate energy load prediction in residential buildings. IEEE Access. 7, 48328–48338 (2019). https://doi.org/10.1109/ACCESS.2019.2909470

    Article  Google Scholar 

  44. Farnaaz, N., Jabbar, M.A.: Random forest modeling for network intrusion detection system. Proced. Comput. Sci. 89, 213–217 (2016). https://doi.org/10.1016/j.procs.2016.06.047

    Article  Google Scholar 

  45. Jabbar, M.A., Deekshatulu, B.L., Chndra, P.: Alternating decision trees for early diagnosis of heart disease. In: International Conference on Circuits, Communication, Control and Computing. pp. 322–328. IEEE (2014)

  46. Ali, J., Khan, R., Ahmad, N., Maqsood, I.: Random forests and decision trees. Int. J. Comput. Sci. Issues. 9, (2012)

  47. Fawagreh, K., Gaber, M.M., Elyan, E.: Random forests: from early developments to recent advancements. Syst. Sci. Control Eng. 2, 602–609 (2014). https://doi.org/10.1080/21642583.2014.956265

    Article  Google Scholar 

  48. Sahu, S., Mehtre, B.M.: Network intrusion detection system using J48 Decision Tree. In: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI). pp. 2023–2026. IEEE (2015)

  49. Ruggieri, S.: Efficient C4.5 [classification algorithm]. IEEE Trans. Knowl. Data. Eng. 14, 438–444 (2002)

    Article  Google Scholar 

  50. Chen, F., Ye, Z., Wang, C., Yan, L., Wang, R.: A Feature Selection Approach for Network Intrusion Detection Based on Tree-Seed Algorithm and K-Nearest Neighbor. In: 2018 IEEE 4th International Symposium on Wireless Systems within the International Conferences on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS-SWS). pp. 68–72. IEEE (2018)

  51. Buczak, A.L., Guven, E.: A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tutorials. 18, 1153–1176 (2016). https://doi.org/10.1109/COMST.2015.2494502

    Article  Google Scholar 

  52. Jin, Y., Wang, H., Sun, C.: Introduction to Machine Learning. Presented at the (2021)

  53. Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10, e0118432 (2015). https://doi.org/10.1371/journal.pone.0118432

    Article  Google Scholar 

  54. Liu, Z., Bondell, H.D.: Binormal precision-recall curves for optimal classification of imbalanced data. Stat. Biosci. 11, 141–161 (2019). https://doi.org/10.1007/s12561-019-09231-9

    Article  Google Scholar 

  55. Paper, D.: Hands-on Scikit-Learn for Machine Learning Applications. Apress, Berkeley, CA (2020)

    Book  Google Scholar 

  56. Singh, M., Mehtre, B.M., Sangeetha, S.: Insider Threat Detection Based on User Behaviour Analysis. In: International Conference on Machine Learning, Image Processing, Network Security and Data Sciences. pp. 559–574. Springer (2020)

  57. Yuan, F., Shang, Y., Liu, Y., Cao, Y., Tan, J.: Attention-based LSTM for insider threat detection. In: International Conference on Applications and Techniques in Information Security. pp. 192–201. Springer (2019)

  58. Al-Shehari, T., Alsowail, R.A.: An insider data leakage detection using one-hot encoding, synthetic minority oversampling and machine learning techniques. Entropy 23, 1258 (2021). https://doi.org/10.3390/e23101258

    Article  Google Scholar 

Download references

Acknowledgements

The authors of this study extend their appreciation to the Researchers Supporting Project number (RSPD2023R544), King Saud University, Riyadh, Saudi Arabia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taher Al-Shehari.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Shehari, T., Alsowail, R.A. Random resampling algorithms for addressing the imbalanced dataset classes in insider threat detection. Int. J. Inf. Secur. 22, 611–629 (2023). https://doi.org/10.1007/s10207-022-00651-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-022-00651-1

Keywords

Navigation