Skip to main content
Log in

FAIXID: A Framework for Enhancing AI Explainability of Intrusion Detection Results Using Data Cleaning Techniques

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

Organizations depend on heavy use of various cyber defense technologies, including intrusion detection and prevention systems, to monitor and protect networks and devices from malicious activities. However, large volumes of false alerts from such technologies challenge cybersecurity analysts in isolating credible alerts from false positives for further investigations. In this article, we propose a framework named FAIXID that leverages Explainable Artificial Intelligence (XAI) and data cleaning methods for improving the explainability and understandability of intrusion detection alerts, which in turn assist cyber analysts in making more informed decisions fueled by the quick elimination of false positives. We identified five functional modules in FAIXID: (1) the pre-modeling explainability module that improves the quality of network traffic’s data through data cleaning; (2) the modeling module that provides explanations of the AI models to help analysts make sense of the model internals; (3) the post-modeling explainability module that provides additional explanations to enhance the understandability of the results produced by the AI models; (4) the attribution module that selects the appropriate explanations for the analysts according to their needs; and (5) the evaluation module that evaluates the explanations and collects feedback from analysts. FAIXID has been implemented and evaluated using experiments with real-world datasets. Evaluation of results demonstrates that the utilization of data cleaning and AI explainability techniques provide quality explanations to analysts depending on their expertise and backgrounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://www.defense.gov

  2. A type of fraud occurs online in pay-per-click advertising.

References

  1. D’Amico, A., Whitley, K.: The real work of computer network defense analysts. In VizSEC 2007, pp 19–37. Springer, New York (2008)

  2. Zhong, C., Yen, J., Liu, P., Erbacher, R.F., Garneau, C., Chen, B.: Studying analysts’ data triage operations in cyber defense situational analysis. In: Theory and models for cyber situation awareness, pp. 128–169. Springer, (2017)

  3. Islam, S.R., Eberle, W., Ghafoor, S.K., Siraj, A., Rogers, M.: Domain knowledge aided explainable artificial intelligence for intrusion detection and response. arXiv preprint arXiv:1911.09853, (2019)

  4. Amarasinghe, K., Manic, M.: Improving user trust on deep neural networks based intrusion detection systems. In IECON 2018-44th Annual Conference of the IEEE Industrial Electronics Society, pp. 3262–3268. IEEE, (2018)

  5. Chu, X., Ilyas, I.F., Krishnan, S., Wang, J.: Data cleaning: Overview and emerging challenges. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2201–2206 (2016)

  6. Ding, Xiaoou, Wang, Hongzhi, Su, Jiaxuan, Li, Zijue, Li, Jianzhong, Gao, Hong: Cleanits: a data cleaning system for industrial time series. Proc. VLDB Endow. 12(12), 1786–1789 (2019)

    Article  Google Scholar 

  7. Krishnan, Sanjay, Wang, Jiannan, Wu, Eugene, Franklin, Michael J, Goldberg, Ken: Activeclean: interactive data cleaning for statistical modeling. Proc. VLDB Endow. 9(12), 948–959 (2016)

    Article  Google Scholar 

  8. Yu, Z., Chu, X.: Piclean: a probabilistic and interactive data cleaning system. In: Proceedings of the 2019 International Conference on Management of Data, pp. 2021–2024 (2019)

  9. Lipton, Z.C.: The mythos of model interpretability. arXiv preprint arXiv:1606.03490 (2016)

  10. Onwubiko, C.: Cocoa: An ontology for cybersecurity operations centre analysis process. In: 2018 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA), pp. 1–8. IEEE (2018)

  11. Ganesan, Rajesh, Jajodia, Sushil, Shah, Ankit, Cam, Hasan: Dynamic scheduling of cybersecurity analysts for minimizing risk using reinforcement learning. ACM Trans. Intell. Syst. Technol. TIST 8(1), 1–21 (2016)

    Article  Google Scholar 

  12. Zhong, Chen, Yen, John, Liu, Peng, Erbacher, Robert F: Learning from experts’ experience: toward automated cyber security data triage. IEEE Syst. J. 13(1), 603–614 (2018)

    Article  Google Scholar 

  13. Feng, C., Wu, S., Liu, N.: A user-centric machine learning framework for cyber security operations center. In: 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 173–175. IEEE (2017)

  14. Zhong, C., Yen, J., Liu, P.: Can cyber operations be made autonomous? an answer from the situational awareness viewpoint. In: Adaptive Autonomous Secure Cyber Systems, pp. 63–88. Springer (2020)

  15. Peng, K., Leung, V.C.M., Huang, Q.: Clustering approach based on mini batch kmeans for intrusion detection system over big data. IEEE Access 6, 11897–11906 (2018)

    Article  Google Scholar 

  16. Otoum, S., Kantarci, B., Mouftah, H.: Empowering reinforcement learning on big sensed data for intrusion detection. In: IEEE International Conference on Communications (ICC), pp. 1–7 (2019)

  17. Uwagbole, S.O., Buchanan, W.J., Fan, L.: Applied machine learning predictive analytics to sql injection attack detection and prevention. In: IFIP/IEEE Symposium on Integrated Network and Service Management (IM), pp. 1087–1090, Lisbon (2017)

  18. Aloqaily, M., Otoum, S., Ridhawi, I.A., Jararweh, Y.: An intrusion detection system for connected vehicles in smart cities. Ad Hoc Netw. 90, 101842. Recent advances on security and privacy in Intelligent Transportation Systems. (2019)

  19. Goeschel, K.: Reducing false positives in intrusion detection systems using data-mining techniques utilizing support vector machines, decision trees, and naive bayes for off-line analysis. In: SoutheastCon 2016, pp. 1–6, Norfolk, VA (2016)

  20. Hachmi, Fatma, Boujenfa, Khadouja, Limam, Mohamed: Enhancing the accuracy of intrusion detection systems by reducing the rates of false positives and false negatives through multi-objective optimization. J. Netw. Syst. Manag. 27, 93–120 (2019)

    Article  Google Scholar 

  21. Gil Pérez, Manuel, FMármol, élix Gómez, Pérez, Gregorio Martínez, Skarmeta Gómez, Antonio F.: Repcidn: a reputation-based collaborative intrusion detection network to lessen the impact of malicious alarms. J. Netw. Syst. Manag. 21, 128–167 (2013)

    Article  Google Scholar 

  22. Khosravi-Farmad, Masoud, Ghaemi-Bafghi, Abbas: Bayesian decision network-based security risk management framework. J. Netw. Syst. Manag. 28, 1794–1819 (2020)

    Article  Google Scholar 

  23. Otoum, S., Kantarci, B., Mouftah, H.T.: Mitigating false negative intruder decisions in wsn-based smart grid monitoring. In: 13th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 153–158 (2017)

  24. Liu, Y., Sarabi, A., Zhang, J., Naghizadeh, P., Karir, M., Bailey, M., Liu, M.: Cloudy with a chance of breach: Forecasting cyber security incidents. In: 24th USENIX Security Symposium (USENIX Security 15), pp. 1009–1024, Washington, D.C., (August 2015). USENIX Association

  25. Soska, K., Christin, N.: Automatically detecting vulnerable websites before they turn malicious. In: 23rd USENIX Security Symposium (USENIX Security 14), pp. 625–640, San Diego, CA, (August 2014). USENIX Association

  26. Ritter, A., Wright, E., Casey, W.A., Michael, T.: Weakly supervised extraction of computer security events from twitter. In: Proceedings of the 24th International Conference on World Wide Web WWW, pp. 896–905 (2015)

  27. Yang, H., Ma, X., Du, K., Li, Z., Duan, H., Su, X., Liu, G., Geng, Z., Wu, J.: How to learn klingon without a dictionary: Detection and measurement of black keywords used by the underground economy. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 751–769 (2017)

  28. Sabottke, Carl., Suciu, Octavian., Dumitras, Tudor.: Vulnerability disclosure in the age of social media: Exploiting twitter for predicting real-world exploits. In 24th USENIX Security Symposium (USENIX Security 15), pages 1041–1056, Washington, D.C., USENIX Association. (2015)

  29. Borgolte, K., Kruegel, C., Vigna, G.: Delta: automatic identification of unknown web-based infection campaigns. In: The ACM SIGSAC Conference on Computer and Communications Security, pp. 109–120 (2013)

  30. Wang, M., Zheng, K., Yang, Y., Wang, X.: An explainable machine learning framework for intrusion detection systems. IEEE Access 8, 73127–73141 (2020)

    Article  Google Scholar 

  31. Chandrasekaran, B., Tanner, M.C., Josephson, J.R.: Explaining control strategies in problem solving. IEEE Intell. Syst. (1), 9–15 (1989)

  32. Swartout, W.R., Moore, J.D.: Explanation in second generation expert systems. In: Second Generation Expert Systems, pp. 543–585. Springer (1993)

  33. Swartout, W.R.: Rule-based expert systems: The mycin experiments of the stanford heuristic programming project: Bg buchanan and eh shortliffe,(Addison-Wesley, Reading, MA, 1984), p. 702 (1985)

  34. Esper, M.T.: Ai ethical principles. https://www.defense.gov/Newsroom/Releases/Release/Article/2091996/dod-adopts-ethical-principles-for-artificial-intelligence/, (February 2020). Accessed 03 July 2020

  35. Yang, S.C.-H., Shafto, P.: Explainable artificial intelligence via bayesian teaching. In: NIPS 2017 Workshop on Teaching Machines, Robots, and Humans (2017)

  36. Lei, T., Barzilay, R., Jaakkola, T.: Rationalizing neural predictions. arXiv preprint arXiv:1606.04155 (2016)

  37. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM (2016)

  38. Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F., Sayres, R.: Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). arXiv preprint arXiv:1711.11279 (2017)

  39. Horel, E., Giesecke, K.: Towards explainable ai: Significance tests for neural networks. arXiv preprint arXiv:1902.06021 (2019)

  40. Marino, D.L., Wickramasinghe, C.S., Manic, M.: An adversarial approach for explainable ai in intrusion detection systems. In: IECON 2018—44th Annual Conference of the IEEE Industrial Electronics Society, pp. 3237–3243 (2018)

  41. Hartl, A., Bachl, M., Fabini, J., Zseby, T.: Explainability and adversarial robustness for rnns. In: 2020 IEEE Sixth International Conference on Big Data Computing Service and Applications (BigDataService), pp. 148–156 (2020)

  42. Kasun Amarasinghe, K., Kenney, K., Manic, M.: Toward explainable deep neural network based anomaly detection. In: 2018 11th International Conference on Human System Interaction (HSI), pp. 311–317 (2018)

  43. Wang, Zhidong, Lai, Yingxu, Liu, Zenghui, Liu, Jing: Explaining the attributes of a deep learning based intrusion detection system for industrial control networks. Sensors 20(14), 3817 (2020)

    Article  Google Scholar 

  44. Al Ridhawi, Ismaeel, Otoum, Safa, Aloqaily, Moayad, Boukerche, Azzedine: Generalizing AI: challenges and opportunities for plug and play AI solutions. IEEE Netw. 35(1), 372–379 (2020)

    Google Scholar 

  45. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, pp. 4765–4774 (2017)

  46. Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning, Vol. 70, pp. 3145–3153. JMLR.org (2017)

  47. Ando, S.: Interpreting random forests. http://blog.datadive.net/interpreting-random-forests/ (2019)

  48. Datta, A., Sen, S., Zick, Y.: Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 598–617. IEEE (2016)

  49. Štrumbelj, Erik, Kononenko, Igor: Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41(3), 647–665 (2014)

    Article  Google Scholar 

  50. Lipovetsky, Stan, Conklin, Michael: Analysis of regression in game theory approach. Appl. Stoch. Models Bus. Ind. 17(4), 319–330 (2001)

    Article  MathSciNet  Google Scholar 

  51. Bach, Sebastian, Binder, Alexander, Montavon, Grégoire, Klauschen, Frederick, Müller, Klaus-Robert, Samek, Wojciech: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015)

    Article  Google Scholar 

  52. Lundberg, Scott.: Shap vs lime

  53. Arrieta, A.B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., García, S., Gil-López, S., Molina, D., Benjamins, R., et al.: Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Inf. Fusion 58, 82–115 (2020)

  54. Liu, H., Kim, J.: Data quality assessment and problem severity assessment for data cleaning. In: The 15th International Conference on Data Science, pp. 207–210 (2019)

  55. Dash, S., Gunluk, O., Wei, D.: Boolean decision rules via column generation. In: Advances in Neural Information Processing Systems, pp. 4655–4665 (2018)

  56. Islam, S.R., Eberle, W., Bundy, S., Ghafoor, S.K.: Infusing domain knowledge in ai-based” black box” models for better explainability with application in bankruptcy prediction. arXiv preprint arXiv:1905.11474 (2019)

  57. Dhurandhar, A., Chen, P.-Y., Luss, R., Tu, C.-C., Ting, P., Shanmugam, K., Das, P.: Explanations based on the missing: towards contrastive explanations with pertinent negatives. In: Advances in Neural Information Processing Systems, pp. 592–603 (2018)

  58. Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016)

  59. Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017)

  60. Melis, D.A., Jaakkola, T.: Towards robust interpretability with self-explaining neural networks. In: Advances in Neural Information Processing Systems, pp. 7775–7784 (2018)

  61. Luss, R., Chen, P.-Y., Dhurandhar, A., Sattigeri, P., Zhang, Y., Shanmugam, K., Tu, C.-C.: Generating contrastive explanations with monotonic attribute functions. arXiv preprint arXiv:1905.12698 (2019)

  62. Maciá-Fernández, Gabriel, Camacho, José, Magán-Carrión, Roberto, García-Teodoro, Pedro, Therón, Roberto: Ugr‘16: a new dataset for the evaluation of cyclostationarity-based network idss. Comput. Secur. 73, 411–424 (2018)

    Article  Google Scholar 

  63. Arya, V., Bellamy, R. K.E., Chen, P.-Y., Dhurandhar, A., Hind, M., Hoffman, S.C., Houde, S., Liao, Q.V., Luss, R., Mojsilović, A. et al.: One explanation does not fit all: A toolkit and taxonomy of ai explainability techniques. arXiv preprint arXiv:1909.03012 (2019)

  64. Wei, D., Dash, S., Gao, T., Günlük, O.: Generalized linear rule models. arXiv preprint arXiv:1906.01761 (2019)

  65. Gurumoorthy, K.S., Dhurandhar, A., Cecchi, G., Aggarwal, C.: Efficient data representation by selecting prototypes with importance weights. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 260–269. IEEE (2019)

  66. Islam, S.R., Eberle, W., Ghafoor, S.K.: Towards quantification of explainability in explainable artificial intelligence methods. AAAI Publications, The Thirty-Third International Flairs Conference (2020)

  67. Miller, George A: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63(2), 81 (1956)

    Article  Google Scholar 

  68. Wolf, C.T., Ringland, K.E.: Designing accessible, explainable ai (xai) experiences. In: ACM SIGACCESS Accessibility and Computing (125):1–1 (2020)

Download references

Acknowledgements

Awny Alnusair was supported by the IU Kokomo summer faculty fellowship program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Awny Alnusair.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Zhong, C., Alnusair, A. et al. FAIXID: A Framework for Enhancing AI Explainability of Intrusion Detection Results Using Data Cleaning Techniques. J Netw Syst Manage 29, 40 (2021). https://doi.org/10.1007/s10922-021-09606-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10922-021-09606-8

Keywords

Navigation