Skip to main content
Log in

The Analysis of Firewall Policy Through Machine Learning and Data Mining

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Firewalls are primary components for ensuring the network and information security. For this purpose, they are deployed in all commercial, governmental and military networks as well as other large-scale networks. The security policies in an institution are implemented as firewall rules. An anomaly in these rules may lead to serious security gaps. When the network is large and policies are complicated, manual cross-check may be insufficient to detect anomalies. In this paper, an automated model based on machine learning and high performance computing methods is proposed for the detection of anomalies in firewall rule repository. To achieve this, firewall logs are analysed and the extracted features are fed to a set of machine learning classification algorithms including Naive Bayes, kNN, Decision Table and HyperPipes. F-measure, which combines precision and recall, is used for performance evaluation. In the experiments, kNN has shown the best performance. Then, a model based on the F-measure distribution was envisaged. 93 firewall rules were analysed via this model. The model anticipated that 6 firewall rules cause anomaly. These problematic rules were checked against the security reports prepared by experts and each of them are verified to be an anomaly. This paper shows that anomalies in firewall rules can be detected by analysing large scale log files automatically with machine learning methods, which enables avoiding security breaches, saving dramatic amount of expert effort and timely intervention.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Al-Shaer, E. (2004). Managing firewall and network-edge security policies. In 2004 IEEE/IFIP Network Operations and Management Symposium (Vol. 1, p. 926). Seoul: IEEE. doi:10.1109/NOMS.2004.1317810.

  2. Al-Shaer, E., Hamed, H., Boutaba, R., & Hasan, M. (2005). Conflict classification and analysis of distributed firewall policies. IEEE Journal on Selected Areas in Communications, 23(10), 2069–2084. doi:10.1109/JSAC.2005.854119.

    Article  Google Scholar 

  3. Al-Shaer, E. S., & Hamed, H. H. (2003). Firewall policy advisor for anomaly discovery and rule editing. In G. Goldszmidt & J. Schnwlder (Eds.), Integrated network management VIII: Managing it all (p. 1730). Boston, MA: Springer. doi:10.1007/978-0-387-35674-7.

  4. Alpaydın, E. (2010). Introduction to machine learning (2nd ed.). Cambridge, MA, London: MIT Press.

    MATH  Google Scholar 

  5. Breier, J., & Branišová, J. (2015). A dynamic rule creation based anomaly detection method for identifying security breaches in log records. Wireless Personal Communications,. doi:10.1007/s11277-015-3128-1.

    Google Scholar 

  6. Caruso, C., Malerba, D., & Papagni, D. (2005). Learning the daily model of network traffic. In Foundations of Intelligent Systems(pp. 131–141). Saratoga Springs, NY. http://link.springer.com/chapter/10.1007/11425274_14.

  7. Chen, N., Shou, G., Hu, Y., & Guo, Z. (2009). An experimental research of traffic identification algorithms in broadband network. In 2009 International Symposium on Computer Network and Multimedia Technology(pp. 1–4). Wuhan: IEEE. doi:10.1109/CNMT.2009.5374758.

  8. Chmura Kraemer, H., Periyakoil, V. S., & Noda, A. (2002). Kappa coefficients in medical research. Statistics in Medicine, 21(14), 2109–2129. doi:10.1002/sim.1180.

    Article  Google Scholar 

  9. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. doi:10.1177/001316446002000104.

    Article  Google Scholar 

  10. Cover, T., & Hart, P. (1967). Nearest neighbour pattern classification. IEEE Transactions on Information Theory, 13(1), 2127. doi:10.1109/TIT.1967.1053964.

    Article  MATH  Google Scholar 

  11. Eisenstein, J., & Davis, R. (2004). Visual and linguistic information in gesture classification. In Proceedings of the 6th International Conference on Multimodal Interfaces—ICMI04, (p. 113). New York, NY: ACM Press. doi:10.1145/1027933.1027954.

  12. Frei, A., & Rennhard, M. (2008). Histogram matrix: Log file visualization for anomaly detection. In ARES 2008—3rd International Conference on Availability, Security, and Reliability, Proceedings (pp. 610–617). doi:10.1109/ARES.2008.148.

  13. Golnabi, K., Min, R. K., Khan, L., & Al-Shaer, E. (2006). Analysis of firewall policy rules using data mining techniques. In 10th IEEE/IFIP Network Operations and Management Symposium NOMS 2006 (Vol. 5, pp. 305–315). IEEE. doi:10.1109/NOMS.2006.1687561.

  14. Gouda, M. G., & Liu, A. X. (2007). Structured firewall design. Computer Networks, 51(4), 1106–1120. doi:10.1016/j.comnet.2006.06.015.

    Article  MATH  Google Scholar 

  15. Han, J., & Kamber, M. (2006). Data mining concepts and techniques. In J. Gray (Ed.), Data mining: Concepts and techniques (2nd ed.). San Francisco, CA: Morgan Kaufmann Publishers.

  16. Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11(1), 63–91.

    Article  MATH  Google Scholar 

  17. Hu, H., Ahn, G. J., & Kulkarni, K. (2012). Detecting and resolving firewall policy anomalies. IEEE Transactions on Dependable and Secure Computing, 9(3), 318–331. doi:10.1109/TDSC.2012.20.

    Article  Google Scholar 

  18. Hunt, R. (1998). Internet/intranet firewall security-policy, architecture and transaction services. Computer Communications, 21(13), 1107–1123. doi:10.1016/S0140-3664(98)00173-X.

    Article  Google Scholar 

  19. Kerdegari, H., Samsudin, K., Ramli, A. R., & Mokaram, S. (2012). Evaluation of fall detection classification approaches. In 2012 4th International Conference on Intelligent and Advanced Systems (ICIAS2012) (Vol. 1, pp. 131–136). Kuala Lumpur: IEEE. doi:10.1109/ICIAS.2012.6306174.

  20. Khan, B., Khan, M. K., Mahmud, M., & Alghathbar, K. S. (2010). Security analysis of firewall rule sets in computer networks. In 2010 Fourth International Conference on Emerging Security Information, Systems and Technologies (pp. 51–56). Venice: IEEE. doi:10.1109/SECURWARE.2010.16.

  21. Kowalski, K., & Beheshti, M. (2006). Analysis of log files intersections for security enhancement. In Third International Conference on Information Technology: New Generations (ITNG06) (pp. 452–457). Las Vegas: IEEE. doi:10.1109/ITNG.2006.32

  22. Lai, K., & Kelley, K. (2011). Accuracy in parameter estimation for targeted effects in structural equation modeling: Sample size planning for narrow confidence intervals. Psychological Methods, 16(2), 127–148. doi:10.1037/a0021764.

    Article  Google Scholar 

  23. Liu, A. X. (2012). Firewall policy change-impact analysis. ACM Transactions on Internet Technology, 11(4), 1–24. doi:10.1145/2109211.2109212.

    Article  Google Scholar 

  24. Maratea, A., Petrosino, A., & Manzo, M. (2014). Adjusted F-measure and kernel scaling for imbalanced data learning. Information Sciences, 257, 331–341. doi:10.1016/j.ins.2013.04.016.

    Article  Google Scholar 

  25. Moses, K. P., & Devadas, M. D. (2012). An approach to reduce root mean square error in toposheets. European Journal of Scientific Researach, 91(2), 268–274.

    Google Scholar 

  26. Nilsson, N. J. (1998). Introduction to Machine Learning. Stanford, CA. Retrieved from http://robotics.stanford.edu/people/nilsson/mlbook.html.

  27. NIST. (2016). National Vulnerability Database. Technical report, National Institute of Standarts and Information Technology Laboratory, Gaithersburg, MD. https://nvd.nist.gov/home.cfm.

  28. Olson, D. L., & Delen, D. (2008). Advanced data mining techniques(1st edn.). Berlin, Heidelberg: Springer. doi:10.1007/978-3-540-76917-0.

  29. Parker, A., de Cortázar-Atauri, I. G., Chuine, I., Barbeau, G., Bois, B., Boursiquot, J. M., et al. (2013). Classification of varieties for their timing of flowering and veraison using a modelling approach: A case study for the grapevine species Vitis vinifera L. Agricultural and Forest Meteorology, 180, 249–264. doi:10.1016/j.agrformet.2013.06.005.

    Article  Google Scholar 

  30. Pietraszek, T., & Tanner, A. (2005). Data mining and machine learning towards reducing false positives in intrusion detection. Information Security Technical Report, 10(3), 169–183. doi:10.1016/j.istr.2005.07.001.

    Article  Google Scholar 

  31. Shinder, T. W., Amon, C., Shimonski, R. J., & Shinder, D. L. (2003). The best damn firewall book period. Rockland, MA: Syngress Publishing. doi:10.1016/B978-193183690-6/50046-7.

    Google Scholar 

  32. Smith, M. C., & Peterson, G. D. (2005). Parallel application performance on shared high performance reconfigurable computing resources. Performance Evaluation, 60(1–4), 107–125. doi:10.1016/j.peva.2004.10.004.

    Article  Google Scholar 

  33. Smusz, S., Kurczab, R., & Bojarski, A. J. (2013). A multidimensional analysis of machine learning methods performance in the classification of bioactive compounds. Chemometrics and Intelligent Laboratory Systems, 128, 89–100. doi:10.1016/j.chemolab.2013.08.003.

    Article  Google Scholar 

  34. Tran, T., Al-Shaer, E. S., & Boutaba, R. (2007). 055 PolicyVis: Firewall security policy visualization and inspection. In Proceedings of the 21st conference on Large Installation System Administration Conference USENIX Association (Vol. 7, pp. 1–16). http://usenix.org/event/lisa07/tech/full_papers/tran/tran.pdf.

  35. Viera, A. J., & Garrett, J. M. (2005). Understanding inter observer agreement: The kappa statistic. Family Medicine, 37(5), 360–363, http://www.ncbi.nlm.nih.gov/pubmed/15883903.

  36. Winding, R., Wright, T., & Chapple, M. (2006). System anomaly detection: Mining firewall logs. In 2006 Securecomm and Workshops (pp. 1–5). Baltimore, MD: IEEE. doi:10.1109/SECCOMW.2006.359572.

  37. Witten, I. H., & Frank, E. (2005). Data mining practical machine learning tools and techniques (2nd ed.). San Francisco, CA: Morgan Kaufmann Publishers Inc.

    MATH  Google Scholar 

  38. Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining practical machine learning tools and techniques (3rd ed.). Burlington, MA: Elsevier Inc.

    Google Scholar 

  39. Yoon, M., Chen, S., & Zhang, Z. (2010). Minimizing the maximum firewall rule set in a network with multiple firewalls. IEEE Transactions on Computers, 59(2), 218–230. doi:10.1109/TC.2009.172.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erkan Ozhan.

Appendix

Appendix

F-measure values of firewall rules according to kNN classifier.

Rule ID

TP rate

FP rate

Precision

Recall

F-measure

5

1

0

1

1

1

186

1

0

1

1

1

6

1

0

0.999

1

1

45

0.994

0

0.994

0.994

0.994

50

1

0

1

1

1

236

0.996

0.001

0.995

0.996

0.996

53

1

0

1

1

1

220

1

0

1

1

1

46

1

0

0.993

1

0.997

213

0.886

0

0.888

0.886

0.887

47

1

0

1

1

1

273

1

0

1

1

1

10

0.993

0

1

0.993

0.996

32

0.996

0

0.999

0.996

0.997

231

1

0

1

1

1

49

0.996

0

0.999

0.996

0.998

14

0.984

0

0.994

0.984

0.989

84

0.998

0

0.999

0.998

0.999

207

0.998

0

0.998

0.998

0.998

250

0.999

0

1

0.999

0.999

15

0.992

0

0.995

0.992

0.994

190

0.997

0

0.972

0.997

0.984

48

0.971

0

0.985

0.971

0.978

241

0.995

0

0.999

0.995

0.997

51

1

0

1

1

1

52

1

0

1

1

1

237

0.974

0

0.979

0.974

0.976

31

0.993

0

0.987

0.993

0.99

18

0.989

0

0.996

0.989

0.993

88

1

0

1

1

1

191

0.941

0

0.889

0.941

0.914

238

0.997

0

0.997

0.997

0.997

0

0.986

0

1

0.986

0.993

300

0.972

0

0.988

0.972

0.98

272

0.997

0

0.997

0.997

0.997

215

0.996

0

0.998

0.996

0.997

253

0.998

0

0.998

0.998

0.998

112

0.957

0

0.978

0.957

0.967

301

0.959

0

0.97

0.959

0.964

287

0.99

0

0.986

0.99

0.988

116

1

0

1

1

1

2

1

0

1

1

1

233

1

0

1

1

1

169

1

0

0.983

1

0.992

271

0.996

0

1

0.996

0.998

109

0.609

0

0.56

0.609

0.583

258

0.97

0

1

0.97

0.985

266

1

0

1

1

1

289

0.996

0

0.995

0.996

0.996

282

0.999

0

0.999

0.999

0.999

129

0.474

0

0.5

0.474

0.486

72

1

0

0.933

1

0.966

68

0.984

0

1

0.984

0.992

280

0.997

0

0.987

0.997

0.992

111

0.96

0

0.98

0.96

0.97

13

0.625

0

0.476

0.625

0.541

41

0.667

0

0.5

0.667

0.571

139

0.836

0

0.918

0.836

0.875

17

1

0

1

1

1

247

0.94

0

1

0.94

0.969

27

0.636

0

0.897

0.636

0.745

286

0.999

0

0.999

0.999

0.999

44

0.984

0

0.968

0.984

0.976

104

0.992

0

0.936

0.992

0.963

132

0.364

0

0.571

0.364

0.444

130

0.333

0

0.364

0.333

0.348

19

0.833

0

0.714

0.833

0.769

119

1

0

1

1

1

90

0.667

0

1

0.667

0.8

278

0.84

0

0.808

0.84

0.824

81

0.75

0

0.75

0.75

0.75

1

1

0

0.935

1

0.967

16

0.833

0

0.833

0.833

0.833

76

0.8

0

1

0.8

0.889

21

0.444

0

0.364

0.444

0.4

28

0.999

0

0.999

0.999

0.999

257

1

0

0.8

1

0.889

59

0.949

0

1

0.949

0.974

42

1

0

1

1

1

299

0.833

0

0.714

0.833

0.769

251

0.667

0

0.667

0.667

0.667

269

1

0

1

1

1

99

0.667

0

0.667

0.667

0.667

228

0

0

0

0

0

230

0

0

0

0

0

192

1

0

1

1

1

3

0

0

0

0

0

156

0

0

0

0

0

291

1

0

1

1

1

188

0.994

0

0.988

0.994

0.991

267

1

0

1

1

1

297

1

0

1

1

1

274

0.999

0

0.999

0.999

0.999

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ucar, E., Ozhan, E. The Analysis of Firewall Policy Through Machine Learning and Data Mining. Wireless Pers Commun 96, 2891–2909 (2017). https://doi.org/10.1007/s11277-017-4330-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-017-4330-0

Keywords

Navigation