The Analysis of Firewall Policy Through Machine Learning and Data Mining

Ucar, Erdem; Ozhan, Erkan

doi:10.1007/s11277-017-4330-0

The Analysis of Firewall Policy Through Machine Learning and Data Mining

Published: 17 May 2017

Volume 96, pages 2891–2909, (2017)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

1815 Accesses
21 Citations
3 Altmetric
Explore all metrics

Abstract

Firewalls are primary components for ensuring the network and information security. For this purpose, they are deployed in all commercial, governmental and military networks as well as other large-scale networks. The security policies in an institution are implemented as firewall rules. An anomaly in these rules may lead to serious security gaps. When the network is large and policies are complicated, manual cross-check may be insufficient to detect anomalies. In this paper, an automated model based on machine learning and high performance computing methods is proposed for the detection of anomalies in firewall rule repository. To achieve this, firewall logs are analysed and the extracted features are fed to a set of machine learning classification algorithms including Naive Bayes, kNN, Decision Table and HyperPipes. F-measure, which combines precision and recall, is used for performance evaluation. In the experiments, kNN has shown the best performance. Then, a model based on the F-measure distribution was envisaged. 93 firewall rules were analysed via this model. The model anticipated that 6 firewall rules cause anomaly. These problematic rules were checked against the security reports prepared by experts and each of them are verified to be an anomaly. This paper shows that anomalies in firewall rules can be detected by analysing large scale log files automatically with machine learning methods, which enables avoiding security breaches, saving dramatic amount of expert effort and timely intervention.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Al-Shaer, E. (2004). Managing firewall and network-edge security policies. In 2004 IEEE/IFIP Network Operations and Management Symposium (Vol. 1, p. 926). Seoul: IEEE. doi:10.1109/NOMS.2004.1317810.
Al-Shaer, E., Hamed, H., Boutaba, R., & Hasan, M. (2005). Conflict classification and analysis of distributed firewall policies. IEEE Journal on Selected Areas in Communications, 23(10), 2069–2084. doi:10.1109/JSAC.2005.854119.
Article Google Scholar
Al-Shaer, E. S., & Hamed, H. H. (2003). Firewall policy advisor for anomaly discovery and rule editing. In G. Goldszmidt & J. Schnwlder (Eds.), Integrated network management VIII: Managing it all (p. 1730). Boston, MA: Springer. doi:10.1007/978-0-387-35674-7.
Alpaydın, E. (2010). Introduction to machine learning (2nd ed.). Cambridge, MA, London: MIT Press.
MATH Google Scholar
Breier, J., & Branišová, J. (2015). A dynamic rule creation based anomaly detection method for identifying security breaches in log records. Wireless Personal Communications,. doi:10.1007/s11277-015-3128-1.
Google Scholar
Caruso, C., Malerba, D., & Papagni, D. (2005). Learning the daily model of network traffic. In Foundations of Intelligent Systems(pp. 131–141). Saratoga Springs, NY. http://link.springer.com/chapter/10.1007/11425274_14.
Chen, N., Shou, G., Hu, Y., & Guo, Z. (2009). An experimental research of traffic identification algorithms in broadband network. In 2009 International Symposium on Computer Network and Multimedia Technology(pp. 1–4). Wuhan: IEEE. doi:10.1109/CNMT.2009.5374758.
Chmura Kraemer, H., Periyakoil, V. S., & Noda, A. (2002). Kappa coefficients in medical research. Statistics in Medicine, 21(14), 2109–2129. doi:10.1002/sim.1180.
Article Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. doi:10.1177/001316446002000104.
Article Google Scholar
Cover, T., & Hart, P. (1967). Nearest neighbour pattern classification. IEEE Transactions on Information Theory, 13(1), 2127. doi:10.1109/TIT.1967.1053964.
Article MATH Google Scholar
Eisenstein, J., & Davis, R. (2004). Visual and linguistic information in gesture classification. In Proceedings of the 6th International Conference on Multimodal Interfaces—ICMI04, (p. 113). New York, NY: ACM Press. doi:10.1145/1027933.1027954.
Frei, A., & Rennhard, M. (2008). Histogram matrix: Log file visualization for anomaly detection. In ARES 2008—3rd International Conference on Availability, Security, and Reliability, Proceedings (pp. 610–617). doi:10.1109/ARES.2008.148.
Golnabi, K., Min, R. K., Khan, L., & Al-Shaer, E. (2006). Analysis of firewall policy rules using data mining techniques. In 10th IEEE/IFIP Network Operations and Management Symposium NOMS 2006 (Vol. 5, pp. 305–315). IEEE. doi:10.1109/NOMS.2006.1687561.
Gouda, M. G., & Liu, A. X. (2007). Structured firewall design. Computer Networks, 51(4), 1106–1120. doi:10.1016/j.comnet.2006.06.015.
Article MATH Google Scholar
Han, J., & Kamber, M. (2006). Data mining concepts and techniques. In J. Gray (Ed.), Data mining: Concepts and techniques (2nd ed.). San Francisco, CA: Morgan Kaufmann Publishers.
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11(1), 63–91.
Article MATH Google Scholar
Hu, H., Ahn, G. J., & Kulkarni, K. (2012). Detecting and resolving firewall policy anomalies. IEEE Transactions on Dependable and Secure Computing, 9(3), 318–331. doi:10.1109/TDSC.2012.20.
Article Google Scholar
Hunt, R. (1998). Internet/intranet firewall security-policy, architecture and transaction services. Computer Communications, 21(13), 1107–1123. doi:10.1016/S0140-3664(98)00173-X.
Article Google Scholar
Kerdegari, H., Samsudin, K., Ramli, A. R., & Mokaram, S. (2012). Evaluation of fall detection classification approaches. In 2012 4th International Conference on Intelligent and Advanced Systems (ICIAS2012) (Vol. 1, pp. 131–136). Kuala Lumpur: IEEE. doi:10.1109/ICIAS.2012.6306174.
Khan, B., Khan, M. K., Mahmud, M., & Alghathbar, K. S. (2010). Security analysis of firewall rule sets in computer networks. In 2010 Fourth International Conference on Emerging Security Information, Systems and Technologies (pp. 51–56). Venice: IEEE. doi:10.1109/SECURWARE.2010.16.
Kowalski, K., & Beheshti, M. (2006). Analysis of log files intersections for security enhancement. In Third International Conference on Information Technology: New Generations (ITNG06) (pp. 452–457). Las Vegas: IEEE. doi:10.1109/ITNG.2006.32
Lai, K., & Kelley, K. (2011). Accuracy in parameter estimation for targeted effects in structural equation modeling: Sample size planning for narrow confidence intervals. Psychological Methods, 16(2), 127–148. doi:10.1037/a0021764.
Article Google Scholar
Liu, A. X. (2012). Firewall policy change-impact analysis. ACM Transactions on Internet Technology, 11(4), 1–24. doi:10.1145/2109211.2109212.
Article Google Scholar
Maratea, A., Petrosino, A., & Manzo, M. (2014). Adjusted F-measure and kernel scaling for imbalanced data learning. Information Sciences, 257, 331–341. doi:10.1016/j.ins.2013.04.016.
Article Google Scholar
Moses, K. P., & Devadas, M. D. (2012). An approach to reduce root mean square error in toposheets. European Journal of Scientific Researach, 91(2), 268–274.
Google Scholar
Nilsson, N. J. (1998). Introduction to Machine Learning. Stanford, CA. Retrieved from http://robotics.stanford.edu/people/nilsson/mlbook.html.
NIST. (2016). National Vulnerability Database. Technical report, National Institute of Standarts and Information Technology Laboratory, Gaithersburg, MD. https://nvd.nist.gov/home.cfm.
Olson, D. L., & Delen, D. (2008). Advanced data mining techniques(1st edn.). Berlin, Heidelberg: Springer. doi:10.1007/978-3-540-76917-0.
Parker, A., de Cortázar-Atauri, I. G., Chuine, I., Barbeau, G., Bois, B., Boursiquot, J. M., et al. (2013). Classification of varieties for their timing of flowering and veraison using a modelling approach: A case study for the grapevine species Vitis vinifera L. Agricultural and Forest Meteorology, 180, 249–264. doi:10.1016/j.agrformet.2013.06.005.
Article Google Scholar
Pietraszek, T., & Tanner, A. (2005). Data mining and machine learning towards reducing false positives in intrusion detection. Information Security Technical Report, 10(3), 169–183. doi:10.1016/j.istr.2005.07.001.
Article Google Scholar
Shinder, T. W., Amon, C., Shimonski, R. J., & Shinder, D. L. (2003). The best damn firewall book period. Rockland, MA: Syngress Publishing. doi:10.1016/B978-193183690-6/50046-7.
Google Scholar
Smith, M. C., & Peterson, G. D. (2005). Parallel application performance on shared high performance reconfigurable computing resources. Performance Evaluation, 60(1–4), 107–125. doi:10.1016/j.peva.2004.10.004.
Article Google Scholar
Smusz, S., Kurczab, R., & Bojarski, A. J. (2013). A multidimensional analysis of machine learning methods performance in the classification of bioactive compounds. Chemometrics and Intelligent Laboratory Systems, 128, 89–100. doi:10.1016/j.chemolab.2013.08.003.
Article Google Scholar
Tran, T., Al-Shaer, E. S., & Boutaba, R. (2007). 055 PolicyVis: Firewall security policy visualization and inspection. In Proceedings of the 21st conference on Large Installation System Administration Conference USENIX Association (Vol. 7, pp. 1–16). http://usenix.org/event/lisa07/tech/full_papers/tran/tran.pdf.
Viera, A. J., & Garrett, J. M. (2005). Understanding inter observer agreement: The kappa statistic. Family Medicine, 37(5), 360–363, http://www.ncbi.nlm.nih.gov/pubmed/15883903.
Winding, R., Wright, T., & Chapple, M. (2006). System anomaly detection: Mining firewall logs. In 2006 Securecomm and Workshops (pp. 1–5). Baltimore, MD: IEEE. doi:10.1109/SECCOMW.2006.359572.
Witten, I. H., & Frank, E. (2005). Data mining practical machine learning tools and techniques (2nd ed.). San Francisco, CA: Morgan Kaufmann Publishers Inc.
MATH Google Scholar
Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining practical machine learning tools and techniques (3rd ed.). Burlington, MA: Elsevier Inc.
Google Scholar
Yoon, M., Chen, S., & Zhang, Z. (2010). Minimizing the maximum firewall rule set in a network with multiple firewalls. IEEE Transactions on Computers, 59(2), 218–230. doi:10.1109/TC.2009.172.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Faculty of Engineering, Trakya University, 22030, Edirne, Turkey
Erdem Ucar
Department of Computer Engineering, Faculty of Corlu Engineering, Namik Kemal University, Silahtaraga Mah. Unv. 1.Sok., 59860, Tekirdag, Turkey
Erkan Ozhan

Authors

Erdem Ucar
View author publications
You can also search for this author in PubMed Google Scholar
Erkan Ozhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erkan Ozhan.

Appendix

F-measure values of firewall rules according to kNN classifier.

Rule ID	TP rate	FP rate	Precision	Recall	F-measure
5	1	0	1	1	1
186	1	0	1	1	1
6	1	0	0.999	1	1
45	0.994	0	0.994	0.994	0.994
50	1	0	1	1	1
236	0.996	0.001	0.995	0.996	0.996
53	1	0	1	1	1
220	1	0	1	1	1
46	1	0	0.993	1	0.997
213	0.886	0	0.888	0.886	0.887
47	1	0	1	1	1
273	1	0	1	1	1
10	0.993	0	1	0.993	0.996
32	0.996	0	0.999	0.996	0.997
231	1	0	1	1	1
49	0.996	0	0.999	0.996	0.998
14	0.984	0	0.994	0.984	0.989
84	0.998	0	0.999	0.998	0.999
207	0.998	0	0.998	0.998	0.998
250	0.999	0	1	0.999	0.999
15	0.992	0	0.995	0.992	0.994
190	0.997	0	0.972	0.997	0.984
48	0.971	0	0.985	0.971	0.978
241	0.995	0	0.999	0.995	0.997
51	1	0	1	1	1
52	1	0	1	1	1
237	0.974	0	0.979	0.974	0.976
31	0.993	0	0.987	0.993	0.99
18	0.989	0	0.996	0.989	0.993
88	1	0	1	1	1
191	0.941	0	0.889	0.941	0.914
238	0.997	0	0.997	0.997	0.997
0	0.986	0	1	0.986	0.993
300	0.972	0	0.988	0.972	0.98
272	0.997	0	0.997	0.997	0.997
215	0.996	0	0.998	0.996	0.997
253	0.998	0	0.998	0.998	0.998
112	0.957	0	0.978	0.957	0.967
301	0.959	0	0.97	0.959	0.964
287	0.99	0	0.986	0.99	0.988
116	1	0	1	1	1
2	1	0	1	1	1
233	1	0	1	1	1
169	1	0	0.983	1	0.992
271	0.996	0	1	0.996	0.998
109	0.609	0	0.56	0.609	0.583
258	0.97	0	1	0.97	0.985
266	1	0	1	1	1
289	0.996	0	0.995	0.996	0.996
282	0.999	0	0.999	0.999	0.999
129	0.474	0	0.5	0.474	0.486
72	1	0	0.933	1	0.966
68	0.984	0	1	0.984	0.992
280	0.997	0	0.987	0.997	0.992
111	0.96	0	0.98	0.96	0.97
13	0.625	0	0.476	0.625	0.541
41	0.667	0	0.5	0.667	0.571
139	0.836	0	0.918	0.836	0.875
17	1	0	1	1	1
247	0.94	0	1	0.94	0.969
27	0.636	0	0.897	0.636	0.745
286	0.999	0	0.999	0.999	0.999
44	0.984	0	0.968	0.984	0.976
104	0.992	0	0.936	0.992	0.963
132	0.364	0	0.571	0.364	0.444
130	0.333	0	0.364	0.333	0.348
19	0.833	0	0.714	0.833	0.769
119	1	0	1	1	1
90	0.667	0	1	0.667	0.8
278	0.84	0	0.808	0.84	0.824
81	0.75	0	0.75	0.75	0.75
1	1	0	0.935	1	0.967
16	0.833	0	0.833	0.833	0.833
76	0.8	0	1	0.8	0.889
21	0.444	0	0.364	0.444	0.4
28	0.999	0	0.999	0.999	0.999
257	1	0	0.8	1	0.889
59	0.949	0	1	0.949	0.974
42	1	0	1	1	1
299	0.833	0	0.714	0.833	0.769
251	0.667	0	0.667	0.667	0.667
269	1	0	1	1	1
99	0.667	0	0.667	0.667	0.667
228	0	0	0	0	0
230	0	0	0	0	0
192	1	0	1	1	1
3	0	0	0	0	0
156	0	0	0	0	0
291	1	0	1	1	1
188	0.994	0	0.988	0.994	0.991
267	1	0	1	1	1
297	1	0	1	1	1
274	0.999	0	0.999	0.999	0.999

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ucar, E., Ozhan, E. The Analysis of Firewall Policy Through Machine Learning and Data Mining. Wireless Pers Commun 96, 2891–2909 (2017). https://doi.org/10.1007/s11277-017-4330-0

Download citation

Published: 17 May 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s11277-017-4330-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Analysis of Firewall Policy Through Machine Learning and Data Mining

Abstract

Access this article

Similar content being viewed by others

Cybersecurity data science: an overview from machine learning perspective

Artificial Intelligence and Fraud Detection

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Analysis of Firewall Policy Through Machine Learning and Data Mining

Abstract

Access this article

Similar content being viewed by others

Cybersecurity data science: an overview from machine learning perspective

Artificial Intelligence and Fraud Detection

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation