Abstract
Web logs play the crucial role in detecting web attack. However, analyzing web logs become a challenge due to the huge log volume issue. The objective of this research is to create a web log cleaning algorithm for web intrusion detection. Studies on previous works showed that there are five major web log attributes needed in web log cleaning algorithm for intrusion detection, namely multimedia files, web robots request, HTTP status code, HTTP method and other files. The enhanced algorithm is based on these five major web log attributes along with a set of rules and conditions. Our experiment shows that the proposed algorithm is able to clean noisy data effectively with a percentage of reduction of 40.41 and at the same time maintain the readiness for web intrusion detection at a low false negative rate (0.00531). Future works may address the web intrusion detection mechanism.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Suthaharan, S., Panchagnula, T.: Relevance feature selection with data cleaning for intrusion detection system. In: Proceedings of the IEEE SoutheastCon, pp. 1–6. IEEE (2012)
Salama, S.E., Marie, M.I., El-Fangary, L.M., Helmy, Y.K.: Web Server Logs Preprocessing for Web Intrusion Detection. Computer and Information Science 4, 123–133 (2011)
Patil, P., Patil, U.: Preprocessing of web server log file for web mining. World Journal of Science and Technology 2, 14–18 (2012)
Farid, D.M., Rahman, M.Z., Rahman, C.M.: Adaptive Intrusion Detection based on Boosting and Naive Bayesian Classifier. International Journal of Computer Applications 24, 12–19 (2011)
Eshaghi, M., Gawali, S.Z.: Web Usage Mining Based on Complex Structure of XML for Web IDS. IJITEE International Journal of Innovative Technology and Exploring Engineering 2, 323–326 (2013)
Suen, H.Y., Lau, W.C., Yue, O.: Detecting Anomalous Web Browsing via Diffusion Wavelets. In: International Conference on Communications, pp. 1–6. IEEE (2010)
Chauhan, P., Singh, N., Chandra, N.: Deportment of Logs for Securing the Host System. In: 5th International Conference on Computational Intelligence and Communication Networks, pp. 355–359. IEEE (2013)
Aye, T.T.: Web log cleaning for mining of web usage patterns. In: 3rd International Conference on Computer Research and Development, pp. 490–494. IEEE (2011)
Raju, G., Satyanarayana, P.: Knowledge Discovery from Web Usage Data: Complete Preprocessing Methodology. IJCSNS International Journal of Computer Science and Network Security 8, 179–186 (2008)
Vellingiri, J., Pandian, S.C.: A Novel Technique for Web Log mining with Better Data Cleaning and Transaction Identification. Journal of Computer Science 7, 683–689 (2011)
Reddy, K.S., Varma, G., Babu, I.R.: Preprocessing the web server logs: an illustrative approach for effective usage mining. ACM SIGSOFT Software Engineering Notes 37, 1–5 (2012)
Castellano, G., Fanelli, A., Torsello, M.: Log data preparation for mining web usage patterns. In: Proceedings of IADIS International Conference Applied Computing, pp. 371–378 (2007)
Suneetha, K., Krishnamoorthi, R.: Identifying user behavior by analyzing web server access log file. IJCSNS International Journal of Computer Science and Network Security 9, 327–332 (2009)
Anand, S., Aggarwal, R.R.: An Efficient Algorithm for Data Cleaning of Log File using File Extensions. International Journal of Computer Applications 48, 13–18 (2012)
Stamm, S., Stern, B., Markham, G.: Reining in the web with content security policy. In: Proceedings of the 19th International Conference on World Wide Web, pp. 921–930. ACM (2010)
Bomhardt, C., Gaul, W., Schmidt-Thieme, L.: Web robot detection-preprocessing web logfiles for robot detection. In: New Developments in Classification and Data Analysis, pp. 113–124 (2005)
Doran, D., Gokhale, S.S.: Web robot detection techniques: overview and limitations. Data Mining and Knowledge Discovery 22, 183–210 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ong, Y.C., Ismail, Z. (2014). Enhanced Web Log Cleaning Algorithm for Web Intrusion Detection. In: Boonkrong, S., Unger, H., Meesad, P. (eds) Recent Advances in Information and Communication Technology. Advances in Intelligent Systems and Computing, vol 265. Springer, Cham. https://doi.org/10.1007/978-3-319-06538-0_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-06538-0_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06537-3
Online ISBN: 978-3-319-06538-0
eBook Packages: EngineeringEngineering (R0)