Abstract
Data analytics is rapidly being employed in cybersecurity concerns, and has been found to be beneficial in situations where large amounts of data and heterogeneity make human assessment by security specialists difficult. Obtaining data with annotations is a tough and well-known restrictive constraint for various supervised security analytics tasks in real-world cyber-security situations using data-driven analytics. Because annotation is largely manual and involves a great deal of expert effort, vast sections of large datasets are frequently left unlabeled. We adopt a randomly ranked feature active learning strategy to create a semi-supervised solution in this research to address this constraint in an applied cyber-security challenge of phishing classification. An early classifier is trained on a slight sample of interpreted data, and then iteratively updated by selecting just relevant samples from a huge pool of unlabeled data that are most likely to effect classifier presentation quickly. Randomly ranked feature Active Learning has a lot of potential in terms of achieving quicker convergence in relationships of classification presentation in a group learning environment, needing even less human annotation labor. Without requiring a significant number of marked training examples to be accessible during training, a helpful feature rank update strategy paired with active learning displays good classification results for labeling phishing/malicious URLs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Korkmaz, M., Sahingoz, O.K., Diri, B.: Detection of phishing websites by using machine learning-based URL analysis. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE (2020)
Bhattacharjee, S.D., Talukder, A., Al-Shaer, E., Doshi, P.: Prioritized active learning for malicious URL detection using weighted text-based features. IEEE Int. Conference on Intelligence and Security Informatics (ISI) 22, 107–112 (2017)
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: Learning to detect malicious web sites from suspicious URLs. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1245–1254 (2009)
Tang, L., Mahmoud, Q.H.: A survey of machine learning-based solutions for phishing website detection. Machine Learning and Knowledge Extraction 3(3), 672–694 (2021)
Vanhoenshoven, F., et al.: Detecting malicious URLs using machine learning techniques. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE (2016)
Rajawat, A.S., Rawat, R., Barhanpurkar, K., Shaw, R.N., Ghosh, A.: Vulnerability analysis at industrial internet of things platform on dark web network using computational intelligence. In: Bansal, J.C., Paprzycki, M., Bianchini, M., Das, S. (eds.) Computationally Intelligent Systems and their Applications. SCI, vol. 950, pp. 39–51. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0407-2_4
Li, J.-H., Wang, S.-D.: PhishBox: An approach for phishing validation and detection. In: 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). IEEE (2017)
Zhu, E., et al.: OFS-NN: an effective phishing websites detection model based on optimal feature selection and neural network. IEEE Access 7, 73271–73284 (2019)
Blum, A., et al.: Lexical feature-based phishing URL detection using online learning. In: Proceedings of the 3rd ACM Workshop on Artificial Intelligence and Security (2010)
Zhao, P., Hoi, S.C.: Cost-sensitive online active learning with application to malicious URL detection. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2013)
Sadique, F., et al.: An automated framework for real-time phishing URL detection. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC). IEEE (2020)
Deka, R.K., Bhattacharyya, D.K., Kalita, J.K.: Active learning to detect DDoS attack using ranked features. Computer Commun. 145, 203–222 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ponni, P., Prabha, D. (2023). Randomized Active Learning to Identify Phishing URL. In: Shaw, R.N., Paprzycki, M., Ghosh, A. (eds) Advanced Communication and Intelligent Systems. ICACIS 2022. Communications in Computer and Information Science, vol 1749. Springer, Cham. https://doi.org/10.1007/978-3-031-25088-0_47
Download citation
DOI: https://doi.org/10.1007/978-3-031-25088-0_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25087-3
Online ISBN: 978-3-031-25088-0
eBook Packages: Computer ScienceComputer Science (R0)