Abstract
The existing anti-phishing approaches use the blacklist methods or features based machine learning techniques. Blacklist methods fail to detect new phishing attacks and produce high false positive rate. Moreover, existing machine learning based methods extract features from the third party, search engine, etc. Therefore, they are complicated, slow in nature, and not fit for the real-time environment. To solve this problem, this paper presents a machine learning based novel anti-phishing approach that extracts the features from client side only. We have examined the various attributes of the phishing and legitimate websites in depth and identified nineteen outstanding features to distinguish phishing websites from legitimate ones. These nineteen features are extracted from the URL and source code of the website and do not depend on any third party, which makes the proposed approach fast, reliable, and intelligent. Compared to other methods, the proposed approach has relatively high accuracy in detection of phishing websites as it achieved 99.39% true positive rate and 99.09% of overall detection accuracy.
Similar content being viewed by others
References
Jain, A. K., & Gupta, B. B. (2017). Detection of phishing attacks in financial and e-banking websites using link and visual similarity relation. International Journal of Information and Computer Security, Inderscience, 2017 (Forthcoming Articles).
Gupta, S., & Gupta, B. B. (2017). Detection, avoidance, and attack pattern mechanisms in modern web application vulnerabilities: Present and future challenges. International Journal of Cloud Applications and Computing, 7(3), 1–43.
Almomani, A., et al. (2013). A survey of phishing email filtering techniques. IEEE Communications Surveys & Tutorials, 15.4, 2070–2090.
Gupta, B. B., et al. (2017). Fighting against phishing attacks: State of the art and future challenges. Neural Computing and Applications, 28(12), 3629–3654.
APWG Q4 2016 Report available at: http://docs.apwg.org/reports/apwg_trends_report_q4_2016.pdf. Last accessed on September 22, 2017.
Razorthorn phishing report, Available at : http://www.razorthorn.co.uk/wp-content/uploads/2017/01/Phishing-Stats-2016.pdf. Last accessed on September 22, 2017.
Purkait, S. (2015). Examining the effectiveness of phishing filters against DNS based phishing attacks. Information & Computer Security, 23(3), 333–346.
Huang, Z., Liu, S., Mao, X., Chen, K., & Li, J. (2017). Insight of the protection for data security under selective opening attacks. Information Sciences, Volumes, 412–413, 223–241.
Li, J., Chen, X., Huang, X., Tang, S., Xiang, Y., Hassan, M. M., et al. (2015). Secure distributed deduplication systems with improved reliability. IEEE Transactions on Computers, 64(12), 3569–3579.
Gowtham, R., & Krishnamurthi, I. (2014). A comprehensive and efficacious architecture for detecting phishing webpages. Computers & Security, 40, 23–37.
Aboudi, N. E., & Benhlima, L. (2017). Parallel and distributed population based feature selection framework for health monitoring. International Journal of Cloud Applications and Computing, 7(1), 57–71.
Sahoo, D., Liu, C., & Hoi, S. C. H. (2017). Malicious URL detection using machine learning: A survey. arXiv:1701.07179.
Arachchilage, N. A. G., Love, S., & Beznosov, K. (2016). Phishing threat avoidance behaviour: An empirical investigation. Computers in Human Behavior, 60, 185–197.
Sheng, S., Magnien, B., Kumaraguru, P., Acquisti, A., Cranor, L. F., Hong, J. & Nunge, E. (2007). Anti-phishing phil: The design and evaluation of a game that teaches people not to fall for phish. In Proceedings of the 3rd symposium on usable privacy and security, Pittsburgh, (pp. 88–99).
Jain, A. K., & Gupta, B. B. (2016). A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP Journal of Information Security, 2016, 1–11.
Sheng, S., Wardman, B., Warner, G., Cranor, L. F., Hong, J., & Zhang, C. (2009). An empirical analysis of phishing blacklists. In Proceedings of the 6th Conference on Email and Anti-Spam (CEAS’09).
Jain, A. K., & Gupta, B. B. (2017). Phishing detection: Analysis of visual similarity based approaches. Security and Communication Networks, 2017, Article ID 5421046, 20 pages, https://doi.org/10.1155/2017/5421046.
Montazer, G. A., & ArabYarmohammadi, S. (2015). Detection of phishing attacks in Iranian e-banking using a fuzzy-rough hybrid system. Applied Soft Computing, 35, 482–492.
Xiang, G., Hong, J., Rose, C. P., & Cranor, L. (2011). Cantina+: A feature-rich machine learning framework for detecting phishing web sites. ACM Transactions on Information and System Security (TISSEC), 14(2), 21.
El-Alfy, E. S. M. (2017). Detection of phishing websites based on probabilistic neural networks and K-medoids clustering. The Computer Journal. https://doi.org/10.1093/comjnl/bxx035.
Zhang, W., Jiang, Q., Chen, L., & Li, C. (2017). Two-stage ELM for phishing Web pages detection using hybrid features. World Wide Web, 20(4), 797–813.
Zhang, Y., Hong, J. I., & Cranor, L. F. (2007). Cantina: A content-based approach to detecting phishing web sites. In Proceedings of the 16th international conference on world wide web, (pp. 639–648).
Tan, C. L., Chiew, K. L., Wong, K., & Sze, S. N. (2016). PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder. Decision Support Systems, 88, 18–27.
Chiew, K. L., Chang, E. H., & Tiong, W. K. (2015). Utilisation of website logo for phishing detection. Computers & Security, 54, 16–26.
APWG 2014 H2 Report Available at : https://docs.apwg.org/reports/apwg_trends_report_q3_2014.pdf. Last accessed on September 22, 2017.
Dataurization of URLs for a more effective phishing campaign. Available at: https://thehackerblog.com/dataurization-of-urls-for-a-more-effective-phishing-campaign/index.html. Last accessed on September 10, 2017.
Geng, G. G., Yang, X. T., Wang, W., & Meng, C. J. (2014). A Taxonomy of hyperlink hiding techniques. In Asia-Pacific web conference, (pp. 165–176).
Jain, A. K., & Gupta, B. B. (2017). Two-level authentication approach to protect from phishing attacks in real time. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-017-0616-z.
Verified Phishing URL, Available at : https://www.phishtank.com. Last accessed on September 22, 2017.
Phishing dataset available at : https://www.openphish.com/. Last accessed on September 27, 2017.
Alexa Most Popular sites, Available at : http://www.alexa.com/topsites. Last accessed on September 22, 2017.
List of online payment gateways. available at: http://research.omicsgroup.org/index.php/List_of_online_payment_service_providers. Last accessed on September 27, 2017.
Top banking websites in the world. Available at: https://www.similarweb.com/top-websites/category/finance/banking. Last accessed on September 27, 2017.
Chu, P., Komlodi, A., & Rózsa, G. (2015). Online search in english as a non-native language. Proceedings of the Association for Information Science and Technology, 52(1), 1–9.
Percentages of websites using various content languages. Available at https://w3techs.com/technologies/overview/content_language/all. Last accessed on September 22, 2017.
Acknowledgements
This research work is being supported by Sir Visvesvaraya Young Faculty Research Fellowship Grant from Ministry of Electronics & Information Technology (MeitY), Government of India.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jain, A.K., Gupta, B.B. Towards detection of phishing websites on client-side using machine learning based approach. Telecommun Syst 68, 687–700 (2018). https://doi.org/10.1007/s11235-017-0414-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11235-017-0414-0