Skip to main content
Log in

Towards detection of phishing websites on client-side using machine learning based approach

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

The existing anti-phishing approaches use the blacklist methods or features based machine learning techniques. Blacklist methods fail to detect new phishing attacks and produce high false positive rate. Moreover, existing machine learning based methods extract features from the third party, search engine, etc. Therefore, they are complicated, slow in nature, and not fit for the real-time environment. To solve this problem, this paper presents a machine learning based novel anti-phishing approach that extracts the features from client side only. We have examined the various attributes of the phishing and legitimate websites in depth and identified nineteen outstanding features to distinguish phishing websites from legitimate ones. These nineteen features are extracted from the URL and source code of the website and do not depend on any third party, which makes the proposed approach fast, reliable, and intelligent. Compared to other methods, the proposed approach has relatively high accuracy in detection of phishing websites as it achieved 99.39% true positive rate and 99.09% of overall detection accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Jain, A. K., & Gupta, B. B. (2017). Detection of phishing attacks in financial and e-banking websites using link and visual similarity relation. International Journal of Information and Computer Security, Inderscience, 2017 (Forthcoming Articles).

  2. Gupta, S., & Gupta, B. B. (2017). Detection, avoidance, and attack pattern mechanisms in modern web application vulnerabilities: Present and future challenges. International Journal of Cloud Applications and Computing, 7(3), 1–43.

    Article  Google Scholar 

  3. Almomani, A., et al. (2013). A survey of phishing email filtering techniques. IEEE Communications Surveys & Tutorials, 15.4, 2070–2090.

    Article  Google Scholar 

  4. Gupta, B. B., et al. (2017). Fighting against phishing attacks: State of the art and future challenges. Neural Computing and Applications, 28(12), 3629–3654.

    Article  Google Scholar 

  5. APWG Q4 2016 Report available at: http://docs.apwg.org/reports/apwg_trends_report_q4_2016.pdf. Last accessed on September 22, 2017.

  6. Razorthorn phishing report, Available at : http://www.razorthorn.co.uk/wp-content/uploads/2017/01/Phishing-Stats-2016.pdf. Last accessed on September 22, 2017.

  7. Purkait, S. (2015). Examining the effectiveness of phishing filters against DNS based phishing attacks. Information & Computer Security, 23(3), 333–346.

    Article  Google Scholar 

  8. Huang, Z., Liu, S., Mao, X., Chen, K., & Li, J. (2017). Insight of the protection for data security under selective opening attacks. Information Sciences, Volumes, 412–413, 223–241.

    Article  Google Scholar 

  9. Li, J., Chen, X., Huang, X., Tang, S., Xiang, Y., Hassan, M. M., et al. (2015). Secure distributed deduplication systems with improved reliability. IEEE Transactions on Computers, 64(12), 3569–3579.

    Article  Google Scholar 

  10. Gowtham, R., & Krishnamurthi, I. (2014). A comprehensive and efficacious architecture for detecting phishing webpages. Computers & Security, 40, 23–37.

    Article  Google Scholar 

  11. Aboudi, N. E., & Benhlima, L. (2017). Parallel and distributed population based feature selection framework for health monitoring. International Journal of Cloud Applications and Computing, 7(1), 57–71.

    Article  Google Scholar 

  12. Sahoo, D., Liu, C., & Hoi, S. C. H. (2017). Malicious URL detection using machine learning: A survey. arXiv:1701.07179.

  13. Arachchilage, N. A. G., Love, S., & Beznosov, K. (2016). Phishing threat avoidance behaviour: An empirical investigation. Computers in Human Behavior, 60, 185–197.

    Article  Google Scholar 

  14. Sheng, S., Magnien, B., Kumaraguru, P., Acquisti, A., Cranor, L. F., Hong, J. & Nunge, E. (2007). Anti-phishing phil: The design and evaluation of a game that teaches people not to fall for phish. In Proceedings of the 3rd symposium on usable privacy and security, Pittsburgh, (pp. 88–99).

  15. Jain, A. K., & Gupta, B. B. (2016). A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP Journal of Information Security, 2016, 1–11.

    Article  Google Scholar 

  16. Sheng, S., Wardman, B., Warner, G., Cranor, L. F., Hong, J., & Zhang, C. (2009). An empirical analysis of phishing blacklists. In Proceedings of the 6th Conference on Email and Anti-Spam (CEAS’09).

  17. Jain, A. K., & Gupta, B. B. (2017). Phishing detection: Analysis of visual similarity based approaches. Security and Communication Networks, 2017, Article ID 5421046, 20 pages, https://doi.org/10.1155/2017/5421046.

  18. Montazer, G. A., & ArabYarmohammadi, S. (2015). Detection of phishing attacks in Iranian e-banking using a fuzzy-rough hybrid system. Applied Soft Computing, 35, 482–492.

    Article  Google Scholar 

  19. Xiang, G., Hong, J., Rose, C. P., & Cranor, L. (2011). Cantina+: A feature-rich machine learning framework for detecting phishing web sites. ACM Transactions on Information and System Security (TISSEC), 14(2), 21.

    Article  Google Scholar 

  20. El-Alfy, E. S. M. (2017). Detection of phishing websites based on probabilistic neural networks and K-medoids clustering. The Computer Journal. https://doi.org/10.1093/comjnl/bxx035.

  21. Zhang, W., Jiang, Q., Chen, L., & Li, C. (2017). Two-stage ELM for phishing Web pages detection using hybrid features. World Wide Web, 20(4), 797–813.

    Article  Google Scholar 

  22. Zhang, Y., Hong, J. I., & Cranor, L. F. (2007). Cantina: A content-based approach to detecting phishing web sites. In Proceedings of the 16th international conference on world wide web, (pp. 639–648).

  23. Tan, C. L., Chiew, K. L., Wong, K., & Sze, S. N. (2016). PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder. Decision Support Systems, 88, 18–27.

    Article  Google Scholar 

  24. Chiew, K. L., Chang, E. H., & Tiong, W. K. (2015). Utilisation of website logo for phishing detection. Computers & Security, 54, 16–26.

    Article  Google Scholar 

  25. APWG 2014 H2 Report Available at : https://docs.apwg.org/reports/apwg_trends_report_q3_2014.pdf. Last accessed on September 22, 2017.

  26. Dataurization of URLs for a more effective phishing campaign. Available at: https://thehackerblog.com/dataurization-of-urls-for-a-more-effective-phishing-campaign/index.html. Last accessed on September 10, 2017.

  27. Geng, G. G., Yang, X. T., Wang, W., & Meng, C. J. (2014). A Taxonomy of hyperlink hiding techniques. In Asia-Pacific web conference, (pp. 165–176).

  28. Jain, A. K., & Gupta, B. B. (2017). Two-level authentication approach to protect from phishing attacks in real time. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-017-0616-z.

  29. Verified Phishing URL, Available at : https://www.phishtank.com. Last accessed on September 22, 2017.

  30. Phishing dataset available at : https://www.openphish.com/. Last accessed on September 27, 2017.

  31. Alexa Most Popular sites, Available at : http://www.alexa.com/topsites. Last accessed on September 22, 2017.

  32. List of online payment gateways. available at: http://research.omicsgroup.org/index.php/List_of_online_payment_service_providers. Last accessed on September 27, 2017.

  33. Top banking websites in the world. Available at: https://www.similarweb.com/top-websites/category/finance/banking. Last accessed on September 27, 2017.

  34. Chu, P., Komlodi, A., & Rózsa, G. (2015). Online search in english as a non-native language. Proceedings of the Association for Information Science and Technology, 52(1), 1–9.

    Article  Google Scholar 

  35. Percentages of websites using various content languages. Available at https://w3techs.com/technologies/overview/content_language/all. Last accessed on September 22, 2017.

Download references

Acknowledgements

This research work is being supported by Sir Visvesvaraya Young Faculty Research Fellowship Grant from Ministry of Electronics & Information Technology (MeitY), Government of India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. B. Gupta.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jain, A.K., Gupta, B.B. Towards detection of phishing websites on client-side using machine learning based approach. Telecommun Syst 68, 687–700 (2018). https://doi.org/10.1007/s11235-017-0414-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-017-0414-0

Keywords

Navigation