Towards detection of phishing websites on client-side using machine learning based approach

Jain, Ankit Kumar; Gupta, B. B.

doi:10.1007/s11235-017-0414-0

Towards detection of phishing websites on client-side using machine learning based approach

Published: 26 December 2017

Volume 68, pages 687–700, (2018)
Cite this article

Telecommunication Systems Aims and scope Submit manuscript

Ankit Kumar Jain¹ &
B. B. Gupta¹

2986 Accesses
112 Citations
Explore all metrics

Abstract

The existing anti-phishing approaches use the blacklist methods or features based machine learning techniques. Blacklist methods fail to detect new phishing attacks and produce high false positive rate. Moreover, existing machine learning based methods extract features from the third party, search engine, etc. Therefore, they are complicated, slow in nature, and not fit for the real-time environment. To solve this problem, this paper presents a machine learning based novel anti-phishing approach that extracts the features from client side only. We have examined the various attributes of the phishing and legitimate websites in depth and identified nineteen outstanding features to distinguish phishing websites from legitimate ones. These nineteen features are extracted from the URL and source code of the website and do not depend on any third party, which makes the proposed approach fast, reliable, and intelligent. Compared to other methods, the proposed approach has relatively high accuracy in detection of phishing websites as it achieved 99.39% true positive rate and 99.09% of overall detection accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection of Phishing Websites Using Machine Learning

Phishing Attack Detection Using Machine Learning

Phishing URLs Detection Using Machine Learning

References

Jain, A. K., & Gupta, B. B. (2017). Detection of phishing attacks in financial and e-banking websites using link and visual similarity relation. International Journal of Information and Computer Security, Inderscience, 2017 (Forthcoming Articles).
Gupta, S., & Gupta, B. B. (2017). Detection, avoidance, and attack pattern mechanisms in modern web application vulnerabilities: Present and future challenges. International Journal of Cloud Applications and Computing, 7(3), 1–43.
Article Google Scholar
Almomani, A., et al. (2013). A survey of phishing email filtering techniques. IEEE Communications Surveys & Tutorials, 15.4, 2070–2090.
Article Google Scholar
Gupta, B. B., et al. (2017). Fighting against phishing attacks: State of the art and future challenges. Neural Computing and Applications, 28(12), 3629–3654.
Article Google Scholar
APWG Q4 2016 Report available at: http://docs.apwg.org/reports/apwg_trends_report_q4_2016.pdf. Last accessed on September 22, 2017.
Razorthorn phishing report, Available at : http://www.razorthorn.co.uk/wp-content/uploads/2017/01/Phishing-Stats-2016.pdf. Last accessed on September 22, 2017.
Purkait, S. (2015). Examining the effectiveness of phishing filters against DNS based phishing attacks. Information & Computer Security, 23(3), 333–346.
Article Google Scholar
Huang, Z., Liu, S., Mao, X., Chen, K., & Li, J. (2017). Insight of the protection for data security under selective opening attacks. Information Sciences, Volumes, 412–413, 223–241.
Article Google Scholar
Li, J., Chen, X., Huang, X., Tang, S., Xiang, Y., Hassan, M. M., et al. (2015). Secure distributed deduplication systems with improved reliability. IEEE Transactions on Computers, 64(12), 3569–3579.
Article Google Scholar
Gowtham, R., & Krishnamurthi, I. (2014). A comprehensive and efficacious architecture for detecting phishing webpages. Computers & Security, 40, 23–37.
Article Google Scholar
Aboudi, N. E., & Benhlima, L. (2017). Parallel and distributed population based feature selection framework for health monitoring. International Journal of Cloud Applications and Computing, 7(1), 57–71.
Article Google Scholar
Sahoo, D., Liu, C., & Hoi, S. C. H. (2017). Malicious URL detection using machine learning: A survey. arXiv:1701.07179.
Arachchilage, N. A. G., Love, S., & Beznosov, K. (2016). Phishing threat avoidance behaviour: An empirical investigation. Computers in Human Behavior, 60, 185–197.
Article Google Scholar
Sheng, S., Magnien, B., Kumaraguru, P., Acquisti, A., Cranor, L. F., Hong, J. & Nunge, E. (2007). Anti-phishing phil: The design and evaluation of a game that teaches people not to fall for phish. In Proceedings of the 3rd symposium on usable privacy and security, Pittsburgh, (pp. 88–99).
Jain, A. K., & Gupta, B. B. (2016). A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP Journal of Information Security, 2016, 1–11.
Article Google Scholar
Sheng, S., Wardman, B., Warner, G., Cranor, L. F., Hong, J., & Zhang, C. (2009). An empirical analysis of phishing blacklists. In Proceedings of the 6th Conference on Email and Anti-Spam (CEAS’09).
Jain, A. K., & Gupta, B. B. (2017). Phishing detection: Analysis of visual similarity based approaches. Security and Communication Networks, 2017, Article ID 5421046, 20 pages, https://doi.org/10.1155/2017/5421046.
Montazer, G. A., & ArabYarmohammadi, S. (2015). Detection of phishing attacks in Iranian e-banking using a fuzzy-rough hybrid system. Applied Soft Computing, 35, 482–492.
Article Google Scholar
Xiang, G., Hong, J., Rose, C. P., & Cranor, L. (2011). Cantina+: A feature-rich machine learning framework for detecting phishing web sites. ACM Transactions on Information and System Security (TISSEC), 14(2), 21.
Article Google Scholar
El-Alfy, E. S. M. (2017). Detection of phishing websites based on probabilistic neural networks and K-medoids clustering. The Computer Journal. https://doi.org/10.1093/comjnl/bxx035.
Zhang, W., Jiang, Q., Chen, L., & Li, C. (2017). Two-stage ELM for phishing Web pages detection using hybrid features. World Wide Web, 20(4), 797–813.
Article Google Scholar
Zhang, Y., Hong, J. I., & Cranor, L. F. (2007). Cantina: A content-based approach to detecting phishing web sites. In Proceedings of the 16th international conference on world wide web, (pp. 639–648).
Tan, C. L., Chiew, K. L., Wong, K., & Sze, S. N. (2016). PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder. Decision Support Systems, 88, 18–27.
Article Google Scholar
Chiew, K. L., Chang, E. H., & Tiong, W. K. (2015). Utilisation of website logo for phishing detection. Computers & Security, 54, 16–26.
Article Google Scholar
APWG 2014 H2 Report Available at : https://docs.apwg.org/reports/apwg_trends_report_q3_2014.pdf. Last accessed on September 22, 2017.
Dataurization of URLs for a more effective phishing campaign. Available at: https://thehackerblog.com/dataurization-of-urls-for-a-more-effective-phishing-campaign/index.html. Last accessed on September 10, 2017.
Geng, G. G., Yang, X. T., Wang, W., & Meng, C. J. (2014). A Taxonomy of hyperlink hiding techniques. In Asia-Pacific web conference, (pp. 165–176).
Jain, A. K., & Gupta, B. B. (2017). Two-level authentication approach to protect from phishing attacks in real time. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-017-0616-z.
Verified Phishing URL, Available at : https://www.phishtank.com. Last accessed on September 22, 2017.
Phishing dataset available at : https://www.openphish.com/. Last accessed on September 27, 2017.
Alexa Most Popular sites, Available at : http://www.alexa.com/topsites. Last accessed on September 22, 2017.
List of online payment gateways. available at: http://research.omicsgroup.org/index.php/List_of_online_payment_service_providers. Last accessed on September 27, 2017.
Top banking websites in the world. Available at: https://www.similarweb.com/top-websites/category/finance/banking. Last accessed on September 27, 2017.
Chu, P., Komlodi, A., & Rózsa, G. (2015). Online search in english as a non-native language. Proceedings of the Association for Information Science and Technology, 52(1), 1–9.
Article Google Scholar
Percentages of websites using various content languages. Available at https://w3techs.com/technologies/overview/content_language/all. Last accessed on September 22, 2017.

Download references

Acknowledgements

This research work is being supported by Sir Visvesvaraya Young Faculty Research Fellowship Grant from Ministry of Electronics & Information Technology (MeitY), Government of India.

Author information

Authors and Affiliations

National Institute of Technology Kurukshetra, Kurukshetra, India
Ankit Kumar Jain & B. B. Gupta

Authors

Ankit Kumar Jain
View author publications
You can also search for this author in PubMed Google Scholar
B. B. Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. B. Gupta.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jain, A.K., Gupta, B.B. Towards detection of phishing websites on client-side using machine learning based approach. Telecommun Syst 68, 687–700 (2018). https://doi.org/10.1007/s11235-017-0414-0

Download citation

Published: 26 December 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s11235-017-0414-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards detection of phishing websites on client-side using machine learning based approach

Abstract

Access this article

Similar content being viewed by others

Detection of Phishing Websites Using Machine Learning

Phishing Attack Detection Using Machine Learning

Phishing URLs Detection Using Machine Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards detection of phishing websites on client-side using machine learning based approach

Abstract

Access this article

Similar content being viewed by others

Detection of Phishing Websites Using Machine Learning

Phishing Attack Detection Using Machine Learning

Phishing URLs Detection Using Machine Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation