NLP Based Phishing Attack Detection from URLs

Buber, Ebubekir; Diri, Banu; Sahingoz, Ozgur Koray

doi:10.1007/978-3-319-76348-4_59

Ebubekir Buber¹⁸,
Banu Diri¹⁸ &
Ozgur Koray Sahingoz¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 736))

Included in the following conference series:

International Conference on Intelligent Systems Design and Applications

2335 Accesses
25 Citations

Abstract

In recent years, phishing has become an increasing threat in the cyberspace, especially with the increasingly use of messaging and social networks. In traditional phishing attack, users are motivated to visit a bogus website which is carefully designed to look like exactly to a famous banking, e-commerce, social networks, etc., site for getting some personal information such as credit card numbers, usernames, passwords, and even money. Lots of the phishers usually make their attacks with the help of emails by forwarding to the target website. Inexperienced users (even the experienced ones) can visit these fake websites and share their sensitive information. In a phishing attack analysis of 45 countries in the last quarter of 2016, China, Turkey and Taiwan are mostly plagued by malware with the rate of 47.09%, 42.88% and 38.98%. Detection of a phishing attack is a challenging problem, because, this type of attacks is considered as semantics-based attacks, which mainly exploit the computer user’s vulnerabilities. In this paper, a phishing detection system which can detect this type of attacks by using some machine learning algorithms and detecting some visual similarities with the help of some natural language processing techniques. Many tests have been applied on the proposed system and experimental results showed that Random Forest algorithm has a very good performance with a success rate of 97.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Highly accurate phishing URL detection based on machine learning

Article 08 October 2022

Machine Learning-Based Phishing Detection Using URL Features: A Comprehensive Review

Phishing URL Detection Using Machine Learning

References

Anti-Phishing Working Group (APWG): Phishing activity trends report—last quarter (2016). http://docs.apwg.org/reports/apwg_trends_report_q4_2016.pdf
Khonji, M., Iraqi, Y., Jones, A.: Phishing detection: a literature survey. IEEE Commun. Surv. Tutor. 15(4), 2091–2121 (2013)
Article Google Scholar
Garera, S., Provos, N., Chew, M., Rubin, A.D.: A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM Workshop on Recurring Malcode, pp. 1–8. ACM, November 2007
Google Scholar
Stone, A.: Natural-language processing for intrusion detection. Computer 40(12), 103–105 (2007)
Article Google Scholar
Fu, A.Y., Wenyin, L., Deng, X.: Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (EMD). IEEE Trans. Dependable Secur. Comput. 3(4), 301–311 (2006)
Article Google Scholar
Toolan, F., Carthy, J.: Phishing detection using classifier ensembles. In: 2009 eCrime Researchers Summit, eCRIME 2009, pp. 1–9 (2009)
Google Scholar
Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit, eCrime 2007, pp. 60–69. ACM, New York (2007)
Google Scholar
Cook, D.L., Gurbani, V.K., Daniluk, M.: Phishwish: a stateless phishing filter using minimal rules. In: Financial Cryptography and Data Security, pp. 182–186. Springer (2008)
Google Scholar
Cao, Y., Han, W., Le, Y.: Anti-phishing based on automated individual white-list. In: DIM 2008: 4th ACM Workshop on Digital Identity Management, New York, pp. 51–60 (2008)
Google Scholar
Sahingoz, O.K., Erdogan, N.: RUBDES: a rule based distributed event system. In: 18th International Symposium on Computer and Information Sciences - ISCIS 2003, Antalya, Turkey, pp. 284–291 (2003)
Google Scholar
Phistank: join the fight against phishing. https://www.phishtank.com/developer_info.php. Accessed Oct 2017
Yandex account: Yandex Technologies. https://tech.yandex.com.tr/xml/. Accessed Oct 2017
PyEnchant—PyEnchant v1.6.6 documentation. http://pyenchant.readthedocs.io/en/latest/index.html. Accessed Oct 2017
A small program to detect gibberish using a Markov chain. https://github.com/rrenaud/Gibberish-Detector. Accessed Oct 2017
Weka 3: data mining software in Java. http://www.cs.waikato.ac.nz/ml/weka/. Accessed Oct 2017
Buber, E., Diri, B., Sahingoz, O.K.: Detecting phishing attacks from URL by using NLP techniques. In: 2017 International Conference on Computer Science and Engineering (UBMK), Antalya, Turkey, pp. 337–342 (2017)
Google Scholar

Download references

Acknowledgement

Thanks to Normshield Inc., BGA Security, SinaraLabs and Roksit for contributing to the development of this work.

Author information

Authors and Affiliations

Computer Engineering Department, Yildiz Techical University, Istanbul, Turkey
Ebubekir Buber & Banu Diri
Computer Engineering Department, Istanbul Kultur University, 34158, Istanbul, Turkey
Ozgur Koray Sahingoz

Authors

Ebubekir Buber
View author publications
You can also search for this author in PubMed Google Scholar
Banu Diri
View author publications
You can also search for this author in PubMed Google Scholar
Ozgur Koray Sahingoz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ozgur Koray Sahingoz .

Editor information

Editors and Affiliations

Machine Intelligence Research Labs , Auburn, Washington, USA
Ajith Abraham
Department of Computer Science, South Asian University, Chanakyapuri, Delhi, India
Pranab Kr. Muhuri
Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka , Durian Tunggal, Melaka, Malaysia
Azah Kamilah Muda
Machine Intelligence Research Labs , Auburn, Washington, USA
Niketa Gandhi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Buber, E., Diri, B., Sahingoz, O.K. (2018). NLP Based Phishing Attack Detection from URLs. In: Abraham, A., Muhuri, P., Muda, A., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2017. Advances in Intelligent Systems and Computing, vol 736. Springer, Cham. https://doi.org/10.1007/978-3-319-76348-4_59

Download citation

DOI: https://doi.org/10.1007/978-3-319-76348-4_59
Published: 22 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76347-7
Online ISBN: 978-3-319-76348-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics