Phishing Detection with Popular Search Engines: Simple and Effective

Huh, Jun Ho; Kim, Hyoungshick

doi:10.1007/978-3-642-27901-0_15

Jun Ho Huh¹⁸ &
Hyoungshick Kim¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 6888))

Included in the following conference series:

International Symposium on Foundations and Practice of Security

781 Accesses

Abstract

We propose a new phishing detection heuristic based on the search results returned from popular web search engines such as Google, Bing and Yahoo. The full URL of a website a user intends to access is used as the search string, and the number of results returned and ranking of the website are used for classification. Most of the time, legitimate websites get back large number of results and are ranked first, whereas phishing websites get back no result and/or are not ranked at all.

To demonstrate the effectiveness of our approach, we experimented with four well-known classification algorithms – Linear Discriminant Analysis, Naïve Bayesian, K-Nearest Neighbour, and Support Vector Machine – and observed their performance. The K-Nearest Neighbour algorithm performed best, achieving true positive rate of 98% and false positive and false negative rates of 2%. We used new legitimate websites and phishing websites as our dataset to show that our approach works well even on newly launched websites/webpages – such websites are often misclassified in existing blacklisting and whitelisting approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Detecto: The Phishing Website Detection

A Survey of Machine Learning Techniques in Phishing Detection

Naïve Bayes Classifier with Genetic Algorithm for Phishing Website Detection

References

Aaron, G., Rasmussen, R.: Global phishing survey: Trends and domain name use in 2h2009 (May 2010), http://www.antiphishing.org/reports/APWG_GlobalPhishingSurvey_2H2009.pdf
Cao, Y., Han, W., Le, Y.: Anti-phishing based on automated individual white-list. In: DIM 2008: Proceedings of the 4th ACM Workshop on Digital Identity Management, pp. 51–60. ACM, New York (2008)
Google Scholar
Chou, N., Ledesma, R., Teraguchi, Y., Mitchell, J.C.: Client-Side Defense Against Web-Based Identity Theft. In: NDSS 2004: Proceedings of the Network and Distributed System Security Symposium (2004)
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines: and other kernel-based learning methods, 1st edn. Cambridge University Press (March 2000)
Google Scholar
Domeniconi, C., Peng, J., Gunopulos, D.: Locally adaptive metric nearest-neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 1281–1285 (2002)
Article Google Scholar
Domingos, P., Pazzani, M.: On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning 29(2-3), 103–130 (1997)
Article MATH Google Scholar
Fu, A.Y., Wenyin, L., Deng, X.: Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover’s Distance (EMD). IEEE Transactions on Dependable and Secure Computing 3(4), 301–311 (2006)
Article Google Scholar
Fukunaga, K.: Introduction to statistical pattern recognition, 2nd edn. Academic Press Professional, Inc., San Diego (1990)
MATH Google Scholar
Hearst, M.A., Dumais, S.T., Osman, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intelligent Systems and their Applications 13(4), 18–28 (1998)
Article Google Scholar
Bian, K., Park, J.-M., Hsiao, M.S., Belanger, F., Hiller, J.: Evaluation of Online Resources in Assisting Phishing Detection. In: Ninth Annual International Symposium on Applications and the Internet, SAINT 2009, pp. 30–36. IEEE Computer Society, Bellevue (2009)
Chapter Google Scholar
Kim, H., Huh, J.H.: Detecting DNS-poisoning-based phishing attacks from their network performance characteristic. Electronics Letters 47(11), 656–658 (2011)
Article Google Scholar
Kirda, E., Kruegel, C.: Protecting Users Against Phishing Attacks with AntiPhish. In: COMPSAC 2005: Proceedings of the 29th Annual International Computer Software and Applications Conference, pp. 517–524. IEEE Computer Society, Washington, DC, USA (2005)
Google Scholar
Latourrette, M.: Toward an Explanatory Similarity Measure for Nearest-Neighbor Classification. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 238–245. Springer, Heidelberg (2000)
Chapter Google Scholar
Liu, W., Deng, X., Huang, G., Fu, A.Y.: An Antiphishing Strategy Based on Visual Similarity Assessment. IEEE Internet Computing 10(2), 58–65 (2006)
Article Google Scholar
Moore, T., Clayton, R.: Examining the impact of website take-down on phishing. In: eCrime 2007: Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit, pp. 1–13. ACM, New York (2007)
Google Scholar
Pan, Y., Ding, X.: Anomaly Based Web Phishing Page Detection. In: ACSAC 2006: Proceedings of the 22nd Annual Computer Security Applications Conference, pp. 381–392. IEEE Computer Society, Washington, DC, USA (2006)
Google Scholar
Rish, I.: An empirical study of the naive Bayes classifier. In: Proceedings of IJCAI-2001 Workshop on Empirical Methods in Artificial Intelligence (2001)
Google Scholar
Ronda, T., Saroiu, S., Wolman, A.: Itrustpage: a user-assisted anti-phishing tool. ACM SIGOPS Operating Systems Review 42(4), 261–272 (2008)
Article Google Scholar
Sheng, S., Wardman, B., Warner, G., Cranor, L.F., Hong, J., Zhang, C.: An empirical analysis of phishing blacklists. In: CEAS 2009: Proceedings of the 6th Conference on Email and Anti-Spam (2009)
Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience (September 1998)
Google Scholar
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.-H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowledge and Information Systems 14(1), 1–37 (2007)
Article Google Scholar
Xiang, G., Hong, J.I.: A hybrid phish detection approach by identity discovery and keywords retrieval. In: WWW 2009: Proceedings of the 18th international conference on World wide web, pp. 571–580. ACM, New York (2009)
Google Scholar
Zhang, Y., Egelman, S., Cranor, L., Hong, J.: Phinding Phish: Evaluating Anti-Phishing Tools. In: NDSS 2007: Proceedings of the 14th Annual Network and Distributed System Security Symposium (2007)
Google Scholar
Zhang, Y., Hong, J.I., Cranor, L.F.: Cantina: a content-based approach to detecting phishing web sites. In: WWW 2007: Proceedings of the 16th International Conference on World Wide Web, pp. 639–648. ACM, New York (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Trust Institute, University of Illinois at Urbana-Champaign, USA
Jun Ho Huh
Computer Laboratory, University of Cambridge, UK
Hyoungshick Kim

Authors

Jun Ho Huh
View author publications
You can also search for this author in PubMed Google Scholar
Hyoungshick Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

TELECOM-Bretagne, Campus de Rennes, 2, rue de la Châtaigneraie, 35512, Cesson Sévigné Cedex, France
Joaquin Garcia-Alfaro
Université Joseph Fourier, Laboratoire Verimag, Centre Equation, 2 avenue de Vignate, 38610, Gires, France
Pascal Lafourcade

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huh, J.H., Kim, H. (2012). Phishing Detection with Popular Search Engines: Simple and Effective. In: Garcia-Alfaro, J., Lafourcade, P. (eds) Foundations and Practice of Security. FPS 2011. Lecture Notes in Computer Science, vol 6888. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27901-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-27901-0_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27900-3
Online ISBN: 978-3-642-27901-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics