Two-stage ELM for phishing Web pages detection using hybrid features

Zhang, Wei; Jiang, Qingshan; Chen, Lifei; Li, Chengming

doi:10.1007/s11280-016-0418-9

Two-stage ELM for phishing Web pages detection using hybrid features

Published: 29 September 2016

Volume 20, pages 797–813, (2017)
Cite this article

World Wide Web Aims and scope Submit manuscript

Wei Zhang^1,2,
Qingshan Jiang^1,2,
Lifei Chen³ &
…
Chengming Li¹

1213 Accesses
37 Citations
Explore all metrics

Abstract

Increasing high volume phishing attacks are being encountered every day due to attackers’ high financial returns. Recently, there has been significant interest in applying machine learning for phishing Web pages detection. Different from literatures, this paper introduces predicted labels of textual contents to be part of the features and proposes a novel framework for phishing Web pages detection using hybrid features consisting of URL-based, Web-based, rule-based and textual content-based features. We achieve this framework by developing an efficient two-stage extreme learning machine (ELM). The first stage is to construct classification models on textual contents of Web pages using ELM. In particular, we take Optical Character Recognition (OCR) as an assistant tool to extract textual contents from image format Web pages in this stage. In the second stage, a classification model on hybrid features is developed by using a linear combination model-based ensemble ELMs (LC-ELMs), with the weights calculated by the generalized inverse. Experimental results indicate the proposed framework is promising for detecting phishing Web pages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Website Phishing Detection Using Machine Learning Classification Algorithms

An Efficient Approach for Phishing Detection using Machine Learning

Detection of Phishing Website Using Support Vector Machine and Light Gradient Boosting Machine Learning Algorithms

References

Abbasi, A., Chen, H.: A comparison of tools for detecting fake Websites. Computer 42(10), 78–86 (2009)
Article Google Scholar
Abdelhamid, N., Ayesh, A., Thabtah, F.: Phishing detection based associative classification data mining. Expert Syst. Appl. 41(13), 5948–5959 (2014)
Article Google Scholar
Arachchilage, N.A.G., Love, S.: A game design framework for avoiding phishing attacks. Comput. Hum. Behav. 29(3), 706–714 (2013)
Article Google Scholar
Barraclough, P.A., Hossain, M.A., Tahir, M.A., Sexton, G., Aslam, N.: Intelligent phishing detection and protection scheme for online transactions. Expert Syst. Appl. 40(11), 4697–4706 (2013)
Article Google Scholar
Cao, J.J., Kwong, S., Wang, R., Li, K.: A weighted voting method using minimum square error based on Extreme Learning Machine. In: Proceedings of International Conference on Machine Learning and Cybernetics, 1, 411–414 (2012)
Cao, J., Lin, Z., Huang, G. B., Liu, N.: Voting based extreme learning machine. Inf. Sci. 185(1), 66–77 (2012)
Article MathSciNet Google Scholar
Ding, S., Zhao, H., Zhang, Y., Xu, X., Nie, R.: Extreme learning machine: algorithm, theory and application. Artif. Intell. Rev. 44(1), 103–115 (2013)
Article Google Scholar
Dunlop, M., Groat, S., Shelly, D.: GoldPhish: Using Images for Content-Based Phishing Analysis. In: Proceedings of International Conference on Internet Monitoring and Protection, 123-128, IEEE (2010)
Feroz, M.N., Mengel, S.: Examination of data, rule generation and detection of phishing URLs using online logistic regression. In: Proceddings of 2014 IEEE International Conference on Big Data, IEEE, 241-250 (2014)
Google Safe Browsing, https://developers.google.com/safe-browsing/?hl=zh-CN
Gu, X., Wang, H., Ni, T.: An Efficient Approach to Detecting Phishing Web. J. Comput. Inf. Syst. 9(14), 5553–5560 (2013)
Google Scholar
He, M., Horng, S.J., Fan, P., Khan, M.K., Run, R.S., Lai, J.L., et al.: An efficient phishing Webpage detector. Expert Syst. Appl. 38(10), 12018–12027 (2011)
Article Google Scholar
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward Neural Networks. In: Proceedings of IEEE International Joint Confrence on Neural Networks, 2, 985-990, IEEE (2004)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1-3), 489–501 (2006)
Article Google Scholar
Huang, G.B., Ding, X.J., Zhou, H.M.: Optimization method based extreme learning machine for classification. Neurocomputing 74(1-3), 155–163 (2010)
Article Google Scholar
Huang, G.B., Zhou, H.M., Ding, X.J., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. B Cybern. 42(2), 513–529 (2012)
Article Google Scholar
Huang, D., Xu, K., Pei, J.: Malicious URL detection by dynamically mining patterns without pre-defined elements. World Wide Web 17(6), 1375–1394 (2014)
Article Google Scholar
ICTCLAS, http://ictclas.nlpir.org/
Iraqi, Y., Jones, A., Khonji, M.: Phishing detection: a literature survey. IEEE Commun. Surv. Tutorials 15(4), 2091–2121 (2013)
Article Google Scholar
Kumaraguru, P., Sheng, S., Acquisti, A., Cranor, L. F., Hong, J.: Lessons from a real world evaluation of anti-phishing training. In: Proceedings of eCrime Researchers Summit, 1-12, IEEE (2008)
Kumaraguru, P., Sheng, S., Acquisti, A., Cranor, L. F., Hong, J.: Teaching johnny not to fall for phish. ACM Trans. Internet Technol. 10(2), 890–895 (2010)
Article Google Scholar
Laencina, P.J.G.: Improving predictions using linear combination of multiple extreme learning machines. Inf. Technol. Control 42(1), 86–93 (2013)
Google Scholar
Lan, Y., Soh, Y.C., Huang, G.B.: Ensemble of online sequential extreme learning machine. Neurocomputing 72, 3391–3395 (2009)
Article Google Scholar
Li, S., Schmitz, R.: A novel anti-phishing framework based on honeypots. In: Proceedings of eCrime Researchers Summit, 1-13, IEEE (2009)
Li, Y., Chu, S., Xiao, R.: A pharming attack hybrid detection model based on IP addresses and Web content. Optik-Inter. J. Light and Electron Optics 126, 234–239 (2015)
Article Google Scholar
Liu, N., Wang, H.: Ensemble based extreme learning machine. IEEE Signal Process Lett. 7(8), 754–757 (2010)
Google Scholar
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious Web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1245-1254, ACM (2009)
Miche, Y., Sorjamaa, A., Bas, P., Jutten, C., Lendasse, A.: OP-ELM: optimally pruned extreme learning machine. IEEE Trans. Neural Netw. 21(1), 158–62 (2010)
Article Google Scholar
Mohammad, R.M., Thabtah, F., Mccluskey, L.: Predicting phishing Websites based on self-structuring Neural Network. Neural Comput. & Applic. 25(2), 443–458 (2014)
Article Google Scholar
Nah, F H.: A study on tolerable waiting time: how long are Web users willing to wait? Behav. Inform. Technol. 23(3), 153–163 (2003)
Article Google Scholar
Netcraft, http://www.netcraft.com/anti-phishing
Ramanathan, V., Wechsler, H.: Phishing Website detection using Latent Dirichlet Allocation and AdaBoost. In: Proceedings of IEEE International Conference on Intelligence and Security Informatics, 102–107 (2012)
Salton, G., McGill, M.: Introduction to modern information retrieval. McGraw-Hill (1983)
Xiang, G., Hong, J., Rose, C.P., Cranor, L.: CANTINA+: a feature-rich machine learning framework for detecting phishing Web sites. ACM Trans. Inf. Syst. Secur. 14(2), 1–28 (2011)
Article Google Scholar
Yao, W., He, J., Wang, H., Zhang, Y., Cao, J.: Collaborative topic ranking: leveraging item meta-data for sparsity reduction. In: Proceedings of AAAI, 374-380 (2015)
Zhang, H., Liu, G., Chow, T.W.S., Liu, W.: Textual and visual content-based anti-phishing: a bayesian approach. IEEE Trans. Neural Netw. 22(10), 1532–1546 (2011)
Article Google Scholar
Zhuang, W.W., Jiang, Q.S.: Intelligent anti-phishing framework using multiple classifiers combination. J. Comput. Inf. Syst. 8(17), 7267–7281 (2012)
Google Scholar

Download references

Acknowledgements

This research work is supported by Special Fund on Guangdong Province Chinese Academy of Sciences Comprehensive Strategic Cooperation (NO.2013B091300019), Shenzhen Fundamental Research Foundation (NO.CXZZ20150813155917544, JCYJ20150630114942277), and Guangdong National Natural Science Foundation of China (NO.U1401258).

Author information

Authors and Affiliations

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Wei Zhang, Qingshan Jiang & Chengming Li
Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China
Wei Zhang & Qingshan Jiang
Fujian Normal University, Fuzhou, China
Lifei Chen

Authors

Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qingshan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Lifei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chengming Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingshan Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, W., Jiang, Q., Chen, L. et al. Two-stage ELM for phishing Web pages detection using hybrid features. World Wide Web 20, 797–813 (2017). https://doi.org/10.1007/s11280-016-0418-9

Download citation

Received: 13 December 2015
Revised: 07 July 2016
Accepted: 31 July 2016
Published: 29 September 2016
Issue Date: July 2017
DOI: https://doi.org/10.1007/s11280-016-0418-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-stage ELM for phishing Web pages detection using hybrid features

Abstract

Access this article

Similar content being viewed by others

Website Phishing Detection Using Machine Learning Classification Algorithms

An Efficient Approach for Phishing Detection using Machine Learning

Detection of Phishing Website Using Support Vector Machine and Light Gradient Boosting Machine Learning Algorithms

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two-stage ELM for phishing Web pages detection using hybrid features

Abstract

Access this article

Similar content being viewed by others

Website Phishing Detection Using Machine Learning Classification Algorithms

An Efficient Approach for Phishing Detection using Machine Learning

Detection of Phishing Website Using Support Vector Machine and Light Gradient Boosting Machine Learning Algorithms

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation